Feeds

Real Python: Quiz: Interacting With Python

Planet Python - Thu, 2024-11-21 07:00

In this quiz, you’ll test your understanding of the different ways you can interact with Python.

By working through this quiz, you’ll revisit key concepts related to Python interaction in interactive mode using the Read-Eval-Print Loop (REPL), through Python script files, and within Integrated Development Environments (IDEs) and code editors.

You’ll also test your knowledge of some other options that may be useful, such as Jupyter Notebooks.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Django Weblog: Announcing the 6.x Steering Council elections 🚀

Planet Python - Thu, 2024-11-21 03:00

Today, we’re announcing early elections for the Django Software Foundation Steering Council over the 6.x Django release cycle. Elected members will be on the Steering Council for two years, from the end of those elections in December, until April 2027 with the scheduled start of the Django 7.x release cycle.

Why we have early elections

The DSF Board of Directors previously shared Django’s technical governance challenges, and opportunities. Now that the Board elections are completed, we’re ready to proceed with this other, separate election, following existing processes. We will want a Steering Council who strives  to meet the group’s intended goals:

  1. To safeguard big decisions that affect Django projects at a fundamental level.

  2. To help shepherd the project’s future direction.

We expect the new Steering Council will take on those known challenges, resolve those questions of technical leadership, and update Django’s technical governance. They will have the full support of the Board of Directors to address this threat to Django’s future. And the Board will also be more decisive in intervening, should similar issues keep arising.

Elections timeline

Here are the important dates of the Steering Council elections, subject to change:

  • 2024-11-21: announcement & opening of voter registration
  • 2024-11-26 23:59 AoE (Anywhere on Earth): voter registration closes
  • 2024-11-27: opening of Steering Council candidates registration
  • 2024-12-04 23:59 AoE: candidates registration closes
  • (one week gap per defined processes)
  • 2024-12-10: voting starts
  • 2024–12-17 23:59 AoE: voting ends
  • 2024-12-18: results ratification by DSF Board of Directors
  • 2024-12-19: results announcement
Voter registration

If you’re an Individual Member of the Django Software Foundation, you’re already registered to vote. There’s nothing further for you to do. If you aren’t, consider nominating yourself for individual membership. Once approved, you will be registered to vote for this election.

Alternatively, for members of our community who want to vote in this election but don’t want to become Individual Members, you can register to vote from now until 2024-11-26 23:59 Anywhere on Earth, use our form: Django 6.x Steering Council Voter Registration.

Candidate registration

If you’re interested, don’t wait until formal candidate registration. You can already fill in our 6.x Steering Council expression of interest form. At the end of the form, select “I would like what my submissions to this form to be used as part of my candidate registration for the elections”.

Django 6.x Steering Council elections - Expression of interest

Voting

Once voting opens, those eligible to vote in this election will receive information on how to vote via email. Please check for an email with the subject line “6.x Steering Council elections voting”. Voting will be open until 23:59 on December 17, 2024 Anywhere on Earth.

—

Any questions? Ask on our dedicated forum discussion thread, or reach out via email to foundation\@djangoproject.com.

Categories: FLOSS Project Planets

PyPodcats: Trailer: Episode 7 With Anna Makarudze

Planet Python - Thu, 2024-11-21 03:00
A preview of our chat with Anna Makarudze. Watch the full episode on November 20, 2024A preview of our chat with Anna Makarudze. Watch the full episode on November 20, 2024

Sneak Peek of our chat with Anna Makarudze, hosted by Mariatta Wijaya and Cheuk Ting Ho.

Since discovering Python and Django in 2015, Anna has been actively involved in the Django community. She helped organize PyCon Zimbabwe, and she has coached at Django Girls in Harare and Windhoek.

She served on the Board of Directors at Django Software Foundation for five years, and she is currently a Django Girls Foundation Trustee & Fundraising Coordinator.

Anna became aware of the lack of representation of women in tech industry, something that became more evident as she attended Django Under the Hood in 2016 where most of the attendees were white men, and only a few are women. That’s when she realized the importance of communities like Django Girls in supporting more women in the Django Community.

In this chat, Anna shared ways on how you can contribute and help support Django Girls+ Foundation.

Full episode is coming on November 27, 2024! Subscribe to our podcast now!

Categories: FLOSS Project Planets

www-zh-cn @ Savannah: Welcome our new member - bingchuanjuzi

GNU Planet! - Wed, 2024-11-20 21:01

Hi, All:

Please join me in welcoming our new member:

User Details:
-------------
Name:    Haoran Du
Login:   bingchuanjuzi
Email:   dududu233@outlook.com

I wish bingchuanjuzi a wonderful journey in GNU CTT.

Happy Hacking
wxie

Categories: FLOSS Project Planets

Matt Glaman: Lenient Composer Plugin officially replaces lenient packages endpoint

Planet Drupal - Wed, 2024-11-20 18:17

Well, it's official. My Drupal Lenient Composer Plugin has allowed the lenient Composer repository endpoint on Drupal.org to be sunset and removed. I created the mglaman/composer-drupal-lenient repository two years ago at DrupalCon Portland. It is pretty wild how much it has been adopted in just two years. Not only has it allowed the Drupal Association to dismantle some infrastructure, but it is also baked into the Drupal.org GitLab CI. The package is pushing over 3,000,000 downloads from Packagist!

Categories: FLOSS Project Planets

libtool @ Savannah: libtool-2.5.4 released [stable]

GNU Planet! - Wed, 2024-11-20 15:27

Libtoolers!

The Libtool Team is pleased to announce the release of libtool 2.5.4.

GNU Libtool hides the complexity of using shared libraries behind a
consistent, portable interface. GNU Libtool ships with GNU libltdl, which
hides the complexity of loading dynamic runtime libraries (modules)
behind a consistent, portable interface.

There have been 49 commits by 16 people in the 8 weeks since 2.5.3.

See the NEWS below for a brief summary.

Thanks to everyone who has contributed!
The following people contributed changes to this release:

  Adrien Destugues (1)
  Alastair McKinstry (6)
  Bruno Haible (1)
  Ileana Dumitrescu (27)
  Jerome Duval (1)
  Jonathan Nieder (2)
  Joshua Root (1)
  Khalid Masum (1)
  Markus MĂŒtzel (1)
  Martin Storsjö (1)
  Richard Purdie (1)
  Sergey Poznyakoff (1)
  Tim Schumacher (1)
  Vincent Lefevre (2)
  mintsuki (1)
  streaksu (1)

Ileana
 [on behalf of the libtool maintainers]
==================================================================

Here is the GNU libtool home page:
    https://gnu.org/s/libtool/

For a summary of changes and contributors, see:
  https://git.sv.gnu.org/gitweb/?p=libtool.git;a=shortlog;h=v2.5.4
or run this command from a git-cloned libtool directory:
  git shortlog v2.5.3..v2.5.4

Here are the compressed sources:
  https://ftpmirror.gnu.org/libtool/libtool-2.5.4.tar.gz   (2.0MB)
  https://ftpmirror.gnu.org/libtool/libtool-2.5.4.tar.xz   (1.1MB)

Here are the GPG detached signatures:
  https://ftpmirror.gnu.org/libtool/libtool-2.5.4.tar.gz.sig
  https://ftpmirror.gnu.org/libtool/libtool-2.5.4.tar.xz.sig

Use a mirror for higher download bandwidth:
  https://www.gnu.org/order/ftp.html

Here are the SHA1 and SHA256 checksums:

  77227188ead223ed8ba447301eda3761cb68ef57  libtool-2.5.4.tar.gz
  2o67LOTc9GuQCY2vliz/po9LT2LqYPeY0O8Skp7eat8=  libtool-2.5.4.tar.gz
  9781a113fe6af1b150571410b29d3eee2e792516  libtool-2.5.4.tar.xz
  +B9YYGZrC8fYS63e+mDRy5+m/OsjmMw7rKavqmAmZnU=  libtool-2.5.4.tar.xz

Verify the base64 SHA256 checksum with cksum -a sha256 --check
from coreutils-9.2 or OpenBSD's cksum since 2007.

Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify libtool-2.5.4.tar.gz.sig

The signature should match the fingerprint of the following key:

  pub   rsa4096 2021-09-23 [SC]
        FA26 CA78 4BE1 8892 7F22  B99F 6570 EA01 146F 7354
  uid   Ileana Dumitrescu <ileanadumi95@protonmail.com>
  uid   Ileana Dumitrescu <ileanadumitrescu95@gmail.com>

If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.

  gpg --locate-external-key ileanadumi95@protonmail.com

  gpg --recv-keys 6570EA01146F7354

  wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=libtool&download=1' | gpg --import -

As a last resort to find the key, you can try the official GNU
keyring:

  wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
  gpg --keyring gnu-keyring.gpg --verify libtool-2.5.4.tar.gz.sig

This release was bootstrapped with the following tools:
  Autoconf 2.72e
  Automake 1.17
  Gnulib v1.0-1108-gea58a72d4d

NEWS

  • Noteworthy changes in release 2.5.4 (2024-11-20) [stable]


** New features:

  - New libtool command line flag, --no-finish, to skip executing
    finish_cmds that would alter the shared library cache during testing.

  - New libtool command line flag, --reorder-cache=DIRS, to reorder the
    shared library cache, only on OpenBSD.

** Bug fixes:

  - Fix incorrect use of workarounds designed for Darwin versions that
    don't have -single_module support.

  - Fix errors when executing 'make distclean' and 'make maintainer-clean'.

  - Fix bug where the constructed rpath omit directories, instead of
    appending them to the end.

  - Fix configure error for when variable 'multlib' is unset.

  - Fix searching for -L in link paths being over-greedy and incorrectly
    handling paths with -L in them.

  - Avoid using AC_TRY_EVAL macro, "dangerous and undocumented".

  - Fix linking libraries at runtime with tcc by adding run path.

  - Fix path comparison by removing trailing slashes on install commands.

  - Fix linking for mingw with lld by prefering response files over the
    linker script.

  - Fix '-Fe' usage with linking in MSVC.

  - Fix '--no-warnings' flag.

  - Fix handling xlc(1)-specific options.

  - Fix Haiku support.

** Changes in supported systems or compilers:

  - Support additional flang-based compilers, 'f18' and 'f95'.

  - Support for 'netbsdelf*-gnu'.

  - Support for '*-mlibc', and subsequently Ironclad and Managarm.

  - Support for SerenityOS.

  - Support for wasm32-emscripten.

Enjoy!

Categories: FLOSS Project Planets

Give Your Input on the State of Open Source Survey

Open Source Initiative - Wed, 2024-11-20 14:05

As we announced back in September, the OSI has partnered again with OpenLogic by Perforce to produce a comprehensive report on global, industry-wide Open Source software adoption trends. The 2025 State of Open Source Report will be based on responses to a survey of those working with Open Source software in their organizations, from developers to CTOs and everyone in between. 

“This is our fourth year being involved in the State of Open Source Report, and there is never any shortage of surprises in the data,” says Stefano Maffulli, Executive Director, Open Source Initiative. “Now, however, the aim of the survey is not to determine whether or not organizations are using Open Source — we know they are — but to find out how they are handling complexities related to AI, licensing, and of course, security.”

This year, the survey includes new sections on Big Data, the impact of CentOS EOL, and security/compliance. As always, there are questions about technology usage in various categories such as infrastructure, cloud-native, frameworks, CI/CD, automation, and programming languages. Finally, a few questions toward the end look at Open Source maturity and stewardship, including sponsoring or being involved with open source foundations and organizations like OSI. 

Of course, any report like this is only as valuable as its data and the more robust and high-quality the dataset, the stronger the report will be. As stewards of the Open Source community, OSI members are encouraged to take the survey so that the 2025 State of Open Source Report accurately reflects the interests, concerns, and preferences of Open Source software users around the world. 

You can access the State of Open Source Survey here: https://www.research.net/r/SLQWZGF

Categories: FLOSS Research

Trey Hunner: Python Black Friday &amp; Cyber Monday sales (2024)

Planet Python - Wed, 2024-11-20 14:00

Ready for some Python skill-building sales?

This is my seventh annual compilation of Python learning deals.

I’m publishing this post extra early this year, so bookmark this page and set a calendar event for yourself to check back on Friday November 29.

Currently live sales

Here are Python-related sales that are live right now:

Anticipated sales

Here are sales that will be live soon:

Here are some sales I expect to see, but which haven’t been announced yet:

Even more sales

Also see Adam Johnson’s Django-related Deals for Black Friday 2024 for sales on Adam’s books, courses from the folks at Test Driven, Django templates, and various other Django-related deals.

And for non-Python/Django Python deals, see the Awesome Black Friday / Cyber Monday deals GitHub repository and the BlackFridayDeals.dev website.

If you know of another sale (or a likely sale) please comment below or email me.

Categories: FLOSS Project Planets

Security advisories: Drupal core - Moderately critical - Gadget chain - SA-CORE-2024-008

Planet Drupal - Wed, 2024-11-20 12:29
Project: Drupal coreDate: 2024-November-20Security risk: Moderately critical 14 ∕ 25 AC:Complex/A:Admin/CI:All/II:All/E:Theoretical/TD:UncommonVulnerability: Gadget chainAffected versions: >= 8.0.0 < 10.2.11 || >= 10.3.0 < 10.3.9Description: 

Drupal core contains a potential PHP Object Injection vulnerability that (if combined with another exploit) could lead to Remote Code Execution. It is not directly exploitable.

This issue is mitigated by the fact that in order for it to be exploitable, a separate vulnerability must be present to allow an attacker to pass unsafe input to unserialize(). There are no such known exploits in Drupal core.

To help protect against this potential vulnerability, some additional checks have been added to Drupal core's database code. If you use a third-party database driver, check the release notes for additional configuration steps that may be required in certain cases.

Solution: 

Install the latest version:

All versions of Drupal 10 prior to 10.2 are end-of-life and do not receive security coverage. (Drupal 8 and Drupal 9 have both reached end-of-life.)

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Security advisories: Drupal core - Moderately critical - Gadget chain - SA-CORE-2024-007

Planet Drupal - Wed, 2024-11-20 12:27
Project: Drupal coreDate: 2024-November-20Security risk: Moderately critical 14 ∕ 25 AC:Complex/A:Admin/CI:All/II:All/E:Theoretical/TD:UncommonVulnerability: Gadget chainAffected versions: >= 8.0.0 < 10.2.11 || >= 10.3.0 < 10.3.9 || >= 11.0.0 < 11.0.8Description: 

Drupal core contains a potential PHP Object Injection vulnerability that (if combined with another exploit) could lead to Remote Code Execution. It is not directly exploitable.

This issue is mitigated by the fact that in order for it to be exploitable, a separate vulnerability must be present to allow an attacker to pass unsafe input to unserialize(). There are no such known exploits in Drupal core.

To help protect against this potential vulnerability, types have been added to properties in some of Drupal core's classes. If an application extends those classes, the same types may need to be specified on the subclass to avoid a TypeError.

Solution: 

Install the latest version:

All versions of Drupal 10 prior to 10.2 are end-of-life and do not receive security coverage. (Drupal 8 and Drupal 9 have both reached end-of-life.)

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Security advisories: Drupal core - Less critical - Gadget chain - SA-CORE-2024-006

Planet Drupal - Wed, 2024-11-20 12:25
Project: Drupal coreDate: 2024-November-20Security risk: Less critical 8 ∕ 25 AC:Complex/A:User/CI:None/II:Some/E:Theoretical/TD:UncommonVulnerability: Gadget chainAffected versions: >= 8.0.0 < 10.2.11 || >= 10.3.0 < 10.3.9 || >= 11.0.0 < 11.0.8Description: 

Drupal core contains a potential PHP Object Injection vulnerability that (if combined with another exploit) could lead to Artbitrary File Deletion. It is not directly exploitable.

This issue is mitigated by the fact that in order to be exploitable, a separate vulnerability must be present that allows an attacker to pass unsafe input to unserialize(). There are no such known exploits in Drupal core.

To help protect against this vulnerability, types have been added to properties in some of Drupal core's classes. If an application extends those classes, the same types may need to be specified on the subclass to avoid a TypeError.

Solution: 

Install the latest version:

All versions of Drupal 10 prior to 10.2 are end-of-life and do not receive security coverage. (Drupal 8 and Drupal 9 have both reached end-of-life.)

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Security advisories: Drupal core - Critical - Cross Site Scripting - SA-CORE-2024-005

Planet Drupal - Wed, 2024-11-20 12:24
Project: Drupal coreDate: 2024-November-20Security risk: Critical 17 ∕ 25 AC:None/A:None/CI:Some/II:Some/E:Theoretical/TD:DefaultVulnerability: Cross Site ScriptingDescription: 

Drupal 7 core's Overlay module doesn't safely handle user input, leading to reflected cross-site scripting under certain circumstances.

Only sites with the Overlay module enabled are affected by this vulnerability.

Solution: 

Install the latest version:

  • If you are using Drupal 7, update to Drupal 7.102
  • Sites may also disable the Overlay module to avoid the issue.

Drupal 10 and Drupal 11 are not affected, as the Overlay module was removed from Drupal core in Drupal 8.

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Security advisories: Drupal core - Moderately critical - Access bypass - SA-CORE-2024-004

Planet Drupal - Wed, 2024-11-20 12:21
Project: Drupal coreDate: 2024-November-20Security risk: Moderately critical 10 ∕ 25 AC:Basic/A:User/CI:None/II:Some/E:Theoretical/TD:DefaultVulnerability: Access bypassAffected versions: >= 8.0.0 < 10.2.11 || >= 10.3.0 < 10.3.9 || >= 11.0.0 < 11.0.8Description: 

Drupal's uniqueness checking for certain user fields is inconsistent depending on the database engine and its collation.

As a result, a user may be able to register with the same email address as another user.

This may lead to data integrity issues.

Solution: 

Install the latest version:

All versions of Drupal 10 prior to 10.2 are end-of-life and do not receive security coverage. (Drupal 8 and Drupal 9 have both reached end-of-life.)

Updating Drupal will not solve potential issues with existing accounts affected by this bug. See Fixing emails that vary only by case for additional guidance.

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Security advisories: Drupal core - Moderately critical - Cross Site Scripting - SA-CORE-2024-003

Planet Drupal - Wed, 2024-11-20 12:20
Project: Drupal coreDate: 2024-November-20Security risk: Moderately critical 13 ∕ 25 AC:Basic/A:User/CI:Some/II:Some/E:Theoretical/TD:DefaultVulnerability: Cross Site ScriptingAffected versions: >= 8.8.0 < 10.2.11 || >= 10.3.0 < 10.3.9 || >= 11.0.0 < 11.0.8Description: 

Drupal uses JavaScript to render status messages in some cases and configurations. In certain situations, the status messages are not adequately sanitized.

Solution: 

Install the latest version:

All versions of Drupal 10 prior to 10.2 are end-of-life and do not receive security coverage. (Drupal 8 and Drupal 9 have both reached end-of-life.)

Reported By: Fixed By: Coordinated By: 
Categories: FLOSS Project Planets

Real Python: NumPy Practical Examples: Useful Techniques

Planet Python - Wed, 2024-11-20 09:00

The NumPy library is a Python library used for scientific computing. It provides you with a multidimensional array object for storing and analyzing data in a wide variety of ways. In this tutorial, you’ll see examples of some features NumPy provides that aren’t always highlighted in other tutorials. You’ll also get the chance to practice your new skills with various exercises.

In this tutorial, you’ll learn how to:

  • Create multidimensional arrays from data stored in files
  • Identify and remove duplicate data from a NumPy array
  • Use structured NumPy arrays to reconcile the differences between datasets
  • Analyze and chart specific parts of hierarchical data
  • Create vectorized versions of your own functions

If you’re new to NumPy, it’s a good idea to familiarize yourself with the basics of data science in Python before you start. Also, you’ll be using Matplotlib in this tutorial to create charts. While it’s not essential, getting acquainted with Matplotlib beforehand might be beneficial.

Get Your Code: Click here to download the free sample code that you’ll use to work through NumPy practical examples.

Take the Quiz: Test your knowledge with our interactive “NumPy Practical Examples: Useful Techniques” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

NumPy Practical Examples: Useful Techniques

This quiz will challenge your knowledge of working with NumPy arrays. You won't find all the answers in the tutorial, so you'll need to do some extra investigating. By finding all the answers, you're sure to learn some interesting things along the way.

Setting Up Your Working Environment

Before you can get started with this tutorial, you’ll need to do some initial setup. In addition to NumPy, you’ll need to install the Matplotlib library, which you’ll use to chart your data. You’ll also be using Python’s pathlib library to access your computer’s file system, but there’s no need to install pathlib because it’s part of Python’s standard library.

You might consider using a virtual environment to make sure your tutorial’s setup doesn’t interfere with anything in your existing Python environment.

Using a Jupyter Notebook within JupyterLab to run your code instead of a Python REPL is another useful option. It allows you to experiment and document your findings, as well as quickly view and edit files. The downloadable version of the code and exercise solutions are presented in Jupyter Notebook format.

The commands for setting things up on the common platforms are shown below:

Fire up a Windows PowerShell(Admin) or Terminal(Admin) prompt, depending on the version of Windows that you’re using. Now type in the following commands:

Windows PowerShell PS> python -m venv venv\ PS> venv\Scripts\activate (venv) PS> python -m pip install numpy matplotlib jupyterlab (venv) PS> jupyter lab Copied!

Here you create a virtual environment named venv\, which you then activate. If the activation is successful, then the virtual environment’s name will precede your Powershell prompt. Next, you install numpy and matplotlib into this virtual environment, followed by the optional jupyterlab. Finally, you start JupyterLab.

Note: When you activate your virtual environment, you may receive an error stating that your system can’t run the script. Modern versions of Windows don’t allow you to run scripts downloaded from the Internet as a security feature.

To fix this, you need to type the command Set-ExecutionPolicy RemoteSigned, then answer Y to the question. Your computer will now run scripts that Microsoft has verified. Once you’ve done this, the venv\Scripts\activate command should work.

Fire up a terminal and type in the following commands:

Shell $ python -m venv venv/ $ source venv/bin/activate (venv) $ python -m pip install numpy matplotlib jupyterlab (venv) $ jupyter lab Copied!

Here you create a virtual environment named venv/, which you then activate. If the activation is successful, then the virtual environment’s name will precede your command prompt. Next, you install numpy and matplotlib into this virtual environment, followed by the optional jupyterlab. Finally, you start JupyterLab.

You’ll notice that your prompt is preceded by (venv). This means that anything you do from this point forward will stay in this environment and remain separate from other Python work you have elsewhere.

Now that you have everything set up, it’s time to begin the main part of your learning journey.

NumPy Example 1: Creating Multidimensional Arrays From Files

When you create a NumPy array, you create a highly-optimized data structure. One of the reasons for this is that a NumPy array stores all of its elements in a contiguous area of memory. This memory management technique means that the data is stored in the same memory region, making access times fast. This is, of course, highly desirable, but an issue occurs when you need to expand your array.

Suppose you need to import multiple files into a multidimensional array. You could read them into separate arrays and then combine them using np.concatenate(). However, this would create a copy of your original array before expanding the copy with the additional data. The copying is necessary to ensure the updated array will still exist contiguously in memory since the original array may have had non-related content adjacent to it.

Constantly copying arrays each time you add new data from a file can make processing slow and is wasteful of your system’s memory. The problem becomes worse the more data you add to your array. Although this copying process is built into NumPy, you can minimize its effects with these two steps:

  1. When setting up your initial array, determine how large it needs to be before populating it. You may even consider over-estimating its size to support any future data additions. Once you know these sizes, you can create your array upfront.

  2. The second step is to populate it with the source data. This data will be slotted into your existing array without any need for it to be expanded.

Next, you’ll explore how to populate a three-dimensional NumPy array.

Populating Arrays With File Data

In this first example, you’ll use the data from three files to populate a three-dimensional array. The content of each file is shown below, and you’ll also find these files in the downloadable materials:

The first file has two rows and three columns with the following content:

CSV file1.csv 1.1, 1.2, 1.3 1.4, 1.5, 1.6 Copied! Read the full article at https://realpython.com/numpy-example/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Ian Jackson: The Rust Foundation's 2nd bad draft trademark policy

Planet Debian - Wed, 2024-11-20 07:50

tl;dr: The Rust Foundation’s new trademark policy still forbids unapproved modifications: this would forbid both the Rust Community’s own development work(!) and normal Free Software distribution practices.

Background

In April 2023 I wrote about the Rust Foundation’s ham-fisted and misguided attempts to update the Rust trademark policy. This turned into drama.

The new draft

Recently, the Foundation published a new draft. It’s considerably less bad, but the most serious problem, which I identified last year, remains.

It prevents redistribution of modified versions of Rust, without pre-approval from the Rust Foundation. (Subject to some limited exceptions.) The people who wrote this evidently haven’t realised that distributing modified versions is how free software development works. Ie, the draft Rust trademark policy even forbids making a github branch for an MR to contribute to Rust!

It’s also very likely unacceptable to Debian. Rust is still on track to repeat the Firefox/Iceweasel debacle.

Below is a copy of my formal response to the consultation. The consultation closes at 07:59:00 UTC tomorrow (21st November), ie, at the end of today (Wednesday) US Pacific time, so if you want to reply, do so quickly.

My consultation response

Hi. My name is Ian Jackson. I write as a Rust contributor and as a Debian Developer with first-hand experience of Debian’s approach to trademarks. (But I am not a member of the Debian Rust Packaging Team.)

Your form invites me to state any blocking concerns. I’m afraid I have one:

PROBLEM

The policy on distributing modified versions of Rust (page 4, 8th bullet) is far too restrictive.

PROBLEM - ASPECT 1

On its face the policy forbids making a clone of the Rust repositories on a git forge, and pushing a modified branch there. That is publicly distributing a modified version of Rust.

I.e., the current policy forbids the Rust’s community’s own development workflow!

PROBLEM - ASPECT 2

The policy also does not meet the needs of Software-Freedom-respecting downstreams, including community Linux distributions such as Debian.

There are two scenarios (fuzzy, and overlapping) which provide a convenient framing to discuss this:

Firstly, in practical terms, Debian may need to backport bugfixes, or sometimes other changes. Sometimes Debian will want to pre-apply bugfixes or changes that have been contributed by users, and are intended eventually to go upstream, but are not included upstream in official Rust yet. This is a routine activity for a distribution. The policy, however, forbids it.

Secondly, Debian, as a point of principle, requires the ability to diverge from upstream if and when Debian decides that this is the right choice for Debian’s users. The freedom to modify is a key principle of Free Software. This includes making changes that the upstream project disapproves of. Some examples of this, where Debian has made changes, that upstream do not approve of, have included things like: removing user-tracking code, or disabling obsolescence “timebombs” that stop a particular version working after a certain date.

Overall, while alignment in values between Debian and Rust seems to be very good right now, modifiability it is a matter of non-negotiable principle for Debian. The 8th bullet point on page 4 of the PDF does not give Debian (and Debian’s users) these freedoms.

POSSIBLE SOLUTIONS

Other formulations, or an additional permission, seem like they would be able to meet the needs of both Debian and Rust.

The first thing to recognise is that forbidding modified versions is probably not necessary to prevent language ecosystem fragmentation. Many other programming languages are distributed under fully Free Software licences without such restrictive trademark policies. (For example, Python; I’m sure a thorough survey would find many others.)

The scenario that would be most worrying for Rust would be “embrace - extend - extinguish”. In projects with a copyleft licence, this is not a concern, but Rust is permissively licenced. However, one way to address this would be to add an additional permission for modification that permits distribution of modified versions without permission, but if the modified source code is also provided, under the original Rust licence.

I suggest therefore adding the following 2nd sub-bullet point to the 8th bullet on page 4:

  • changes which are shared, in source code form, with all recipients of the modified software, and publicly licenced under the same licence as the official materials.

This means that downstreams who fear copyleft have the option of taking Rust’s permissive copyright licence at face value, but are limited in the modifications they may make, unless they rename. Conversely downstreams such as Debian who wish to operate as part of the Free Software ecosystem can freely make modifications.

It also, obviously, covers the Rust Community’s own development work.

NON-SOLUTIONS

Some upstreams, faced with this problem, have offered Debian a special permission: ie, said that it would be OK for Debian to make modifications that Debian wants to. But Debian will not accept any Debian-specific permissions.

Debian could of course rename their Rust compiler. Debian has chosen to rename in the past: infamously, a similar policy by Mozilla resulted in Debian distributing Firefox under the name Iceweasel for many years. This is a PR problem for everyone involved, and results in a good deal of technical inconvenience and makework.

“Debian could seek approval for changes, and the Rust Foundation would grant that approval quickly”. This is unworkable on a practical level - requests for permission do not fit into Debian’s workflow, and the resulting delays would be unacceptable. But, more fundamentally, Debian rightly insists that it must have the freedom to make changes that the Foundation do not approve of. (For example, if a future Rust shipped with telemetry features Debian objected to.)

“Debian and Rust could compromise”. However, Debian is an ideological as well as technological project. The principles I have set out are part of Debian’s Foundation Documents - they are core values for Debian. When Debian makes compromises, it does so very slowly and with great deliberation, using its slowest and most heavyweight constitutional governance processes. Debian is not likely to want to engage in such a process for the benefit of one programming language.

“Users will get Rust from upstream”. This is currently often the case. Right now, Rust is moving very quickly, and by Debian standards is very new. As Rust becomes more widely used, more stable, and more part of the infrastructure of the software world, it will need to become part of standard, stable, reliable, software distributions. That means Debian.

(The consultation was a Google Forms page with a single text field, so the formatting isn’t great. I have edited the formatting very lightly to avoid rendering bugs here on my blog.)



comments
Categories: FLOSS Project Planets

Real Python: Quiz: NumPy Practical Examples: Useful Techniques

Planet Python - Wed, 2024-11-20 07:00

In this quiz, you’ll test your understanding of the techniques covered in the tutorial NumPy Practical Examples: Useful Techniques.

By working through the questions, you’ll review your understanding of NumPy arrays and also expand on what you learned in the tutorial.

You’ll need to do some research outside of the tutorial to answer all the questions. Embrace this challenge and let it take you on a learning journey.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Russell Coker: Solving Spam and Phishing for Corporations

Planet Debian - Wed, 2024-11-20 00:22
Centralisation and Corporations

An advantage of a medium to large company is that it permits specialisation. For example I’m currently working in the IT department of a medium sized company and because we have standardised hardware (Dell Latitude and Precision laptops, Dell Precision Tower workstations, and Dell PowerEdge servers) and I am involved in fixing all Linux compatibility issues on that I can fix most problems in a small fraction of the time that I would take to fix on a random computer. There is scope for a lot of debate about the extent to which companies should standardise and centralise things. But for computer problems which can escalate quickly from minor to serious if not approached in the correct manner it’s clear that a good deal of centralisation is appropriate.

For people doing technical computer work such as programming there’s a large portion of the employees who are computer hobbyists who like to fiddle with computers. But if the support system is run well even they will appreciate having computers just work most of the time and for a large portion of the failures having someone immediately recognise the problem, like the issues with NVidia drivers that I have documented so that first line support can implement workarounds without the need for a lengthy investigation.

A big problem with email in the modern Internet is the prevalence of Phishing scams. The current corporate approach to this is to send out test Phishing email to people and then force computer security training on everyone who clicks on them. One problem with this is that attackers only need to fool one person on one occasion and when you have hundreds of people doing something on rare occasions that’s not part of their core work they will periodically get it wrong. When every test Phishing run finds several people who need extra training it seems obvious to me that this isn’t a solution that’s working well. I will concede that the majority of people who click on the test Phishing email would probably realise their mistake if asked to enter the password for the corporate email system, but I think it’s still clear that this isn’t a great solution.

Let’s imagine for the sake of discussion that everyone in a company was 100% accurate at identifying Phishing email and other scam email, if that was the case would the problem be solved? I believe that even in that hypothetical case it would not be a solved problem due to the wasted time and concentration. People can spend minutes determining if a single email is legitimate. On many occasions I have had relatives and clients forward me email because they are unsure if it’s valid, it’s great that they seek expert advice when they are unsure about things but it would be better if they didn’t have to go to that effort. What we ideally want to do is centralise the anti-Phishing and anti-spam work to a small group of people who are actually good at it and who can recognise patterns by seeing larger quantities of spam. When a spam or Phishing message is sent to 600 people in a company you don’t want 600 people to individually consider it, you want one person to recognise it and delete/block all 600. If 600 people each spend one minute considering the matter then that’s 10 work hours wasted!

The Rationale for Human Filtering

For personal email human filtering usually isn’t viable because people want privacy. But corporate email isn’t private, it’s expected that the company can read it under certain circumstances (in most jurisdictions) and having email open in public areas of the office where colleagues might see it is expected. You can visit gmail.com on your lunch break to read personal email but every company policy (and common sense) says to not have actually private correspondence on company systems.

The amount of time spent by reception staff in sorting out such email would be less than that taken by individuals. When someone sends a spam to everyone in the company instead of 500 people each spending a couple of minutes working out whether it’s legit you have one person who’s good at recognising spam (because it’s their job) who clicks on a “remove mail from this sender from all mailboxes” button and 500 messages are deleted and the sender is blocked.

Delaying email would be a concern. It’s standard practice for CEOs (and C*Os at larger companies) to have a PA receive their email and forward the ones that need their attention. So human vetting of email can work without unreasonable delays. If we had someone checking all email for the entire company probably email to the senior people would never get noticeably delayed and while people like me would get their mail delayed on occasion people doing technical work generally don’t have notifications turned on for email because it’s a distraction and a fast response isn’t needed. There are a few senders where fast response is required, which is mostly corporations sending a “click this link within 10 minutes to confirm your password change” email. Setting up rules for all such senders that are relevant to work wouldn’t be difficult to do.

How to Solve This

Spam and Phishing became serious problems over 20 years ago and we have had 20 years of evolution of email filtering which still hasn’t solved the problem. The vast majority of email addresses in use are run by major managed service providers and they haven’t managed to filter out spam/phishing mail effectively so I think we should assume that it’s not going to be solved by filtering. There is talk about what “AI” technology might do for filtering spam/phishing but that same technology can product better crafted hostile email to avoid filters.

An additional complication for corporate email filtering is that some criteria that are used to filter personal email don’t apply to corporate mail. If someone sends email to me personally about millions of dollars then it’s obviously not legit. If someone sends email to a company then it could be legit. Companies routinely have people emailing potential clients about how their products can save millions of dollars and make purchases over a million dollars. This is not a problem that’s impossible to solve, it’s just an extra difficulty that reduces the efficiency of filters.

It seems to me that the best solution to the problem involves having all mail filtered by a human. A company could configure their mail server to not accept direct external mail for any employee’s address. Then people could email files to colleagues etc without any restriction but spam and phishing wouldn’t be a problem. The issue is how to manage inbound mail. One possibility is to have addresses of the form it+russell.coker@example.com (for me as an employee in the IT department) and you would have a team of people who would read those mailboxes and forward mail to the right people if it seemed legit. Having addresses like it+russell.coker means that all mail to the IT department would be received into folders of the same account and they could be filtered by someone with suitable security level and not require any special configuration of the mail server. So the person who read the is mailbox would have a folder named russell.coker receiving mail addressed to me. The system could be configured to automate the processing of mail from known good addresses (and even domains), so they could just put in a rule saying that when Dell sends DMARC authenticated mail to is+$USER it gets immediately directed to $USER. This is the sort of thing that can be automated in the email client (mail filtering is becoming a common feature in MUAs).

For a FOSS implementation of such things the server side of it (including extracting account data from a directory to determine which department a user is in) would be about a day’s work and then an option would be to modify a webmail program to have extra functionality for approving senders and sending change requests to the server to automatically direct future mail from the same sender. As an aside I have previously worked on a project that had a modified version of the Horde webmail system to do this sort of thing for challenge-response email and adding certain automated messages to the allow-list.

The Change

One of the first things to do is configuring the system to add every recipient of an outbound message to the allow list for receiving a reply. Having a script go through the sent-mail folders of all accounts and adding the recipients to the allow lists would be easy and catch the common cases.

But even with processing the sent mail folders going from a working system without such things to a system like this will take some time for the initial work of adding addresses to the allow lists, particularly for domain wide additions of all the sites that send password confirmation messages. You would need rules to direct inbound mail to the old addresses to the new style and then address a huge amount of mail that needs to be categorised. If you have 600 employees and the average amount of time taken on the first day is 10 minutes per user then that’s 100 hours of work, 12 work days. If you had everyone from the IT department, reception, and executive assistants working on it that would be viable. After about a week there wouldn’t be much work involved in maintaining it. Then after that it would be a net win for the company.

The Benefits

If the average employee spends one minute a day dealing with spam and phishing email then with 600 employees that’s 10 hours of wasted time per day. Effectively wasting one employee’s work! I’m sure that’s the low end of the range, 5 minutes average per day doesn’t seem unreasonable especially when people are unsure about phishing email and send it to Slack so multiple employees spend time analysing it. So you could have 5 employees being wasted by hostile email and avoiding that would take a fraction of the time of a few people adding up to less than an hour of total work per day.

Then there’s the training time for phishing mail. Instead of having every employee spend half an hour doing email security training every few months (that’s 300 hours or 7.5 working weeks every time you do it) you just train the few experts.

In addition to saving time there are significant security benefits to having experts deal with possibly hostile email. Someone who deals with a lot of phishing email is much less likely to be tricked.

Will They Do It?

They probably won’t do it any time soon. I don’t think it’s expensive enough for companies yet. Maybe government agencies already have equivalent measures in place, but for regular corporations it’s probably regarded as too difficult to change anything and the costs aren’t obvious. I have been unsuccessful in suggesting that managers spend slightly more on computer hardware to save significant amounts of worker time for 30 years.

Related posts:

  1. blocking spam There are two critical things that any anti-spam system must...
  2. Please Turn off Your Spam Protection Hi, I’d like to send an email from a small...
  3. A New Spam Trick One item on my todo list is to set up...
Categories: FLOSS Project Planets

Julien Tayon: The advantages of HTML as a data model over basic declarative ORM approach

Planet Python - Tue, 2024-11-19 23:04
Very often, backend devs don't want to write code.

For this, we use one trick : derive HTML widget for presentation, database access, REST endpoints from ONE SOURCE of truth and we call it MODEL.

A tradition, and I insist it's a conservative tradition, is to use a declarative model where we mad the truth of the model from python classes.

By declaring a class we will implicitly declare it's SQL structure, the HTML input form for human readable interaction and the REST endpoint to access a graph of objects which are all mapped on the database.

Since the arrival of pydantic it makes all the more sense when it comes to empower a strongly type approach in python.

But is it the only one worthy ?

I speak here as a veteran of the trenchline which job is to read a list of entries of customer in an xls file from a project manager and change the faulty value based on the retro-engineering of an HTML formular into whatever the freak the right value is supposed to be.

In this case your job is in fact to short circuit the web framework to which you don't have access to change values directly into the database.

More often than never is these real life case you don't have access to the team who built the framework (to much bureaucracy to even get a question answered before the situation gets critical) ... So you look at the form.

And you guess the name of the table that is impacted by looking at the « network tab » in the developper GUI when you hit the submit button.

And you guess the name of the field impacted in the table to guess the name of the columns.

And then you use your only magical tool which is a write access to the database to reflect the expected object with an automapper and change values.

You could do it raw SQL I agree, but sometimes you need to do a web query in the middle to change the value because you have to ask a REST service what is the new ID of the client.

And you see the more this experience of having to tweak into real life frameworks that often surprise users for the sake of the limitation of the source of truth, the more I want the HTML to be the source of truth.

The most stoĂŻcian approach to full stack framework approach : to derive Everything from an HTML page.

The views, the controllers, the route, the model in such a true way that if you modify the HTML you modify in real time the database model, the routes, the displayed form.



What are the advantages of HTML as a declarative language ?

Here, one of the tradition is to prefere the human readable languages such as YAML and JSON, or machine readable as XML over HTML.

However, JSON and YAML are more limited in expressiveness of data structure than HTML (you can have a dict as a key in a dict in json ? Me I can.)

And on the other hand XML is quite a pain to read and write without mistakes.

HTML is just XML

HTML is a lax and lenient grammarless XML. No parsers will raise an exception because you wrote "<br>" instead of "<br/>" (or the opposite). You can add non existent attributes to tags and the parser will understand this easily without you having to redefine a full fledge grammar.

HTML is an XML YOU CAN SEE.

There are some tags that are related to a grammar of visual widget to which non computer people are familiar with.

If you use a FORM as a mapping to a database table, and all input inside has A column name you have already input drawn on your screen.



Modern « remote procedure call » are web based

Call it RPC, call it soap, call it REST, nowadays the web technologies trust 99% of how computer systems exchange data between each others.

You buy something on the internet, at the end you interact with a web formular or a web call. Hence, we can assert with strong convictions that 100% of web technologies can serve web pages. Thus, if you use your html as a model and present it, therefore you can deduce the data model from the form without needing a new pivoting language.

Proof of concept

For the convenience of « fun » we are gonna imagine a backend for « agile by micro blogging » (à la former twitter).

We are gonna assume the platform is structured micro blogging around where agile shines the most : not when things are done, but to move things on.

Things that are done will be called statements. Like : « software is delivered. Here is a factoid (a git url for instance) ». We will call this nodes in a graph and are they will be supposed to immutable states that can't be contested.

Each statement answers another statement's factoid like a delivery statement tends to follow a story point (at least should lead by the mean of a transition.

Hence in this application we will mirco-blog about the transition ... like on a social network with members of concerned group.
The idea of the application is to replace scrum meetings with micro blogging.

Are you blocked ? Do you need anything ? Can be answered on the mirco blogging platform, and every threads that are presented archived, used for machine learning (about what you want to hear as a good news) in a data form that is convenient for large language model.

As such we want to harvest a text long enough to express emotions, constricted to a laughingly small amount of characters so that finesse and ambiguity are tough to raise. That's the heart of the application : harvesting comments tagged with associated emotions to ease the work of tagging for Artificial Intelligence.

Hear me out, this is just a stupid idea of mine to illustrate a graph like structure described with HTML, not a real life idea. Me I just love to represent State Machine Diagram with everything that fall under my hands.

Here is the entity relationship diagram I have in mind :

Let's see what a table declaration might look like in HTML, let's say transition : <form action=/transition > <input type=number name=id /> <input type=number name=user_group_id nullable=false reference=user_group.id /> <textarea name=message rows=10 cols=50 nullable=false ></textarea> <input type=url name=factoid /> <select name="emotion_for_group_triggered" value=neutral > <option value="">please select a value</option> <option value=positive >Positive</option> <option value=neutral >Neutral</option> <option value=negative >Negative</option> </select> <input type=number name=expected_fun_for_group /> <input type=number name=previous_statement_id reference=statement.id nullable=false /> <input type=number name=next_statement_id reference=statement.id /> <unique_constraint col=next_statement_id,previous_statement_id name=unique_transition ></unique_constraint> <input type=checkbox name=is_exception /> </form> Through the use of additionnal tags of html and attributes we can convey a lot of informations usable for database construction/querying that are gonna be silent at the presentation (like unique_constraint). And with a little bit of javascript and CSS this html generate the following rendering (indicating the webservices endpoint as input type=submit :

Meaning that you can now serve a landing page that serve the purpose of human interaction, describing a « curl way » of automating interaction and a full model of your database.

Most startup think data model should be obfuscated to prevent being copied, most free software project thinks that sharing the non valuable assets helps adopt the technology.

And thanks to this, I can now create my own test suite that is using the HTML form to work on a doppleganger of the real database by parsing the HTML served by the application service (pdca.py) and launch a perfectly functioning service out of it: from requests import post from html.parser import HTMLParser import requests import os from dateutil import parser from passlib.hash import scrypt as crypto_hash # we can change the hash easily from urllib.parse import parse_qsl, urlparse # heaviweight from requests import get from sqlalchemy import * from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session DB=os.environ.get('DB','test.db') DB_DRIVER=os.environ.get('DB_DRIVER','sqlite') DSN=f"{DB_DRIVER}://{DB_DRIVER == 'sqlite' and not DB.startswith('/') and '/' or ''}{DB}" ENDPOINT="http://127.0.0.1:5000" os.chdir("..") os.system(f"rm {DB}") os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 2") url = lambda table : ENDPOINT + "/" + table os.system(f"curl {url('group')}?_action=search") form_to_db = transtype_input = lambda attrs : { k: ( # handling of input having date/time in the name "date" in k or "time" in k and v and type(k) == str ) and parser.parse(v) or # handling of boolean mapping which input begins with "is_" k.startswith("is_") and [False, True][v == "on"] or # password ? "password" in k and crypto_hash.hash(v) or v for k,v in attrs.items() if v and not k.startswith("_") } post(url("user"), params = dict(id=1, secret_password="toto", name="jul2", email="j@j.com", _action="create"), files=dict(pic_file=open("./assets/diag.png", "rb").read())).status_code #os.system(f"curl {ENDPOINT}/user?_action=search") #os.system(f"sqlite3 {DB} .dump") engine = create_engine(DSN) metadata = MetaData() transtype_true = lambda p : (p[0],[False,True][p[1]=="true"]) def dispatch(p): return dict( nullable=transtype_true, unique=transtype_true, default=lambda p:("server_default",eval(p[1])), ).get(p[0], lambda *a:None)(p) transtype_input = lambda attrs : dict(filter(lambda x :x, map(dispatch, attrs.items()))) class HTMLtoData(HTMLParser): def __init__(self): global engine, tables, metadata self.cols = [] self.table = "" self.tables= [] self.enum =[] self.engine= engine self.meta = metadata super().__init__() def handle_starttag(self, tag, attrs): global tables attrs = dict(attrs) simple_mapping = { "email" : UnicodeText, "url" : UnicodeText, "phone" : UnicodeText, "text" : UnicodeText, "checkbox" : Boolean, "date" : Date, "time" : Time, "datetime-local" : DateTime, "file" : Text, "password" : Text, "uuid" : Text, #UUID is postgres specific } if tag in {"select", "textarea"}: self.enum=[] self.current_col = attrs["name"] self.attrs= attrs if tag == "option": self.enum.append( attrs["value"] ) if tag == "unique_constraint": self.cols.append( UniqueConstraint(*attrs["col"].split(','), name=attrs["name"]) ) if tag in { "input" }: if attrs.get("name") == "id": self.cols.append( Column('id', Integer, **( dict(primary_key = True) | transtype_input(attrs )))) return try: if attrs.get("name").endswith("_id"): table=attrs.get("name").split("_") self.cols.append( Column(attrs["name"], Integer, ForeignKey(attrs["reference"])) ) return except Exception as e: log(e, ln=line()) if attrs.get("type") in simple_mapping.keys() or tag in {"select",}: self.cols.append( Column( attrs["name"], simple_mapping[attrs["type"]], **transtype_input(attrs) ) ) if attrs["type"] == "number": if attrs.get("step","") == "any": self.cols.append( Columns(attrs["name"], Float) ) else: self.cols.append( Column(attrs["name"], Integer) ) if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): global tables if tag == "select": # self.cols.append( Column(self.current_col,Enum(*[(k,k) for k in self.enum]), **transtype_input(self.attrs)) ) self.cols.append( Column(self.current_col, Text, **transtype_input(self.attrs)) ) if tag == "textarea": self.cols.append( Column( self.current_col, String(int(self.attrs["cols"])*int(self.attrs["rows"])), **transtype_input(self.attrs)) ) if tag=="form": self.tables.append( Table(self.table, self.meta, *self.cols), ) #tables[self.table] = self.tables[-1] self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() HTMLtoData().feed(get("http://127.0.0.1:5000/").text) os.system("pkill -f pdca.py") #metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() with Session(engine) as session: for table,values in tuple([ ("user", form_to_db(dict( name="him", email="j2@j.com", secret_password="toto"))), ("group", dict(id=1, name="trolol") ), ("group", dict(id=2, name="serious") ), ("user_group", dict(id=1,user_id=1, group_id=1, secret_token="secret")), ("user_group", dict(id=2,user_id=1, group_id=2, secret_token="")), ("user_group", dict(id=3,user_id=2, group_id=1, secret_token="")), ("statement", dict(id=1,user_group_id=1, message="usable agile workflow", category="story" )), ("statement", dict(id=2,user_group_id=1, message="How do we code?", category="story_item" )), ("statement", dict(id=3,user_group_id=1, message="which database?", category="question")), ("statement", dict(id=4,user_group_id=1, message="which web framework?", category="question")), ("statement", dict(id=5,user_group_id=1, message="preferably less", category="answer")), ("statement", dict(id=6,user_group_id=1, message="How do we test?", category="story_item" )), ("statement", dict(id=7,user_group_id=1, message="QA framework here", category="delivery" )), ("statement", dict(id=8,user_group_id=1, message="test plan", category="test" )), ("statement", dict(id=9,user_group_id=1, message="OK", category="finish" )), ("statement", dict(id=10, user_group_id=1, message="PoC delivered",category="delivery")), ("transition", dict( user_group_id=1, previous_statement_id=1, next_statement_id=2, message="something bugs me",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=4, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=3, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=6, message="change accepted",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=4, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=3, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=6, next_statement_id=7, message="R&D", )), ("transition", dict( user_group_id=1, previous_statement_id=7, next_statement_id=8, message="Q&A", )), ("transition", dict( user_group_id=1, previous_statement_id=8, next_statement_id=9, message="CI action", )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=10, message="situation unblocked", )), ("transition", dict( user_group_id=1, previous_statement_id=9, next_statement_id=10, message="situation unblocked", )), ]): session.add(getattr(Base.classes,table)(**values)) session.commit() os.system("python ./generate_state_diagram.py sqlite:///test.db > out.dot ;dot -Tpng out.dot > diag2.png; xdot out.dot") s = requests.session() os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 1") print(s.post(url("group"), params=dict(_action="delete", id=3,name=1)).status_code) print(s.post(url("grant"), params = dict(secret_password="toto", email="j@j.com",group_id=1, )).status_code) print(s.post(url("grant"), params = dict(_redirect="/group",secret_password="toto", email="j@j.com",group_id=2, )).status_code) print(s.cookies["Token"]) print(s.post(url("user_group"), params=dict(_action="search", user_id=1)).text) print(s.post(url("group"), params=dict(_action="create", id=3,name=2)).text) print(s.post(url("group"), params=dict(_action="delete", id=3)).status_code) print(s.post(url("group"), params=dict(_action="search", )).text) os.system("pkill -f pdca.py") Which give me a nice set of data to play with while I experiment on how to handle the business logic where the core of the value is.
Categories: FLOSS Project Planets

Julien Tayon: The advantages of HTML as a data model over basic declarative ORM approach

Planet Python - Tue, 2024-11-19 23:04
Very often, backend devs don't want to write code.

For this, we use one trick : derive HTML widget for presentation, database access, REST endpoints from ONE SOURCE of truth and we call it MODEL.

A tradition, and I insist it's a conservative tradition, is to use a declarative model where we mad the truth of the model from python classes.

By declaring a class we will implicitly declare it's SQL structure, the HTML input form for human readable interaction and the REST endpoint to access a graph of objects which are all mapped on the database.

Since the arrival of pydantic it makes all the more sense when it comes to empower a strongly type approach in python.

But is it the only one worthy ?

I speak here as a veteran of the trenchline which job is to read a list of entries of customer in an xls file from a project manager and change the faulty value based on the retro-engineering of an HTML formular into whatever the freak the right value is supposed to be.

In this case your job is in fact to short circuit the web framework to which you don't have access to change values directly into the database.

More often than never is these real life case you don't have access to the team who built the framework (to much bureaucracy to even get a question answered before the situation gets critical) ... So you look at the form.

And you guess the name of the table that is impacted by looking at the « network tab » in the developper GUI when you hit the submit button.

And you guess the name of the field impacted in the table to guess the name of the columns.

And then you use your only magical tool which is a write access to the database to reflect the expected object with an automapper and change values.

You could do it raw SQL I agree, but sometimes you need to do a web query in the middle to change the value because you have to ask a REST service what is the new ID of the client.

And you see the more this experience of having to tweak into real life frameworks that often surprise users for the sake of the limitation of the source of truth, the more I want the HTML to be the source of truth.

The most stoĂŻcian approach to full stack framework approach : to derive Everything from an HTML page.

The views, the controllers, the route, the model in such a true way that if you modify the HTML you modify in real time the database model, the routes, the displayed form.



What are the advantages of HTML as a declarative language ?

Here, one of the tradition is to prefere the human readable languages such as YAML and JSON, or machine readable as XML over HTML.

However, JSON and YAML are more limited in expressiveness of data structure than HTML (you can have a dict as a key in a dict in json ? Me I can.)

And on the other hand XML is quite a pain to read and write without mistakes.

HTML is just XML

HTML is a lax and lenient grammarless XML. No parsers will raise an exception because you wrote "<br>" instead of "<br/>" (or the opposite). You can add non existent attributes to tags and the parser will understand this easily without you having to redefine a full fledge grammar.

HTML is an XML YOU CAN SEE.

There are some tags that are related to a grammar of visual widget to which non computer people are familiar with.

If you use a FORM as a mapping to a database table, and all input inside has A column name you have already input drawn on your screen.



Modern « remote procedure call » are web based

Call it RPC, call it soap, call it REST, nowadays the web technologies trust 99% of how computer systems exchange data between each others.

You buy something on the internet, at the end you interact with a web formular or a web call. Hence, we can assert with strong convictions that 100% of web technologies can serve web pages. Thus, if you use your html as a model and present it, therefore you can deduce the data model from the form without needing a new pivoting language.

Proof of concept

For the convenience of « fun » we are gonna imagine a backend for « agile by micro blogging » (à la former twitter).

We are gonna assume the platform is structured micro blogging around where agile shines the most : not when things are done, but to move things on.

Things that are done will be called statements. Like : « software is delivered. Here is a factoid (a git url for instance) ». We will call this nodes in a graph and are they will be supposed to immutable states that can't be contested.

Each statement answers another statement's factoid like a delivery statement tends to follow a story point (at least should lead by the mean of a transition.

Hence in this application we will mirco-blog about the transition ... like on a social network with members of concerned group.
The idea of the application is to replace scrum meetings with micro blogging.

Are you blocked ? Do you need anything ? Can be answered on the mirco blogging platform, and every threads that are presented archived, used for machine learning (about what you want to hear as a good news) in a data form that is convenient for large language model.

As such we want to harvest a text long enough to express emotions, constricted to a laughingly small amount of characters so that finesse and ambiguity are tough to raise. That's the heart of the application : harvesting comments tagged with associated emotions to ease the work of tagging for Artificial Intelligence.

Hear me out, this is just a stupid idea of mine to illustrate a graph like structure described with HTML, not a real life idea. Me I just love to represent State Machine Diagram with everything that fall under my hands.

Here is the entity relationship diagram I have in mind :

Let's see what a table declaration might look like in HTML, let's say transition : <form action=/transition > <input type=number name=id /> <input type=number name=user_group_id nullable=false reference=user_group.id /> <textarea name=message rows=10 cols=50 nullable=false ></textarea> <input type=url name=factoid /> <select name="emotion_for_group_triggered" value=neutral > <option value="">please select a value</option> <option value=positive >Positive</option> <option value=neutral >Neutral</option> <option value=negative >Negative</option> </select> <input type=number name=expected_fun_for_group /> <input type=number name=previous_statement_id reference=statement.id nullable=false /> <input type=number name=next_statement_id reference=statement.id /> <unique_constraint col=next_statement_id,previous_statement_id name=unique_transition ></unique_constraint> <input type=checkbox name=is_exception /> </form> Through the use of additionnal tags of html and attributes we can convey a lot of informations usable for database construction/querying that are gonna be silent at the presentation (like unique_constraint). And with a little bit of javascript and CSS this html generate the following rendering (indicating the webservices endpoint as input type=submit :

Meaning that you can now serve a landing page that serve the purpose of human interaction, describing a « curl way » of automating interaction and a full model of your database.

Most startup think data model should be obfuscated to prevent being copied, most free software project thinks that sharing the non valuable assets helps adopt the technology.

And thanks to this, I can now create my own test suite that is using the HTML form to work on a doppleganger of the real database by parsing the HTML served by the application service (pdca.py) and launch a perfectly functioning service out of it: from requests import post from html.parser import HTMLParser import requests import os from dateutil import parser from passlib.hash import scrypt as crypto_hash # we can change the hash easily from urllib.parse import parse_qsl, urlparse # heaviweight from requests import get from sqlalchemy import * from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session DB=os.environ.get('DB','test.db') DB_DRIVER=os.environ.get('DB_DRIVER','sqlite') DSN=f"{DB_DRIVER}://{DB_DRIVER == 'sqlite' and not DB.startswith('/') and '/' or ''}{DB}" ENDPOINT="http://127.0.0.1:5000" os.chdir("..") os.system(f"rm {DB}") os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 2") url = lambda table : ENDPOINT + "/" + table os.system(f"curl {url('group')}?_action=search") form_to_db = transtype_input = lambda attrs : { k: ( # handling of input having date/time in the name "date" in k or "time" in k and v and type(k) == str ) and parser.parse(v) or # handling of boolean mapping which input begins with "is_" k.startswith("is_") and [False, True][v == "on"] or # password ? "password" in k and crypto_hash.hash(v) or v for k,v in attrs.items() if v and not k.startswith("_") } post(url("user"), params = dict(id=1, secret_password="toto", name="jul2", email="j@j.com", _action="create"), files=dict(pic_file=open("./assets/diag.png", "rb").read())).status_code #os.system(f"curl {ENDPOINT}/user?_action=search") #os.system(f"sqlite3 {DB} .dump") engine = create_engine(DSN) metadata = MetaData() transtype_true = lambda p : (p[0],[False,True][p[1]=="true"]) def dispatch(p): return dict( nullable=transtype_true, unique=transtype_true, default=lambda p:("server_default",eval(p[1])), ).get(p[0], lambda *a:None)(p) transtype_input = lambda attrs : dict(filter(lambda x :x, map(dispatch, attrs.items()))) class HTMLtoData(HTMLParser): def __init__(self): global engine, tables, metadata self.cols = [] self.table = "" self.tables= [] self.enum =[] self.engine= engine self.meta = metadata super().__init__() def handle_starttag(self, tag, attrs): global tables attrs = dict(attrs) simple_mapping = { "email" : UnicodeText, "url" : UnicodeText, "phone" : UnicodeText, "text" : UnicodeText, "checkbox" : Boolean, "date" : Date, "time" : Time, "datetime-local" : DateTime, "file" : Text, "password" : Text, "uuid" : Text, #UUID is postgres specific } if tag in {"select", "textarea"}: self.enum=[] self.current_col = attrs["name"] self.attrs= attrs if tag == "option": self.enum.append( attrs["value"] ) if tag == "unique_constraint": self.cols.append( UniqueConstraint(*attrs["col"].split(','), name=attrs["name"]) ) if tag in { "input" }: if attrs.get("name") == "id": self.cols.append( Column('id', Integer, **( dict(primary_key = True) | transtype_input(attrs )))) return try: if attrs.get("name").endswith("_id"): table=attrs.get("name").split("_") self.cols.append( Column(attrs["name"], Integer, ForeignKey(attrs["reference"])) ) return except Exception as e: log(e, ln=line()) if attrs.get("type") in simple_mapping.keys() or tag in {"select",}: self.cols.append( Column( attrs["name"], simple_mapping[attrs["type"]], **transtype_input(attrs) ) ) if attrs["type"] == "number": if attrs.get("step","") == "any": self.cols.append( Columns(attrs["name"], Float) ) else: self.cols.append( Column(attrs["name"], Integer) ) if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): global tables if tag == "select": # self.cols.append( Column(self.current_col,Enum(*[(k,k) for k in self.enum]), **transtype_input(self.attrs)) ) self.cols.append( Column(self.current_col, Text, **transtype_input(self.attrs)) ) if tag == "textarea": self.cols.append( Column( self.current_col, String(int(self.attrs["cols"])*int(self.attrs["rows"])), **transtype_input(self.attrs)) ) if tag=="form": self.tables.append( Table(self.table, self.meta, *self.cols), ) #tables[self.table] = self.tables[-1] self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() HTMLtoData().feed(get("http://127.0.0.1:5000/").text) os.system("pkill -f pdca.py") #metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() with Session(engine) as session: for table,values in tuple([ ("user", form_to_db(dict( name="him", email="j2@j.com", secret_password="toto"))), ("group", dict(id=1, name="trolol") ), ("group", dict(id=2, name="serious") ), ("user_group", dict(id=1,user_id=1, group_id=1, secret_token="secret")), ("user_group", dict(id=2,user_id=1, group_id=2, secret_token="")), ("user_group", dict(id=3,user_id=2, group_id=1, secret_token="")), ("statement", dict(id=1,user_group_id=1, message="usable agile workflow", category="story" )), ("statement", dict(id=2,user_group_id=1, message="How do we code?", category="story_item" )), ("statement", dict(id=3,user_group_id=1, message="which database?", category="question")), ("statement", dict(id=4,user_group_id=1, message="which web framework?", category="question")), ("statement", dict(id=5,user_group_id=1, message="preferably less", category="answer")), ("statement", dict(id=6,user_group_id=1, message="How do we test?", category="story_item" )), ("statement", dict(id=7,user_group_id=1, message="QA framework here", category="delivery" )), ("statement", dict(id=8,user_group_id=1, message="test plan", category="test" )), ("statement", dict(id=9,user_group_id=1, message="OK", category="finish" )), ("statement", dict(id=10, user_group_id=1, message="PoC delivered",category="delivery")), ("transition", dict( user_group_id=1, previous_statement_id=1, next_statement_id=2, message="something bugs me",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=4, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=3, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=6, message="change accepted",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=4, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=3, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=6, next_statement_id=7, message="R&D", )), ("transition", dict( user_group_id=1, previous_statement_id=7, next_statement_id=8, message="Q&A", )), ("transition", dict( user_group_id=1, previous_statement_id=8, next_statement_id=9, message="CI action", )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=10, message="situation unblocked", )), ("transition", dict( user_group_id=1, previous_statement_id=9, next_statement_id=10, message="situation unblocked", )), ]): session.add(getattr(Base.classes,table)(**values)) session.commit() os.system("python ./generate_state_diagram.py sqlite:///test.db > out.dot ;dot -Tpng out.dot > diag2.png; xdot out.dot") s = requests.session() os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 1") print(s.post(url("group"), params=dict(_action="delete", id=3,name=1)).status_code) print(s.post(url("grant"), params = dict(secret_password="toto", email="j@j.com",group_id=1, )).status_code) print(s.post(url("grant"), params = dict(_redirect="/group",secret_password="toto", email="j@j.com",group_id=2, )).status_code) print(s.cookies["Token"]) print(s.post(url("user_group"), params=dict(_action="search", user_id=1)).text) print(s.post(url("group"), params=dict(_action="create", id=3,name=2)).text) print(s.post(url("group"), params=dict(_action="delete", id=3)).status_code) print(s.post(url("group"), params=dict(_action="search", )).text) os.system("pkill -f pdca.py") Which give me a nice set of data to play with while I experiment on how to handle the business logic where the core of the value is.
Categories: FLOSS Project Planets

Pages