Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 21 hours 47 min ago

Real Python: Python Mappings: A Comprehensive Guide

Wed, 2024-06-12 10:00

One of the main data structures you learn about early in your Python learning journey is the dictionary. Dictionaries are the most common and well-known of Python’s mappings. However, there are other mappings in Python’s standard library and third-party modules. Mappings share common characteristics, and understanding these shared traits will help you use them more effectively.

In this tutorial, you’ll learn about:

  • Basic characteristics of a mapping
  • Operations that are common to most mappings
  • Abstract base classes Mapping and MutableMapping
  • User-defined mutable and immutable mappings and how to create them

This tutorial assumes that you’re familiar with Python’s built-in data types, especially dictionaries, and with the basics of object-oriented programming.

Get Your Code: Click here to download the free sample code that you’ll use to learn about mappings in Python.

Take the Quiz: Test your knowledge with our interactive “Python Mappings” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Mappings

In this quiz, you'll test your understanding of the basic characteristics and operations of Python mappings. By working through this quiz, you'll revisit the key concepts and techniques of creating a custom mapping.

Understanding the Main Characteristics of Python Mappings

A mapping is a collection that allows you to look up a key and retrieve its value. The keys in mappings can be objects of a broad range of types. However, in most mappings, there are object types that can’t be used as keys, as you’ll learn later in this tutorial.

The previous paragraph described mappings as collections. A collection is an iterable container that has a defined size. However, mappings also have additional features. You’ll explore each of these mapping characteristics with examples from Python’s main mapping types.

The feature that’s most characteristic of mappings is the ability to retrieve a value using a key. You can use a dictionary to demonstrate this operation:

Python >>> points = { ... "Denise": 3, ... "Igor": 2, ... "Sarah": 3, ... "Trevor": 1, ... } >>> points["Sarah"] 3 >>> points["Matt"] Traceback (most recent call last): ... KeyError: 'Matt' Copied!

The dictionary points contains four items, each with a key and a value. You can use the key within the square brackets to fetch the value associated with that key. However, if the key doesn’t exist in the dictionary, the code raises a KeyError.

You can use one of the mappings in the standard-library collections module to assign a default value for keys that aren’t present in the collection. The defaultdict type includes a callable that’s called each time you try to access a key that doesn’t exist. If you want the default value to be zero, you can use a lambda function that returns 0 as the first argument in defaultdict:

Python >>> from collections import defaultdict >>> points_default = defaultdict( ... lambda: 0, ... points, ... ) >>> points_default defaultdict(<function <lambda> at 0x104a95da0>, {'Denise': 3, 'Igor': 2, 'Sarah': 3, 'Trevor': 1}) >>> points_default["Sarah"] 3 >>> points_default["Matt"] 0 >>> points_default defaultdict(<function <lambda> at 0x103e6c700>, {'Denise': 3, 'Igor': 2, 'Sarah': 3, 'Trevor': 1, 'Matt': 0}) Copied!

The defaultdict constructor has two arguments in this example. The first argument is the callable that’s used when a default value is needed. The second argument is the dictionary you created earlier. You can use any valid argument when you call dict() as the second argument in defaultdict() or omit this argument to create an empty defaultdict.

When you access a key that’s missing from the dictionary, the key is added, and the default value is assigned to it. You can also create the same points_default object using the callable int as the first argument since calling int() with no arguments returns 0.

All mappings are also collections, which means they’re iterable containers with a defined length. You can explore these characteristics with another mapping in Python’s standard library, collections.Counter:

Python >>> from collections import Counter >>> letters = Counter("learning python") >>> letters Counter({'n': 3, 'l': 1, 'e': 1, 'a': 1, 'r': 1, 'i': 1, 'g': 1, ' ': 1, 'p': 1, 'y': 1, 't': 1, 'h': 1, 'o': 1}) Copied!

The letters in the string "learning python" are converted into keys in Counter, and the number of occurrences of each letter is used as the value corresponding to each key.

You can confirm that this mapping is iterable, has a defined length, and is a container:

Python >>> for letter in letters: ... print(letter) ... l e a r n i g p y t h o >>> len(letters) 13 >>> "n" in letters True >>> "x" in letters False Copied!

You can use the Counter object letters in a for loop, which confirms it’s iterable. All mappings are iterable. However, the iteration loops through the keys and not the values. You’ll see how to iterate through the values or through both keys and values later in this tutorial.

The built-in len() function returns the number of items in the mapping. This is equal to the number of unique characters in the original string, including the space character. The object is sized since len() returns a value.

You can use the in keyword to confirm which elements are in the mapping. This check alone isn’t sufficient to confirm that the mapping is a container. However, you can also access the object’s .__contains__() special method directly:

Python >>> letters.__contains__("n") True Copied! Read the full article at https://realpython.com/python-mappings/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Python Mappings

Wed, 2024-06-12 08:00

In this quiz, you’ll test your understanding of the basic characteristics and operations of Python mappings. By working through this quiz, you’ll revisit the key concepts and techniques of creating a custom mapping.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Kay Hayen: Nuitka Release 2.3

Tue, 2024-06-11 18:00

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, “download now”.

This release bumps the long-awaited 3.12 support to a complete level. Now, Nuitka behaves identically to CPython 3.12 for the most part.

In terms of bug fixes, it’s also huge. Especially for Unicode paths and software with Unicode extension module names and Unicode program names, and even non-UTF8 code names, there have been massive amounts of improvements.

Table of Contents

Bug Fixes
  • Standalone: Added support for python-magic-bin package. Fixed in 2.2.1 already.

  • Fix: The cache directory creation could fail when multiple compilations started simultaneously. Fixed in 2.2.1 already.

  • macOS: For arm64 builds, DLLs can also have an architecture dependent suffix; check that as well. Makes the soundfile dependency scan work. Fixed in 2.2.1 already.

  • Fix: Modules where lazy loaders handling adds hard imports when a module is first processed did not affect the current module, potentially causing it not to resolve hidden imports. Fixed in 2.2.1 already.

  • macOS: The use of libomp in numba needs to cause the extension module not to be included and not to look elsewhere. Fixed in 2.2.1 already.

  • Python3.6+: Fix, added support for keyword arguments of ModuleNotFoundError. Fixed in 2.2.1 already.

  • macOS: Detect more versioned DLLs and arm64 specific filenames. Fixed in 2.2.1 already.

  • Fix, was not annotating exception exit when converting an import to a hard submodule import. Fixed in 2.2.2 already.

  • Fix, branches that became empty can still have traces that need to be merged.

    Otherwise, usages outside the branch will not see propagated assignment statements. As a result, these falsely became unassigned instead. Fixed in 2.2.2 already.

  • Windows: Fix, uninstalled self-compiled Python didn’t have proper installation prefix added for DLL scan, resulting in runtime DLLs not picked up from there. Fixed in 2.2.2 already.

  • Standalone: Added support for newer PySide6 version 6.7. It needed correction on macOS and has a new data file type. Fixed in 2.2.3 already.

  • Standalone: Complete support for pyocd package. Fixed in 2.2.3 already.

  • Module: Fix, the created .pyi files were incomplete.

    The list of imported modules created in the finalization step was incomplete, we now go over the actual done modules and mark all non-included modules as dependencies.

  • Scons: Fix, need to avoid using Unicode paths towards the linker on Windows. Instead, use a temporary output filename and rename it to the actual filename after Scons has completed.

  • Windows: Avoid passing Unicode paths to the dependency walker on Windows, as it cannot handle those. Also, the temporary filenames in the build folder must be in short paths, as it cannot handle them in case that is a Unicode path.

  • Scons: For ccache on Windows, the log filename must be a short path too, if the build folder is a Unicode path.

  • Windows: Make sure the Scons build executes inside a short path as well, so that a potential Unicode path is visible to the C compiler when resolving the current directory.

  • Windows: The encoding of Unicode paths for accelerated mode values of __file__ was not making sure that hex sequences were correctly terminated, so in some cases, it produced ambiguous C literals.

  • Windows: Execute binaries created with --windows-uac-admin with and --run options with proper UAC prompt.

  • Fix, need to allow for non-UTF8 Unicode in variable names, function names, class names, and method names.

  • Python3.10+: Fix, match statements that captured the rest of mapping checks were not working yet.

    match value: case {"key1": 5, **rest}: ... # rest was not assigned here
  • Windows: When deleting build folders, make sure the retries leading to a complete deletion always.

  • Python2: Fix, could crash with non-unicode program paths on Windows.

  • Avoid giving SyntaxWarning from reading source code

    For example, the standard site module of Python 3.12 gives warnings about illegal escape sequences that nobody cares about apparently.

  • Fix, the matplotlib warnings by options-nanny were still given even if the no-qt plugin was used, since the variable name referenced there was not actually set yet by that plugin.

  • Windows: Fix, when using the uninstalled self-compiled Python, we need python.exe to find DLL dependencies. Otherwise it doesn’t locate the MSVC runtime and Python DLL properly.

  • Standalone: Added support for freetype package.

New Features
  • Support for Python 3.12 is finally there. We focused on scalability first and because we did things the correct way immediately, rather than rushing to get it working and improving only later.

    As a result, the correctness and performance of Nuitka with previous Python releases are improved as well.

    Some things got delayed, though. We need to do more work to take advantage of other core changes. Concerning exceptions normalized at creation time, the created module code doesn’t yet take advantage. Also, more efficient two-digit long handling is possible with Python 3.12, but not implemented. It will take more time before we have these changes completed.

  • Experimental support for Python 3.13 beta 1 is also there, and potentially surprising, but we will try and follow its release cycle closely and aim to support it at the time of release.

    Nuitka has followed all of its core changes so far, and basic tests are passing; the accelerated, module, standalone, and onefile modes all work as expected. The only thing delayed is the uncompiled generator integration, where we need to replicate the exact CPython behavior. We need to have perfect integration only for working with the asyncio loop, so we wait with it until release candidates appear.

  • Plugins: Added support to include directories entirely unchanged by adding raw_dir values for data-files section, see Nuitka Package Configuration.

  • UI: The new command line option --include-raw-dir was added to allow including directories entirely unchanged.

  • Module: Added support for creating modules with Unicode names. Needs a different DLL entry function name and to make use of two-phase initialization for the created extension module.

  • Added support for OpenBSD standalone mode.

Optimization
  • Python3: Avoid API calls for allocators

    Most effective with Python 3.11 or higher but also many other types like bytes, dict keys, float, and list objects are faster to create with all Python3 versions.

  • Python3.5+: Directly use the Python allocator functions for object creation, avoiding the DLL API calls. The coverage is complete with Python3.11 or higher, but many object types like float, dict, list, bytes benefit even before that version.

  • Python3: Faster creation of StopIteration objects.

    With Python 3.12, the object is created directly and set as the current exception without normalization checks.

    We also added a new specialized function to create the exception object and populate it directly, avoiding the overhead of calling of the StopIteration type.

  • Python3.10+: When accessing freelists, we were not passing for tstate but locally getting the interpreter object, which can be slower by a few percent in some configurations. We now use the free lists more efficient with tuple, list, and dict objects.

  • Python3.8+: Call uncompiled functions via vector calls.

    We avoid an API call that ends up being slower than using the same function via the vector call directly.

  • Python3.4+: Avoid using _PyObject_LengthHint API calls in list.extend and have our variant that is faster to call.

  • Added specialization for os.path.normpath. We might benefit from compile time analysis of it once we want to detect file accesses.

  • Avoid using module constants accessor for global constant values

    For example, with (), we used the module-level accessor for no reason, as it is already available as a global value. As a result, constant blobs shrink, and the compiled code becomes slightly smaller , too.

  • Anti-Bloat: Avoid using dask from the sparse module. Added in 2.2.2 already.

Organizational
  • UI: Major change in console handling.

    Compiled programs on Windows now have a third mode, besides console or not. You can now create GUI applications that attach to an available console and output there.

    The new option --console controls this and allows to enforce console with the force value and disable using it with the disable value, the attach value activates the new behavior.

    Note

    Redirection of outputs to a file in attach mode only works if it is launched correctly, for example, interactively in a shell, but some forms of invocation will not work; prominently, subprocess.call without inheritable outputs will still output to a terminal.

    On macOS, the distinction doesn’t exist anymore; technically it wasn’t valid for a while already; you need to use bundles for non-console applications, though, by default otherwise a console is forced by macOS itself.

  • Detect patchelf usage in buggy version 0.18.0 and ask the user to upgrade or downgrade it, as this specific version is known to be broken.

  • UI: Make clear that the --nofollow-import-to option accepts patters.

  • UI: Added warning for module mode and usage of the options to force outputs as they don’t have any effect.

  • UI: Check the success of Scons in creating the expected binary immediately after running it and not only once we reach post-processing.

  • UI: Detect empty user package configuration files

  • UI: Do not output module ast when a plugin reports an error for the module, for example, a forbidden import.

  • Actions: Update from deprecated action versions to the latest versions.

Tests
  • Use Nuitka Project Options for the user plugin test rather than passing by environment variables to the test runner.

  • Added a new search mode, skip, `` to complement ``resume which resumes right

    after the last test resume stopped on. We can use that while support for a Python version is not complete.

Cleanups
  • Solved a TODO about using unified code for setting the StopIteration, coroutines, generators, and asyncgen used to be different.

  • Unified how the binary result filename is passed to Scons for modules and executables to use the same result_exe key.

Summary

This release marks a huge step in catching up with compatibility of Python. After being late with 3.12 support, we will now be early with 3.13 support if all goes well.

The many Unicode support related changes also enhanced Nuitka to generate 2 phase loading extension modules, which also will be needed for sub-interpreter support later on.

From here on, we need to re-visit compatibility. A few more obscured 3.10 features are missing, the 3.11 compatibility is not yet complete, and we need to take advantage of the new caching possibilities to enhance performance for example with attribute lookups to where it can be with the core changes there.

For the coming releases until 3.13 is released, we hope to focus on scalability a lot more and get a much needed big improvement there, and complete these other tasks on the side.

Categories: FLOSS Project Planets

Brett Cannon: Saying thanks to open source maintainers

Tue, 2024-06-11 17:29

After signing up for GitHub Sponsors, I had a nagging feeling that somehow asking for money from other people to support my open source work was inappropriate. But after much reflection, I realized that phrasing the use of GitHub Sponsors as a way to express patronage/support and appreciation for my work instead of sponsorship stopped me feeling bad about it. It also led me to reflect on to what degree people can express thanks to open source maintainers.

⚠️This blog post is entirely from my personal perspective and thus will not necessarily apply to every open source developer out there.Be nice

The absolutely easiest way to show thanks is to simply not be mean. It sounds simple, but plenty of people fail at even this basic level of civility. This isn&apost to say you can&apost say that a project didn&apost work for you or you disagree with something, but there&aposs a massive difference between saying "I tried the project and it didn&apost meet my needs" and "this project is trash".

People failing to support this basic level of civility is what leads to burnout.

Be an advocate

It&aposs rather indirect, but saying nice things about a project is a way of showing thanks. As an example, I have seen various people talk positively about pyproject.toml online, but not directly at me. That still feels nice due to how much effort I put into helping make that file exist and creating the [project] table.

Or put another way, you never know who is reading your public communications.

Produce your own open source

Another indirect way to show thanks is by sharing your own open source code. By maintaining your own code, you&aposll increase the likelihood I myself will become a user of your project. That then becomes a circuitous cycle of open source support between us.

Say thanks

Directly saying "thank you" actually goes a really long way. It takes a lot of positive interactions to counteract a single negative interaction. You might be surprised how much it might brighten someone&aposs day when someone takes the time and effort to reach out and say "thank you", whether that&aposs by DM, email, in-person at a conference, etc.

Fiscal support

As I said in the opening of this post, I set up GitHub Sponsors for myself as a way for people to show fiscal support for my open source work if that&aposs how they prefer to express their thanks (including businesses). Now I&aposm purposefully not saying "sponsor" as to me that implies that giving money leads to some benefit (e.g. getting a shout-out somewhere) which is totally reasonable for people to do. But for me, since every commit is a gift, I&aposm financially secure, and I&aposm not trying to make a living from my volunteer open source work or put in the effort to make sponsorship worth it, I have chosen to treat fiscal support as a way of showing reciprocity for the gift of sharing my code that you&aposve already received. This means I fully support all open source maintainers setting up fiscal support at a minimum, and if they want to put in the effort to go the sponsorship route then they definitely should.

Producing open source also isn&apost financially free. For instance, I pay for:

  1. The hosting of this blog via Ghost(Pro)
  2. Obsidian Sync to keep my open source notes available on all my devices so when I have an idea I can write it down
  3. Obsidian Publish to share my open source notes
  4. Computer upgrades (including ergonomic upgrades like keyboards)
  5. My personal time away from my wife and child, family and friends (which my open source journal exists to try and point out for those who don&apost realize how much time I put into my volunteer work)

So while open source is "free" for you as the consumer, the producer very likely has concrete financial costs in producing that open source on top of the intangible costs like volunteering their personal time.

But as I listed earlier, there are plenty of other ways to show thanks without having to spend money that can be equally valuable to a maintainer.

I also specifically didn&apost mention contributing. I have said before that contributions are like giving someone a puppy: it seems like a lovely gift at the time, but the recipient is now being "gifted" daily walks involving scooping &#x1F4A9; and vet bills. As such, contributions from others can be a blessing and a curse all at the same time depending on the contribution itself, the attitude of the person making the contribution, etc. So I wouldn&apost always assume my contribution is as welcomed and desired as much as a "thank you" note.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #633 (June 11, 2024)

Tue, 2024-06-11 15:30

#633 – JUNE 11, 2024
View in Browser »

String Interpolation in Python: Exploring Available Tools

In this tutorial, you’ll learn about the different tools that Python provides for performing string interpolation. String interpolation allows you to create new strings by inserting different objects into a string template.
REAL PYTHON

Notebooks for Fundamentals of Music Processing

This is a collection of Python Notebooks for teaching and learning the fundamentals of music processing. Examples include illustrations, sound samples, math, and more.
INTERNATIONAL AUDIO LABS

Upgrade Python Versions Without the Pain

Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →
ACTIVESTATE sponsor

Python’s Many Command-Line Utilities

This article describes every command-line tool included with Python, each of which can be run with python -m module_name.
TREY HUNNER

String Interpolation in Python (Quiz)

Take this quiz to test your understanding of the available tools for string interpolation in Python, as well as their strengths and weaknesses. These tools include f-strings, the .format() method, and the modulo operator.
REAL PYTHON

Python 3.12.4 Released

See the full list of changes in this release
CPYTHON DEV BLOG

PEP 712 Rejected

This Python Enhancement Proposal “Adding a ‘converter’ parameter to dataclasses.field” was determined to have an insufficient number of use cases.
PYTHON

Python 3.13.0 Beta 2 Released

CPYTHON DEV BLOG

Articles & Tutorials What Are CRUD Operations?

CRUD operations are the cornerstone of application functionality. Whether you access a database or interact with a REST API, you usually want to create, retrieve, update, and delete data. In this tutorial, you’ll explore how CRUD operations work in practice.
REAL PYTHON

What We Talk About When We Talk About System Design

Mahesh talks about the rules he has encountered when doing research on designing large systems. Guidelines include late-binding on the design, focusing on the problem rather than existing systems, talking about other applications, and more.
MAHESH BALAKRISHNAN

Get Your Own AI Agent to Answer Questions From Your Database

Introducing “Database Mind” - a ready-to-use AI system designed for easy integration into your projects. As part of the “Minds Endpoints” AI platform, it offers a simple plug-and-play API service, enabling developers to effortlessly incorporate advanced AI capabilities into their solutions →
MINDSDB sponsor

Statically Typed Functional Programming With Python 3.12

This detailed article looks at how to use the match statement along with Python’s typing mechanism to write functional programs similar in style to Kotlin.
OSKAR WICKSTROM

How to Annotate a Graph With Matplotlib and Python

The Matplotlib package is great for visualizing data. One of its many features is the ability to annotate points on your graph. This article shows you how.
MIKE DRISCOLL

bytes: The Lesser-Known Python Built-in Sequence

The bytes data type looks a bit like a string, but it isn’t a string. This article explores it and also looks at the main Unicode encoding, UTF-8
STEPHEN GRUPPETTA

Reflecting on One Year of Being an Engineering Manager

“Being a manager is a focus change from code to people, from output to outcomes and from being productive to making most of everyone’s time.” Read more of Victor’s reflecting on his first year as a manager.
VICTOR STOJANOV

Testing With Python: Fake It

This article is on using mock in your Python testing and is part of a larger series on testing in general.
BITECODE

Projects & Code Mesop: Build Web Apps in Python

GITHUB.COM/GOOGLE

WeasyPrint: The Awesome Document Factory

GITHUB.COM/KOZEA

django-axes: Track of Failed Login Attempts in Django

GITHUB.COM/JAZZBAND

Zango: Microservices in Django

GITHUB.COM/HEALTHLANE-TECHNOLOGIES

gloe: Library for Flow-Oriented Code

GITHUB.COM/IDEOS

Events Weekly Real Python Office Hours Q&A (Virtual)

June 12, 2024
REALPYTHON.COM

Wagtail Space NL

June 12 to June 15, 2024
WAGTAIL.SPACE

Django Girls Abraka Workshop 2024

June 13 to June 15, 2024
DJANGOGIRLS.ORG

Python Atlanta

June 13 to June 14, 2024
MEETUP.COM

PyData London 2024

June 14 to June 17, 2024
PYDATA.ORG

PyCamp Leipzig 2024

June 15 to June 17, 2024
BARCAMPS.EU

Wagtail Space US

June 20 to June 23, 2024
WAGTAIL.SPACE

Happy Pythoning!
This was PyCoder’s Weekly Issue #633.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Real Python: Listing All Files in a Directory With Python

Tue, 2024-06-11 10:00

Getting a list of all the files and folders in a directory is a natural first step for many file-related operations in Python. When looking into it, though, you may be surprised to find various ways to go about it.

When you’re faced with many ways of doing something, it can be a good indication that there’s no one-size-fits-all solution to your problems. Most likely, every solution will have its own advantages and trade-offs. This is the case when it comes to getting a list of the contents of a directory in Python.

In this video course, you’ll be focusing on the most general-purpose techniques in the pathlib module to list items in a directory, but you’ll also learn a bit about some alternative tools.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Bytes: #387 Heralding in a new era of database queries

Tue, 2024-06-11 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/Dataherald/dataherald">Dataherald</a></li> <li><a href="https://www.pythonmorsels.com/cli-tools"><strong>Python's many command-line utilities</strong></a></li> <li><a href="https://github.com/wolfi-dev">Distroless Python</a></li> <li><a href="https://docs.python.org/3/library/functools.html"><strong>functools.cache</strong></a>, <a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a><strong>, and</strong> <a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=ETZ3CvfbF_o' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="387">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout"><strong>pythonbytes.fm/scout</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="https://github.com/Dataherald/dataherald">Dataherald</a></p> <ul> <li>Interact with your SQL database, Natural Language to SQL using LLMs.</li> <li>Allows you to set up an API from your database that can answer questions in plain English</li> <li>Uses include <ul> <li>Allow business users to get insights from the data warehouse without going through a data analyst</li> <li>Enable Q+A from your production DBs inside your SaaS application</li> <li>Create a ChatGPT plug-in from your proprietary data</li> </ul></li> </ul> <p><strong>Brian #2:</strong> <a href="https://www.pythonmorsels.com/cli-tools"><strong>Python's many command-line utilities</strong></a></p> <ul> <li>Trey Hunner</li> <li>Too many to list, but here’s some fun ones <ul> <li>json.tool - nicely format json data</li> <li>calendar - print the calendar <ul> <li>current by default, but you can pass in year and month</li> </ul></li> <li>gzip, ftplib, tarfile, and other unixy things <ul> <li>handy on Windows</li> </ul></li> <li>cProfile &amp; pstats</li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="https://github.com/wolfi-dev">Distroless Python</a></p> <ul> <li>via Patrick Smyth</li> <li>What is <a href="https://www.chainguard.dev/unchained/minimal-container-images-towards-a-more-secure-future">distroless</a> anyway? <ul> <li>These are container images without package managers or shells included.</li> <li>Debugging these images presents some wrinkles (can't just exec into a shell inside the image), but they're a lot more secure.</li> </ul></li> <li>Chainguard, creates low/no CVE distroless images based on our FOSS distroless OS, <a href="https://github.com/wolfi-dev">Wolfi</a>.</li> <li>Some Python use-cases: <pre><code>docker run -it cgr.dev/chainguard/python:latest # The entrypoint is a Python REPL, since no b/a/sh is included docker run -it cgr.dev/chainguard/python:latest-dev # This is their dev version and has pip, bash, apk, etc. </code></pre></li> </ul> <p><strong>Brian #4:</strong> <a href="https://docs.python.org/3/library/functools.html"><strong>functools.cache</strong></a>, <a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a><strong>, and</strong> <a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a></p> <ul> <li><a href="https://docs.python.org/3/library/functools.html"><strong>functools</strong></a> cache and lru_cache - built in </li> <li><a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a> - “This module provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function decorator.”</li> <li><a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a> - “The fastest caching Python library written in Rust”</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://pythoninsider.blogspot.com/2024/06/python-3124-released.html">Python 3.12.4 is out</a></li> <li><a href="https://devblogs.microsoft.com/python/python-in-visual-studio-code-june-2024-release/">VSCode has some pytest improvements</a></li> </ul> <p>Michael:</p> <ul> <li>Time for a <a href="https://www.macrumors.com/2024/06/06/alternatives-bartender-mac-menu-bar/">bartender alternative</a>, I’ve switched to <a href="https://icemenubar.app">Ice</a>.</li> <li><a href="https://www.rocket.chat">Rocket.chat</a> as an alternative to Slack</li> </ul> <p><strong>Joke:</strong> <a href="https://dev.to/alvaromontoro/css-cartoons-29bp">CSS Cartoons</a></p>
Categories: FLOSS Project Planets

Real Python: Python News: What's New From May 2024

Mon, 2024-06-10 10:00

May was packed with exciting updates and events in the Python community. This month saw the release of the first beta version of Python 3.13, the conclusion of PyCon US 2024, and the announcement of the keynote speakers for EuroPython 2024. Additionally, PEP 649 has been delayed until the Python 3.14 release, and the Python Software Foundation published its 2023 Annual Impact Report.

Get ready to explore the recent highlights!

Join Now: Click here to join the Real Python Newsletter and you'll never miss another Python tutorial, course update, or post.

The First Beta Version of Python 3.13 Released

After nearly a year of continuous development, the first beta release of Python 3.13 was made available to the general public. It marks a significant milestone in Python’s annual release cycle, officially kicking off the beta testing phase and introducing a freeze on new features. Beyond this point, Python’s core developers will shift their focus to only identifying and fixing bugs, enhancing security, and improving the interpreter’s performance.

While it’s still months before the final release planned for October 2024, as indicated by the Python 3.13 release schedule, third-party library maintainers are strongly encouraged to test their packages with this new Python version. The goal of early beta testing is to ensure compatibility and address any issues that may arise so that users can expect a smoother transition when Python 3.13 gets officially released later this year.

Although you shouldn’t use a beta version in any of your projects, especially in production environments, you can go ahead and try out the new version today. To check out Python’s latest features, you must install Python 3.13.0b1 using one of several approaches.

Note: If you’d like to share your feedback or file a bug against a pre-release development version, then open an issue on Python’s GitHub repository.

The quickest and arguably the most straightforward way to manage multiple Python versions alongside your system-wide interpreter is to use a tool like pyenv in the terminal:

Shell $ pyenv install 3.13.0b1 $ pyenv shell 3.13.0b1 $ python Python 3.13.0b1 (main, May 15 2024, 10:41:55) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> Copied!

The highlighted line brings the first beta release of Python 3.13 onto your computer, while the following command temporarily sets the path to the python executable in your current shell session. As a result, the python command points to the specified Python interpreter.

Alternatively, you can use a Python installer, which you’ll find at the bottom of the downloads page, or run Python in an isolated Docker container to keep it completely separate from your operating system. However, for ultimate control, you can try building the interpreter from source code based on the instructions in the README file. This method will let you experiment with more advanced features, like turning off the GIL.

Unlike previous Python releases, which introduced a host of tangible syntactical features that you could get your hands on, this one mainly emphasizes internal optimizations and cleanup. That said, according to the official release summary document, there are a few notable new features that will be immediately visible to most Python programmers:

Some of these features don’t work on Windows at the moment because they rely on Unix-specific libraries, so you won’t see any difference unless you’re a macOS or Linux user. The good news is that Windows support is coming in the second beta release, which will arrive soon, thanks to Real Python team member Anthony Shaw.

Read the full article at https://realpython.com/python-news-may-2024/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Robin Wilson: Introducing offline_folium

Mon, 2024-06-10 05:35

Another new-ish package that I’ve never got around to writing about on my blog is offline_folium. It has a somewhat niche use-case, but it seems like a few people have found it useful.

In brief, it allows you to use the folium package for creating interactive maps from Python, but without an internet connection. Folium is built on top of the Leaflet web mapping library – and it loads all the relevant JS and CSS files directly from CDNs. This means that if you try and use folium without an internet connection it just doesn’t work. Offline_folium works around this by downloading the files when you do have an internet connection, and then re-writing the links to the files to point to the offline versions.

I originally created this package when doing freelance work on a project for the UK Navy – they wanted to have interactive maps on an air-gapped computer, so I built a system that would allow this (in this case, the JS/CSS file download step was run as part of building the ‘app’ we produced). I should note at this point that without an internet connection the default OpenStreetMap background mapping will not work. In some situations that would be a significant problem – but we were using other background mapping, coastline vector files and so on, so it wasn’t a problem for us.

So, how do you use this package?

First, install it using pip install offline_folium. Then make sure you have an internet connection and run python -m offline_folium – this will download the JS/CSS files and store them in a sensible place (chosen automatically). Now you’re all set up.

Once you have no internet connection and want to use folium, you must import it in a slightly different way – first import offline from offline_folium, and then import folium, like this:

from offline_folium import offline import folium

Now you can use folium as usual, for example by creating a simple map object:

m = folium.Map()

So, that’s pretty-much it. People are actively submitting pull requests at the moment, so hopefully the functionality will expand to work with various folium plugins too.

For more information (or to submit a PR yourself), see the Github repo.

Categories: FLOSS Project Planets

Python Software Foundation: It’s time to make nominations for the PSF Board Election!

Mon, 2024-06-10 05:00

This year’s Board Election Nomination period opens tomorrow and closes on June 25th. Who runs for the board? People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Current Board members want to share what being on the Board is like and are making themselves available to answer all your questions about responsibilities, activities, and time commitments via online chat. Please join us on the PSF Discord for the Board Election Office Hours (June 11th, 4-5 PM UTC and June 18th, 12-1 PM UTC)  to talk with us about running for the PSF Board. 

Board Election Timeline

Nominations open: Tuesday, June 11th, 2:00 pm UTC
Nominations close: Tuesday, June 25th, 2:00 pm UTC
Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC
Announce candidates: Thursday, June 27th
Voting start date: Tuesday, July 2nd, 2:00 pm UTC
Voting end date: Tuesday, July 16th, 2:00 pm UTC 

Not sure what UTC is for you locally? Check here!

Nomination details

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, and end on June 25th, 2:00 PM UTC.

Please send nominations and questions regarding nominations to psf-board-nominations@pyfound.org. Include the name, email address, and an endorsement of the nominee's candidacy for the PSF Board. To get an idea of what a nomination looks like, check out the Nominees for 2023 PSF Board page. After the nomination period closes, we will request a statement and other relevant information from the nominees to publish for voter review.

Voting Reminder!

Every PSF Voting Member (Supporting, Managing, Contributing, and Fellow) needs to affirm their membership to vote in this year’s election. You should have received an email from "psf@psfmember.org <Python Software Foundation>" with subject "[Action Required] Affirm your PSF Membership voting status" that contains information on how to affirm your voting status.

You can see your membership record and status on your PSF Member User Information page. If you are a voting-eligible member and do not already have a login, please create an account on psfmembership.org first and then email psf-donations@python.org so we can link your membership to your account.

Categories: FLOSS Project Planets

Tryton News: Security Release for issue #92

Mon, 2024-06-10 04:00

Ashish Kunwar has found that python-sql accepts any string in the offset or limit parameters when python is ran with -O which makes any system exposing those vulnerable to an SQL injection attack.

Impact

CVSS v3.0 Base Score: 9.1

  • Attack Vector: Network
  • Attack Complexity: Low
  • Privileges Required: Low
  • User Interaction: None
  • Scope: Changed
  • Confidentiality: High
  • Integrity: Low
  • Availability: Low
Workaround

Do not use the -O switch or PYTHONOPTIMIZE environment variable when executing python.

Resolution

All affected users should upgrade python-sql to the latest version.

Affected versions: <= 1.5.0
Non affected versions: >= 1.5.1

Reference Concerns?

Any security concerns should be reported on the bug-tracker at https://bugs.tryton.org/python-sql with the confidential checkbox checked.

1 post - 1 participant

Read full topic

Categories: FLOSS Project Planets

Zato Blog: HL7 FHIR Integrations in Python

Mon, 2024-06-10 04:00
HL7 FHIR Integrations in Python 2024-06-10, by Dariusz Suchojad

HL7 FHIR, pronounced "fire", is a data model and message transfer protocol designed to facilitate the exchange of information among systems used in health care settings.

In such environments, a FHIR server will assume the role of a central repository of health records with other systems integrating with it, potentially in a hub-and-spoke fashion, thus letting the FHIR server become a unified and consistent source of data that would otherwise stay confined to a silo of each individual health information system.

While FHIR is the way forward, the current reality of health care systems is that much of the useful and actionable information is distributed and scattered among many individual data sources - paper-based directories, applications or data bases belonging to the same or different enterprises - and that directly hampers the progress towards delivering good health care. Anyone witnessing health providers copy-and-pasting the same information from one application to another, not having access to the already existing data, not to mention people not having an easy way to access their own data about themselves either, can understand what the lack of interoperability looks like externally.

The challenges that integrators face are two-fold. On the one hand, the already existing systems, including software as well as medical appliances, were often not, or are still not being, designed for the contemporary inter-connected world. On the other hand, FHIR in itself is a relatively new technology which means that it is not straightforward to re-use the existing skills and competencies.

Zato is an open-source platform that makes it possible to integrate systems with FHIR using Python. Specifically, its support for FHIR enables quick on-boarding of integrators who may be new to health care interoperability, who are coming to FHIR with previous experience or interest in web development technologies, and who need an easy way to get started with and to navigate the complex landscape of health care integrations.

Connecting to FHIR servers

Outgoing FHIR connections are what allows Python-based services to communicate with FHIR servers. Throughout the rest of the chapter, the following definition will be used. It connects to a live, publicly available FHIR server.

Filling out the form below will suffice, there is no need for any server restarts. This principle, that restarts are not needed, applies all throughout the platform, whenever you change any piece of configuration, it will be automatically propagated as necessary.

  • Name: FHIR.Sample
  • Address: https://simplifier.net/zato
  • Security: No security definition (we will talk about security later)
  • TLS CA Certs: Default bundle

Retrieving data from FHIR servers

In Python code, you obtain client connections to FHIR servers through self.out.hl7.fhir objects, as in the example below which first refers to the server by its name and then looks up all the patients in the server.

The structure of the Patient resource that we expect to receive can be found here.

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class FHIService1(Service): name = 'demo.fhir.1' def handle(self) -> 'None': # Connection to use conn_name = 'FHIR.Sample' with self.out.hl7.fhir[conn_name].conn.client() as client: # This is how we can refer to patients patients = client.resources('Patient') # Get all active patients, sorted by their birth date result = patients.sort('active', '-birthdate') # Log the result that we received for elem in result: self.logger.info('Received -> %s', elem['name'])

Invoking the service will store in logs the data expected:

INFO - Received -> [{'use': 'official', 'family': 'Chalmers', 'given': ['Peter', 'James']}]

For comparison, this is what the FHIR server displays in its frontend. - the information is the same.

Storing data in FHIR servers

To save information in a FHIR server, create the required resources and call .save to permanently store the data in the server. Resources can be saved either individually (as in the example below) or as a bundle.

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class CommandsService(Service): name = 'demo.fhir.2' def handle(self) -> 'None': # Connection to use conn_name = 'FHIR.Sample' with self.out.hl7.fhir[conn_name].conn.client() as client: # First, create a new patient patient = client.resource('Patient') # Save the patient in the FHIR server patient.save() # Create a new appointment object appointment = client.resource('Appointment') # Who will attend it participant = { 'actor': patient, 'status':'accepted' } # Fill out the information about the appointment appointment.status = 'booked' appointment.participant = [participant] appointment.start = '2022-11-11T11:11:11.111+00:00' appointment.end = '2022-12-22T22:22:22.222+00:00' # Save the appointment in the FHIR server appointment.save() Learning what FHIR resources to use

The "R" in FHIR stands for "Resources" and the sample code above uses resources such a Patient or Appointment but how does one learn what other resources exist and what they look like? In other words, how does one learn the underlying data model?

First, you need to get familiar with the spec itself which, in addition to textual information, offers visualizations of the data model. For instance, here is the description of the Observation object, including details such as all the attributes an Observation is composed of as well as their multiplicities.

Secondly, do spend time with FHIR servers such as Simplifier. Use Zato services to create test resources, look them up and compare the results with what the spec says. There is no substitute for experimentation when learning a new data model.

FHIR security

Outgoing FHIR connections can be secured in several ways, depending on what a given FHIR requires:

  • With Basic Auth definitions
  • With OAuth definitions
  • With SSL/TLS. If the server is not a public one (e.g. it is in a private network with a private IP address), you may need to upload the server's certificate to Zato first if you plan to use SSL/TLS because, without it, the server's certificate may be rejected.
MLLP, HL7 v2 and v3

While FHIR is what new deployments use, it is worth to add that there are still other HL7 versions frequently seen in integrations:

  • Version 2, using its own MLLP protocol
  • Version 3, using XML

Both of them can be used in Zato services, in both directions. For instance, it is possible to both receive HL7 v2 messages as well as to send them to external applications. It is also possible to send v2 messages using REST in addition to MLLP.

More blog posts
Categories: FLOSS Project Planets

Ed Crewe: Software Development with Generative AI - 2024 Update

Sun, 2024-06-09 12:37

Why write an update?
I wrote a blog post on Software Development with Generative AI last year, which was questioning the approach of the current AI software authoring assistants. I believe the bigger picture holds true that to fully utilize AI to write software, will require an entirely different approach. Changing the job of a software developer in a far more radical manner and perhaps making many of today's software languages redundant.

However I also raised the issue that I found the current generative AI helpers utility questionable for seasoned developers:
"The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?
I think that question is currently debatable... (but it is improving rapidly) ... We may reach that point within a year or two"

Well it hasn't been a year or two, just 6 months. But I believe the addition of the Chat window to CoPilot and an improvement in the accuracy of its models has already made a significant difference. 

On balance I would now say that even a fluent programmer may get some benefits from its use. Given the speed of improvement it is likely that all commercial programming will use an AI assistant within a few years. 


To delay the inevitable and not embed it in to your work process is like King Canute commanding the sea to retreat. There are increasing numbers of alternatives available too. However as the market leader I believe it is worth going in to slightly more depth as to the current state of play with CoPilot.

Copilot Features

The new Chat window within your IDE gives you a context sensitive version of Copilot ChatGPT that can act as a pair programmer and code reviewer for your work. 

If you have enabled auto-complete then you instigate that usage by writing functional comments, ie prompts then tabbing out to accept the suggestions it responds with.

To override these prompts, you instead can use dot and get real code completion options (as long as your IDE is configured correctly). Since code completion has your whole codebase as context, it complements CoPilot reasonably well. But whilst the code completion is always correct, CoPilot is less so, probably more like 75% now compared to its initial release level of 50%

It takes some time to improve the quality of your prompting. An effort must be made to eradicate any nuance, assumption, implication or subtlety from your English. Precise mechanical instructions are what are required. However its language model will have learnt common usage. So if you ask it to sort out your variables it will understand that you mean replace all hardcoded values in the body of your code with a set of constants defined at the top, explain that is what it thinks you mean and give you the code that does that.

You can ask it anything about the usage of the language you are working in, how something should be coded, alternatives to that etc. So taking a pair programming approach and explaining what you are about to code and why to CoPilot chat as you go,  can be very useful. Given rubber duck programming is useful, having an intelligent duck that can answer back ... is clearly more so. 

It excels as a learning tool, largely replacing Googling and Stack Overflow with an IDE embedded search for learning new languages. But even for a language you know well, there can be details and nuances of usage you have overlooked or changes in syntactic standards with new releases you have missed.

You can also ask it to give your file a code review. Where it will list out a series of suggested refactors that it judges would improve it.

Copilot Limitations

Currently however there are many limitations, understanding them, helps you know how to use CoPilot and not turn it off in frustration at its failings! 

The most important one is that CoPilot's context is extremely limited. There is no RAG enhancement yet, no learning from your usage. It may seem to improve with usage, but that is just you getting better at using it. It does not learn about you and your coding style as you might expect, given a dumb shopping site does that as standard.

It does not create a user context for you and populate it with your codebase. It simply grabs the content of the currently edited file and the Chat prompt text and the language version for the session as a big query. The same for the auto-suggestion. But here the chat text is from the comments or doc strings on the lines preceding. 

Posting the lot to a fixed CoPilot LLM that is some months out of date. Although apparently it has weekly updates from continuous retraining. 

This total lack of context can mean the only way you can get CoPilot to suggest what you actually want is to write very detailed prompts. It is often simpler to just cut and paste example code as comments into the file - please rewrite blah like this ... paste example. Since only if its in the file or latest Chat question will it get posted to inform the response.

At the time of writing CoPilot is due to at least retain and learn from Chat window history to extend its context a little. But currently it only knows about the currently open file and latest Chat message. Other providers have tools that do load the whole code base, for example Cody, plus there are open source tools to post more of your code base to ChatGPT or to an open source LLM.

As this blog post update indicates, the whole area is evolving at an extremely rapid pace.

The model it has for a language is fixed and dated. Less so for the core language but for example you may use a newer version of the leading 3rd party Postgres library that came out 2 years ago. But the majority of users are still on the previous one since it is still maintained. Their syntax differs. Copilot may only know the syntax for the old library because that is what it was trained with, even though a later version is being imported in the file, so is in Copilot's limited context. So any chat window or code prompts it suggests will be wrong.

I have yet to find it brings up anything useful that I didn't know about the code when using the code review feature, plus the suggestions can include things that are inapplicable or already applied. But I am sure it would be more useful for learning a new language.

AI prompting and commenting issue

Good practise for software teams around code commenting are that you should NOT stick in functional comments that just explain what the next few lines do.  The team are developers and they can read the code as quickly for its base functionality. Adding lots of functional commenting makes things unclear by excessive verbosity.
It is something that is only done for teaching people how to code in example snippets. It has no place in production code.

Comments should be added to give wider context, caveats, assumptions etc. So commenting is all about explaining the Why, not the How.

Doc strings at the head of methods and packages can contain a summary of what the function does in terms of the codebase. So more functional in orientation, but as a big scale summary. So again they are a What not a How.

It looks like current AI assistants may mess that up. Since they need comments that are basically as close to pseudo code as possible. Adding information about real world issues, roadmap, wider codebase, integration with other services ... ie all the Why is likely to  confuse them and degrade the auto-complete.

Unfortunately code comments are not AI prompts for generating code and vice versa.
Which suggests that you may want to write a temporary prompt as a comment to generate the code, then replace it with a proper comment once it has served its purpose.

Or otherwise introduce a separate form of hideable prompt marked comment that make it clear what is for the AI and what is for the Human!

Alternatively use the chat window for code generation then paste it in.

Copilot Translation

Translation is an area where Copilot can be very beneficial. As a non-native English speaker you can interact with it in your own language for prompting and comments and it will handle that and translate any comments in the file to English if asked to.

Code translation is more problematic, since the whole structure of a program and common libraries can be different. But if the code is doing some very encapsulated common process. For example just maths operations, or file operations. It can extract the comments and prompts and regenerate the code into another language for you.

One can imagine that one day the only language anyone will need will be a very high level, succinct English-like language, eg. Python.
When you want to write in a verbose or low-level language. You just write the simpler prompts in a spoken language, but use Python when it is faster to communicate explicitly than spoken. Since spoken languages are so unsuited to creating machine instructions.
Press a button and Copilot turns the lot into verbose C or Java code with English comments.

Categories: FLOSS Project Planets

Ed Crewe: Software development with Generative AI

Sun, 2024-06-09 09:25
The Current State of AI Software GenerationThe user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?

The high level language

High level programming languages are human languages composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.

A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.

Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.
Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining  all edge cases etc.

Unlike English language, it can succinctly describe computer functionality in a few lines. 

The AI

A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.

It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt. 

For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.

Finally it returns an untested snippet of generated high level code.

The Non-Developer

The non-developer pastes it in place and tries to run the program with it in.

They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.

It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt. 

They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.

The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.

Or the Developer
The developer reads the code and understands it, determines if it should do what they want. Or if they just want to use some of it as an example.

They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.

Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.

Then they run their code and debug as usual.

At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.

For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.

The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?

I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.

Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a year or two. At which point professional software developer will need to be adept at using it as part of their toolset.

Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.

Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.

NOTE - June 2024 Update
Having come back to CoPilot 6 months later. I have come to appreciate some of its new features so have added a new blog post that accepts that it now provides utility even for the seasoned programmer.

Another WayCopilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.
Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.

This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.

There have been experiments with applying test driven development to AI generated code, but I have not heard of serious attempts to address the bigger picture...

  • Move all functional code writing to be AI only.
  • Remove the need for any high level computer language for humans to gain fluency in.
  • Have AI develop software by hundreds of thousands of iterative composition  TDD cycles.
  • Parallel refactoring thousands of solutions to arrive at the optimum one.
  • Use AI that understands the machine code it is generating by training it on the results of running that code. 
  • The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.
  • In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.
  • Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.
  • Finally we are left with the job of telling the computer what its software should do.
    We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.

Self-Programming Computers

Exploring the last point in more detail...

Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.

Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.

Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.

The software creator uses the DSL to create an extensive set of functional definitions for a project.

The DSL language design and evolution is optimised for LLM interpretation.  So it has very tight grammatical and syntactical usage that promote accurate generative outputs.

A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.

Some basic characteristics of the DSL:

  1. auto-formatting (like Go) minimizing syntactical variation
  2. To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'
    But strictly applied, rather than as a vague principle as Python does
  3. unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.
  4. the language will never be used to implement functionality
  5. uses simple english vocabulary and ideally minimal mathematical symbology

These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality. 

The specification DSL auto generates all the required tests in a low level language.

Since the system should also have a generative AI LLM trained for C or assembly language.
This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.

The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.

So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.

Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.

Also we don't need English or high level computer languages in the stack of mistranslation at all.
Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement.
BDD / TDD generative AI if you like.

Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.

This gives more freedom to widen the DSL to mature into a general precise AI prompt language.

Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.

Could AI could take my job?Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.
With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.
Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.

Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers.
Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.
Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages. 

Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.

Categories: FLOSS Project Planets

Jeremy Epstein: Introducing: Floyd-Warshall CSV Generator

Sat, 2024-06-08 20:00

I built a little Python script called the Floyd-Warshall CSV Generator. It takes a CSV of graph edges as input, and generates a CSV of the edges that are the shortest paths between all pairs of vertices.

The script is a simple wrapper of the SciPy floyd_warshall function, which in turn implements the Floyd-Warshall Algorithm. Hope you find it useful for all your directed (or undirected) weighted graph needs.

Given an input CSV of the following graph edges:

point_a,point_b,cost a,b,5 b,c,8 c,d,23 d,e,6

When the script is called as follows:

floyd-warshall-csv-generator &bsol /path/to/input_data.csv &bsol --vertex-i-column-name point_a &bsol --vertex-j-column-name point_b &bsol --weight-column-name cost &bsol --no-directed &bsol --max-weight 35

It generates an output CSV that looks like this:

point_a,point_b,cost a,b,5.0 a,c,13.0 b,c,8.0 b,d,31.0 c,d,23.0 c,e,29.0 d,e,6.0

That is, it generates all the possible (indirect) paths from one point to all other points, based on the (direct) paths that are already known, with duplicate (undirected) paths filtered out, and with paths whose cost is more than max-weight filtered out.

I wrote this script in order to generate the "all edges" data that's shown in the World Locality Transit Graph, which I'll also be blogging about real soon. Let me know if you put this script to any other interesting uses!

Categories: FLOSS Project Planets

Pythonicity: GraphQL cursors

Sat, 2024-06-08 20:00
Contrarian view on cursor-based pagination.

GraphQL documentation recommends cursor-based pagination, and it has subsequently become a popular standard.

In general, we’ve found that cursor-based pagination is the most powerful of those designed. Especially if the cursors are opaque, either offset or ID-based pagination can be implemented using cursor-based pagination (by making the cursor the offset or the ID), and using cursors gives additional flexibility if the pagination model changes in the future. As a reminder that the cursors are opaque and that their format should not be relied upon, we suggest base64 encoding them. …

{ hero { name friends(first: 2) { totalCount edges { node { name } cursor } pageInfo { endCursor hasNextPage } } } }

There are several oversights with this well-intentioned advice.

Cursors and state

Cursors imply state, at least they used to. A database cursor is used for iterating over a result set. Meaning it has transactional integrity to pick up where it left off.

The vast majority of GraphQL APIs are inherently stateless. The “cursor” is being decoded as input to a new request, and offers no guarantees. From this observation, the advice falls apart.

The problem with stateless pagination is inconsistency; items may shift, appear, or disappear. Which gives the client the perception of missing or duplicate items. This happens regardless of whether the pagination is offset or ID based. Arguably worse in the case of IDs, since the reference can move arbitrarily or be gone.

Cursors don’t solve the consistency problem; they give the client the false impression of solving the problem.

Opaqueness and compatibility

The claim is that an opaque cursor is compatible across changes. Changed to do what exactly, would be the more relevant question.

Taking a step back, what is the problem being solved here? We assume there is a list of items, with an inherent ordering, and too many to return to the client with acceptable performance.

Given those assumptions, the first obvious step is an optional size limit. That is not in dispute; the disagreement if over the “offset”. A simple and versatile solution is a range filter over whatever field(s) is relevant to ordering. This is not even remotely controversial when the field in question has a name like date. In other words, “pagination” is not necessarily the problem that needs solving.

Range filters with a size limit are sufficient to implement pagination, and new optional filters are always backwards compatible. They also offer the flexibility of search, whereas cursors can only be used iteratively. And what if the client does not want visibility into the range filters? That is exactly what offset is for; offset is a range filter over an implied index field.

There is a reason why the recommendation does not offer a useful example of this supposed compatibility; there isn’t one. The advice is equivocating on the ambiguity of an after: $ID filter. Is the ID field relevant to the ordering?

  • If yes, then it is just another range filter
  • If no, then it is just another placeholder for index

There is no third case. There is no future secret field that relates to ordering, is relevant to the client, but somehow still opaque to the client.

Stateless pagination is a combination of range filters and size limits. No matter what the input fields are called. A true stateful is cursor is opaque precisely because it does not represent any known field.

Next optimization

The “next” piece of advice is that the cursor implementation should indicate whether another request is worthwhile. Again, in a stateless API, the server can make no such guarantee.

If the server can provide a total count, by all means do so. It solves the “next” problem, and is more generally useful.

If it is not feasible for the server to provide a total count, how is it going to implement whether there are more items? At the data layer, it is going to stop processing at N + 1 items instead of the requested N. The client could do that too. Instead of requesting the next 10, it could go to 11.

Better yet, why stop at the server optimizing for N + 0? If it knows there is just 1 more item, why not go ahead and include that last one too. N + 2 anyone? Obsessing over the last “next” is a pointless micro-optimization, all the more so because it is irrelevant whenever the total count is not coincidentally a multiple of N. If N is arbitrary, then optimizing for a particular residue mod N is clearly arbitrary.

API design

Not only is there no good reason to blindly add opaque cursors, there is also no reason to add range filters before needed. A size limit alone solves the first order of magnitude of performance issues. If a client requests the first 10 items, then needs the next 10, actually pressure test whether it is unreasonable to request the first 20. The advantage is the client then has a consistent snapshot of the first 20 regardless of changes, which could provide a better user experience.

A simple strategy for pagination: start with none. Then proceed to next steps as performance warrants.

  1. size limit
  2. range filter on known field(s)
  3. offset

In the unlikely event your API is stateful, you didn’t need this advice because you already had a cursor. Otherwise, cursors are an overly-complicated useless abstraction.

Categories: FLOSS Project Planets

Ga&#235;l Varoquaux: Promoting open-source, from inria to :probabl.

Sat, 2024-06-08 18:00

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

Prelude: funding scikit-learn is hard

Scikit-learn is a central software component in today’s machine learning landscape, and it is open source, governed by a community, easy to install, and well documented. It started many years ago as a project that we did on the side, and we were joined by many volunteers, which was key to the success of the project. We soon decided to ensure that scikit-learn was not only a volunteer-based effort. Over more than a decade, I’ve dedicated a lot of energy to this, using a variety of funding mechanisms: first grants (as an academic), then sponsoring and related contracts with various actors.

Digital commons eliminate scarcity and exclusivity

Funding digital commons is really hard. People build fortunes by leveraging competitive advantages, by creating lock-ins, or selling access to data. What makes a great open-source library, as scikit-learn, is exactly what prevents these tricks: we are committed to being independent, easy to use and install, lightweight…

The birth of a new ambition

Scikit-learn is very successful, but it could be more. For instance, it does not facilitate pushing to production as much as tensorflow, which can be served, deployed to android… And scikit-learn is not very visible to top decision makers: it’s not a line on their budget, a brand that they know. As a consequence, it is not reaping the benefit of its success [1].

[1]Many commercial tools are sitting on top of open source software like scikit-learn (splunk, sagemaker, to name only a few), making profits, and not helping in any way the open source world that they build upon. The French government is backing us to push the envelope

3 years ago, the French government challenged us to go further, to consolidate the ecosystem into a consistent data-science commons. The strategic interest of France is to preserve some technological autonomy on data, eg sensitive data. Thus, the government offered us, at Inria, a funding opportunity to go further.

They promised us a lot of money (dozens of millions of Euros), but with a specific mission to develop a sustainable “data-science commons” [2] ecosystem around scikit-learn. I’ll spare you the details of the amount of meetings we had, documents that we wrote, to sketch the outline of the project. I pushed forward a vision of technical components that fit in the broader open-source ecosystem, complementing it.

[2]The letter that we received from the French government specifically defines the objective in these words: “data-science common” (“Communs numériques pour la Science des Données”)

As I moved forward, I faced a difficulty: the French government wanted a sustainability plan, and private investment to back it. To be honest, this is not what I’m good at. François Goupil, the COO of the scikit-learn consortium, was helping me, but we needed more for our ambitions. And this is when we started talking to Yann Lechelle, a tech entrepreneur with an impressive track record interested in the impact of France on the global tech world.

Probabl, a mission-driven enterprise

With Yann, we built a new vision. Our challenge is to be long-term sustainable and virtuous for scikit-learn, its broader ecosystem, and its community. Yann brought in a business point of view, and I tried to bring that of open-source communities beyond probabl [3], for instance avoiding to getting in the way of others building businesses that contribute to scikit-learn. Indeed, we are convinced that having a broad and diverse community around scikit-learn is central to its future.

[3]One of the first things that Probabl did (Guillaume Lemaître, to be specific), was submit a grant application (to the Chang-Zuckenberg Institute), to fund, via NumFocus, a developer employed by Quantsight, with no money transiting via Probabl (one reason being that we have no operations outside of Europe so far).

Our sustainability model is still being finetuned. What I can tell is that it will involve a mix of professional service, support & sponsorship agreement, as well as a product-based offer, where we supplement scikit-learn with enterprise features. Our focus will be on features that are typically not the focus of open-source developers: integration in large structures, such as access control, LDAP connection, regulatory compliance. We will not shoehorn scikit-learn in open core or dual licensing approaches: we want our incentives to be aligned with scikit-learn, and its ecosystem, being as complete as possible.

Foster growth and adoption of our open-source stack

In a sense, our inspiration is that of RedHat, where the growth of the company fosters the growth and adoption of the software (Linux in the case of RedHat), beyond the company, in an ecosystem, and for a wide variety of applications.

Strong growth will mean external capital. To ensure that we do not lose the focus on our mission, building data-science commons, Yann penciled down a specific governance of the company (and then validated it with many people, as we are a spin-off from a governmental organization). The ultimate share structure, and the board, are divided in three electoral colleges: one for outside investors, one for founders and employees, and one for public institutions. This ensures a balance of power that hopefully will keep us aligned to our mission. I think that this structure sends a strong signal that we are not just another for-profit that will go from creating useful tech to dark money-generating patterns.


Probabl is already having an impact

A strong open-source team In February, the whole team developing scikit-learn at Inria moved to Probabl, joined by Adrin Jalali, a Berlin-based core developer of scikit-learn and fairlearn. We’ve been hiring excellent people, and we now have 9 people on open-source (see the Probabl team), spending their time contributing to open source (Jérémie, for instance, has been doing the last releases for scikit-learn).

Fostering an ecosystem Probabl is not only about scikit-learn. We are prioritizing 8 libraries, central to the machine-learning and data science ecosystem: joblib, fairlearn, imbalanced-learn… In general, as we have always done, we will not hesitate contributing to upstream or related projects. Our goal is to have a healthy open-source ecosystem around data-science.

Not only software Not everybody sees the important lines of code. I’ve become increasingly aware of the need to do outreach and communication, to coders, but also to decision makers. At Probabl we dedicate energy to be in business meetings, to participate in the tech narrative, to teach how to best do data science, eg with didactic videos. We’re starting a mentioning program, we’ll be organizing sprints… I am convinced that all this is a useful long-term investment.


My position within Probabl, my vested interests

I am a French civil servant (a researcher at Inria, one of our national research institute). Such a position comes with strong responsibilities to control conflicts of interest. The creation of Probabl underwent strict scrutiny (that took a long long time). I have been recently cleared to take an active role: 10% of my time is allocated to be a scientific and open-source advisor for Probabl.

I am not paid by Probabl. 100% of my salary comes from Inria (and I was not given a raise because of my involvement in Probabl). I do have financial interests as a founder, but given that I have a small active part, I have one of the smallest amount of shares among founders.

My main interest in Probabl is really the success of its mission: the long-term growth of an open-source data-science ecosystem. Spinning-off from Inria actually continues my efforts in this direction, but with more agility and breadth. And having on top of open source a variety of complementary commercial activities makes it stronger, by answering better the needs of some actors.

More to come

There are many things that we are still ironing. Clearing out specific details takes time (for instance, clearing my role took a while). We are still to announce the future of the sponsorship program that we had set up at the Inria foundation. Its mission has been transferred to Probabl. Currently, Probabl’s open source team is ensuring continuity of our work with the existing sponsors. But we will set up broader partnership opportunities, with a similar governance, that enable third-parties to invest in open source on a roadmap decided jointly with the open-source community.

I believe that we need a lot of transparency in how we decide upon priorities in our open source team. Our 2024 priorities for scikit-learn are visible here.

I look forward to when Probabl will start adding value to scikit-learn for enterprises with an offer enriching scikit-learn and the broader open-source ecosystem.

I am acutely aware that good open source is made of communities, and that communities need trust and understanding of big players such as Probabl (well, so far we are not that big). I hope that with time our actions will become easy to read and speak of themselves.

Categories: FLOSS Project Planets

Trey Hunner: A beautiful Python monstrosity

Sat, 2024-06-08 17:30

Creating performance tests for Python Morsels exercises is a frequent annoyance

I loathe writing automated tests for performance-related exercises because they’re always flaky. How flaky depends on the exercise, what I’m testing, and the time variability inherent in the particular Python features that a learner might use.

I came up with a solution for flaky tests recently, but it also makes my tests less readable. I then came up with a tool to improve the readability, but that has its own trade-offs.

The code I eventually came up with is a beautiful Python monstrosity.

1 2 3 4 5 6 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n)

I’ll explain what that code does, but first let’s talk about why it’s needed.

The flaky performance tests

My flaky performance tests initially looked like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def test_some_test(self): n, m = 2.45, 2.04 micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The first block runs a performance test for the user’s function on a very small list and on a slightly larger list and then asserting that the slightly larger list didn’t take too much longer to run. The next two blocks run the same code on even larger lists and make further assertions about the relative times that the code took to run.

This roughly approximates the time complexity of this code.

Running performance checks in a loop

These performance checks need to:

  1. Predictably fail for inefficient solutions
  2. Predictably pass for efficient solutions
  3. Run fast (within just a few seconds) even when the code is inefficient
  4. Avoid the use of threading because they’ll be running on WebAssembly in the browser
  5. Run consistently on pretty much any computer

These 5 requirements together have caused me countless headaches. I get the tests passing well, but they don’t always fail when they should. I get the tests failing and passing when they should, but then they’re too slow. And so on…

Notice the n and m factors in the above assertions:

1 self.assertLess(small_time, micro_time*n*m)

If n and m are too big, we’ll get false positives (tests passing when they should fail). If n and m are too small, we’ll get false negatives (tests failing when they should pass).

To avoid both Type I and Type II errors, I decided to keep n and m small but attempt the assertion block multiple times.

Here’s the (far less flaky) revised code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def test_some_test(self): n, m = 2.45, 2.04 for attempts_left in reversed(range(10)): try: micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(5)): try: small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(3)): try: medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n) break except AssertionError: if attempts_left == 0: raise

The for loop runs the code multiple times, the break statement stops the code as soon as the assertions all pass, and the except and if ensure that any assertion errors are suppressed until/unless we’re on the final iteration of the loop.

Let’s call this a for-try-break-except-if-raise pattern. It’s an absurdly verbose name fitting of absurdly verbose code.

This for-try-break-except-if-raise pattern works pretty well! But it’s not pretty.

Like many programmers, I believe that Don’t Repeat Yourself (DRY) need not apply to tests. Tests are allowed to be repetitive if the verbosity improves readability.

But there is so much noise in that code! I decided that removing some noise might improve readability. So I devised a helper utility to reduce the repetition.

In search of a solution

While pondering the repetitive noise in this code, I wondered what Python features I could use to abstract away this for-try-break-except-if-raise pattern.

Could I make a context manager and use a with block? That might help with the try-except, but context managers can’t run their code block multiple times, so that wouldn’t help with the for and the break. So a context manager is out.

Could I abstract this away into a looping helper by implementing a generator function? We are looping and generator functions can break early. But, a generator function can’t catch an exception that’s raised within the body of a loop. So a generator function wouldn’t work either.

What about a decorator? 🤔

Context managers and decorators both sandwich a block of code. But decorators sandwich functions and they have the power to run the same function repeatedly. A decorator might work!

Here’s a decorator that will run a given function up to 10 times (until no AssertionError is raised):

1 2 3 4 5 6 7 8 9 def try_10_times(function): def wrapper(): for attempts_left in reversed(range(10)): try: return function() except AssertionError: if attempts_left == 0: raise return wrapper

To use this decorator, we would need to define a function and then call that function:

1 2 3 4 5 6 7 @try_10_times def assertions(): micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) assertions()

This isn’t quite good enough though…

  1. We need a pattern to run code N times (not necessarily exactly 10)
  2. We reference the variables defined in each block in later blocks, so micro_time and tiny_time will need to be available outside that function
  3. We need this function to run just one time right after it’s defined… could we do that automatically?

All 3 of these problems are solvable:

  1. We need a decorator that accepts arguments
  2. We need to use rarely seen nonlocal statement
  3. We could have the decorator automatically call the decorated function
The final weird decorator

Here’s the decorator I ended up with:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def attempt_n_times(n): """ Run tests multiple times if assertions are raised. Allows for more forgiving tests when assertions may be a bit flaky. """ def decorator(function): """This looks like a decorator, but it actually runs the function!""" for attempts_left in reversed(range(n)): try: return function() except AssertionError: if attempts_left == 0: raise return decorator

This decorator accepts an n argument which determines the maximum number of times the decorated function should be called. The decorator then calls the function repeatedly in a for loop and a try-except block. As soon as an AssertionError is not raised during one of these function calls, the looping stops.

The weirdest part about this decorator is that it calls the decorated function. Note that the decorator function doesn’t define a wrapper function within itself… it just runs code right away!

The resulting beautiful Python monstrosity

Here’s the final refactored test code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 def test_some_test(self): n, m = 2.45, 2.04 micro_time = tiny_time = small_time = medium_time = 0 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) @attempt_n_times(5) def _(): nonlocal small_time small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) @attempt_n_times(3) def _(): nonlocal medium_time medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The attempt_n_times decorator immediately calls the function it decorates. Each function is defined and immediately called one or more times, in a try-except block within a loop.

That’s why we’ve named these functions with the throwaway _ name: we don’t care about the name of a function we’re never going to refer to again.

Also note the use of the nonlocal statement. Each function in Python has its own scope and all assignments assign to the local scope by default. That nonlocal variable pulls those variables to the scope of the outer function instead.

Compare the above code to the code just before this refactor:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def test_some_test(self): n, m = 2.45, 2.04 for attempts_left in reversed(range(10)): try: micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(5)): try: small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(3)): try: medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n) break except AssertionError: if attempts_left == 0: raise

I find the refactored version easier to skim.

But that attempt_n_times decorator does abuse the decorator syntax. Decorators aren’t meant to call the function they’re decorating.

Is this misuse of decorators worth it?

Is this worth it?

Decorators aren’t supposed to immediately call the function they decorate. But there’s nothing stopping them from doing so. I feel that I’ve traded “normal code” for a beautiful monstrosity that’s easier to skim at a glance.

The attempt_n_times decorator is pretending that it’s a block-level tool by using a function because there’s no other way to invent such a tool in Python.

I think abstracting away the for-try-break-except-if-raise pattern was worth it, even though I ended up abusing Python’s decorator syntax in the process.

What do you think? Was that attempt_n_times abstraction worth it?

Categories: FLOSS Project Planets

Talk Python to Me: #465: The AI Revolution Won't Be Monopolized

Sat, 2024-06-08 04:00
There hasn't been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI. I hope you join us for this excellent conversation about the future of AI and open source.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/porkbun'>Porkbun</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Ines Montani on Twitter</b>: <a href="https://twitter.com/_inesmontani" target="_blank" rel="noopener">@_inesmontani</a><br/> <b>spaCy</b>: <a href="https://spacy.io" target="_blank" rel="noopener">spacy.io</a><br/> <b>Prodigy App</b>: <a href="https://prodi.gy" target="_blank" rel="noopener">prodi.gy</a><br/> <b>Ines' presentation at PyCon Lithuania</b>: <a href="https://www.youtube.com/watch?v=SsnDN7LI7IY" target="_blank" rel="noopener">youtube.com</a><br/> <b>LM Studio</b>: <a href="https://lmstudio.ai" target="_blank" rel="noopener">lmstudio.ai</a><br/> <b>Little Bobby Tables</b>: <a href="https://xkcd.com/327/" target="_blank" rel="noopener">xkcd.com</a><br/> <br/> <b>spaCy and NLP course</b>: <a href="https://talkpython.fm/spacy" target="_blank" rel="noopener">talkpython.fm</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=zaZrWZwKJH4" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/465/the-ai-revolution-wont-be-monopolized" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Anwesha Das: Event Driven Ansible, what, why and how?

Fri, 2024-06-07 14:02

Ansible Playbooks is the known term, now there is a new term which is being floted in the project, which is Ansible Rulebooks. Today we are going to discuss about Ansible&aposs journey from Playbook to Rulebook rather Playbook with Rulebook.

What is Event Driven Ansible?

What is Event Driven Ansible? In simple terms, some action is triggered by some events. The idea of EDA comes from Event driven architecture. Event driven ansible runs code automatically based on received event notifications.

Some important terms:

What is event in Event Driven Ansible?

The event is the notification of a certain incident.

Where do we get the events from?

We get the events from event sources. Ansible EDA provides different pulgins to support various event sources. There are several event source plugins such as :
url_check (checking the http status code), webhook (providing and checking events from webhook), journald (monitoring the journald logs) and the list goes on.

When to take actions?

Rulebook defines conditions and actions in case of fulfilling those actions. Conditions use operators as strings, boolean and numerical data. And actions are occurrence of events once the conditions are met. Running a playbook, setting a fact, running a module etc.

Small example Project

Here is a small example of Event Driven Ansible and how it is run. The idea is on receiving of a message (here the number 42) a playbook will run in the host. There are the following 3 files :

demo_rule.yml --- - name: Listen for events on a webhook hosts: all sources: - ansible.eda.webhook: host: 0.0.0.0 port: 8000 rules: - name: Say thank you condition: event.payload.message == "42" action: run_playbook: name: demo.yml

This is the rulebook. We are using the webhook plugin here as the event source. As a rule in the event of receiving the message 42 as json payload in the webhook, we run the playbook called demo.yml

demo.yml - hosts: localhost connection: local tasks: - debug: msg: "Thank you for the answer."

demo.yml, the playbook which run on the occurrence of the event mentioned in the rulebook and prints a debug message.

--- local: hosts: localhost

inventory.yml mentions the hosts to run the action against.

Further there are 2 files to one to test 42.json and 43.json to test the code.

{ "message" : "42" } { "message" : "43" }

First we have to install all related dependencies before we can run the rulebook.

$ python -m venv .venv $ source .venv/bin/activate $ python -m pip install ansible ansible-rulebook ansible-runner psycopg $ ansible-galaxy collection install ansible.eda $ ansible-rulebook --rulebook demo_rule.yml -i inventory.yml --verbose

Go to another terminal and on the same directory path and run the following command to test the Rulebook. After receiving the message, the playbook runs.

curl -X POST -H "Content-Type: application/json" -d @42.json 127.0.0.1:8000/endpoint Output 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting sources 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting rules ... TASK [debug] ******************************************************************* ok: [localhost] => { "msg": "Thank you for the answer." } PLAY RECAP ********************************************************************* localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2024-06-07 16:50:08,224 - ansible_rulebook.action.runner - INFO - Ansible runner Queue task cancelled 2024-06-07 16:50:08,225 - ansible_rulebook.action.run_playbook - INFO - Ansible runner rc: 0, status: successful

Now if we run the other json file 43.json we see that the playbook does not run even after the http status code being 200.

curl -X POST -H "Content-Type: application/json" -d @43.json 127.0.0.1:8000/endpoint

Output :

2024-06-07 18:20:37,633 - aiohttp.access - INFO - 127.0.0.1 [07/Jun/2024:17:20:37 +0100] "POST /endpoint HTTP/1.1" 200 159 "-" "curl/8.2.1"

You can try this yourself follwoing this git repository.

Categories: FLOSS Project Planets

Pages