Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 2 hours 39 min ago

Real Python: Listing All Files in a Directory With Python

Tue, 2024-06-11 10:00

Getting a list of all the files and folders in a directory is a natural first step for many file-related operations in Python. When looking into it, though, you may be surprised to find various ways to go about it.

When you’re faced with many ways of doing something, it can be a good indication that there’s no one-size-fits-all solution to your problems. Most likely, every solution will have its own advantages and trade-offs. This is the case when it comes to getting a list of the contents of a directory in Python.

In this video course, you’ll be focusing on the most general-purpose techniques in the pathlib module to list items in a directory, but you’ll also learn a bit about some alternative tools.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Bytes: #387 Heralding in a new era of database queries

Tue, 2024-06-11 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/Dataherald/dataherald">Dataherald</a></li> <li><a href="https://www.pythonmorsels.com/cli-tools"><strong>Python's many command-line utilities</strong></a></li> <li><a href="https://github.com/wolfi-dev">Distroless Python</a></li> <li><a href="https://docs.python.org/3/library/functools.html"><strong>functools.cache</strong></a>, <a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a><strong>, and</strong> <a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=ETZ3CvfbF_o' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="387">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout"><strong>pythonbytes.fm/scout</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="https://github.com/Dataherald/dataherald">Dataherald</a></p> <ul> <li>Interact with your SQL database, Natural Language to SQL using LLMs.</li> <li>Allows you to set up an API from your database that can answer questions in plain English</li> <li>Uses include <ul> <li>Allow business users to get insights from the data warehouse without going through a data analyst</li> <li>Enable Q+A from your production DBs inside your SaaS application</li> <li>Create a ChatGPT plug-in from your proprietary data</li> </ul></li> </ul> <p><strong>Brian #2:</strong> <a href="https://www.pythonmorsels.com/cli-tools"><strong>Python's many command-line utilities</strong></a></p> <ul> <li>Trey Hunner</li> <li>Too many to list, but here’s some fun ones <ul> <li>json.tool - nicely format json data</li> <li>calendar - print the calendar <ul> <li>current by default, but you can pass in year and month</li> </ul></li> <li>gzip, ftplib, tarfile, and other unixy things <ul> <li>handy on Windows</li> </ul></li> <li>cProfile &amp; pstats</li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="https://github.com/wolfi-dev">Distroless Python</a></p> <ul> <li>via Patrick Smyth</li> <li>What is <a href="https://www.chainguard.dev/unchained/minimal-container-images-towards-a-more-secure-future">distroless</a> anyway? <ul> <li>These are container images without package managers or shells included.</li> <li>Debugging these images presents some wrinkles (can't just exec into a shell inside the image), but they're a lot more secure.</li> </ul></li> <li>Chainguard, creates low/no CVE distroless images based on our FOSS distroless OS, <a href="https://github.com/wolfi-dev">Wolfi</a>.</li> <li>Some Python use-cases: <pre><code>docker run -it cgr.dev/chainguard/python:latest # The entrypoint is a Python REPL, since no b/a/sh is included docker run -it cgr.dev/chainguard/python:latest-dev # This is their dev version and has pip, bash, apk, etc. </code></pre></li> </ul> <p><strong>Brian #4:</strong> <a href="https://docs.python.org/3/library/functools.html"><strong>functools.cache</strong></a>, <a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a><strong>, and</strong> <a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a></p> <ul> <li><a href="https://docs.python.org/3/library/functools.html"><strong>functools</strong></a> cache and lru_cache - built in </li> <li><a href="https://github.com/tkem/cachetools/"><strong>cachetools</strong></a> - “This module provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function decorator.”</li> <li><a href="https://github.com/awolverp/cachebox"><strong>cachebox</strong></a> - “The fastest caching Python library written in Rust”</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://pythoninsider.blogspot.com/2024/06/python-3124-released.html">Python 3.12.4 is out</a></li> <li><a href="https://devblogs.microsoft.com/python/python-in-visual-studio-code-june-2024-release/">VSCode has some pytest improvements</a></li> </ul> <p>Michael:</p> <ul> <li>Time for a <a href="https://www.macrumors.com/2024/06/06/alternatives-bartender-mac-menu-bar/">bartender alternative</a>, I’ve switched to <a href="https://icemenubar.app">Ice</a>.</li> <li><a href="https://www.rocket.chat">Rocket.chat</a> as an alternative to Slack</li> </ul> <p><strong>Joke:</strong> <a href="https://dev.to/alvaromontoro/css-cartoons-29bp">CSS Cartoons</a></p>
Categories: FLOSS Project Planets

Real Python: Python News: What's New From May 2024

Mon, 2024-06-10 10:00

May was packed with exciting updates and events in the Python community. This month saw the release of the first beta version of Python 3.13, the conclusion of PyCon US 2024, and the announcement of the keynote speakers for EuroPython 2024. Additionally, PEP 649 has been delayed until the Python 3.14 release, and the Python Software Foundation published its 2023 Annual Impact Report.

Get ready to explore the recent highlights!

Join Now: Click here to join the Real Python Newsletter and you'll never miss another Python tutorial, course update, or post.

The First Beta Version of Python 3.13 Released

After nearly a year of continuous development, the first beta release of Python 3.13 was made available to the general public. It marks a significant milestone in Python’s annual release cycle, officially kicking off the beta testing phase and introducing a freeze on new features. Beyond this point, Python’s core developers will shift their focus to only identifying and fixing bugs, enhancing security, and improving the interpreter’s performance.

While it’s still months before the final release planned for October 2024, as indicated by the Python 3.13 release schedule, third-party library maintainers are strongly encouraged to test their packages with this new Python version. The goal of early beta testing is to ensure compatibility and address any issues that may arise so that users can expect a smoother transition when Python 3.13 gets officially released later this year.

Although you shouldn’t use a beta version in any of your projects, especially in production environments, you can go ahead and try out the new version today. To check out Python’s latest features, you must install Python 3.13.0b1 using one of several approaches.

Note: If you’d like to share your feedback or file a bug against a pre-release development version, then open an issue on Python’s GitHub repository.

The quickest and arguably the most straightforward way to manage multiple Python versions alongside your system-wide interpreter is to use a tool like pyenv in the terminal:

Shell $ pyenv install 3.13.0b1 $ pyenv shell 3.13.0b1 $ python Python 3.13.0b1 (main, May 15 2024, 10:41:55) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> Copied!

The highlighted line brings the first beta release of Python 3.13 onto your computer, while the following command temporarily sets the path to the python executable in your current shell session. As a result, the python command points to the specified Python interpreter.

Alternatively, you can use a Python installer, which you’ll find at the bottom of the downloads page, or run Python in an isolated Docker container to keep it completely separate from your operating system. However, for ultimate control, you can try building the interpreter from source code based on the instructions in the README file. This method will let you experiment with more advanced features, like turning off the GIL.

Unlike previous Python releases, which introduced a host of tangible syntactical features that you could get your hands on, this one mainly emphasizes internal optimizations and cleanup. That said, according to the official release summary document, there are a few notable new features that will be immediately visible to most Python programmers:

Some of these features don’t work on Windows at the moment because they rely on Unix-specific libraries, so you won’t see any difference unless you’re a macOS or Linux user. The good news is that Windows support is coming in the second beta release, which will arrive soon, thanks to Real Python team member Anthony Shaw.

Read the full article at https://realpython.com/python-news-may-2024/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Robin Wilson: Introducing offline_folium

Mon, 2024-06-10 05:35

Another new-ish package that I’ve never got around to writing about on my blog is offline_folium. It has a somewhat niche use-case, but it seems like a few people have found it useful.

In brief, it allows you to use the folium package for creating interactive maps from Python, but without an internet connection. Folium is built on top of the Leaflet web mapping library – and it loads all the relevant JS and CSS files directly from CDNs. This means that if you try and use folium without an internet connection it just doesn’t work. Offline_folium works around this by downloading the files when you do have an internet connection, and then re-writing the links to the files to point to the offline versions.

I originally created this package when doing freelance work on a project for the UK Navy – they wanted to have interactive maps on an air-gapped computer, so I built a system that would allow this (in this case, the JS/CSS file download step was run as part of building the ‘app’ we produced). I should note at this point that without an internet connection the default OpenStreetMap background mapping will not work. In some situations that would be a significant problem – but we were using other background mapping, coastline vector files and so on, so it wasn’t a problem for us.

So, how do you use this package?

First, install it using pip install offline_folium. Then make sure you have an internet connection and run python -m offline_folium – this will download the JS/CSS files and store them in a sensible place (chosen automatically). Now you’re all set up.

Once you have no internet connection and want to use folium, you must import it in a slightly different way – first import offline from offline_folium, and then import folium, like this:

from offline_folium import offline import folium

Now you can use folium as usual, for example by creating a simple map object:

m = folium.Map()

So, that’s pretty-much it. People are actively submitting pull requests at the moment, so hopefully the functionality will expand to work with various folium plugins too.

For more information (or to submit a PR yourself), see the Github repo.

Categories: FLOSS Project Planets

Python Software Foundation: It’s time to make nominations for the PSF Board Election!

Mon, 2024-06-10 05:00

This year’s Board Election Nomination period opens tomorrow and closes on June 25th. Who runs for the board? People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Current Board members want to share what being on the Board is like and are making themselves available to answer all your questions about responsibilities, activities, and time commitments via online chat. Please join us on the PSF Discord for the Board Election Office Hours (June 11th, 4-5 PM UTC and June 18th, 12-1 PM UTC)  to talk with us about running for the PSF Board. 

Board Election Timeline

Nominations open: Tuesday, June 11th, 2:00 pm UTC
Nominations close: Tuesday, June 25th, 2:00 pm UTC
Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC
Announce candidates: Thursday, June 27th
Voting start date: Tuesday, July 2nd, 2:00 pm UTC
Voting end date: Tuesday, July 16th, 2:00 pm UTC 

Not sure what UTC is for you locally? Check here!

Nomination details

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, and end on June 25th, 2:00 PM UTC.

Please send nominations and questions regarding nominations to psf-board-nominations@pyfound.org. Include the name, email address, and an endorsement of the nominee's candidacy for the PSF Board. To get an idea of what a nomination looks like, check out the Nominees for 2023 PSF Board page. After the nomination period closes, we will request a statement and other relevant information from the nominees to publish for voter review.

Voting Reminder!

Every PSF Voting Member (Supporting, Managing, Contributing, and Fellow) needs to affirm their membership to vote in this year’s election. You should have received an email from "psf@psfmember.org <Python Software Foundation>" with subject "[Action Required] Affirm your PSF Membership voting status" that contains information on how to affirm your voting status.

You can see your membership record and status on your PSF Member User Information page. If you are a voting-eligible member and do not already have a login, please create an account on psfmembership.org first and then email psf-donations@python.org so we can link your membership to your account.

Categories: FLOSS Project Planets

Tryton News: Security Release for issue #92

Mon, 2024-06-10 04:00

Ashish Kunwar has found that python-sql accepts any string in the offset or limit parameters when python is ran with -O which makes any system exposing those vulnerable to an SQL injection attack.

Impact

CVSS v3.0 Base Score: 9.1

  • Attack Vector: Network
  • Attack Complexity: Low
  • Privileges Required: Low
  • User Interaction: None
  • Scope: Changed
  • Confidentiality: High
  • Integrity: Low
  • Availability: Low
Workaround

Do not use the -O switch or PYTHONOPTIMIZE environment variable when executing python.

Resolution

All affected users should upgrade python-sql to the latest version.

Affected versions: <= 1.5.0
Non affected versions: >= 1.5.1

Reference Concerns?

Any security concerns should be reported on the bug-tracker at https://bugs.tryton.org/python-sql with the confidential checkbox checked.

1 post - 1 participant

Read full topic

Categories: FLOSS Project Planets

Zato Blog: HL7 FHIR Integrations in Python

Mon, 2024-06-10 04:00
HL7 FHIR Integrations in Python 2024-06-10, by Dariusz Suchojad

HL7 FHIR, pronounced "fire", is a data model and message transfer protocol designed to facilitate the exchange of information among systems used in health care settings.

In such environments, a FHIR server will assume the role of a central repository of health records with other systems integrating with it, potentially in a hub-and-spoke fashion, thus letting the FHIR server become a unified and consistent source of data that would otherwise stay confined to a silo of each individual health information system.

While FHIR is the way forward, the current reality of health care systems is that much of the useful and actionable information is distributed and scattered among many individual data sources - paper-based directories, applications or data bases belonging to the same or different enterprises - and that directly hampers the progress towards delivering good health care. Anyone witnessing health providers copy-and-pasting the same information from one application to another, not having access to the already existing data, not to mention people not having an easy way to access their own data about themselves either, can understand what the lack of interoperability looks like externally.

The challenges that integrators face are two-fold. On the one hand, the already existing systems, including software as well as medical appliances, were often not, or are still not being, designed for the contemporary inter-connected world. On the other hand, FHIR in itself is a relatively new technology which means that it is not straightforward to re-use the existing skills and competencies.

Zato is an open-source platform that makes it possible to integrate systems with FHIR using Python. Specifically, its support for FHIR enables quick on-boarding of integrators who may be new to health care interoperability, who are coming to FHIR with previous experience or interest in web development technologies, and who need an easy way to get started with and to navigate the complex landscape of health care integrations.

Connecting to FHIR servers

Outgoing FHIR connections are what allows Python-based services to communicate with FHIR servers. Throughout the rest of the chapter, the following definition will be used. It connects to a live, publicly available FHIR server.

Filling out the form below will suffice, there is no need for any server restarts. This principle, that restarts are not needed, applies all throughout the platform, whenever you change any piece of configuration, it will be automatically propagated as necessary.

  • Name: FHIR.Sample
  • Address: https://simplifier.net/zato
  • Security: No security definition (we will talk about security later)
  • TLS CA Certs: Default bundle

Retrieving data from FHIR servers

In Python code, you obtain client connections to FHIR servers through self.out.hl7.fhir objects, as in the example below which first refers to the server by its name and then looks up all the patients in the server.

The structure of the Patient resource that we expect to receive can be found here.

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class FHIService1(Service): name = 'demo.fhir.1' def handle(self) -> 'None': # Connection to use conn_name = 'FHIR.Sample' with self.out.hl7.fhir[conn_name].conn.client() as client: # This is how we can refer to patients patients = client.resources('Patient') # Get all active patients, sorted by their birth date result = patients.sort('active', '-birthdate') # Log the result that we received for elem in result: self.logger.info('Received -> %s', elem['name'])

Invoking the service will store in logs the data expected:

INFO - Received -> [{'use': 'official', 'family': 'Chalmers', 'given': ['Peter', 'James']}]

For comparison, this is what the FHIR server displays in its frontend. - the information is the same.

Storing data in FHIR servers

To save information in a FHIR server, create the required resources and call .save to permanently store the data in the server. Resources can be saved either individually (as in the example below) or as a bundle.

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class CommandsService(Service): name = 'demo.fhir.2' def handle(self) -> 'None': # Connection to use conn_name = 'FHIR.Sample' with self.out.hl7.fhir[conn_name].conn.client() as client: # First, create a new patient patient = client.resource('Patient') # Save the patient in the FHIR server patient.save() # Create a new appointment object appointment = client.resource('Appointment') # Who will attend it participant = { 'actor': patient, 'status':'accepted' } # Fill out the information about the appointment appointment.status = 'booked' appointment.participant = [participant] appointment.start = '2022-11-11T11:11:11.111+00:00' appointment.end = '2022-12-22T22:22:22.222+00:00' # Save the appointment in the FHIR server appointment.save() Learning what FHIR resources to use

The "R" in FHIR stands for "Resources" and the sample code above uses resources such a Patient or Appointment but how does one learn what other resources exist and what they look like? In other words, how does one learn the underlying data model?

First, you need to get familiar with the spec itself which, in addition to textual information, offers visualizations of the data model. For instance, here is the description of the Observation object, including details such as all the attributes an Observation is composed of as well as their multiplicities.

Secondly, do spend time with FHIR servers such as Simplifier. Use Zato services to create test resources, look them up and compare the results with what the spec says. There is no substitute for experimentation when learning a new data model.

FHIR security

Outgoing FHIR connections can be secured in several ways, depending on what a given FHIR requires:

  • With Basic Auth definitions
  • With OAuth definitions
  • With SSL/TLS. If the server is not a public one (e.g. it is in a private network with a private IP address), you may need to upload the server's certificate to Zato first if you plan to use SSL/TLS because, without it, the server's certificate may be rejected.
MLLP, HL7 v2 and v3

While FHIR is what new deployments use, it is worth to add that there are still other HL7 versions frequently seen in integrations:

  • Version 2, using its own MLLP protocol
  • Version 3, using XML

Both of them can be used in Zato services, in both directions. For instance, it is possible to both receive HL7 v2 messages as well as to send them to external applications. It is also possible to send v2 messages using REST in addition to MLLP.

More blog posts
Categories: FLOSS Project Planets

Ed Crewe: Software Development with Generative AI - 2024 Update

Sun, 2024-06-09 12:37

Why write an update?
I wrote a blog post on Software Development with Generative AI last year, which was questioning the approach of the current AI software authoring assistants. I believe the bigger picture holds true that to fully utilize AI to write software, will require an entirely different approach. Changing the job of a software developer in a far more radical manner and perhaps making many of today's software languages redundant.

However I also raised the issue that I found the current generative AI helpers utility questionable for seasoned developers:
"The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?
I think that question is currently debatable... (but it is improving rapidly) ... We may reach that point within a year or two"

Well it hasn't been a year or two, just 6 months. But I believe the addition of the Chat window to CoPilot and an improvement in the accuracy of its models has already made a significant difference. 

On balance I would now say that even a fluent programmer may get some benefits from its use. Given the speed of improvement it is likely that all commercial programming will use an AI assistant within a few years. 


To delay the inevitable and not embed it in to your work process is like King Canute commanding the sea to retreat. There are increasing numbers of alternatives available too. However as the market leader I believe it is worth going in to slightly more depth as to the current state of play with CoPilot.

Copilot Features

The new Chat window within your IDE gives you a context sensitive version of Copilot ChatGPT that can act as a pair programmer and code reviewer for your work. 

If you have enabled auto-complete then you instigate that usage by writing functional comments, ie prompts then tabbing out to accept the suggestions it responds with.

To override these prompts, you instead can use dot and get real code completion options (as long as your IDE is configured correctly). Since code completion has your whole codebase as context, it complements CoPilot reasonably well. But whilst the code completion is always correct, CoPilot is less so, probably more like 75% now compared to its initial release level of 50%

It takes some time to improve the quality of your prompting. An effort must be made to eradicate any nuance, assumption, implication or subtlety from your English. Precise mechanical instructions are what are required. However its language model will have learnt common usage. So if you ask it to sort out your variables it will understand that you mean replace all hardcoded values in the body of your code with a set of constants defined at the top, explain that is what it thinks you mean and give you the code that does that.

You can ask it anything about the usage of the language you are working in, how something should be coded, alternatives to that etc. So taking a pair programming approach and explaining what you are about to code and why to CoPilot chat as you go,  can be very useful. Given rubber duck programming is useful, having an intelligent duck that can answer back ... is clearly more so. 

It excels as a learning tool, largely replacing Googling and Stack Overflow with an IDE embedded search for learning new languages. But even for a language you know well, there can be details and nuances of usage you have overlooked or changes in syntactic standards with new releases you have missed.

You can also ask it to give your file a code review. Where it will list out a series of suggested refactors that it judges would improve it.

Copilot Limitations

Currently however there are many limitations, understanding them, helps you know how to use CoPilot and not turn it off in frustration at its failings! 

The most important one is that CoPilot's context is extremely limited. There is no RAG enhancement yet, no learning from your usage. It may seem to improve with usage, but that is just you getting better at using it. It does not learn about you and your coding style as you might expect, given a dumb shopping site does that as standard.

It does not create a user context for you and populate it with your codebase. It simply grabs the content of the currently edited file and the Chat prompt text and the language version for the session as a big query. The same for the auto-suggestion. But here the chat text is from the comments or doc strings on the lines preceding. 

Posting the lot to a fixed CoPilot LLM that is some months out of date. Although apparently it has weekly updates from continuous retraining. 

This total lack of context can mean the only way you can get CoPilot to suggest what you actually want is to write very detailed prompts. It is often simpler to just cut and paste example code as comments into the file - please rewrite blah like this ... paste example. Since only if its in the file or latest Chat question will it get posted to inform the response.

At the time of writing CoPilot is due to at least retain and learn from Chat window history to extend its context a little. But currently it only knows about the currently open file and latest Chat message. Other providers have tools that do load the whole code base, for example Cody, plus there are open source tools to post more of your code base to ChatGPT or to an open source LLM.

As this blog post update indicates, the whole area is evolving at an extremely rapid pace.

The model it has for a language is fixed and dated. Less so for the core language but for example you may use a newer version of the leading 3rd party Postgres library that came out 2 years ago. But the majority of users are still on the previous one since it is still maintained. Their syntax differs. Copilot may only know the syntax for the old library because that is what it was trained with, even though a later version is being imported in the file, so is in Copilot's limited context. So any chat window or code prompts it suggests will be wrong.

I have yet to find it brings up anything useful that I didn't know about the code when using the code review feature, plus the suggestions can include things that are inapplicable or already applied. But I am sure it would be more useful for learning a new language.

AI prompting and commenting issue

Good practise for software teams around code commenting are that you should NOT stick in functional comments that just explain what the next few lines do.  The team are developers and they can read the code as quickly for its base functionality. Adding lots of functional commenting makes things unclear by excessive verbosity.
It is something that is only done for teaching people how to code in example snippets. It has no place in production code.

Comments should be added to give wider context, caveats, assumptions etc. So commenting is all about explaining the Why, not the How.

Doc strings at the head of methods and packages can contain a summary of what the function does in terms of the codebase. So more functional in orientation, but as a big scale summary. So again they are a What not a How.

It looks like current AI assistants may mess that up. Since they need comments that are basically as close to pseudo code as possible. Adding information about real world issues, roadmap, wider codebase, integration with other services ... ie all the Why is likely to  confuse them and degrade the auto-complete.

Unfortunately code comments are not AI prompts for generating code and vice versa.
Which suggests that you may want to write a temporary prompt as a comment to generate the code, then replace it with a proper comment once it has served its purpose.

Or otherwise introduce a separate form of hideable prompt marked comment that make it clear what is for the AI and what is for the Human!

Alternatively use the chat window for code generation then paste it in.

Copilot Translation

Translation is an area where Copilot can be very beneficial. As a non-native English speaker you can interact with it in your own language for prompting and comments and it will handle that and translate any comments in the file to English if asked to.

Code translation is more problematic, since the whole structure of a program and common libraries can be different. But if the code is doing some very encapsulated common process. For example just maths operations, or file operations. It can extract the comments and prompts and regenerate the code into another language for you.

One can imagine that one day the only language anyone will need will be a very high level, succinct English-like language, eg. Python.
When you want to write in a verbose or low-level language. You just write the simpler prompts in a spoken language, but use Python when it is faster to communicate explicitly than spoken. Since spoken languages are so unsuited to creating machine instructions.
Press a button and Copilot turns the lot into verbose C or Java code with English comments.

Categories: FLOSS Project Planets

Ed Crewe: Software development with Generative AI

Sun, 2024-06-09 09:25
The Current State of AI Software GenerationThe user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?

The high level language

High level programming languages are human languages composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.

A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.

Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.
Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining  all edge cases etc.

Unlike English language, it can succinctly describe computer functionality in a few lines. 

The AI

A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.

It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt. 

For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.

Finally it returns an untested snippet of generated high level code.

The Non-Developer

The non-developer pastes it in place and tries to run the program with it in.

They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.

It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt. 

They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.

The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.

Or the Developer
The developer reads the code and understands it, determines if it should do what they want. Or if they just want to use some of it as an example.

They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.

Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.

Then they run their code and debug as usual.

At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.

For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.

The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?

I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.

Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a year or two. At which point professional software developer will need to be adept at using it as part of their toolset.

Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.

Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.

NOTE - June 2024 Update
Having come back to CoPilot 6 months later. I have come to appreciate some of its new features so have added a new blog post that accepts that it now provides utility even for the seasoned programmer.

Another WayCopilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.
Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.

This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.

There have been experiments with applying test driven development to AI generated code, but I have not heard of serious attempts to address the bigger picture...

  • Move all functional code writing to be AI only.
  • Remove the need for any high level computer language for humans to gain fluency in.
  • Have AI develop software by hundreds of thousands of iterative composition  TDD cycles.
  • Parallel refactoring thousands of solutions to arrive at the optimum one.
  • Use AI that understands the machine code it is generating by training it on the results of running that code. 
  • The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.
  • In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.
  • Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.
  • Finally we are left with the job of telling the computer what its software should do.
    We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.

Self-Programming Computers

Exploring the last point in more detail...

Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.

Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.

Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.

The software creator uses the DSL to create an extensive set of functional definitions for a project.

The DSL language design and evolution is optimised for LLM interpretation.  So it has very tight grammatical and syntactical usage that promote accurate generative outputs.

A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.

Some basic characteristics of the DSL:

  1. auto-formatting (like Go) minimizing syntactical variation
  2. To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'
    But strictly applied, rather than as a vague principle as Python does
  3. unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.
  4. the language will never be used to implement functionality
  5. uses simple english vocabulary and ideally minimal mathematical symbology

These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality. 

The specification DSL auto generates all the required tests in a low level language.

Since the system should also have a generative AI LLM trained for C or assembly language.
This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.

The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.

So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.

Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.

Also we don't need English or high level computer languages in the stack of mistranslation at all.
Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement.
BDD / TDD generative AI if you like.

Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.

This gives more freedom to widen the DSL to mature into a general precise AI prompt language.

Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.

Could AI could take my job?Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.
With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.
Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.

Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers.
Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.
Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages. 

Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.

Categories: FLOSS Project Planets

Jeremy Epstein: Introducing: Floyd-Warshall CSV Generator

Sat, 2024-06-08 20:00

I built a little Python script called the Floyd-Warshall CSV Generator. It takes a CSV of graph edges as input, and generates a CSV of the edges that are the shortest paths between all pairs of vertices.

The script is a simple wrapper of the SciPy floyd_warshall function, which in turn implements the Floyd-Warshall Algorithm. Hope you find it useful for all your directed (or undirected) weighted graph needs.

Given an input CSV of the following graph edges:

point_a,point_b,cost a,b,5 b,c,8 c,d,23 d,e,6

When the script is called as follows:

floyd-warshall-csv-generator &bsol /path/to/input_data.csv &bsol --vertex-i-column-name point_a &bsol --vertex-j-column-name point_b &bsol --weight-column-name cost &bsol --no-directed &bsol --max-weight 35

It generates an output CSV that looks like this:

point_a,point_b,cost a,b,5.0 a,c,13.0 b,c,8.0 b,d,31.0 c,d,23.0 c,e,29.0 d,e,6.0

That is, it generates all the possible (indirect) paths from one point to all other points, based on the (direct) paths that are already known, with duplicate (undirected) paths filtered out, and with paths whose cost is more than max-weight filtered out.

I wrote this script in order to generate the "all edges" data that's shown in the World Locality Transit Graph, which I'll also be blogging about real soon. Let me know if you put this script to any other interesting uses!

Categories: FLOSS Project Planets

Pythonicity: GraphQL cursors

Sat, 2024-06-08 20:00
Contrarian view on cursor-based pagination.

GraphQL documentation recommends cursor-based pagination, and it has subsequently become a popular standard.

In general, we’ve found that cursor-based pagination is the most powerful of those designed. Especially if the cursors are opaque, either offset or ID-based pagination can be implemented using cursor-based pagination (by making the cursor the offset or the ID), and using cursors gives additional flexibility if the pagination model changes in the future. As a reminder that the cursors are opaque and that their format should not be relied upon, we suggest base64 encoding them. …

{ hero { name friends(first: 2) { totalCount edges { node { name } cursor } pageInfo { endCursor hasNextPage } } } }

There are several oversights with this well-intentioned advice.

Cursors and state

Cursors imply state, at least they used to. A database cursor is used for iterating over a result set. Meaning it has transactional integrity to pick up where it left off.

The vast majority of GraphQL APIs are inherently stateless. The “cursor” is being decoded as input to a new request, and offers no guarantees. From this observation, the advice falls apart.

The problem with stateless pagination is inconsistency; items may shift, appear, or disappear. Which gives the client the perception of missing or duplicate items. This happens regardless of whether the pagination is offset or ID based. Arguably worse in the case of IDs, since the reference can move arbitrarily or be gone.

Cursors don’t solve the consistency problem; they give the client the false impression of solving the problem.

Opaqueness and compatibility

The claim is that an opaque cursor is compatible across changes. Changed to do what exactly, would be the more relevant question.

Taking a step back, what is the problem being solved here? We assume there is a list of items, with an inherent ordering, and too many to return to the client with acceptable performance.

Given those assumptions, the first obvious step is an optional size limit. That is not in dispute; the disagreement if over the “offset”. A simple and versatile solution is a range filter over whatever field(s) is relevant to ordering. This is not even remotely controversial when the field in question has a name like date. In other words, “pagination” is not necessarily the problem that needs solving.

Range filters with a size limit are sufficient to implement pagination, and new optional filters are always backwards compatible. They also offer the flexibility of search, whereas cursors can only be used iteratively. And what if the client does not want visibility into the range filters? That is exactly what offset is for; offset is a range filter over an implied index field.

There is a reason why the recommendation does not offer a useful example of this supposed compatibility; there isn’t one. The advice is equivocating on the ambiguity of an after: $ID filter. Is the ID field relevant to the ordering?

  • If yes, then it is just another range filter
  • If no, then it is just another placeholder for index

There is no third case. There is no future secret field that relates to ordering, is relevant to the client, but somehow still opaque to the client.

Stateless pagination is a combination of range filters and size limits. No matter what the input fields are called. A true stateful is cursor is opaque precisely because it does not represent any known field.

Next optimization

The “next” piece of advice is that the cursor implementation should indicate whether another request is worthwhile. Again, in a stateless API, the server can make no such guarantee.

If the server can provide a total count, by all means do so. It solves the “next” problem, and is more generally useful.

If it is not feasible for the server to provide a total count, how is it going to implement whether there are more items? At the data layer, it is going to stop processing at N + 1 items instead of the requested N. The client could do that too. Instead of requesting the next 10, it could go to 11.

Better yet, why stop at the server optimizing for N + 0? If it knows there is just 1 more item, why not go ahead and include that last one too. N + 2 anyone? Obsessing over the last “next” is a pointless micro-optimization, all the more so because it is irrelevant whenever the total count is not coincidentally a multiple of N. If N is arbitrary, then optimizing for a particular residue mod N is clearly arbitrary.

API design

Not only is there no good reason to blindly add opaque cursors, there is also no reason to add range filters before needed. A size limit alone solves the first order of magnitude of performance issues. If a client requests the first 10 items, then needs the next 10, actually pressure test whether it is unreasonable to request the first 20. The advantage is the client then has a consistent snapshot of the first 20 regardless of changes, which could provide a better user experience.

A simple strategy for pagination: start with none. Then proceed to next steps as performance warrants.

  1. size limit
  2. range filter on known field(s)
  3. offset

In the unlikely event your API is stateful, you didn’t need this advice because you already had a cursor. Otherwise, cursors are an overly-complicated useless abstraction.

Categories: FLOSS Project Planets

Ga&#235;l Varoquaux: Promoting open-source, from inria to :probabl.

Sat, 2024-06-08 18:00

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

Prelude: funding scikit-learn is hard

Scikit-learn is a central software component in today’s machine learning landscape, and it is open source, governed by a community, easy to install, and well documented. It started many years ago as a project that we did on the side, and we were joined by many volunteers, which was key to the success of the project. We soon decided to ensure that scikit-learn was not only a volunteer-based effort. Over more than a decade, I’ve dedicated a lot of energy to this, using a variety of funding mechanisms: first grants (as an academic), then sponsoring and related contracts with various actors.

Digital commons eliminate scarcity and exclusivity

Funding digital commons is really hard. People build fortunes by leveraging competitive advantages, by creating lock-ins, or selling access to data. What makes a great open-source library, as scikit-learn, is exactly what prevents these tricks: we are committed to being independent, easy to use and install, lightweight…

The birth of a new ambition

Scikit-learn is very successful, but it could be more. For instance, it does not facilitate pushing to production as much as tensorflow, which can be served, deployed to android… And scikit-learn is not very visible to top decision makers: it’s not a line on their budget, a brand that they know. As a consequence, it is not reaping the benefit of its success [1].

[1]Many commercial tools are sitting on top of open source software like scikit-learn (splunk, sagemaker, to name only a few), making profits, and not helping in any way the open source world that they build upon. The French government is backing us to push the envelope

3 years ago, the French government challenged us to go further, to consolidate the ecosystem into a consistent data-science commons. The strategic interest of France is to preserve some technological autonomy on data, eg sensitive data. Thus, the government offered us, at Inria, a funding opportunity to go further.

They promised us a lot of money (dozens of millions of Euros), but with a specific mission to develop a sustainable “data-science commons” [2] ecosystem around scikit-learn. I’ll spare you the details of the amount of meetings we had, documents that we wrote, to sketch the outline of the project. I pushed forward a vision of technical components that fit in the broader open-source ecosystem, complementing it.

[2]The letter that we received from the French government specifically defines the objective in these words: “data-science common” (“Communs numériques pour la Science des Données”)

As I moved forward, I faced a difficulty: the French government wanted a sustainability plan, and private investment to back it. To be honest, this is not what I’m good at. François Goupil, the COO of the scikit-learn consortium, was helping me, but we needed more for our ambitions. And this is when we started talking to Yann Lechelle, a tech entrepreneur with an impressive track record interested in the impact of France on the global tech world.

Probabl, a mission-driven enterprise

With Yann, we built a new vision. Our challenge is to be long-term sustainable and virtuous for scikit-learn, its broader ecosystem, and its community. Yann brought in a business point of view, and I tried to bring that of open-source communities beyond probabl [3], for instance avoiding to getting in the way of others building businesses that contribute to scikit-learn. Indeed, we are convinced that having a broad and diverse community around scikit-learn is central to its future.

[3]One of the first things that Probabl did (Guillaume Lemaître, to be specific), was submit a grant application (to the Chang-Zuckenberg Institute), to fund, via NumFocus, a developer employed by Quantsight, with no money transiting via Probabl (one reason being that we have no operations outside of Europe so far).

Our sustainability model is still being finetuned. What I can tell is that it will involve a mix of professional service, support & sponsorship agreement, as well as a product-based offer, where we supplement scikit-learn with enterprise features. Our focus will be on features that are typically not the focus of open-source developers: integration in large structures, such as access control, LDAP connection, regulatory compliance. We will not shoehorn scikit-learn in open core or dual licensing approaches: we want our incentives to be aligned with scikit-learn, and its ecosystem, being as complete as possible.

Foster growth and adoption of our open-source stack

In a sense, our inspiration is that of RedHat, where the growth of the company fosters the growth and adoption of the software (Linux in the case of RedHat), beyond the company, in an ecosystem, and for a wide variety of applications.

Strong growth will mean external capital. To ensure that we do not lose the focus on our mission, building data-science commons, Yann penciled down a specific governance of the company (and then validated it with many people, as we are a spin-off from a governmental organization). The ultimate share structure, and the board, are divided in three electoral colleges: one for outside investors, one for founders and employees, and one for public institutions. This ensures a balance of power that hopefully will keep us aligned to our mission. I think that this structure sends a strong signal that we are not just another for-profit that will go from creating useful tech to dark money-generating patterns.


Probabl is already having an impact

A strong open-source team In February, the whole team developing scikit-learn at Inria moved to Probabl, joined by Adrin Jalali, a Berlin-based core developer of scikit-learn and fairlearn. We’ve been hiring excellent people, and we now have 9 people on open-source (see the Probabl team), spending their time contributing to open source (Jérémie, for instance, has been doing the last releases for scikit-learn).

Fostering an ecosystem Probabl is not only about scikit-learn. We are prioritizing 8 libraries, central to the machine-learning and data science ecosystem: joblib, fairlearn, imbalanced-learn… In general, as we have always done, we will not hesitate contributing to upstream or related projects. Our goal is to have a healthy open-source ecosystem around data-science.

Not only software Not everybody sees the important lines of code. I’ve become increasingly aware of the need to do outreach and communication, to coders, but also to decision makers. At Probabl we dedicate energy to be in business meetings, to participate in the tech narrative, to teach how to best do data science, eg with didactic videos. We’re starting a mentioning program, we’ll be organizing sprints… I am convinced that all this is a useful long-term investment.


My position within Probabl, my vested interests

I am a French civil servant (a researcher at Inria, one of our national research institute). Such a position comes with strong responsibilities to control conflicts of interest. The creation of Probabl underwent strict scrutiny (that took a long long time). I have been recently cleared to take an active role: 10% of my time is allocated to be a scientific and open-source advisor for Probabl.

I am not paid by Probabl. 100% of my salary comes from Inria (and I was not given a raise because of my involvement in Probabl). I do have financial interests as a founder, but given that I have a small active part, I have one of the smallest amount of shares among founders.

My main interest in Probabl is really the success of its mission: the long-term growth of an open-source data-science ecosystem. Spinning-off from Inria actually continues my efforts in this direction, but with more agility and breadth. And having on top of open source a variety of complementary commercial activities makes it stronger, by answering better the needs of some actors.

More to come

There are many things that we are still ironing. Clearing out specific details takes time (for instance, clearing my role took a while). We are still to announce the future of the sponsorship program that we had set up at the Inria foundation. Its mission has been transferred to Probabl. Currently, Probabl’s open source team is ensuring continuity of our work with the existing sponsors. But we will set up broader partnership opportunities, with a similar governance, that enable third-parties to invest in open source on a roadmap decided jointly with the open-source community.

I believe that we need a lot of transparency in how we decide upon priorities in our open source team. Our 2024 priorities for scikit-learn are visible here.

I look forward to when Probabl will start adding value to scikit-learn for enterprises with an offer enriching scikit-learn and the broader open-source ecosystem.

I am acutely aware that good open source is made of communities, and that communities need trust and understanding of big players such as Probabl (well, so far we are not that big). I hope that with time our actions will become easy to read and speak of themselves.

Categories: FLOSS Project Planets

Trey Hunner: A beautiful Python monstrosity

Sat, 2024-06-08 17:30

Creating performance tests for Python Morsels exercises is a frequent annoyance

I loathe writing automated tests for performance-related exercises because they’re always flaky. How flaky depends on the exercise, what I’m testing, and the time variability inherent in the particular Python features that a learner might use.

I came up with a solution for flaky tests recently, but it also makes my tests less readable. I then came up with a tool to improve the readability, but that has its own trade-offs.

The code I eventually came up with is a beautiful Python monstrosity.

1 2 3 4 5 6 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n)

I’ll explain what that code does, but first let’s talk about why it’s needed.

The flaky performance tests

My flaky performance tests initially looked like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def test_some_test(self): n, m = 2.45, 2.04 micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The first block runs a performance test for the user’s function on a very small list and on a slightly larger list and then asserting that the slightly larger list didn’t take too much longer to run. The next two blocks run the same code on even larger lists and make further assertions about the relative times that the code took to run.

This roughly approximates the time complexity of this code.

Running performance checks in a loop

These performance checks need to:

  1. Predictably fail for inefficient solutions
  2. Predictably pass for efficient solutions
  3. Run fast (within just a few seconds) even when the code is inefficient
  4. Avoid the use of threading because they’ll be running on WebAssembly in the browser
  5. Run consistently on pretty much any computer

These 5 requirements together have caused me countless headaches. I get the tests passing well, but they don’t always fail when they should. I get the tests failing and passing when they should, but then they’re too slow. And so on…

Notice the n and m factors in the above assertions:

1 self.assertLess(small_time, micro_time*n*m)

If n and m are too big, we’ll get false positives (tests passing when they should fail). If n and m are too small, we’ll get false negatives (tests failing when they should pass).

To avoid both Type I and Type II errors, I decided to keep n and m small but attempt the assertion block multiple times.

Here’s the (far less flaky) revised code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def test_some_test(self): n, m = 2.45, 2.04 for attempts_left in reversed(range(10)): try: micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(5)): try: small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(3)): try: medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n) break except AssertionError: if attempts_left == 0: raise

The for loop runs the code multiple times, the break statement stops the code as soon as the assertions all pass, and the except and if ensure that any assertion errors are suppressed until/unless we’re on the final iteration of the loop.

Let’s call this a for-try-break-except-if-raise pattern. It’s an absurdly verbose name fitting of absurdly verbose code.

This for-try-break-except-if-raise pattern works pretty well! But it’s not pretty.

Like many programmers, I believe that Don’t Repeat Yourself (DRY) need not apply to tests. Tests are allowed to be repetitive if the verbosity improves readability.

But there is so much noise in that code! I decided that removing some noise might improve readability. So I devised a helper utility to reduce the repetition.

In search of a solution

While pondering the repetitive noise in this code, I wondered what Python features I could use to abstract away this for-try-break-except-if-raise pattern.

Could I make a context manager and use a with block? That might help with the try-except, but context managers can’t run their code block multiple times, so that wouldn’t help with the for and the break. So a context manager is out.

Could I abstract this away into a looping helper by implementing a generator function? We are looping and generator functions can break early. But, a generator function can’t catch an exception that’s raised within the body of a loop. So a generator function wouldn’t work either.

What about a decorator? 🤔

Context managers and decorators both sandwich a block of code. But decorators sandwich functions and they have the power to run the same function repeatedly. A decorator might work!

Here’s a decorator that will run a given function up to 10 times (until no AssertionError is raised):

1 2 3 4 5 6 7 8 9 def try_10_times(function): def wrapper(): for attempts_left in reversed(range(10)): try: return function() except AssertionError: if attempts_left == 0: raise return wrapper

To use this decorator, we would need to define a function and then call that function:

1 2 3 4 5 6 7 @try_10_times def assertions(): micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) assertions()

This isn’t quite good enough though…

  1. We need a pattern to run code N times (not necessarily exactly 10)
  2. We reference the variables defined in each block in later blocks, so micro_time and tiny_time will need to be available outside that function
  3. We need this function to run just one time right after it’s defined… could we do that automatically?

All 3 of these problems are solvable:

  1. We need a decorator that accepts arguments
  2. We need to use rarely seen nonlocal statement
  3. We could have the decorator automatically call the decorated function
The final weird decorator

Here’s the decorator I ended up with:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def attempt_n_times(n): """ Run tests multiple times if assertions are raised. Allows for more forgiving tests when assertions may be a bit flaky. """ def decorator(function): """This looks like a decorator, but it actually runs the function!""" for attempts_left in reversed(range(n)): try: return function() except AssertionError: if attempts_left == 0: raise return decorator

This decorator accepts an n argument which determines the maximum number of times the decorated function should be called. The decorator then calls the function repeatedly in a for loop and a try-except block. As soon as an AssertionError is not raised during one of these function calls, the looping stops.

The weirdest part about this decorator is that it calls the decorated function. Note that the decorator function doesn’t define a wrapper function within itself… it just runs code right away!

The resulting beautiful Python monstrosity

Here’s the final refactored test code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 def test_some_test(self): n, m = 2.45, 2.04 micro_time = tiny_time = small_time = medium_time = 0 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) @attempt_n_times(5) def _(): nonlocal small_time small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) @attempt_n_times(3) def _(): nonlocal medium_time medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The attempt_n_times decorator immediately calls the function it decorates. Each function is defined and immediately called one or more times, in a try-except block within a loop.

That’s why we’ve named these functions with the throwaway _ name: we don’t care about the name of a function we’re never going to refer to again.

Also note the use of the nonlocal statement. Each function in Python has its own scope and all assignments assign to the local scope by default. That nonlocal variable pulls those variables to the scope of the outer function instead.

Compare the above code to the code just before this refactor:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def test_some_test(self): n, m = 2.45, 2.04 for attempts_left in reversed(range(10)): try: micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(5)): try: small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(3)): try: medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n) break except AssertionError: if attempts_left == 0: raise

I find the refactored version easier to skim.

But that attempt_n_times decorator does abuse the decorator syntax. Decorators aren’t meant to call the function they’re decorating.

Is this misuse of decorators worth it?

Is this worth it?

Decorators aren’t supposed to immediately call the function they decorate. But there’s nothing stopping them from doing so. I feel that I’ve traded “normal code” for a beautiful monstrosity that’s easier to skim at a glance.

The attempt_n_times decorator is pretending that it’s a block-level tool by using a function because there’s no other way to invent such a tool in Python.

I think abstracting away the for-try-break-except-if-raise pattern was worth it, even though I ended up abusing Python’s decorator syntax in the process.

What do you think? Was that attempt_n_times abstraction worth it?

Categories: FLOSS Project Planets

Talk Python to Me: #465: The AI Revolution Won't Be Monopolized

Sat, 2024-06-08 04:00
There hasn't been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI. I hope you join us for this excellent conversation about the future of AI and open source.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/porkbun'>Porkbun</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Ines Montani on Twitter</b>: <a href="https://twitter.com/_inesmontani" target="_blank" rel="noopener">@_inesmontani</a><br/> <b>spaCy</b>: <a href="https://spacy.io" target="_blank" rel="noopener">spacy.io</a><br/> <b>Prodigy App</b>: <a href="https://prodi.gy" target="_blank" rel="noopener">prodi.gy</a><br/> <b>Ines' presentation at PyCon Lithuania</b>: <a href="https://www.youtube.com/watch?v=SsnDN7LI7IY" target="_blank" rel="noopener">youtube.com</a><br/> <b>LM Studio</b>: <a href="https://lmstudio.ai" target="_blank" rel="noopener">lmstudio.ai</a><br/> <b>Little Bobby Tables</b>: <a href="https://xkcd.com/327/" target="_blank" rel="noopener">xkcd.com</a><br/> <br/> <b>spaCy and NLP course</b>: <a href="https://talkpython.fm/spacy" target="_blank" rel="noopener">talkpython.fm</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=zaZrWZwKJH4" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/465/the-ai-revolution-wont-be-monopolized" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Anwesha Das: Event Driven Ansible, what, why and how?

Fri, 2024-06-07 14:02

Ansible Playbooks is the known term, now there is a new term which is being floted in the project, which is Ansible Rulebooks. Today we are going to discuss about Ansible&aposs journey from Playbook to Rulebook rather Playbook with Rulebook.

What is Event Driven Ansible?

What is Event Driven Ansible? In simple terms, some action is triggered by some events. The idea of EDA comes from Event driven architecture. Event driven ansible runs code automatically based on received event notifications.

Some important terms:

What is event in Event Driven Ansible?

The event is the notification of a certain incident.

Where do we get the events from?

We get the events from event sources. Ansible EDA provides different pulgins to support various event sources. There are several event source plugins such as :
url_check (checking the http status code), webhook (providing and checking events from webhook), journald (monitoring the journald logs) and the list goes on.

When to take actions?

Rulebook defines conditions and actions in case of fulfilling those actions. Conditions use operators as strings, boolean and numerical data. And actions are occurrence of events once the conditions are met. Running a playbook, setting a fact, running a module etc.

Small example Project

Here is a small example of Event Driven Ansible and how it is run. The idea is on receiving of a message (here the number 42) a playbook will run in the host. There are the following 3 files :

demo_rule.yml --- - name: Listen for events on a webhook hosts: all sources: - ansible.eda.webhook: host: 0.0.0.0 port: 8000 rules: - name: Say thank you condition: event.payload.message == "42" action: run_playbook: name: demo.yml

This is the rulebook. We are using the webhook plugin here as the event source. As a rule in the event of receiving the message 42 as json payload in the webhook, we run the playbook called demo.yml

demo.yml - hosts: localhost connection: local tasks: - debug: msg: "Thank you for the answer."

demo.yml, the playbook which run on the occurrence of the event mentioned in the rulebook and prints a debug message.

--- local: hosts: localhost

inventory.yml mentions the hosts to run the action against.

Further there are 2 files to one to test 42.json and 43.json to test the code.

{ "message" : "42" } { "message" : "43" }

First we have to install all related dependencies before we can run the rulebook.

$ python -m venv .venv $ source .venv/bin/activate $ python -m pip install ansible ansible-rulebook ansible-runner psycopg $ ansible-galaxy collection install ansible.eda $ ansible-rulebook --rulebook demo_rule.yml -i inventory.yml --verbose

Go to another terminal and on the same directory path and run the following command to test the Rulebook. After receiving the message, the playbook runs.

curl -X POST -H "Content-Type: application/json" -d @42.json 127.0.0.1:8000/endpoint Output 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting sources 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting rules ... TASK [debug] ******************************************************************* ok: [localhost] => { "msg": "Thank you for the answer." } PLAY RECAP ********************************************************************* localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2024-06-07 16:50:08,224 - ansible_rulebook.action.runner - INFO - Ansible runner Queue task cancelled 2024-06-07 16:50:08,225 - ansible_rulebook.action.run_playbook - INFO - Ansible runner rc: 0, status: successful

Now if we run the other json file 43.json we see that the playbook does not run even after the http status code being 200.

curl -X POST -H "Content-Type: application/json" -d @43.json 127.0.0.1:8000/endpoint

Output :

2024-06-07 18:20:37,633 - aiohttp.access - INFO - 127.0.0.1 [07/Jun/2024:17:20:37 +0100] "POST /endpoint HTTP/1.1" 200 159 "-" "curl/8.2.1"

You can try this yourself follwoing this git repository.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #207: Decomposing Software Problems &amp; Avoiding the Trap of Clever Code

Fri, 2024-06-07 08:00

How do you effectively break a software problem into individual steps? What are signs you're writing overly clever code? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Insider: Python 3.12.4 released

Thu, 2024-06-06 17:50

I'm pleased to announce the release of Python 3.12.4:

https://www.python.org/downloads/release/python-3124/

 This is the third maintenance release of Python 3.12

Python 3.12 is the newest major release of the Python programming language, and it contains many new features and optimizations. 3.12.4 is the latest maintenance release, containing more than 250 bugfixes, build improvements and documentation changes since 3.12.3.

 Major new features of the 3.12 series, compared to 3.11  New features Type annotations Deprecations
  • The deprecated wstr and wstr_length members of the C implementation of unicode objects were removed, per PEP 623.
  • In the unittest module, a number of long deprecated methods and classes were removed. (They had been deprecated since Python 3.1 or 3.2).
  • The deprecated smtpd and distutils modules have been removed (see PEP 594 and PEP 632. The setuptools package continues to provide the distutils module.
  • A number of other old, broken and deprecated functions, classes and methods have been removed.
  • Invalid backslash escape sequences in strings now warn with SyntaxWarning instead of DeprecationWarning, making them more visible. (They will become syntax errors in the future.)
  • The internal representation of integers has changed in preparation for performance enhancements. (This should not affect most users as it is an internal detail, but it may cause problems for Cython-generated code.)

For more details on the changes to Python 3.12, see What’s new in Python 3.12.

 More resources  Enjoy the new releases

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.


Your release team,
Thomas Wouters
Łukasz Langa
Ned Deily
Steve Dower 

Categories: FLOSS Project Planets

Python Insider: Python 3.13.0 beta 2 released

Thu, 2024-06-06 17:46

I'm pleased to announce the release of Python 3.13 beta 2.

https://www.python.org/downloads/release/python-3130b2/

 

This is a beta preview of Python 3.13

Python 3.13 is still in development. This release, 3.13.0b2, is the second of four beta release previews of 3.13.

Beta release previews are intended to give the wider community the opportunity to test new features and bug fixes and to prepare their projects to support the new feature release.

We strongly encourage maintainers of third-party Python projects to test with 3.13 during the beta phase and report issues found to the Python bug tracker as soon as possible. While the release is planned to be feature complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (Tuesday 2024-07-30). Our goal is to have no ABI changes after beta 4 and as few code changes as possible after 3.13.0rc1, the first release candidate. To achieve that, it will be extremely important to get as much exposure for 3.13 as possible during the beta phase.

Two particularly noteworthy changes in beta 2 involve the macOS installer we provide:

  • The minimum supported macOS version was changed from 10.9 to 10.13 (High Sierra). Older macOS versions will not be supported going forward.
  • The macOS installer package now includes an optional additional build of Python 3.13 with the experimental free-threading feature enabled. The free-threaded version, python3.13t, is separate from and co-exists with the traditional GIL-only installation. The free-threaded build is not installed by default; use the Customize option of the installer as explained in the installer readme. Since this is an experimental feature, there may be late-breaking issues found; see the free-threaded macOS build issue on GitHub for the most recent status.

Please keep in mind that this is a preview release and its use is not recommended for production environments.

 Major new features of the 3.13 series, compared to 3.12

Some of the new major new features and changes in Python 3.13 are:

New features Typing Removals and new deprecations
  • PEP 594 (Removing dead batteries from the standard library) scheduled removals of many deprecated modules: aifc, audioop, chunk, cgi, cgitb, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib, lib2to3.
  • Many other removals of deprecated classes, functions and methods in various standard library modules.
  • C API removals and deprecations. (Some removals present in alpha 1 were reverted in alpha 2, as the removals were deemed too disruptive at this time.)
  • New deprecations, most of which are scheduled for removal from Python 3.15 or 3.16.

(Hey, fellow core developer, if a feature you find important is missing from this list, let Thomas know.)

For more details on the changes to Python 3.13, see What’s new in Python 3.13 . The next pre-release of Python 3.13 will be 3.13.0b3, currently scheduled for 2024-06-25.

 More resources  Enjoy the new releases

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.

Your release team,
Thomas Wouters
Łukasz Langa
Ned Deily
Steve Dower 

 

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Python in Visual Studio Code – June 2024 Release

Thu, 2024-06-06 10:44

We’re excited to announce the June 2024 release of the Python and Jupyter extensions for Visual Studio Code!

This release includes the following announcements:

  • VS Code Native REPL for Python with Intellisense and syntax highlighting
  • Pytest improvements in the testing rewrite

If you’re interested, you can check the full list of improvements in our changelogs for the Python, Jupyter and Pylance extensions.

VS Code Native REPL for Python with Intellisense and syntax highlighting

Starting in this release, we are experimenting with a new REPL in the Python extension which includes features such as Intellisense and syntax highlighting to make your Python development experience more efficient. For those familiar with Jupyter’s Interactive Window, this REPL may look similar; however, it has two key differentiators: it does not rely on the Jupyter extension, nor does it require a kernel to be installed in your development environment. This VS Code Native REPL for Python also adheres to principles present in the REPL built-in to Python itself, in that history is immutable.

To enable this feature, set "python.REPL.sendToNativeREPL": true in your settings.json file. This will execute code in the VS Code Native REPL on Shift+Enter and Run Selection/Line. Moreover, the Native REPL will smartly execute on Enter, similar to Python’s original interactive interpreter, if you add the setting "interactiveWindow.executeWithShiftEnter": false, in your settings.json. You can opt to continue to use the REPL built-in to Python located in the terminal ( >>> ) by setting "python.REPL.sendToNativeREPL": false in your settings.json.

As we continue iterating on this feature, all feedback is welcomed and can be made as issues in our GitHub repository.

Pytest improvements in the testing rewrite

The experience with pytest when using the Python Testing Rewrite has been improved to better support setting pytest’s cwd (current working directory) when it is adjacent to the VS Code workspace root, and for displaying parameterized tests on the Test Explorer when function names are repeated across classes. Additionally, we reduced test discovery failure scenarios by adding the system configuration script path to PATH to enable shells for test execution.

As we continue to add improvements to the testing experience under the rewrite to make the experience more stable and performant, we will begin adopting the rewrite as default in the upcoming month on pre-release of the Python extension.

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python and Jupyter Notebooks in Visual Studio Code. Some notable changes include:

  • You can now enable/disable auto-indent with Pylance without having to restart the server (pylance-release#5778)
  • “Implement all inherited abstract classes” Code Action is now available as a Quick Fix for the reportAbstractUsage diagnostic (pylance-release#5757)

We would also like to extend special thanks to this month’s contributors:

Call for Community Feedback

As we are planning and prioritizing future work, we value your feedback! Below are a few issues we would love feedback on:

Try out these new improvements by downloading the Python extension and the Jupyter extension from the Marketplace, or install them directly from the extensions view in Visual Studio Code (Ctrl + Shift + X or ⌘ + ⇧ + X). You can learn more about Python support in Visual Studio Code in the documentation. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code – June 2024 Release appeared first on Python.

Categories: FLOSS Project Planets

The Python Show: 43 - Python Image Processing with Pillow

Thu, 2024-06-06 09:46

In this episode, Mike has Alex Clark on the show. Alex is the leading developer of the fork of the Python Imaging Library (PIL), known as Pillow.

We chatted about the following topics:

  • The origins of Pillow

  • Lessons learned in open source development

  • Surprising use cases for Pillow

  • AI and Pillow

  • and much more!

Show Links
Categories: FLOSS Project Planets

Pages