Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 9 hours 16 min ago

Real Python: Quiz: Python's Built-in Exceptions: A Walkthrough With Examples

Wed, 2024-07-31 08:00

In this quiz, you’ll test your understanding of the most commonly used built-in exceptions in Python.

Exception handling is a core topic in Python. Knowing how to use some of the most common built-in exceptions can help you to debug your code and handle your own exceptions.

Good luck!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #640 (July 30, 2024)

Tue, 2024-07-30 15:30

#640 – JULY 30, 2024
View in Browser »

Build Captivating Display Tables in Python With Great Tables

Do you need help making data tables in Python look interesting and attractive? How can you create beautiful display-ready tables as easily as charts and graphs in Python? This week on the show, we speak with Richard Iannone and Michael Chow from Posit about the Great Tables Python library.
REAL PYTHON podcast

Overview of the Module itertools

This article proposes the top 3 iterators that are most useful from the module itertools, classifies all of the 19 iterators into 5 categories, and then provides brief usage examples for all the iterators in the module itertools.
RODRIGO GIRÃO SERRÃO • Shared by Rodrigo Girão Serrão

Take a Free Course. It’s on us

Learn how to speed up Python programs on NVIDIA GPUs using Numba, a type-specializing just-in-time compiler. Join the NVIDIA Developer Program to take our ‘Fundamentals of Accelerated Computing with CUDA Python’ course for free →
NVIDIA sponsor

Asyncio Event Loop in Separate Thread

Typically, the asyncio event loop runs in the main thread, but as that is the one used by the interpreter, sometimes you want the event loop to run in a separate thread. This article talks about why and how to do just that.
JASON BROWNLEE

Quiz: Python Type Checking

In this quiz, you’ll test your understanding of Python type checking. You’ll revisit concepts such as type annotations, type hints, adding static types to code, running a static type checker, and enforcing types at runtime. This knowledge will help you develop your code more efficiently.
REAL PYTHON

Quiz: Build a Blog Using Django, GraphQL, and Vue

In this quiz, you’ll test your understanding of building a Django blog back end and a Vue front end, using GraphQL to communicate between them. This will help you decouple your back end and front end, handle data persistence in the API, and display the data in a single-page app (SPA).
REAL PYTHON

PEP 751: A File Format to List Python Dependencies for Installation Reproducibility (New)

This PEP proposes a new file format for dependency specification to enable reproducible installation in a Python environment.
PYTHON.ORG

pytest 8.3 Released

PYTEST.ORG

Django 5.1 RC 1 Released

DJANGO SOFTWARE FOUNDATION

Discussions Interesting Topics for an Advanced Python Lecture?

DISCUSSIONS ON PYTHON.ORG

Articles & Tutorials Wide Angle Lens Distortion Correction With Straight Lines

Discusses how to estimate and correct wide-angle lens distortion using straight lines in an image. It covers techniques like the Radon transform, Hough transform, and an iterative optimization algorithm to estimate the distortion parameters and undistort the image. The author also provides Python code to match the division-based undistortion model to the OpenCV distortion model.
HUGO HADFIELD

Testing Python Integration With an Azure Eventhub

Using an Azure EventHub with Python is pretty easy thanks to Azure SDK for Python. However, ensuring that your code actually send events into an event hub in a reliable and automated way can be a bit harder. This article demonstrates how you can achieve this thanks to asyncio, docker and pytest.
BENOÎT GODARD • Shared by Benoît Godard

Crunchy Bridge Integrates Postgres with DuckDB

Postgres excels in managing transactional databases. DuckDB offers fast performance for queries and data analysis. Integrating these two databases provides a hybrid solution leveraging the strengths of both transactional and analytical workloads.
CRUNCHY DATA sponsor

pandas GroupBy: Grouping Real World Data in Python

In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.
REAL PYTHON course

10 Open-Source Tools for Optimizing Cloud Expenses

The cloud gets you scale, but it can also be complicated to price properly. This article covers ten different open source tools that you can use to optimize your deployment and understand the associated costs.
TARUN SINGH

Hugging Face Transformers: Open-Source AI With Python

As the AI boom continues, the Hugging Face platform stands out as the leading open-source model hub. In this tutorial, you’ll get hands-on experience with Hugging Face and the Transformers library in Python.
REAL PYTHON

Tanda Runner: A Personalized Running Dashboard

This post talks about a new dashboard tool for visualizing your Strava running data and getting personalized recommendations for your next big race. It is built using Django and includes a LLM integration.
DUARTE O.CARMO

You Don’t Have to Guess to Estimate

“There are roughly three senses of ‘estimate.’ One is ‘a prediction of how much something will cost.’ One is ‘a guess.’ But another definition is a rough calculation.”
NAT BENNETT

Using else in a Comprehension

While list comprehensions in Python don’t support the else keyword directly, conditional expressions can be embedded within list comprehension.
TREY HUNNER

TIL: Difference Between __getattr__ and __getattribute__

A quick post on the the difference between __getattr__ and __getattribute__.
RODRIGO GIRÃO SERRÃO

Projects & Code taipy: Turns Data Into a Web App

GITHUB.COM/AVAIGA

posting: The Modern API Client That Lives in Your Terminal

GITHUB.COM/DARRENBURNS

django-sql-explorer: Share Data With SQL Queries

GITHUB.COM/EXPLORERHQ

pyxel: A Retro Game Engine for Python

GITHUB.COM/KITAO

Herbie: Retrieve Weather Prediction Data

GITHUB.COM/BLAYLOCKBK • Shared by Brian Blaylock

Maelstrom: A Clustered Test Runner for Python and Rust

GITHUB.COM/MAELSTROM-SOFTWARE • Shared by Neal Fachan

Events Weekly Real Python Office Hours Q&A (Virtual)

July 31, 2024
REALPYTHON.COM

Canberra Python Meetup

August 1, 2024
MEETUP.COM

Sydney Python User Group (SyPy)

August 1, 2024
SYPY.ORG

Django Girls Ecuador 2024

August 3, 2024
OPENLAB.EC

Melbourne Python Users Group, Australia

August 5, 2024
J.MP

STL Python

August 8, 2024
MEETUP.COM

Happy Pythoning!
This was PyCoder’s Weekly Issue #640.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Real Python: Simulate a Text File in Python

Tue, 2024-07-30 10:00

Testing applications that read files from a disk can be challenging. Issues such as machine dependencies, special access requirements, and slow performance often arise when you need to read text from a file.

In this Code Conversation with instructor Martin Breuss, you’ll discover how to simplify this process by simulating text files with StringIO from the io module in Python’s standard library.

In this video course, you’ll learn how to:

  • Use io.StringIO to simulate a text file on disk
  • Perform file operations on a io.StringIO object
  • Decide when to use io.StringIO and when to avoid it
  • Understand possible alternatives
  • Mock a file object using unittest.mock

Understanding how to simulate text file objects and mock file objects can help streamline your testing strategy and development process.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Marcos Dione: Writing a tile server in python

Tue, 2024-07-30 05:02

Another dictated post111, but heavily edited. Buyer beware.

I developed a tileset based on OpenStreetMap data and style and elevation information, but I don't have a render server. What I have been doing is using my own version of an old script from the mapnik version of the OSM style. This script is called generate_tiles, and I made big modifications to it and now it's capable of doing many things, including spawning several processes for handling the rendering. You can define regions that you want to render, or you can just provide a bbox or a set of tiles or just coordinates. You can change the size of the meta tile, and it handles empty tiles. If you find a sea tile, most probably you will not need to render its children9, where children are the four tiles that are just under it in the next zoom level. For instance, in zoom level zero we have only one tile (0,0,0), and it's children are (1,0,0), (1,0,1), (1,1,0) and (1,1,1). 75% of the planet's surface is water, and with Mercator projection and the Antartic Ocean, the percent of tiles could be bigger, so this optimization cuts a lot of useless rendering time.

Another optimization is that it assumes that when you render zoom level N, you will be using at least the same data for zoom level N+1. Of course, I am not catching that data because mapnik does not allow this, but the operating system does the catching. So if you have enough RAM, then you should be able to reuse all the data that's already in buffers and cache, instead of having to fetch them again from disk. This in theory should accelerate rendering and probably it is10.

The script works very well, and I've been using it for years already for rendering tiles in batches for several zoom levels. Because my personal computer is way more powerful than my server (and younger; 2018 vs 2011), I render in my computer and rsync to my server.

So now I wanted to make a tile server based on this. Why do I want to make my own and not use renderd? I think my main issue with renderd is that it does not store the individual tiles, but keeps metatiles of 8x8 tiles and serve the individual tiles from there. This saves inode usage and internal fragmentation. Since my main usage so far has been (and probably will continue to be) rendering regions by hand, and since my current (static) tile server stores all the latest versions of the tiles I have rendered since I started doing this some 15 years ago, I want updating the server in a fast way. Most tile storage methods I know fail terribly at update time (see here); most of the time it means sending the whole file over the wire. Also, individual tiles are easier to convert to anything else, like creating a MBTiles file, push it to my phone, and have a offline tile service I can carry with me on treks where there is no signal. Also, serving the tiles can be as easy as python -m http.server from the tileset root directory. So renderd is not useful for me. Another reason is, well, I already have the rendering engine working. So how does it work?

The rendering engine consists of one main thread, which I call Master, and rendering threads3. These rendering threads load the style and wait for work to do. The current style file is 6MiB+ and takes mapnik 4s+ to load it and generate all its structures, which means these threads have to be created once per service lifetime. I have one queue that can send commands from the Master to the renderer pool asking for rendering a metatile, which is faster than rendering the individual tiles. Then one of the rendering threads picks the request from this queue, calls mapnik, generates the metatile, cuts it into the subtiles and saves them to disk. The rendering thread posts in another queue, telling the Master about the children metatiles that must be rendered, which due to emptiness can be between 0 and 4.

To implement the caching optimization I mentioned before, I use a third structure to maintain a stack. At the beginning I push into it the initial work; later I pop one element from it, and when a rendered returns the list of children to be rendered, I push them on top of the rest. This is what tries to guarantee that a metatile's children will be rendered before moving to another region that would trash the cache. And because the children can inspect the tiles being written, they can figure out when a child is all sea tiles and not returning it for rendering.

At the beginning I thought that, because the multiprocessing queues are implemented with pipes, I could use select()4 to see whether the queue was ready for writing or reading and use a typical non-blocking loop. When you're trying to write, these queues will block when the queue is full, and when you're trying to read, they will block when the queue is empty. But these two conditions, full and empty, are actually handled by semaphores, not by the size of the pipe. That means that selecting on those pipes, even if I could reach all the way down into the structures of the multiprocessing.Queue all the way down. and add them to a selector, yes, the read queue will not be selected if it's empty (nothing to read), but the write queue will not, since availability of space in the pipe does not mean the queue is not full.

So instead I'm peeking into these queues. For the work queue, I know that the Master thread8 is the only writer, so I can peek to see if it is full. If it is, I am not going to send any new work to be done, because it means that all the renders are busy, and the only work queued to be done has not been picked up yet. For the reading side it's the same, Master is the only reader. so, I can peek if it's empty, and if it is, I am not going to try to read any information from it. So, I have a loop, peeking first into the work queue and then into the info queue. If nothing has been done, I sleep a fraction of a second.

Now let's try to think about how to replace this main loop with a web frontend. What is the web frontend going to do? It's going to be getting queries by different clients. It could be just a slippy map in a web page, so we have a browser as a client, or it could be any of the applications that can also render slippy maps. For instance, on Linux, we have marble; on Android, I use MyTrails, and OsmAnd.

One of the things about these clients is that they have timeouts. Why am I mentioning this? Because rendering a metatile for me can take between 3 to 120 seconds, depending on the zoom level. There are zoom levels that are really, really expensive, like between 7 and 10. If a client is going to be asking directly a rendering service for a tile, and the tile takes too long to render, the client will timeout and close the connection. How do we handle this on the server side? Well, instead of the work stack, the server will have request queue, which will be collecting the requests from the clients, and the Master will be sending these requests to the render pool.

So if the client closes the connection, I want to be able to react to that, removing any lingering requests made by that client from the request queue. If I don't do that, the request queue will start piling up more and more requests, creating a denial of service. This is not possible in multiprocessing queues, you cannot remove an element. The only container that can do that is a dequeue5, which also is optimized for putting and popping things from both ends (it's probably implemented using a circular buffer), which is perfect for a queue. As for the info queue, I will not be caring anymore about children metatiles, because I will not be doing any work that the clients are not requesting.

What framework that would allow me to do this? Let's recap the requirements:

  • Results are computed, and take several seconds.
  • The library that generates the results is not async, nor thread safe, so I need to use subprocesses to achieve parallelization.
  • A current batch implementation uses 2 queues to send and retrieve computations to a pool of subprocesses; my idea is to "just" add a web frontend to this.
  • Each subprocess spends some seconds warming up, son I can't spawn a new process for each request.
  • Since I will have a queue of requested computations, if a client dies, if its query is being processed, then I let it finish; if not, I should remove it from the waiting queue.

I started with FastAPI, but it doesn't have the support that I need. At first I just implemented a tile server; the idea was to grow from there6, but reading the docs it only allows doing long running async stuff after the response has been sent.

Next was Flask. Flask is not async unless you want to use sendfile(). sendfile() is a way to make the kernel read a file and write it directly on a socket without intervention from the process requesting that. The alternative is to to open the file, read a block, write it on the socket, repeat. This definitely makes your code more complex, you have to handle lots of cases. So sendfile() is very, very handy, but it's also faster because it's 0-copy. But Flask does not give control of what happens when the client suddenly closes the connection. I can instruct it to cancel the tasks in flight, but as per all the previous explanation, that's not what I want.

This same problem seems to affect all async frameworks I looked into. asyncio, aiohttp, tornado. Except, of course, twisted, but its API for that is with callbacks, and TBH, I was starting to get tired of all this, and the prospect of callback hell, even when all the rest of the system could be developed in a more async way, was too much. And this is not counting the fact that I need to hook into the main loop to step the Master. This could be implemented with timed callbacks, such as twisted's callLater(), but another thought started to form in my head.

Why did I go directly for frameworks? Because they're supposed to make our lives easier, but from the beginning I had the impression that this would not be a run of the mill service. The main issue came down to beign able to send things to render, return the rendered data to the right clients, associate several clients to a single job before it finished (more than one client might request the same tile or several tiles that belong to the same metatile), and handle client and job cancellation when clients disappear. The more frameworks' documentation I read, the more I started to fear that the only solution was to implement an non-blocking12 loop myself.

I gotta be honest, I dusted an old Unix Network Programming book, 2nd Ed., 1998 (!!!), read half a chapter, and I was ready to do it. And thanks to the simple selector API, it's a breeze:

  1. Create a listening socket.
  2. Register it for read events (connections).
  3. On connection, accept the client and wait for read events in that one too.
  4. We were not registering for write before because the client is always ready for write before we start sending anything, which lead to tight loops.
  5. On client read, read the request and send the job to Master. Unregister for read.
  6. But if there's nothing to read, the client disconnected. Send an empty.response, unregister for read and register for write.
  7. Step Master.
  8. If anything came back, generate the responses and queue them for sending. Register the right clients for write.
  9. On client write (almost always), send the response and the file with sendfile() if any.
  10. Then close the connection and unregister.
  11. Loop to #3.

Initially all this, including reimplementing fake Master and render threads, took less than 200 lines of code, some 11h of on-and-off work. Now that I have finished I have a better idea of how to implement this at least with twisted, which I think I will have to do, since step 4 assumes the whole query can be recv()'ed in one go and step 7 similarly for send()'ing; luckily I don't need to do any handholding for sendfile(), even when the socket is non blocking. A more production ready service needs to handle short reads and writes. Also, the HTTP/1.1 protocol all clients are using allows me to assume that once a query is received, the client will be waiting for an answer before trying anything else, and that I can close the connection once a response has been send and assume the client will open a new connection for more tiles. And even then, supporting keep alive should not be that hard (instead of closing the client, unregister for write, register for read, and only do the close dance when the response is empty). And because I can simply step Master in the main loop, I don't have to worry about blocking queues.

Of course, now it's more complex, because it's implementing support for multiple clients with different queries requiring rendering the same metatile. This is due that applications will open several clients for fetching tiles when showing a region, and unless it's only 4 and they fall in the corner of 4 adjacent metatiles, they will always mean more than one client per metatile. Also, I could have several clients looking at the same region. The current code is approaching the 500 lines, but all that should also be present in any other implementation.

I'm pretty happy about how fast I could make it work and how easy it was. Soon I'll be finishing integrating a real render thread with saving the tiles and implement the fact that if one metatile's tile is not present, we can assume it's OK, but if all are not present, I have to find out if they were all empty or never rendered. A last step would be how to make all this testable. And of course, the twisted port.

  1. This is getting out of hand. The audio was 1h long, not sure how long it took to auto transcribe, and when editing and thinking I was getting to the end of it, the preview told me I still had like half the text to go through. 

  2. No idea what I wanted to write here :) 

  3. Because mapnik is not thread safe and because of the GIL, they're actually subprocesses via the multioprocessing module, but I'll keep calling them threads to simplify. 

  4. Again, a simplification. Python provides the selector module that allows using abstract implementations that spare us from having to select the best implementation for the platform. 

  5. I just found out it's pronounced like 'deck'. 

  6. All the implementations I did followed the same pattern. In fact, right now, I hadn't implementing the rendering tile server: it's only blockingly sleep()'ing for some time (up to 75s, to trigger client timeouts), and then returning the tiles already present. What's currently missing is figuring out whether I should rerender or use the tiles already present7, and actually connecting the rendering part. 

  7. Two reasons to rerender: the data is stale, or the style has changed. The latter requires reloading the styles, which will probably mean rebuilding the rendering threads. 

  8. I keep calling this the Master thread, but at this point instead of having its own main loop, I'm just calling a function that implements the body of such loop. Following previous usage for such functions, it's called single_step(). 

  9. Except when you start rendering ferry routes. 

  10. I never measured it :( 

  11. Seems like nikola renumbers the footnotes based on which order they are here at the bottom of the source. The first note was 0, but it renumbered it and all the rest to start counting from 1. 

  12. Have in account that I'm explicitly making a difference between a non-blocking/select() loop from an async/await system, but have in account that the latter is actually implemented with the formet. 

Categories: FLOSS Project Planets

Python Bytes: #394 Python is easy now?

Tue, 2024-07-30 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://rdrn.me/postmodern-python/"><strong>Python is easy now</strong></a></li> <li><strong><a href="https://til.simonwillison.net/python/trying-free-threaded-python">Trying out free-threaded Python on macOS</a></strong></li> <li><a href="https://mathspp.com/blog/module-itertools-overview"><strong>Module itertools overview</strong></a></li> <li><strong><a href="https://github.com/louislam/uptime-kuma">uptime-kuma</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=6v7VLgfhZ5o' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="394">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout"><strong>pythonbytes.fm/scout</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it. </p> <p><strong>Brian #1:</strong> <a href="https://rdrn.me/postmodern-python/"><strong>Python is easy now</strong></a></p> <ul> <li>or Postmodern Python</li> <li>or Beyond Hypermodern</li> <li>Chris Ardene</li> <li>Mostly a cool review of using rye for <ul> <li>setup</li> <li>linting</li> <li>typing</li> <li>testing</li> <li>documentation</li> <li>CI/CD</li> </ul></li> <li>Also a nice discussion of how to deal with a Monorepo for Python projects</li> </ul> <p><strong>Michael #2:</strong> <a href="https://til.simonwillison.net/python/trying-free-threaded-python">Trying out free-threaded Python on macOS</a></p> <ul> <li>via pycoders</li> <li>How to install free threaded Python the easy way</li> <li>Testing the CPU bound work speed ups for FT Python</li> </ul> <p><strong>Brian #3:</strong> <a href="https://mathspp.com/blog/module-itertools-overview"><strong>Module itertools overview</strong></a></p> <ul> <li>Rodrigo</li> <li>20 tools that every Python developer should be aware of.</li> <li>In 5 categories <ul> <li>Reshaping</li> <li>Filtering</li> <li>Combinatorial</li> <li>Infinite</li> <li>Iterators that complement other tools</li> </ul></li> <li>Things I forgot about <ul> <li>chain</li> <li>pairwise</li> <li>zip_longest</li> <li>tee</li> </ul></li> </ul> <p><strong>Michael #4:</strong> <a href="https://github.com/louislam/uptime-kuma">uptime-kuma</a></p> <ul> <li>A fancy self-hosted monitoring tool</li> <li><strong>Features</strong> <ul> <li>Monitoring uptime for HTTP(s) / TCP / HTTP(s) Keyword / HTTP(s) Json Query / Ping / DNS Record / Push / Steam Game Server / Docker Containers</li> <li>Fancy, Reactive, Fast UI/UX</li> <li>Notifications via Telegram, Discord, Gotify, Slack, Pushover, Email (SMTP), and <a href="https://github.com/louislam/uptime-kuma/tree/master/src/components/notifications">90+ notification services, click here for the full list</a></li> <li>20-second intervals</li> <li><a href="https://github.com/louislam/uptime-kuma/tree/master/src/lang">Multi Languages</a></li> <li>Multiple status pages</li> <li>Map status pages to specific domains</li> <li>Ping chart</li> <li>Certificate info</li> <li>Proxy support</li> <li>2FA support</li> </ul></li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li>Still working on a new pytest course. Hoping to get it released soon-ish.</li> </ul> <p>Michael:</p> <ul> <li><a href="https://x.com/kennethreitz42/status/1815881034334126539?prefetchTimestamp=1722279033597">Open source Switzerland</a> </li> <li><a href="https://mastodon.social/@ffalcon31415/112852910444032717">spyoungtech/FreeSimpleGUI</a> — actively maintained fork of the last release of PySimpleGUI</li> </ul> <p><strong>Joke:</strong> <a href="https://devhumor.com/media/java-amp-javascript">Java vs. JavaScript</a></p>
Categories: FLOSS Project Planets

Kay Hayen: Nuitka Release 2.4

Tue, 2024-07-30 03:09

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, “download now”.

This release largely contains bug fixes for the previous changes, but also finishes full compatibility with the match statements of 3.10, something that was long overdue since there were always some incompatible behaviors there.

In terms of bug fixes, it’s also huge. An upgrade is required, especially for new setuptools that made compiled programs segfault at startup.

Table of Contents

Bug Fixes
  • UI: Fix, we had reversed disable / force and wrong option name recommendation for --windows-console-mode when the user used old-style options.

  • Python3.10+: Fix, must not check for len greater or equal of 0 or for sequence match cases. That is unnecessary and incompatible and can raise exceptions with custom sequences not implementing __len__. Fixed in 2.3.1 already.

  • Python3.10+: Fix, match sequence with final star arguments failed in some cases to capture the rest. The assigned value then was empty.when it shouldn’t have been. Fixed in 2.3.1 already.

  • Python3.8+: Fix, calls to variable args functions now need to be done differently, or else they can crash, as was observed with 3.10 in PGO instrumentation, at least. Fixed in 2.3.1 already.

  • PGO: Fix, using nuitka-run did not execute the program created as expected. Fixed in 2.3.1 already.

  • Linux: Support extension modules used as DLLs by other DLLs or extension modules. That makes newer tensorflow and potentially more packages work again. Fixed in 2.3.1 already.

  • Python3.10+: Matches classes were not fully compatible.

    We need to check against case-defined class __match_args__, not the matched value type __match_args that is not necessarily the same.

    Also, properly annotating the exception exit of subscript matches; the subscript value can indeed raise an exception.

    Collect keyword and positional match values in one go and detect duplicate attributes used, which we previously did not.

  • Scons: Fix, do not crash when clang is not reporting its version correctly. It happened if Clang usage was required with --clang option but not installed. Fixed in 2.3.2 already.

  • Debian: Fix, detecting the Debian flavor of Python was not working anymore, and as a result, the intended defaults were no longer applied by Nuitka, leading to incorrect suggestions that didn’t work. Fixed in 2.3.3 already.

  • Ubuntu: Fix, the static link library for Python 3.12 is not usable unless we provide parts of HACL for the sha2 module so as not to cause link errors. Fixed in 2.3.3 already.

  • Standalone: Fix, importing newer pkg_resources was crashing. Fixed in 2.3.3 already.

  • Python3.11+: Added support for newer Python with dill-compat. Fixed in 2.3.4 already.

  • Standalone: Support locating Windows icons for pywebview. Fixed in 2.3.4 already.

  • Standalone: Added support for spacy related packages. Fixed in 2.3.4 already.

  • Python3.12: Fix, our workaround for cv2 support cannot use the imp module anymore. Fixed in 2.3.4 already.

  • Compatibility: Added support for __init__ files that are extension modules. Architecture checks for macOS were false negatives for them, and the case insensitive import scan failed to find them on Windows. Fixed in 2.3.4 already.

  • Standalone: Added missing dependencies for standard library extension modules, mainly exhibited on macOS. Fixed in 2.3.4 already.

  • Windows: Fix build failures on mapped network drives. Fixed in 2.3.4 already.

  • Python3.12: Fix, need to set frame prev_inst or else f_lasti is random. Some packages; for example PySide6; use this to check what bytecode calls them or how they import them and it could crash when attempting it. Fixed in 2.3.6 already.

  • Fix, fork bomb in cpuinfo package no longer happens. Fixed in 2.3.8 already.

  • Nuitka-Python: Fix, cannot ask for shared library prefixes. Fixed in 2.3.8 already.

  • Standalone: Make sure keras package dependency for tensorflow is visible. Fixed in 2.3.10 already.

  • Linux: Fix, for static executables we should ignore errors setting a DLL load path. Fixed in 2.3.10 already.

  • Compatibility: Fix, nuitka resource readers also need to have .parent attribute. Fixed in 2.3.10 already.

  • Fix, need to force no-locale language outputs for tools outputs on non-Windows. Our previous methods were not forcing enough.

    For non-Windows this makes Nuitka work on systems with locales active for message outputs only. Fixed in 2.3.10 already.

  • Fix, was not using proper result value for SET_ATTRIBUTE to check success in a few corner cases. Fixed in 2.3.10 already.

  • Windows: Retry deleting dist and build folders, allowing users to recognize still running programs and not crashing on Anti-Virus software still locking parts of them.

  • Fix, dict.fromkeys didn’t give compatible error messages for no args given.

  • Fix, output correct unsupported exception messages for in-place operations

    For in-place **, it was also incompatible, since it must not mention the pow function.

  • Fix, included metadata could lead to instable code generation. We were using a dictionary for it, but that is not as stable order for the C compiler to fully benefit.

  • Fix, including data files for packages that are extension modules was not working yet.

  • macOS: Detect the DLL path of libpython (if used) by looking at dependencies of the running Python binary rather than encoding what CPython does. Doing that covers other Python flavors as well.

  • Fix, need to prefer extension modules over Python code for packages.

  • Fix, immutable constant values are not to be treated as very trusted.

  • Python3: Fix, the __loader__ attribute of a module should be an object and not only the class, otherwise only static methods can work.

  • Python3: Added .name and .path attributes to Nuitka loader objects for enhanced compatibility with code that expects source code loaders.

  • Fix, the sys.argv[0] needs to be absolute for best usability.

    For dirname(sys.argv[0]) to be usable even if the program is launched via PATH environment by a shell, we cannot rely on how we are launched since that won’t be a good path, unlike with Python interpreter, where it always is.

  • Standalone: Fix, adding missing dependencies for some crypto packages.

  • Python3.12: Need to write to thread local variable during import. This however doesn’t work for Windows and non-static libpython flavors in general.

  • macOS: Enforce using system codesign as the Anaconda one is not working for us.

  • Fix, we need to read .pyi files as source code. Otherwise unicode characters can cause crashes.

  • Standalone: Fix, some packages query private values for distribution objects, so use the same attribute name for the path.

  • Multidist: Make sure to follow the multidist reformulation modules. Otherwise in accelerated mode, these could end up not being included.

  • Fix, need to hold a reference of the iterable while converting it to list.

  • Plugins: Fix, this wasn’t properly ignoring None values in load descriptions as intended.

  • macOS: Need to allow DLLs from all Homebrew paths.

  • Reports: Do not crash during report writing for very early errors.

  • Python3.11+: Fix, need to make sure we have split as a constant value when using exception groups.

  • Debian: More robust against problematic distribution folders with no metadata, these apparently can happen with OS upgrades.

  • Fix, was leaking exception in case of --python-flag=-m mode that could cause errors.

  • Compatibility: Close standard file handles on process forks as CPython does. This should enhance things for compilations using attach on Windows.

Package Support
  • Standalone: Added data file for older bokeh version. Fixed in 2.3.1 already.

  • Standalone: Support older pandas versions as well.

  • Standalone: Added data files for panel package.

  • Standalone: Added support for the newer kivy version and added macOS support as well. Fixed in 2.3.4 already.

  • Standalone: Include all kivy.uix packages with kivy, so their typical config driven usage is not too hard.

  • Standalone: Added implicit dependencies of lxml.sax module. Fixed in 2.3.4 already.

  • Standalone: Added implicit dependencies for zeroconf package. Fixed in 2.3.4 already.

  • Standalone: Added support for numpy version 2. Fixed in 2.3.7 already.

  • Standalone: More complete support for tables package. Fixed in 2.3.8 already.

  • Standalone: Added implicit dependencies for scipy.signal package. Fixed in 2.3.8 already.

  • Standalone: Added support for moviepy and imageio_ffmeg packages. Fixed in 2.3.8 already.

  • Standalone: Added support for newer scipy. Fixed in 2.3.10 already.

  • Standalone: Added data files for bpy package. For full support more work will be needed.

  • Standalone: Added support for nes_py and gym_tetris packages.

  • Standalone: Added support for dash and plotly.

  • Standalone: Added support for usb1 package.

  • Standalone: Added support for azure.cognitiveservices.speech package.

  • Standalone: Added implicit dependencies for tinycudann package.

  • Standalone: Added support for newer win32com.server.register.

  • Standalone: Added support for jaxtyping package.

  • Standalone: Added support for open3d package.

  • Standalone: Added workaround for torch submodule import function.

  • Standalone: Added support for newer paddleocr.

New Features
  • Experimental support for Python 3.13 beta 3. We try to follow its release cycle closely and aim to support it at the time of CPython release. We also detect no-GIL Python and can make use of it. The GIL status is output in the --version format and the GIL usage is available as a new {GIL} variable for project options.

  • Scons: Added experimental option --experimental=force-system-scons to enforce system Scons to be used. That allows for the non-use of inline copy, which can be interesting for experiments with newer Scons releases. Added in 2.3.2 already.

  • Debugging: A new non-deployment handler helps when segmentation faults occurred. The crashing program then outputs a message pointing to a page with helpful information unless the deployment mode is active.

  • Begin merging changes for WASI support. Parts of the C changes were merged and for other parts, command line option --target=wasi was added, and we are starting to address cross platform compilation for it. More work will be necessary to fully merge it, right not it doesn’t work at all yet.

  • PGO: Added support for using it in standalone mode as well, so once we use it more, it will immediately be practical.

  • Make the --list-package-dlls use plugins as well, and make delvewheel and announce its DLL path internally, too. Listing DLLs for packages using plugins can use these paths for more complete outputs.

  • Plugins: The no-qt plugin was usable in accelerated mode.

  • Reports: Added included metadata and reasons for it.

  • Standalone: Added support for spacy with a new plugin.

  • Compatibility: Use existing source files as if they were .pyi files for extension modules. That gives us dependencies for code that installs source code and extension modules.

  • Plugins: Make version information, onefile mode, and onefile cached mode indication available in Nuitka Package Configuration, too.

  • Onefile: Warn about using tendo.singleton in non-cached onefile mode.

    Tendo uses the running binary name for locking by default. So it’s not going to work if that changes for each execution, make the user aware of that, so they can use cached mode instead.

  • Reports: Include the micro pass counts and tracing merge statistics so we can see the impact of new optimization.

  • Plugins: Allow to specify modes in the Nuitka Package Configuration for annotations, doc_strings, and asserts. These overrule global configuration, which is often not practical. Some modules may require annotations, but for other packages, we will know they are fine without them. Simply disabling annotations globally barely works. For some modules, removing annotations can give a 30% compile-time speedup.

  • Standalone: Added module configuration for Django to find commands and load its engine.

  • Allow negative values for –jobs to be relative to the system core count so that you can tell Nuitka to use all but two cores with --jobs=-2 and need not hardcode your current code count.

  • Python3.12: Annotate libraries that are currently not supported

    We will need to provide our own Python3.12 variant to make them work.

  • Python3.11+: Catch calls to uncompiled function objects with compiled code objects. We now raise a RuntimeError in the bytecode making it easier to catch them rather than segfaulting.

Optimization
  • Statically optimize constant subscripts of variables with immutable constant values.

  • Forward propagate very trusted values for variable references enabling a lot more optimization.

  • Python3.8+: Calls of C functions are faster and more compact code using vector calls, too.

  • Python3.10+: Mark our compiled types as immutable.

  • Python3.12: Constant returning functions are dealing with immortal values only. Makes their usage slightly faster since no reference count handling is needed.

  • Python3.10+: Faster attribute descriptor lookups. Have our own replacement of PyDesc_IsData that had become an API call, making it very slow on Windows specifically.

  • Avoid using Python API function for determining sequence sizes when getting a length size for list creations.

  • Data Composer: More compact and portable Python3 int (Python2 long) value representation.

    Rather than fixed native length 8 or 4 bytes, we use variable length encoding which for small values uses only a single byte.

    This also avoids using struct.pack with C types, as we might be doing cross platform, so this makes part of the WASI changes unnecessary at the same time.

    Large values are also more compact because middle 31-bit portions can be less than 4 bytes and save space on average.

  • Data Composer: Store bytecode blob size more efficient and portable, too.

  • Prepare having knowledge of __prepare__ result to be dictionaries per compile time decisions.

  • Added more hard trust for the typing module.

    The typing.Text is a constant too. In debug mode, we now check all exports of typing for constant values. This will allow to find missing values sooner in the future.

    Added the other types to be known to exist. That should help scalability for types intensive code somewhat by removing error handling for them.

  • macOS: Should use static libpython with Anaconda as it works there too, and reduces issues with Python3.12 and extension module imports.

  • Standalone: Statically optimize by OS in sysconfig.

    Consequently, standalone distributions can exclude OS-specific packages such as _aix_support and _osx_support.

  • Avoid changing code names for complex call helpers

    The numbering of complex call helper as normally applied to all functions are, caused this issue. When part of the code is used from the bytecode cache, they never come to exist and the C code of modules using them then didn’t match.

    This avoids an extra C re-compilation for some modules that were using renumbered function the second time around a compilation happens. Added in 2.3.10 already.

  • Avoid using C-API when creating __path__ value.

  • Faster indentation of generated code.

Anti-Bloat
  • Add new pydoc bloat mode to trigger warnings when using it.

  • Recognize usage of numpy.distutils as setuptools bloat for more direct reporting.

  • Avoid compiling large opcua modules that generate huge C files much like asyncua package. Added in 2.3.1 already.

  • Avoid shiboken2 and shiboken6 modules from matplotlib package when the no-qt plugin is used. Added in 2.3.6 already.

  • Changes for not using pydoc and distutils in numpy version 2. Added in 2.3.7 already.

  • Avoid numpy and packaging dependencies from PIL package.

  • Avoid using webbrowser module from pydoc.

  • Avoid using unittest in keras package. Added in 2.3.1 already.

  • Avoid distutils from _oxs_support (used by sysconfig) module on macOS.

  • Avoid using pydoc for werkzeug package. Fixed in 2.3.10 already.

  • Avoid using pydoc for site module. Fixed in 2.3.10 already.

  • Avoid pydoc from xmlrpc.server. Fixed in 2.3.10 already.

  • Added no_docstrings support for numpy2 as well. Fixed in 2.3.10 already.

  • Avoid pydoc in joblib.memory.

  • Avoid setuptools in gsplat package.

  • Avoid dask and jax in scipy package.

  • Avoid using matplotlib for networkx package.

Organizational
  • Python3.12: Added annotations of official support for Nuitka PyPI package and test runner options that were still missing. Fixed in 2.3.1 already.

  • UI: Change runner scripts. The nuitka3 is no more. Instead, we have nuitka2 where it applies. Also, we now use CMD files rather than batch files.

  • UI: Check filenames for data files for illegal paths on the respective platforms. Some user errors with data file options become more apparent this way.

  • UI: Check spec paths more for illegal paths as well. Also do not accept system paths like {TEMP} and no path separator after it.

  • UI: Handle report writing interrupt with CTRL-C more gracefully. No need to present this this as a general problem, rather inform the user that he did it.

  • NoGIL: Warn if using a no-GIL Python version, as this mode is not yet officially supported by Nuitka.

  • Added badges to the README.rst of Nuitka to display package support and more. Added in 2.3.1 already.

  • UI: Use the retry decorator when removing directories in general. It will be more thorough with properly annotated retries on Windows. For the dist folder, mention the running program as a probable cause.

  • Quality: Check replacements and replacements_plain Nuitka package configuration values.

  • Quality: Catch backlashes in paths provided in Nuitka Package Configuration values for dest_path, relative_path, dirs, raw_dirs and empty_dirs.

  • Debugging: Disable pagination in gdb with the --debugger option.

  • PGO: Warn if the PGO binary does not run successfully.

  • UI: The new console mode option is a Windows-specific option now, move it to that group.

  • UI: Detect “rye python” on macOS. Added in 2.3.8 already.

  • UI: Be forgiving about release candidates; Ubuntu shipped one in a LTS release. Changed in 2.3.8 already.

  • Debugging: Allow fine-grained debug control for immortal checks

    Can use --no-debug-immortal-assumptions to allow for corrupted immortal objects, which might be done by non-Nuitka code and then break the debug mode.

  • UI: Avoid leaking compile time Nuitka environment variables to the child processes.

    They were primarily visible with --run, but we should avoid it for everything.

    For non-Windows, we now recognize if we are the exact re-execution and otherwise, reject them.

  • Watch: Delete the existing virtualenv in case of errors updating or upgrading it.

  • Watch: Keep track of Nuitka compiled program exit code in newly added result files, too.

  • Watch: Redo compilations in case of previous errors when executing the compile program.

  • Quality: Wasn’t detecting files to ignore for PyLint on Windows properly, also detect crashes of PyLint.

Tests
  • Added test to cover the dill-compat plugin.

  • macOS: Make actual use of ctypes in its standalone test to ensure correctness on that OS, too.

  • Make compile extension module test work on macOS, too.

  • Avoid using 2to3 in our tests since newer Python no longer contains it by default, we split up tests with mixed contents into two tests instead.

  • Python3.11+: Make large constants test executable for as well. We no longer can easily create those values on the fly and output them due to security enhancements.

  • Python3.3: Remove support from the test runner as well.

  • Tests: Added construct-based tests for coroutines so we can compare their performance as well.

Cleanups
  • Make try/finally variable releases through common code. It will allow us to apply special exception value trace handling for only those for scalability improvements, while also making many re-formulations simpler.

  • Avoid using anti-bloat configuration values replacements where replacements_plain is good enough. A lot of config pre-date its addition.

  • Avoid Python3 and Python3.5+ specific Jinja2 modules on versions before that, and consequently, avoid warning about the SyntaxError given.

  • Moved code object extraction of dill-compat plugin from Python module template to C code helper for shared usage and better editing.

  • Also call va_end for standards compliance when using va_start. Some C compilers may need that, so we better do it even if what we have seen so far doesn’t need it.

  • Don’t pass main filename to the tree building anymore, and make nuitka.Options functions usage explicit when importing.

  • Change comments that still mentioned Python 3.3 as where a change in Python happened since we no longer support this version. Now, we consider what’s first seen in Python 3.4 is a Python3 change.

  • Cleanup, change Python 3.4 checks to 3.0 checks as Python3.3 is no longer supported. Cleans up version checks, as we now treat >=3.4 either as >=3 or can drop checks entirely.

  • The usual flow of spelling cleanups, this time for C codes.

Summary

This release cycle was a longer than usual, with much new optimization and package support requiring attention.

For optimization we got quite a few things going, esp. with more forward propagation, but the big ones for scalability are still all queued up and things are only prepared.

The 3.13 work was continuing smoothly and seems to be doing fine. We are still on track for supporting it right after release.

The parts where we try and address WASI prepare cross-compilation, but we will not aim at it generally immediately, and target our own Nuitka standalone backend Python that is supposed to be added in coming releases.

Categories: FLOSS Project Planets

Real Python: Strings and Character Data in Python

Mon, 2024-07-29 10:00

In Python, string objects contain sequences of characters that allow you to manipulate textual data. It’s rare to find an application, program, or library that doesn’t need to manipulate strings to some extent. So, processing characters and strings is integral to programming and a fundamental skill for you as a Python programmer.

In this tutorial, you’ll learn how to:

  • Create strings using literals and the str() function
  • Use operators and built-in functions with strings
  • Index and slice strings
  • Do string interpolation and formatting
  • Use string methods

To get the most out of this tutorial, you should have a good understanding of core Python concepts, including variables, functions, and operators and expressions.

Get Your Code: Click here to download the free sample code that shows you how to work with strings and character data in Python.

Take the Quiz: Test your knowledge with our interactive “Python Strings and Character Data” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Strings and Character Data

This quiz will evaluate your understanding of Python's string data type and your knowledge about manipulating textual data with string objects. You'll cover the basics of creating strings using literals and the `str()` function, applying string methods, using operators and built-in functions with strings, indexing and slicing strings, and more!

Getting to Know Strings and Characters in Python

Python provides the built-in string (str) data type to handle textual data. Other programming languages, such as Java, have a character data type for single characters. Python doesn’t have that. Single characters are strings of length one.

In practice, strings are immutable sequences of characters. This means you can’t change a string once you define it. Any operation that modifies a string will create a new string instead of modifying the original one.

A string is also a sequence, which means that the characters in a string have a consecutive order. This feature allows you to access characters using integer indices that start with 0. You’ll learn more about these concepts in the section about indexing strings. For now, you’ll learn about how to create strings in Python.

Creating Strings in Python

There are different ways to create strings in Python. The most common practice is to use string literals. Because strings are everywhere and have many use cases, you’ll find a few different types of string literals. There are standard literals, raw literals, and formatted literals.

Additionally, you can use the built-in str() function to create new strings from other existing objects.

In the following sections, you’ll learn about the multiple ways to create strings in Python and when to use each of them.

Standard String Literals

A standard string literal is just a piece of text or a sequence of characters that you enclose in quotes. To create single-line strings, you can use single ('') and double ("") quotes:

Python >>> 'A single-line string in single quotes' 'A single-line string in single quotes' >>> "A single-line string in double quotes" 'A single-line string in double quotes' Copied!

In the first example, you use single quotes to delimit the string literal. In the second example, you use double quotes.

Note: Python’s standard REPL displays string objects using single quotes even though you create them using double quotes.

You can define empty strings using quotes without placing characters between them:

Python >>> "" '' >>> '' '' >>> len("") 0 Copied!

An empty string doesn’t contain any characters, so when you use the built-in len() function with an empty string as an argument, you get 0 as a result.

To create multiline strings, you can use triple-quoted strings. In this case, you can use either single or double quotes:

Python >>> '''A triple-quoted string ... spanning across multiple ... lines using single quotes''' 'A triple-quoted string\nspanning across multiple\nlines using single quotes' >>> """A triple-quoted string ... spanning across multiple ... lines using double quotes""" 'A triple-quoted string\nspanning across multiple\nlines using double quotes' Copied!

The primary use case for triple-quoted strings is to create multiline strings. You can also use them to define single-line strings, but this is a less common practice.

Read the full article at https://realpython.com/python-strings/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCharm: Learning Resources for pytest

Mon, 2024-07-29 04:31

In this blog post, we’ll look at how PyCharm helps you when you’re working with pytest, and we will signpost you to a bunch of resources for learning pytest. While some of these resources were created by us here at JetBrains, others were crafted by the storytellers in the pytest community.

Using pytest in PyCharm

PyCharm has extensive support for pytest, including a dedicated test pytest runner. PyCharm also gives you code completion for the test subject and pytest fixtures as detailed assert failure reports so you can get to the root of the problem quickly. 

Download PyCharm

Resources for pytest

If you like reading blog posts, we have plenty of those. If videos are more your thing, we have some awesome content there, too. If you prefer a sequential course, we have some pointers for you, and if you prefer to snuggle down and read a book, we’ve got a great recommendation as well. There’s a little something here for everyone!  

Although the majority of these resources don’t assume any prior knowledge of pytest, some delve deeper into the subject than others – so there’s plenty to explore when you’re ready. 

First, let me point you to two important links:

  • The pytest page on the JetBrains Guide serves as a starting point for all the pytest resources we’ve created.
  • Brian Okken maintains a website for all his pytest resources (some of which are free, whereas others are paid). 
Videos about pytest

We have a video tutorial series composed of nine videos on pytest that start from the beginning of pytest-time. You can check out the tutorials on YouTube.

If there’s something specific you want to take a look at, the individual video topics are:

If you want a super-speedy refresher of pytest in PyCharm, you can watch PyCharm and pytest in Under 7 Minutes (beware – it’s fast!)

Tutorials about pytest

If you prefer learning by doing, we have some great pytest tutorials for you. First, the above video series is also available as a written tutorial in the JetBrains Guide.

Alternatively, Brian Okken has produced a detailed tutorial on everything pytest if you want to explore all areas. This tutorial is paid content, but it’s well worth it! 

Books about pytest

If you prefer reading, we have lots of blog posts for you. Here are some of the pytest resources on the JetBrains Guide:

Additionally, Brian has a blog that covers a diverse range of pytest subjects you can dive into. 

While we’re on the subject of reading, Brian has also written an excellent book on pytest that you can purchase and curl up with if that’s your thing.

Official pytest documentation

Last but not least, the official pytest documentation is another one to bookmark and keep close by as you go on your journey to pytest mastery. 

Conclusion

The Python testing framework pytest is packed with helpful features such as fixtures, mocking, and parametrizing that make testing your applications easier, giving you confidence in the quality of your code. Go ahead and try pytest out and let us know what you learn on your path to Python testing excellence!

Categories: FLOSS Project Planets

Zato Blog: Automating telecommunications networks with Python and SFTP

Mon, 2024-07-29 03:43
Automating telecommunications networks with Python and SFTP 2024-07-29, by Dariusz Suchojad

In telecommunications, the Secure File Transfer Protocol (SFTP) serves as a critical mechanism for secure and reliable file exchange between different network components devices, and systems, whether it is updating configurations, network monitoring, exchanging customer data, or facilitating software updates. Conversely, Python is an ideal tool for the automation of telecommunications networks thanks to its readability and versatility.

Let's dive into how to employ the two effectively and efficiently using the Zato integration and automation platform.

Dashboard

The first step is to define a new SFTP connection in your Dashboard, as in the screenshots below.

The form lets you provide all the default options that apply to each SFTP connection - remote host, what protocol to use, whether file metadata should be preserved during transfer, logging level and other details that you would typically provide.

Simply fill it out with the same details that you would use if it were command line-based SFTP connections.

Pinging

The next thing, right after the creation of a new connection, is to ping it to check if the server is responding.

Pinging opens a new SFTP connection and runs the ping command - in the screenshot above it was ls . - a practically no-op command whose sole purpose is to let the connection confirm that commands in fact can be executed, which proves the correctness of the configuration.

This will either returns details of why a connection could not be established or the response time if it was successful.

Cloud SFTP console

Having validated the configuration by pinging it, we can now execute SFTP commands straight in Dashboard from a command console:

Any SFTP command, or even a series of commands, can be sent and responses retrieved immediately. It is also possible to increase the logging level for additional SFTP protocol-level details.

This makes it possible to rapidly prototype file transfer functionality as a series of scripts that can be next moved as they are to Python-based services.

Python automation

Now, in Python, your API automation services have access to an extensive array of capabilities - from executing transfer commands individually or in batches to the usage of SFTP scripts previously created in your Dashboard.

Here is how Python can be used in practice:

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class MySFTPService(Service): def handle(self): # Connection to use conn_name = 'My SFTP Connection' # Get a handle to the connection object conn = self.out.sftp[conn_name].conn # Execute an arbitrary script with one or more SFTP commands, like in web-admin my_script = 'ls -la /remote/path' conn.execute(my_script) # Ping a remote server to check if it responds conn.ping() # Download an entry, possibly recursively conn.download('/remote/path', '/local/path') # Like .download but remote path must point to a file (exception otherwise) conn.download_file('/remote/path', '/local/path') # Makes the contents of a remote file available on output out = conn.read('/remote/path') # Uploads a local file or directory to remote path conn.upload('/local/path', '/remote/path') # Writes input data out to a remote file data = 'My data' conn.write(data, '/remote/path') # Create a new directory conn.create_directory('/path/to/new/directory') # Create a new symlink conn.create_symlink('/path/to/new/symlink') # Create a new hard-link conn.create_hardlink('/path/to/new/hardlink') # Delete an entry, possibly recursively, no matter what kind it is conn.delete('/path/to/delete') # Like .delete but path must be a directory conn.delete_directory('/path/to/delete') # Like .delete but path must be a file conn.delete_file('/path/to/delete') # Like .delete but path must be a symlink conn.delete_symlink('/path/to/delete') # Get information about an entry, e.g. modification time, owner, size and more info = conn.get_info('/remote/path') self.logger.info(info.last_modified) self.logger.info(info.owner) self.logger.info(info.size) self.logger.info(info.size_human) self.logger.info(info.permissions_oct) # A boolean flag indicating if path is a directory result = conn.is_directory('/remote/path') # A boolean flag indicating if path is a file result = conn.is_file('/remote/path') # A boolean flag indicating if path is a symlink result = conn.is_symlink('/remote/path') # List contents of a directory - items are in the same format that .get_info uses items = conn.list('/remote/path') # Move (rename) remote files or directories conn.move('/from/path', '/to/path') # An alias to .move conn.rename('/from/path', '/to/path') # Change mode of entry at path conn.chmod('600', '/path/to/entry') # Change owner of entry at path conn.chown('myuser', '/path/to/entry') # Change group of entry at path conn.chgrp('mygroup', '/path/to/entry') Summary

Given how important SFTP is in telecommunications, having a convenient and easy way to automate it using Python is an essential ability in a network engineer's skill-set.

Thanks to the SFTP connections in Zato, you can prototype SFTP scripts in Dashboard and employ them in API services right after that. To complement it, a full Python API is available for programmatic access to remote file servers.

Combined, the features make it possible to create scalable and reusable file transfer services in a quick and efficient manner using the most convenient programming language, Python.

More resources

Click here to read more about using Python and Zato in telecommunications
What is a Network Packet Broker? How to automate networks in Python?
What is an integration platform?
Python Integration platform as a Service (iPaaS)
What is an Enterprise Service Bus (ESB)? What is SOA?

More blog posts
Categories: FLOSS Project Planets

Turnkey Linux: Python PEP 668 - working with "externally managed environment"

Sun, 2024-07-28 23:25

Python Linux users would have noticed that python is now an "externally managed environment" on newer releases. I suspect that it has caused many frustrations. It certainly did for me - at least initially. Marcos, a long term friend of TurnKey recently reached out to me to ask about the best way to work around this when developing on our Odoo appliance.

The issue

Before I go on, for those of you who are not sure what I'm talking about, try installing or updating a python package via pip. It will fail with an error message:

error: externally-managed-environment × This environment is externally managed ╰─> To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install. If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed. If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed. See /usr/share/doc/python3.11/README.venv for more information. note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages. hint: See PEP 668 for the detailed specification.

Whilst this does fix a legitmate issue, it's also a big PITA for developers. To read more about the rationale, please see PEP 668.

Resolution options

As per the message, installing Python packages via apt is preferable. But what about if you need a python library not available in Debian? Or a newer version that what Debian provides? As noted by the error message, using a Python venv is the next best option. But that means duplication of any apt packages you may already have installed therefore bloat. It also means that you miss out on the automated security updates that TurnKey provides for Debian packages. The only remaining option is to "break system packages". That doesn't sound good! It will revert your system to the behavior before the application of PEP 668 - thus making life more complicated in the future...

Pros and Cons of each approach.

Assuming that you want/need versions and/or packages not available in Debian, what is the best path? Obviously each option note has pros and cons, so which way should you go? In his message to me, Marcos nicely laid out the pros and cons of each of the 2 suggested approaches, so I'll share them here:

Virtual Environment Pros:
  • Isolates application dependencies, avoiding conflicts with system packages.
  • Allows for more flexible and up-to-date package management.
Cons:
  • Adds complexity to the setup and maintenance process.
  • Increases the overall footprint and resource requirements of the deployment.
System-Wide Installation Pros:
  • Simpler setup and integration with the system.
  • Utilizes the standard Turnkey Linux deployment model.
Cons:
  • Potential conflicts with system-managed packages.
  • Limited by the constraints imposed by Debian 12.
Another - perhaps better - option

Another option not noted in the pip error message is to create a virtual environment, with the system python passed through. Whilst it's still not perfect, in my option it is by far the best option - unless of course you can get by just using system packages alone. TBH, I'm a bit surprised that it's not noted in the error message. It's pretty easy to set up, just include adding the '--system-site-packages' switch when creating your virtual environment. I.e.:

python3 -m venv --system-site-packages /path/to/venv What you get with this approach

Let's have a look at what you get when using '--system-site-packages'. First let's create an example venv. Note that all of this is running as root from root's home (/root). Although for any AWS Marketplace users (or non TurnKey users) most of these commands should work fine as a "sudo" user (for apt installs).

root@core ~# mkdir ~/test_venv root@core ~# python3 -m venv --system-site-packages ~/test_venv root@core ~# source ~/test_venv/bin/activate (test_venv) root@core ~#

Now for a demonstration of what happens when you use it.

I'll use a couple of apt packages with my examples:

  • python3-pygments (initially installed)
  • python3-socks (initially not-installed)

Continuing on from creating the venv above, let's confirm the package versions and status:

(test_venv) root@core ~# apt list python3-pygments python3-socks Listing... Done python3-pygments/stable,now 2.14.0+dfsg-1 all [installed,automatic] python3-socks/stable 1.7.1+dfsg-1 all

So we have python3-pygments installed and it's version 2.14.0. python3-socks is not installed, but the available version is 1.7.1. Now let's check that the installed package (pygments) is available in the venv and that it's the system version. For those not familiar with grep, the grep command does a case-insensitive search for lines that include socks or pygments.

(test_venv) root@core ~# pip list | grep -i 'socks\|pygments' Pygments 2.14.0

Let's install python3-socks and check the status again:

(test_venv) root@core ~# apt install -y python3-socks [...] (test_venv) root@core ~# apt list python3-pygments python3-socks Listing... Done python3-pygments/stable,now 2.14.0+dfsg-1 all [installed,automatic] python3-socks/stable,now 1.7.1+dfsg-1 all [installed]

Ok so python3-socks is installed now. And it's instantly available in our venv:

(test_venv) root@core ~# pip list | grep -i 'socks\|pygments' Pygments 2.14.0 PySocks 1.7.1

Woohoo! :) And we can still install and/or update packages in our venv with pip?:

(test_venv2) root@core ~# pip install --upgrade pygments Requirement already satisfied: pygments in /usr/lib/python3/dist-packages (2.14.0) Collecting pygments Downloading pygments-2.18.0-py3-none-any.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 6.5 MB/s eta 0:00:00 Installing collected packages: pygments Attempting uninstall: pygments Found existing installation: Pygments 2.14.0 Not uninstalling pygments at /usr/lib/python3/dist-packages, outside environment /root/test_venv2 Can't uninstall 'Pygments'. No files were found to uninstall. Successfully installed pygments-2.18.0

Yes! When using pip install in our venv there are some extra lines related to the system package, but otherwise it's the same as using a standalone venv. Let's double check the new version:

(test_venv) root@core ~# pip list | grep -i 'socks\|pygments' Pygments 2.18.0 PySocks 1.7.1

So we've updated from the system version of Pygments to 2.18.0, but the system version still exists - and is still 2.14.0:

(test_venv) root@core ~# apt list python3-pygments Listing... Done python3-pygments/stable,now 2.14.0+dfsg-1 all [installed,automatic]

So what happens if we remove the pip installed version?:

(test_venv) root@core ~# pip uninstall pygments Found existing installation: Pygments 2.18.0 Uninstalling Pygments-2.18.0: Would remove: /root/test_venv2/bin/pygmentize /root/test_venv2/lib/python3.11/site-packages/pygments-2.18.0.dist-info/* /root/test_venv2/lib/python3.11/site-packages/pygments/* Proceed (Y/n)? y Successfully uninstalled Pygments-2.18.0

This time there is no mention of the system package. Let's double check the system and the venv:

(test_venv) root@core ~# apt list python3-pygments Listing... Done python3-pygments/stable,now 2.14.0+dfsg-1 all [installed,automatic] (test_venv) root@core ~# pip list | grep -i 'pygments' Pygments 2.14.0

Yep, the system package is still there and it's still in the venv!

The best of both worlds

So using '--system-site-packages' is essentially the best of both worlds. Where possible you can use system packages via apt, but you still have all the advantages of a virtual environment. In my opinion it's the best option by far! What do you think? Feel free to share your thoughts and feedback below.

Blog Tags: pythonvirtual environmentvenvaptpipdebianlinux
Categories: FLOSS Project Planets

Kushal Das: Multi-factor authentication in django

Fri, 2024-07-26 10:24

Multi-factor authentication is a must have feature in any modern web application. Specially providing support for both TOTP (think applications on phone) and FIDO2 (say Yubikeys) usage. I created a small Django demo mfaforgood which shows how to enable both.

I am using django-mfa3 for all the hard work, but specially from a PR branch from my friend Giuseppe De Marco.

I also fetched the cbor-js package in the repository so that hardware tokens for FIDO2 to work. I hope this example will help you add the MFA support to your Django application.

Major points of the code
  • Adding example templates from MFA project, with admin theme and adding cbor-js to the required templates.
  • Adding mfa to INSTALLED_APPS.
  • Adding mfa.middleware.MfaSessionMiddleware to MIDDLEWARE.
  • Adding MFA_DOMAIN and MFA_SITE_TITLE to settings.py.
  • Also adding STATICFILES_DIRS.
  • Adding mfa.views.MFAListView as the Index view of the application.
  • Also adding mfa URLs.

After login for the first time one can enable MFA in the following screen.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #214: Build Captivating Display Tables in Python With Great Tables

Fri, 2024-07-26 08:00

Do you need help making data tables in Python look interesting and attractive? How can you create beautiful display-ready tables as easily as charts and graphs in Python? This week on the show, we speak with Richard Iannone and Michael Chow from Posit about the Great Tables Python library.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Software Foundation: Notice of Python Software Foundation Bylaws change, effective 10 August 2024

Fri, 2024-07-26 07:33
There has been a lot of attention directed at our Bylaws over the last few weeks, and as a result of that conversation, the Board was alerted to a defect in our Bylaws that exposes the Foundation to an unbounded financial liability.

Specifically, Bylaws Article XIII as originally written compels the Python Software Foundation to extend indemnity coverage to individual Members (including our thousands of “Basic Members”) in certain cases, and to advance legal defense expenses to individual Members with surprisingly few restrictions.

Further, the Bylaws compel the Foundation to take out insurance to cover these requirements, however, insurance of this nature is not actually available to 501(c)(3) nonprofit corporations such as the Python Software Foundation to purchase, and thus it is impossible in practice to comply with this requirement.

In the unlikely but not impossible event of the Foundation being called upon to advance such expenses, the potential financial burden would be virtually unlimited, and there would be no recourse to insurance.

As this is an existential threat to the Foundation, the Board has agreed that it must immediately reduce the Foundation’s exposure, and has opted to exercise its ability to amend the Bylaws by a majority vote of the Board directors, rather than by putting it to a vote of the membership, as allowed by Bylaws Article XI.

Acting on legal advice, the full Board has voted unanimously to amend its Bylaws to no longer extend an offer to indemnify, advance legal expenses, or insure Members when they are not serving at the request of the Foundation. The amended Bylaws still allow for indemnification of a much smaller set of individuals acting on behalf of the PSF such as Board Members and officers, which is in line with standard nonprofit governance practices and for which we already hold appropriate insurance.

The full text of the changes can be viewed at https://github.com/python/psf-bylaws/compare/a35a607...298843b

These changes shall become effective on Saturday 10 August 2024, 15 days from the date of this notice.

Any questions about these changes may be sent to psf@python.org. We gladly welcome further suggestions or recommendations for future Bylaws amendments.

Thank you,

The PSF Board of Directors
Categories: FLOSS Project Planets

Python Software Foundation: Python’s Supportive and Welcoming Environment is Tightly Coupled to Its Progress

Fri, 2024-07-26 05:49
Python is as popular as it is today because we have gone above and beyond to make this a welcoming community. Being a friendly and supportive community is part of how we are perceived by the wider world and is integral to the wide popularity of Python. We won a “Wonderfully Welcoming Award” last year at GitHub Universe. Over and over again, the tech press refers to Python as a supportive community. We aren’t the fastest, the newest or the best-funded programming language, but we are the most welcoming and supportive. Our philosophy is a big part of why Python is a fantastic choice for not only new programmers, glue programmers, and folks who split their time between research and programming but for everyone who wants to be part of a welcoming community.

We believe to be “welcoming” means to do our best to provide all participants with a safe, civil, and respectful environment when they are engaging with our community - on our forums, at PyCon events, and other spaces that have committed to following our Code of Conduct. That kind of environment doesn’t happen by accident - a lot of people have worked hard over a long time to figure out the best ways to nurture this welcoming quality for the Python community. That work has included drafting and improving the Code of Conduct, crafting and implementing processes for enforcing it, and moderating the various online spaces where it applies. And most importantly the huge, collective effort of individuals across the community, each putting in consistent effort to show up in all the positive ways that make the Python community the warm and welcoming place that we know.

The recent slew of conversations, initially kicked off in response to a bylaws change proposal, has been pretty alienating for many members of our community. They haven’t all posted publicly to explain their feelings, but they have found other ways to let the PSF know how they are feeling.
  • After the conversation on PSF-Vote had gotten pretty ugly, forty-five people out of ~1000 unsubscribed. (That list has since been put on announce-only)
  • We received a lot of Code of Conduct reports or moderation requests about the PSF-vote mailing list and the discuss.python.org message board conversations. (Several reports have already been acted on or closed and the rest will be soon).
  • PSF staff received private feedback that the blanket statements about “neurodiverse people”, the bizarre motives ascribed to the people in charge of the PSF and various volunteers and the sideways comments about the kinds of people making reports were also very off-putting.
As an open source code community, we do most things out in the open which is a fantastic strategy for code. (Many eyes, shallow bugs, etc.) We also try to be transparent about what is going on here at the Foundation and are always working to improve visibility into our policies, current resource levels, spending priorities and aspirations. Sometimes staff and volunteers are a little too busy “doing the work" to “talk about the work” but we do our best to be responsive, especially in the areas that people want to know more about. That said, sometimes things do need to be kept confidential, for privacy, legal, or other good reasons.

Some examples:
  • Most Code of Conduct reports – Oftentimes, these reports have the potential to affect both the reporter and the reported person’s reputations and livelihoods so our practice is to keep them confidential when possible to protect everyone involved. Some of you have been here long enough to remember the incident at PyCon US in 2013, an example of the entire internet discussing a Code of Conduct violation that led to negative repercussions for everyone involved, but especially for the person who reported the behavior.
  • Legal advice and proceedings – It is an unfortunate fact of the world that the legal system(s) we operate under sometimes require us to keep secret information we might otherwise prefer to disclose, often because doing so could open us up to liability in a way that would create significant risk to the PSF or it could potentially put us in violation of laws or regulation. It’s our responsibility to follow legal guidance about how to protect the Foundation, our resources, and our mission in these situations.
  • Mental health, personal history, or disability status – Community members should not, for example, have to disclose their status as neurodivergent or share their history with abuse so that others can decide if they are allowed to be offended. Community members should also not be speculating about other individuals’ characteristics or experience in this regard.
We have a moral imperative – as one of the very best places to bring new people into tech and into open source – to keep being good at welcoming new people. If we do not rise and continue to rise every day to this task, then we are not fulfilling our own mission, “to support and facilitate the growth of a diverse and international community of Python programmers.” Technical skills are a game-changer for the people who acquire them and joining a vast global network of people with similar interests opens many doors. Behavior that contributes to a hostile environment around Python or throws up barriers and obstacles to those who would join the Python community must be addressed because it endangers what we have built here.

Part of the care-taking of a diverse community “where everyone feels welcome” sadly often means asking some people to leave – or at least take a break. This is known as the paradox of tolerance. We can not tolerate intolerance and we will not allow combative and aggressive behavior to ruin the experience in our spaces for everyone else. People do make honest mistakes and don’t always understand the impact that their words have had. All we ask is that as community members we all do our best to adhere to the Code of Conduct we’ve committed to as a community, and that we gracefully accept feedback when our efforts fall short. Sometimes that means learning that the words, assumptions or tone you’re using aren’t coming across the way you’ve intended. When a person’s words and actions repeatedly come in conflict with our community norms and cause harm, and that pattern hasn’t changed in response to feedback – then we have to ask people to take a break or as a last resort to leave the conversation.

Our forum, mailing lists and events will continue to be moderated. We want to thank everyone who contributed positively to the recent conversations and everyone who made the hard choice to write to us to point out off-putting, harmful, unwelcoming or offensive comments. We especially want to thank all the volunteers who serve on the Python Discourse moderation team and our Code of Conduct Working Group. We know it’s been a long couple of weeks, and although your work may occasionally be draining and unpleasant, it is also absolutely essential and endlessly appreciated by the vast majority of the community. Thank you for everything you do!


Sincerely,
Deb Nicholson
Dawn Wages
Tania Allard
KwonHan Bae
Kushal Das
Georgi Ker
Jannis Leidel
Cristián Maureira-Fredes
Christopher Neugebauer
Denny Perez
Cheuk Ting Ho
Simon Willison

Categories: FLOSS Project Planets

Talk Python to Me: #472: State of Flask and Pallets in 2024

Fri, 2024-07-26 04:00
Flask is one of the most important Python web frameworks and powers a bunch of the internet. David Lord, Flask's lead maintainer is here to give us an update on the state of Flask and Pallets in 2024. If you care about where Flask is and where it's going, you'll definitely want to listen in.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>David on Mastodon</b>: <a href="https://mas.to/@davidism" target="_blank" rel="noopener">@davidism</a><br/> <b>David on X</b>: <a href="https://twitter.com/davidism" target="_blank" rel="noopener">@davidism</a><br/> <b>State of Pallets 2024 FlaskCon Talk</b>: <a href="https://www.youtube.com/watch?v=TYeMf0bCbr8" target="_blank" rel="noopener">youtube.com</a><br/> <b>FlaskCon</b>: <a href="https://flaskcon.com/2024/" target="_blank" rel="noopener">flaskcon.com</a><br/> <b>FlaskCon 2024 Talks</b>: <a href="https://www.youtube.com/playlist?list=PL-MSuSC-Kjb6n0HsxU_knxCOLuToQm44z" target="_blank" rel="noopener">youtube.com</a><br/> <b>Pallets Discord</b>: <a href="https://discord.com/invite/pallets" target="_blank" rel="noopener">discord.com</a><br/> <b>Pallets Eco</b>: <a href="https://github.com/pallets-eco" target="_blank" rel="noopener">github.com</a><br/> <b>JazzBand</b>: <a href="https://jazzband.co" target="_blank" rel="noopener">jazzband.co</a><br/> <b>Pallets Github Org</b>: <a href="https://github.com/pallets" target="_blank" rel="noopener">github.com</a><br/> <b>Jinja</b>: <a href="https://github.com/pallets/jinja" target="_blank" rel="noopener">github.com</a><br/> <b>Click</b>: <a href="https://github.com/pallets/click" target="_blank" rel="noopener">github.com</a><br/> <b>Werkzeug</b>: <a href="https://github.com/pallets/werkzeug" target="_blank" rel="noopener">github.com</a><br/> <b>MarkupSafe</b>: <a href="https://github.com/pallets/markupsafe" target="_blank" rel="noopener">github.com</a><br/> <b>ItsDangerous</b>: <a href="https://github.com/pallets/itsdangerous" target="_blank" rel="noopener">github.com</a><br/> <b>Quart</b>: <a href="https://github.com/pallets/quart" target="_blank" rel="noopener">github.com</a><br/> <b>pypistats</b>: <a href="https://pypistats.org/packages/flask" target="_blank" rel="noopener">pypistats.org</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=EvNx5fwcib0" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/472/state-of-flask-and-pallets-in-2024" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Real Python: Quiz: Getting Started With Testing in Python

Thu, 2024-07-25 08:00

In this quiz, you’ll test your understanding of testing your Python code.

Testing in Python is a huge topic and can come with a lot of complexity, but it doesn’t need to be hard. You can get started creating simple tests for your application in a few easy steps and then build on it from there.

With this quiz, you can check your understanding of the fundamentals of Python testing. Good luck!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Python Basics: Lists and Tuples

Thu, 2024-07-25 08:00

In Python Basics: Lists and Tuples, you’ve met two new and important data structures:

  • Lists
  • Tuples

Both of these data types are sequences, meaning they are objects that contain other objects in a certain order. They each have some important distinguishing properties and come with their own set of methods for interacting with objects of each type.

In this quiz, youll test your knowledge of:

  • Creating lists and tuples
  • Indexing and slicing lists and tuples
  • Iterating over these containers
  • Understanding their differences, specifically the impact of mutability
  • Adding and removing items from a list

Then, you can move on to other Python Basics courses.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

EuroPython Society: EuroPython 2024 Code of Conduct Transparency Report

Thu, 2024-07-25 02:00

The 2024 version of the EuroPython conference took place both online and in person in July 2024. This was the second conference under our new Code of Conduct (CoC), and we had Code of Conduct working group members continuously available both online and in person.

Reports

We had 4 Code of Conduct working group members continuously available both online and in person. Over the course of the conference the Code of Conduct team was made aware of the following issues:

  • A disabled person had requested reserved seating for talks, but when he arrived the first day, there was none. He reported this to a CoC member, who filed a report with Ops. It turned out that while the request had been gathered on the web form, there was no mechanism to get that information to the people involved. Once they were informed, the issue was quickly resolved, and the reporter expressed satisfaction with the way it was handled.
  • One person was uncomfortable with having their last name shown on Discord. They were informed that they could change that as soon as the registration bot ran, limiting the exposure to a minute or so, or that they could come to the registration desk for assistance. The report came via email and there was no response to the email suggesting those options.
  • An attendee reported that one talk&aposs slides included a meme that seemed to reflect a racist trope. The CoC team reviewed that talk&aposs slides, and agreed that the meme might be interpreted that way. A member of the CoC team contacted the presenter who immediately agreed to remove that meme before uploading the slides, and the video team was alerted to edit that meme out of the talk video before final publication.
  • There were multiple reports that the toilet signage was confusing and causing people to be uncomfortable with choosing a toilet. Once this was reported the signage was adjusted to make the gender designation visible and no further reports were received. It should be noted that none of the complaints objected to the text of the signs, just to the fact that covering of gender markers led to people entering a toilet they didn&apost want to.
  • The CoC team also were presented with a potential lightning talk topic that had caused complaints at another conference due to references to current wars that some viewers found disturbing. Since lightning talks are too short for content warnings to be effective, and since they are not reviewed in any detail by the programme committee, the CoC team counselled the prospective presenter against using the references that had been problematic at a prior conference. Given that advice, the presenter elected not to submit that topic.
Categories: FLOSS Project Planets

PyPy: Abstract interpretation in the Toy Optimizer

Wed, 2024-07-24 10:48

This is a cross-post from Max Bernstein from his excellent blog where he writes about programming languages, compilers, optimizations, virtual machines. He's looking for a (dynamic language runtime or compiler related) job too.

CF Bolz-Tereick wrote some excellent posts in which they introduce a small IR and optimizer and extend it with allocation removal. We also did a live stream together in which we did some more heap optimizations.

In this blog post, I'm going to write a small abtract interpreter for the Toy IR and then show how we can use it to do some simple optimizations. It assumes that you are familiar with the little IR, which I have reproduced unchanged in a GitHub Gist.

Abstract interpretation is a general framework for efficiently computing properties that must be true for all possible executions of a program. It's a widely used approach both in compiler optimizations as well as offline static analysis for finding bugs. I'm writing this post to pave the way for CF's next post on proving abstract interpreters correct for range analysis and known bits analysis inside PyPy.

Before we begin, I want to note a couple of things:

  • The Toy IR is in SSA form, which means that every variable is defined exactly once. This means that abstract properties of each variable are easy to track.
  • The Toy IR represents a linear trace without control flow, meaning we won't talk about meet/join or fixpoints. They only make sense if the IR has a notion of conditional branches or back edges (loops).

Alright, let's get started.

Welcome to abstract interpretation

Abstract interpretation means a couple different things to different people. There's rigorous mathematical formalism thanks to Patrick and Radhia Cousot, our favorite power couple, and there's also sketchy hand-wavy stuff like what will follow in this post. In the end, all people are trying to do is reason about program behavior without running it.

In particular, abstract interpretation is an over-approximation of the behavior of a program. Correctly implemented abstract interpreters never lie, but they might be a little bit pessimistic. This is because instead of using real values and running the program---which would produce a concrete result and some real-world behavior---we "run" the program with a parallel universe of abstract values. This abstract run gives us information about all possible runs of the program.1

Abstract values always represent sets of concrete values. Instead of literally storing a set (in the world of integers, for example, it could get pretty big...there are a lot of integers), we group them into a finite number of named subsets.2

Let's learn a little about abstract interpretation with an example program and example abstract domain. Here's the example program:

v0 = 1 v1 = 2 v2 = add(v0, v1)

And our abstract domain is "is the number positive" (where "positive" means nonnegative, but I wanted to keep the words distinct):

top / \ positive negative \ / bottom

The special top value means "I don't know" and the special bottom value means "empty set" or "unreachable". The positive and negative values represent the sets of all positive and negative numbers, respectively.

We initialize all the variables v0, v1, and v2 to bottom and then walk our IR, updating our knowledge as we go.

# here v0:bottom = 1 v1:bottom = 2 v2:bottom = add(v0, v1)

In order to do that, we have to have transfer functions for each operation. For constants, the transfer function is easy: determine if the constant is positive or negative. For other operations, we have to define a function that takes the abstract values of the operands and returns the abstract value of the result.

In order to be correct, transfer functions for operations have to be compatible with the behavior of their corresponding concrete implementations. You can think of them having an implicit universal quantifier forall in front of them.

Let's step through the constants at least:

v0:positive = 1 v1:positive = 2 # here v2:bottom = add(v0, v1)

Now we need to figure out the transfer function for add. It's kind of tricky right now because we haven't specified our abstract domain very well. I keep saying "numbers", but what kinds of numbers? Integers? Real numbers? Floating point? Some kind of fixed-width bit vector (int8, uint32, ...) like an actual machine "integer"?

For this post, I am going to use the mathematical definition of integer, which means that the values are not bounded in size and therefore do not overflow. Actual hardware memory constraints aside, this is kind of like a Python int.

So let's look at what happens when we add two abstract numbers:

top positive negative bottom top top top top bottom positive top positive top bottom negative top top negative bottom bottom bottom bottom bottom bottom

As an example, let's try to add two numbers a and b, where a is positive and b is negative. We don't know anything about their values other than their signs. They could be 5 and -3, where the result is 2, or they could be 1 and -100, where the result is -99. This is why we can't say anything about the result of this operation and have to return top.

The short of this table is that we only really know the result of an addition if both operands are positive or both operands are negative. Thankfully, in this example, both operands are known positive. So we can learn something about v2:

v0:positive = 1 v1:positive = 2 v2:positive = add(v0, v1) # here

This may not seem useful in isolation, but analyzing more complex programs even with this simple domain may be able to remove checks such as if (v2 < 0) { ... }.

Let's take a look at another example using an sample absval (absolute value) IR operation:

v0 = getarg(0) v1 = getarg(1) v2 = absval(v0) v3 = absval(v1) v4 = add(v2, v3) v5 = absval(v4)

Even though we have no constant/concrete values, we can still learn something about the states of values throughout the program. Since we know that absval always returns a positive number, we learn that v2, v3, and v4 are all positive. This means that we can optimize out the absval operation on v5:

v0:top = getarg(0) v1:top = getarg(1) v2:positive = absval(v0) v3:positive = absval(v1) v4:positive = add(v2, v3) v5:positive = v4

Other interesting lattices include:

  • Constants (where the middle row is pretty wide)
  • Range analysis (bounds on min and max of a number)
  • Known bits (using a bitvector representation of a number, which bits are always 0 or 1)

For the rest of this blog post, we are going to do a very limited version of "known bits", called parity. This analysis only tracks the least significant bit of a number, which indicates if it is even or odd.

Parity

The lattice is pretty similar to the positive/negative lattice:

top / \ even odd \ / bottom

Let's define a data structure to represent this in Python code:

class Parity: def __init__(self, name): self.name = name def __repr__(self): return self.name

And instantiate the members of the lattice:

TOP = Parity("top") EVEN = Parity("even") ODD = Parity("odd") BOTTOM = Parity("bottom")

Now let's write a forward flow analysis of a basic block using this lattice. We'll do that by assuming that a method on Parity is defined for each IR operation. For example, Parity.add, Parity.lshift, etc.

def analyze(block: Block) -> None: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] for op in block: transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args)

For every operation, we compute the abstract value---the parity---of the arguments and then call the corresponding method on Parity to get the abstract result.

We need to special case Constants due to a quirk of how the Toy IR is constructed: the constants don't appear in the instruction stream and instead are free-floating.

Let's start by looking at the abstraction function for concrete values---constants:

class Parity: # ... @staticmethod def const(value): if value.value % 2 == 0: return EVEN else: return ODD

Seems reasonable enough. Let's pause on operations for a moment and consider an example program:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v5 = dummy(v4)

This function (which is admittedly a little contrived) takes two inputs, shifts them left by one bit, adds the result, and then checks the least significant bit of the addition result. It then passes that result into a dummy function, which you can think of as "return" or "escape".

To do some abstract interpretation on this program, we'll need to implement the transfer functions for lshift and add (dummy will just always return TOP). We'll start with add. Remember that adding two even numbers returns an even number, adding two odd numbers returns an even number, and mixing even and odd returns an odd number.

class Parity: # ... def add(self, other): if self is BOTTOM or other is BOTTOM: return BOTTOM if self is TOP or other is TOP: return TOP if self is EVEN and other is EVEN: return EVEN if self is ODD and other is ODD: return EVEN return ODD

We also need to fill in the other cases where the operands are top or bottom. In this case, they are both "contagious"; if either operand is bottom, the result is as well. If neither is bottom but either operand is top, the result is as well.

Now let's look at lshift. Shifting any number left by a non-zero number of bits will always result in an even number, but we need to be careful about the zero case! Shifting by zero doesn't change the number at all. Unfortunately, since our lattice has no notion of zero, we have to over-approximate here:

class Parity: # ... def lshift(self, other): # self << other if other is ODD: return EVEN return TOP

This means that we will miss some opportunities to optimize, but it's a tradeoff that's just part of the game. (We could also add more elements to our lattice, but that's a topic for another day.)

Now, if we run our abstract interpretation, we'll collect some interesting properties about the program. If we temporarily hack on the internals of bb_to_str, we can print out parity information alongside the IR operations:

v0:top = getarg(0) v1:top = getarg(1) v2:even = lshift(v0, 1) v3:even = lshift(v1, 1) v4:even = add(v2, v3) v5:top = dummy(v4)

This is pretty awesome, because we can see that v4, the result of the addition, is always even. Maybe we can do something with that information.

Optimization

One way that a program might check if a number is odd is by checking the least significant bit. This is a common pattern in C code, where you might see code like y = x & 1. Let's introduce a bitand IR operation that acts like the & operator in C/Python. Here is an example of use of it in our program:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v5 = bitand(v4, 1) # new! v6 = dummy(v5)

We'll hold off on implementing the transfer function for it---that's left as an exercise for the reader---and instead do something different.

Instead, we'll see if we can optimize operations of the form bitand(X, 1). If we statically know the parity as a result of abstract interpretation, we can replace the bitand with a constant 0 or 1.

We'll first modify the analyze function (and rename it) to return a new Block containing optimized instructions:

def simplify(block: Block) -> Block: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] result = Block() for op in block: # TODO: Optimize op # Emit result.append(op) # Analyze transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args) return result

We're approaching this the way that PyPy does things under the hood, which is all in roughly a single pass. It tries to optimize an instruction away, and if it can't, it copies it into the new block.

Now let's add in the bitand optimization. It's mostly some gross-looking pattern matching that checks if the right hand side of a bitwise and operation is 1 (TODO: the left hand side, too). CF had some neat ideas on how to make this more ergonomic, which I might save for later.3

Then, if we know the parity, optimize the bitand into a constant.

def simplify(block: Block) -> Block: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] result = Block() for op in block: # Try to simplify if isinstance(op, Operation) and op.name == "bitand": arg = op.arg(0) mask = op.arg(1) if isinstance(mask, Constant) and mask.value == 1: if parity_of(arg) is EVEN: op.make_equal_to(Constant(0)) continue elif parity_of(arg) is ODD: op.make_equal_to(Constant(1)) continue # Emit result.append(op) # Analyze transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args) return result

Remember: because we use union-find to rewrite instructions in the optimizer (make_equal_to), later uses of the same instruction get the new optimized version "for free" (find).

Let's see how it works on our IR:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v6 = dummy(0)

Hey, neat! bitand disappeared and the argument to dummy is now the constant 0 because we know the lowest bit.

Wrapping up

Hopefully you have gained a little bit of an intuitive understanding of abstract interpretation. Last year, being able to write some code made me more comfortable with the math. Now being more comfortable with the math is helping me write the code. It's nice upward spiral.

The two abstract domains we used in this post are simple and not very useful in practice but it's possible to get very far using slightly more complicated abstract domains. Common domains include: constant propagation, type inference, range analysis, effect inference, liveness, etc. For example, here is a a sample lattice for constant propagation:

"-inf"; bottom -> "-2"; bottom -> "-1"; bottom -> 0; bottom -> 1; bottom -> 2; bottom -> "+inf"; "-inf" -> negative; "-2" -> negative; "-1" -> negative; 0 -> top; 1 -> nonnegative; 2 -> nonnegative; "+inf" -> nonnegative; negative -> nonzero; nonnegative -> nonzero; nonzero->top; {rank=same; "-inf"; "-2"; "-1"; 0; 1; 2; "+inf"} {rank=same; nonnegative; negative;} } -->

It has multiple levels to indicate more and less precision. For example, you might learn that a variable is either 1 or 2 and be able to encode that as nonnegative instead of just going straight to top.

Check out some real-world abstract interpretation in open source projects:

If you have some readable examples, please share them so I can add.

Acknowledgements

Thank you to CF Bolz-Tereick for the toy optimizer and helping edit this post!

  1. In the words of abstract interpretation researchers Vincent Laviron and Francesco Logozzo in their paper Refining Abstract Interpretation-based Static Analyses with Hints (APLAS 2009):

    The three main elements of an abstract interpretation are: (i) the abstract elements ("which properties am I interested in?"); (ii) the abstract transfer functions ("which is the abstract semantics of basic statements?"); and (iii) the abstract operations ("how do I combine the abstract elements?").

    We don't have any of these "abstract operations" in this post because there's no control flow but you can read about them elsewhere! 

  2. These abstract values are arranged in a lattice, which is a mathematical structure with some properties but the most important ones are that it has a top, a bottom, a partial order, a meet operation, and values can only move in one direction on the lattice.

    Using abstract values from a lattice promises two things:

    • The analysis will terminate
    • The analysis will be correct for any run of the program, not just one sample run

  3. Something about __match_args__ and @property... 

Categories: FLOSS Project Planets

Real Python: Hugging Face Transformers: Leverage Open-Source AI in Python

Wed, 2024-07-24 10:00

Transformers is a powerful Python library created by Hugging Face that allows you to download, manipulate, and run thousands of pretrained, open-source AI models. These models cover multiple tasks across modalities like natural language processing, computer vision, audio, and multimodal learning. Using pretrained open-source models can reduce costs, save the time needed to train models from scratch, and give you more control over the models you deploy.

In this tutorial, you’ll learn how to:

  • Navigate the Hugging Face ecosystem
  • Download, run, and manipulate models with Transformers
  • Speed up model inference with GPUs

Throughout this tutorial, you’ll gain a conceptual understanding of Hugging Face’s AI offerings and learn how to work with the Transformers library through hands-on examples. When you finish, you’ll have the knowledge and tools you need to start using models for your own use cases. Before starting, you’ll benefit from having an intermediate understanding of Python and popular deep learning libraries like pytorch and tensorflow.

Get Your Code: Click here to download the free sample code that shows you how to use Hugging Face Transformers to leverage open-source AI in Python.

Take the Quiz: Test your knowledge with our interactive “Hugging Face Transformers” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Hugging Face Transformers

In this quiz, you'll test your understanding of the Hugging Face Transformers library. This library is a popular choice for working with transformer models in natural language processing tasks, computer vision, and other machine learning applications.

The Hugging Face Ecosystem

Before using Transformers, you’ll want to have a solid understanding of the Hugging Face ecosystem. In this first section, you’ll briefly explore everything that Hugging Face offers with a particular emphasis on model cards.

Exploring Hugging Face

Hugging Face is a hub for state-of-the-art AI models. It’s primarily known for its wide range of open-source transformer-based models that excel in natural language processing (NLP), computer vision, and audio tasks. The platform offers several resources and services that cater to developers, researchers, businesses, and anyone interested in exploring AI models for their own use cases.

There’s a lot you can do with Hugging Face, but the primary offerings can be broken down into a few categories:

  • Models: Hugging Face hosts a vast repository of pretrained AI models that are readily accessible and highly customizable. This repository is called the Model Hub, and it hosts models covering a wide range of tasks, including text classification, text generation, translation, summarization, speech recognition, image classification, and more. The platform is community-driven and allows users to contribute their own models, which facilitates a diverse and ever-growing selection.

  • Datasets: Hugging Face has a library of thousands of datasets that you can use to train, benchmark, and enhance your models. These range from small-scale benchmarks to massive, real-world datasets that encompass a variety of domains, such as text, image, and audio data. Like the Model Hub, 🤗 Datasets supports community contributions and provides the tools you need to search, download, and use data in your machine learning projects.

  • Spaces: Spaces allows you to deploy and share machine learning applications directly on the Hugging Face website. This service supports a variety of frameworks and interfaces, including Streamlit, Gradio, and Jupyter notebooks. It is particularly useful for showcasing model capabilities, hosting interactive demos, or for educational purposes, as it allows you to interact with models in real time.

  • Paid offerings: Hugging Face also offers several paid services for enterprises and advanced users. These include the Pro Account, the Enterprise Hub, and Inference Endpoints. These solutions offer private model hosting, advanced collaboration tools, and dedicated support to help organizations scale their AI operations effectively.

These resources empower you to accelerate your AI projects and encourage collaboration and innovation within the community. Whether you’re a novice looking to experiment with pretrained models, or an enterprise seeking robust AI solutions, Hugging Face offers tools and platforms that cater to a wide range of needs.

This tutorial focuses on Transformers, a Python library that lets you run just about any model in the Model Hub. Before using transformers, you’ll need to understand what model cards are, and that’s what you’ll do next.

Understanding Model Cards

Model cards are the core components of the Model Hub, and you’ll need to understand how to search and read them to use models in Transformers. Model cards are nothing more than files that accompany each model to provide useful information. You can search for the model card you’re looking for on the Models page:

Hugging Face Models page

On the left side of the Models page, you can search for model cards based on the task you’re interested in. For example, if you’re interested in zero-shot text classification, you can click the Zero-Shot Classification button under the Natural Language Processing section:

Hugging Face Models page filtered for zero-shot text classification models

In this search, you can see 266 different zero-shot text classification models, which is a paradigm where language models assign labels to text without explicit training or seeing any examples. In the upper-right corner, you can sort the search results based on model likes, downloads, creation dates, updated dates, and popularity trends.

Each model card button tells you the model’s task, when it was last updated, and how many downloads and likes it has. When you click a model card button, say the one for the facebook/bart-large-mnli model, the model card will open and display all of the model’s information:

A Hugging Face model card

Even though a model card can display just about anything, Hugging Face has outlined the information that a good model card should provide. This includes detailed information about the model, its uses and limitations, the training parameters and experiment details, the dataset used to train the model, and the model’s evaluation performance.

A high-quality model card also includes metadata such as the model’s license, references to the training data, and links to research papers that describe the model in detail. In some model cards, you’ll also get to tinker with a deployed instance of the model via the Inference API. You can see an example of this in the facebook/bart-large-mnli model card:

Tinker with Hugging Face models using the Inference API Read the full article at https://realpython.com/huggingface-transformers/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Pages