Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 18 hours 9 min ago

Real Python: Using Python for Data Analysis

Wed, 2024-01-17 09:00

Data analysis is a broad term that covers a wide range of techniques that enable you to reveal any insights and relationships that may exist within raw data. As you might expect, Python lends itself readily to data analysis. Once Python has analyzed your data, you can then use your findings to make good business decisions, improve procedures, and even make informed predictions based on what you’ve discovered.

In this tutorial, you’ll:

  • Understand the need for a sound data analysis workflow
  • Understand the different stages of a data analysis workflow
  • Learn how you can use Python for data analysis

Before you start, you should familiarize yourself with Jupyter Notebook, a popular tool for data analysis. Alternatively, JupyterLab will give you an enhanced notebook experience. You might also like to learn how a pandas DataFrame stores its data. Knowing the difference between a DataFrame and a pandas Series will also prove useful.

Get Your Code: Click here to download the free data files and sample code for your mission into data analysis with Python.

In this tutorial, you’ll use a file named james_bond_data.csv. This is a doctored version of the free James Bond Movie Dataset. The james_bond_data.csv file contains a subset of the original data with some of the records altered to make them suitable for this tutorial. You’ll find it in the downloadable materials. Once you have your data file, you’re ready to begin your first mission into data analysis.

Understanding the Need for a Data Analysis Workflow

Data analysis is a very popular field and can involve performing many different tasks of varying complexity. Which specific analysis steps you perform will depend on which dataset you’re analyzing and what information you hope to glean. To overcome these scope and complexity issues, you need to take a strategic approach when performing your analysis. This is where a data analysis workflow can help you.

A data analysis workflow is a process that provides a set of steps for your analysis team to follow when analyzing data. The implementation of each of these steps will vary depending on the nature of your analysis, but following an agreed-upon workflow allows everyone involved to know what needs to happen and to see how the project is progressing.

Using a workflow also helps futureproof your analysis methodology. By following the defined set of steps, your efforts become systematic, which minimizes the possibility that you’ll make mistakes or miss something. Furthermore, when you carefully document your work, you can reapply your procedures against future data as it becomes available. Data analysis workflows therefore also provide repeatability and scalability.

There’s no single data workflow process that suits every analysis, nor is there universal terminology for the procedures used within it. To provide a structure for the rest of this tutorial, the diagram below illustrates the stages that you’ll commonly find in most workflows:

A Data Analysis Workflow

The solid arrows show the standard data analysis workflow that you’ll work through to learn what happens at each stage. The dashed arrows indicate where you may need to carry out some of the individual steps several times depending upon the success of your analysis. Indeed, you may even have to repeat the entire process should your first analysis reveal something interesting that demands further attention.

Now that you have an understanding of the need for a data analysis workflow, you’ll work through its steps and perform an analysis of movie data. The movies that you’ll analyze all relate to the British secret agent Bond … James Bond.

Setting Your Objectives

The very first workflow step in data analysis is to carefully but clearly define your objectives. It’s vitally important for you and your analysis team to be clear on what exactly you’re all trying to achieve. This step doesn’t involve any programming but is every bit as important because, without an understanding of where you want to go, you’re unlikely to ever get there.

The objectives of your data analysis will vary depending on what you’re analyzing. Your team leader may want to know why a new product hasn’t sold, or perhaps your government wants information about a clinical test of a new medical drug. You may even be asked to make investment recommendations based on the past results of a particular financial instrument. Regardless, you must still be clear on your objectives. These define your scope.

In this tutorial, you’ll gain experience in data analysis by having some fun with the James Bond movie dataset mentioned earlier. What are your objectives? Now pay attention, 007:

  • Is there any relationship between the Rotten Tomatoes ratings and those from IMDb?
  • Are there any insights to be gleaned from analyzing the lengths of the movies?
  • Is there a relationship between the number of enemies James Bond has killed and the user ratings of the movie in which they were killed?

Now that you’ve been briefed on your mission, it’s time to get out into the field and see what intelligence you can uncover.

Acquiring Your Data

Once you’ve established your objectives, your next step is to think about what data you’ll need to achieve them. Hopefully, this data will be readily available, but you may have to work hard to get it. You may need to extract it from the data storage systems within an organization or collect survey data. Regardless, you’ll somehow need to get the data.

In this case, you’re in luck. When your bosses briefed you on your objectives, they also gave you the data in the james_bond_data.csv file. You must now spend some time becoming familiar with what you have in front of you. During the briefing, you made some notes on the content of this file:

Heading Meaning Release The release date of the movie Movie The title of the movie Bond The actor playing the title role Bond_Car_MFG The manufacturer of James Bond’s car US_Gross The movie’s gross US earnings World_Gross The movie’s gross worldwide earnings Budget ($ 000s) The movie’s budget, in thousands of US dollars Film_Length The running time of the movie Avg_User_IMDB The average user rating from IMDb Avg_User_Rtn_Tom The average user rating from Rotten Tomatoes Martinis The number of martinis that Bond drank in the movie

As you can see, you have quite a variety of data. You won’t need all of it to meet your objectives, but you can think more about this later. For now, you’ll concentrate on getting the data out of the file and into Python for cleansing and analysis.

Read the full article at https://realpython.com/python-for-data-analysis/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Join us for AI Chat App Hack from Jan. 29 – Feb.12

Tue, 2024-01-16 19:21

Over the past six months, we’ve met hundreds of developers that are using Python to build AI chat apps for their own knowledge domains, using the RAG (Retrieval Augmented Generation) approach to send chunks of knowledge to an LLM model along with the user question.

We’ve also heard from many developers that they’d like to learn how to build their own RAG chat apps, but they don’t know where to start. So we’re hosting a virtual hackathon to help you learn how to build your own RAG chat app with Python!

.

From January 29th to February 12th, we’ll host live streams showing you how to build on our most popular RAG chat sample repository, while also explaining the core concepts underlying all modern RAG chat apps. Live stream topics will include vector search, access control, GPT-4 with vision. We’re hoping to get developers from all over the world involved, so we’ll also have live streams in Spanish, Portuguese, and Chinese. There will be prizes for the best chat apps, and even a prize for our most helpful community member.

To learn more, visit the AI Chat App Hack page, and follow the steps there to register and meet the community. Hope to see you there!

More RAG resources for Python developers

If you’re interested in learning more about RAG chat apps but can’t join the hack, here are some resources to get you started:

The post Join us for AI Chat App Hack from Jan. 29 – Feb.12 appeared first on Python.

Categories: FLOSS Project Planets

Python⇒Speed: Beware of misleading GPU vs CPU benchmarks

Tue, 2024-01-16 19:00

Do you use NumPy, Pandas, or scikit-learn and want to get faster results? Nvidia has created GPU-based replacements for each of these with the shared promise of extra speed.

For example, if you visit the front page of NVidia’s RAPIDS project, you’ll see benchmarks showing cuDF, a GPU-based Pandas replacement, is 15× to 80× faster than Pandas!

Unfortunately, while those speed-ups are impressive, they are also misleading. GPU-based libraries might be the answer to your performance problems… or they might be an an unnecessary and expensive distraction.

Read more...
Categories: FLOSS Project Planets

Seth Michael Larson: Defending against the PyTorch supply chain attack PoC

Tue, 2024-01-16 19:00
Defending against the PyTorch supply chain attack PoC AboutBlogNewsletterLinks Defending against the PyTorch supply chain attack PoC

Published 2024-01-17 by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the OpenSSF Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Last week there which a publication into a proof-of-concept supply chain attack against PyTorch using persistence in self-hosted GitHub runners, capturing tokens from triggerable jobs as a third-party contributor, and modifying workflows. This report was #1 on Hacker News for most of Sunday. In the comments of this publication there was a lot of discussion and folks questioning "how do you defend from this type of attack"?

Luckily for open source users, there are already techniques that can be used today to mitigate the downstream impact of a compromised dependency:

  • Using a lock file with pinned hashes like pip with --require-hashes, poetry.lock, or Pipfile.lock.
  • Reviewing diffs between currently pinned and new candidate releases. The diff must be of the installed artifacts, not using git tags or source repository information. Tools like diffoscope are useful for diffing wheel files which are actually zip files in disguise.
  • For larger organizations the cost of manual review can be amortized by mirroring PyPI and only updating dependencies that have been manually reviewed.
  • Binary or compiled dependencies can be compiled from source to ensure malicious code isn't hidden from human inspection.

These are tried-and-true methods to protect yourself and ensure dependencies aren't compromised regardless of what happens upstream. Obviously the suggestions above take time and effort to implement. Generally there's desire from me and others to make the above steps easier for consumers like exposing build provenance for easier reviewing of source code or by improving the overall safety of PyPI content using malware scanning and reporting.

Part of my plans for 2024 is to create guidance for Python open source consumers and maintainers for how to safely use packaging tools both from the perspective of supply chain integrity but also for vulnerabilities, builds, etc. So stay tuned for that!

CPython Software Bill-of-Materials update

Last week I published a draft for CPython's SBOM document specifically for the source tarballs in order to solicit feedback from consumers of SBOMs and developers of SBOM tooling. I received great feedback from Adolfo Garcia Veytia and Ritesh Noronha including the following points:

  • Strip version information from the fileName attribute
  • The top-level CPython component had no relationships to non-file components, should have DEPENDS_ON relationships to all its dependent packages.
  • Fix the formatting of the "Tool: " name and version. Correct format is {name}-{version}.
  • Use the fileName attribute on the CPython package instead of using a separate file component for the tarball containing CPython source code.
  • Include an email address for all "Person" identities.
  • Guidance on alternatives to the documentNamespace field.

After applying this feedback we now have an SBOM which meets NTIA's Minimum Elements of an SBOM and scores 9.6 out of 10 for the SBOM Quality Score.

Next I'm working on the infrastructure for actually generating and making the SBOM available for consumers:

Other items
  • Reviewed PEP 740 proposal for arbitrary attestation mechanism for PyPI artifacts.
  • Triaged multiple reports to the Python Security Response Team.

That's all for this week! 👋 If you're interested in more you can read last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #612 (Jan. 16, 2024)

Tue, 2024-01-16 14:30

#612 – JANUARY 16, 2024
View in Browser »

Exploring Python in Excel

Are you interested in using your Python skills within Excel? Would you like to share a data science project or visualization as a single Office file? This week on the show, we speak with Principal Architect John Lam and Sr. Cloud Developer Advocate Sarah Kaiser from Microsoft about Python in Excel.
REAL PYTHON podcast

Python 3.13 Gets a JIT

This article does a deeper dive into the JIT recently added to the CPython 3.13 pre-release. This JIT is a bit different, it is called a copy-and-patch JIT, and the post explains what that means.
ANTHONY SHAW

NumPy 2 Is Coming: Preventing Breakage, Updating Your Code

NumPy 2 is coming, and it’s backwards incompatible. Learn how to keep your code from breaking, and how to upgrade.
ITAMAR TURNER-TRAURING

Build Invincible Apps With Temporal’s Python SDK

Get an introduction to Temporal’s Python SDK by walking through our easy, free tutorials. Learn how to build Temporal applications using Python, including building a data pipeline Workflow and a subscription Workflow. Get started here →
TEMPORAL sponsor

PSF Says: EU’s Cyber Resilience Act Has Wins for Open Source

PYTHON SOFTWARE FOUNDATION

Articles & Tutorials Learn From 2023’s Most Popular Python Tutorials and Courses

Revisit your favorite Real Python tutorials and video courses from 2023. Explore various topics, from Python basics to web development, machine learning, and effective coding environments. It’s been a busy year of learning, and there’s something for everyone to discover and build upon in 2024.
REAL PYTHON

Python’s Array: Working With Numeric Data Efficiently

In this tutorial, you’ll dive deep into working with numeric arrays in Python, an efficient tool for handling binary data. Along the way, you’ll explore low-level data types exposed by the array module, emulate custom types, and even pass a Python array to C for high-performance processing.
REAL PYTHON

Data Deduplication in Python with RecordLinkage

Duplicate detection is a critical process in data preprocessing, especially when dealing with large datasets. In this tutorial, you will explore data deduplication using Python’s RecordLinkage package, paired with Pandas for data manipulation.
PATRYK SZLAGOWSKI • Shared by Izabela Pawlik

The Curious Case of Pydantic and the 1970s Timestamps

When parsing Unix timestamps, Pydantic guesses whether to interpret them in seconds or milliseconds. While this is certainly convenient and works most of the time, it can drastically (and silently) distort timestamps from a few decades ago.
ARIE BOVENBERG • Shared by Arie Bovenberg

A Critical Supply Chain Attack on PyTorch

This post describes how coders found an exploit in the PyTorch supply chain, leaving the repo and its maintainers up for attack. Full details on what the vulnerability was and how to avoid the problem in your own repos is covered.
JOHN STAWINSKI

Comparing Coroutines, by Example, in Kotlin and Python

This series of 3 articles compares Python and Kotlin, with a focus on coroutines and generators. It compares, through examples, how coroutines are used in both languages, to read files and perform network requests.
MEDIUM.COM • Shared by Carmen Alvarez

Enhance Your Flask Web Project With a Database

Adding a database to your Flask project elevates your web app to the next level. In this tutorial, you’ll learn how to connect your Flask app to a database and how to receive and store posts from users.
REAL PYTHON

SQLALchemy vs Django ORM

If you are working with Django ORM most of the time and then switching to SQLAlchemy, you may face some unexpected behavior. This post describes the most important differences between them.
ALEXEY EVSEEV

Annotating *args and **kwargs in Python

“Typing *args and **kwargs has always been a pain since you couldn’t annotate them precisely before.” This article shows you what your options are when typing function signatures.
REDOWAN DELOWAR

Python Gotcha: Modifying a List While Iterating

Python makes it easy to modify a list while you are iterating through it’s elements. This will bite you. Read on to find out how and what can be done about it.
ANDREW WEGNER

A Deep Dive Into Python’s functools.wraps Decorator

Take a deep dive into Python’s functools.wraps decorator to learn how it maintains metadata in your code. A concise guide to effective decorator use.
JACOB PADILLA

max() is broken

The built-in function max in Python is broken and this article explains why, drawing parallels with other programming and mathematics concepts.
MATHSPP.COM • Shared by Rodrigo Girão Serrão

All PyCon 2023 (US and AU) Talks Sorted by the View Count

A full list of PyCon talks given in the US and Australia which are available on YouTube, and sorted by popularity.
SUBSTACK.COM

Projects & Code PikaPython: Python Interpreter in 4KB of RAM

GITHUB.COM/PIKASTECH

Fontimize: Optimize Fonts to the Glyphs on Your Site

GITHUB.COM/VINTAGEDAVE

instructor: Structured Outputs for LLMS

GITHUB.COM/JXNL

Pint: Units for Python

PYPI.ORG

Events Weekly Real Python Office Hours Q&A (Virtual)

January 17, 2024
REALPYTHON.COM

PyData Bristol Meetup

January 18, 2024
MEETUP.COM

PyLadies Dublin

January 18, 2024
PYLADIES.COM

Chattanooga Python User Group

January 19 to January 20, 2024
MEETUP.COM

IndyPy: Models & AI For Dummies (Hybrid)

January 23, 2024
MEETUP.COM • Shared by Laura Stephens

Happy Pythoning!
This was PyCoder’s Weekly Issue #612.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Real Python: Create a Tic-Tac-Toe Python Game Engine With an AI Player

Tue, 2024-01-16 09:00

A classic childhood game is tic-tac-toe, also known as naughts and crosses. It’s simple and enjoyable, and coding a version of it with Python is an exciting project for a budding programmer. Now, adding some artificial intelligence (AI) using Python can make an old favorite even more thrilling.

In this comprehensive tutorial, you’ll construct a flexible game engine. This engine will include an unbeatable computer player that employs the minimax algorithm to play tic-tac-toe flawlessly. Throughout the tutorial, you’ll explore concepts such as immutable class design, generic plug-in architecture, and modern Python coding practices and patterns.

In this video course, you’ll learn how to:

  • Develop a reusable Python library containing the tic-tac-toe game engine
  • Create a Pythonic code style that accurately models the tic-tac-toe domain
  • Implement various artificial players, including one using the powerful minimax algorithm
  • Construct a text-based console front end for the game, enabling human players to participate
  • Discover effective strategies for optimizing performance

Are you ready to embark on this step-by-step adventure of building an extensible game engine with an unbeatable AI player using the minimax algorithm?

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python People: Pamela Fox - Teaching Python, Accessibility, and Tools

Tue, 2024-01-16 09:00

Pamela Fox is a Python Cloud Developer Advocate at Microsoft. 


Topics include:

  • Girl Develop It
  • Django Girls
  • Girls Who Code
  • Teaching a language vs teaching a tool
  • What a dev advocate does
  • Accessibility (A11y) testing
  • Playwright
  • axe-core
  • Snapshot testing
  • pytest plugin authoring
  • Flask SQLAlchemy
  • Relearning Go

Links from the show:



The Complete pytest Course

★ Support this podcast on Patreon ★ <p>Pamela Fox is a Python Cloud Developer Advocate at Microsoft. </p><p><br>Topics include:</p><ul><li>Girl Develop It</li><li>Django Girls</li><li>Girls Who Code</li><li>Teaching a language vs teaching a tool</li><li>What a dev advocate does</li><li>Accessibility (A11y) testing</li><li>Playwright</li><li>axe-core</li><li>Snapshot testing</li><li>pytest plugin authoring</li><li>Flask SQLAlchemy</li><li>Relearning Go</li></ul><p>Links from the show:</p><ul><li><a href="https://pythonbytes.fm/episodes/show/323/ai-search-wars-have-begun">Python Bytes 323 with Pamela: AI search wars have begun</a></li><li><a href="https://podcast.pythontest.com/episodes/199-is-azure-right-for-a-side-project">Python Test 199 with Pamela: Is Azure Right for a Side Project?</a></li><li><a href="https://girldevelopit.com">gdi: Girl Develop It</a></li><li><a href="https://djangogirls.org/en/">Django Girls</a></li><li><a href="https://girlswhocode.com">Girls Who Code</a></li><li><a href="https://www.youtube.com/watch?v=J-4Qa6PSomM">"Automated accessibility audits" - Pamela Fox (North Bay Python 2023)</a></li><li><a href="https://playwright.dev">Playwright</a></li><li><a href="https://github.com/dequelabs/axe-core">axe-core</a></li><li><a href="https://github.com/pamelafox/pytest-axe-playwright-snapshot">pytest-axe-playwright-snapshot</a>, plugin from Pamela</li><li><a href="https://www.youtube.com/watch?v=kevcz8NRcQU">pytest-crayons plugin is from a PyCascades talk about building plugins</a></li><li><a href="https://github.com/okken/pytest-check">pytest-check</a>, yet another plugin</li><li><a href="https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/">FlaskSQLAlchemy</a></li><li><a href="https://www.youtube.com/watch?v=oV9rvDllKEg">Concurrency is not Parallelism by Rob Pike</a></li></ul><p><br></p> <br><p><strong>The Complete pytest Course</strong></p><ul><li>Level up your testing skills and save time during coding and maintenance.</li><li>Check out <a href="https://courses.pythontest.com/p/complete-pytest-course">courses.pythontest.com</a></li></ul> <strong> <a href="https://www.patreon.com/PythonPeople" rel="payment" title="★ Support this podcast on Patreon ★">★ Support this podcast on Patreon ★</a> </strong>
Categories: FLOSS Project Planets

Python Bytes: #367 A New Cloud Computing Paradigm at Python Bytes

Tue, 2024-01-16 03:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://world.hey.com/dhh/we-have-left-the-cloud-251760fb"><strong>Leaving the cloud</strong></a></li> <li><a href="https://peps.python.org/pep-0723/"><strong>PEP 723 - Inline script metadata</strong></a></li> <li><a href="https://flet.dev/blog/flet-for-android"><strong>Flet for Android</strong></a></li> <li><a href="https://github.com/tconbeer/harlequin"><strong>harlequin: The SQL IDE for Your Terminal.</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=qjl95MJwW1A' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="367">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by <strong>Bright Data</strong> : <a href="https://pythonbytes.fm/brightdata"><strong>pythonbytes.fm/brightdata</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.</p> <p><strong>Michael #1:</strong> <a href="https://world.hey.com/dhh/we-have-left-the-cloud-251760fb"><strong>Leaving the cloud</strong></a></p> <ul> <li>Also see <a href="https://world.hey.com/dhh/five-values-guiding-our-cloud-exit-638add47">Five values guiding our cloud exit</a> <ul> <li>We value independence above all else.</li> <li>We serve the internet. </li> <li>We spend our money wisely. </li> <li>We lead the way. </li> <li>We seek adventure.</li> </ul></li> <li>And <a href="https://world.hey.com/dhh/we-stand-to-save-7m-over-five-years-from-our-cloud-exit-53996caa">We stand to save $7m over five years from our cloud exit</a></li> <li>Slice our new monster 192-thread Dell R7625s into isolated VMs</li> <li>Which added a combined <a href="https://world.hey.com/dhh/the-hardware-we-need-for-our-cloud-exit-has-arrived-99d66966">4,000 vCPUs with 7,680 GB of RAM and 384TB of NVMe storage</a> to our server capacity</li> <li>They <a href="https://kamal-deploy.org">created Kamal</a> — Deploy web apps anywhere</li> <li>A lot of these ideas have changed how I run the infrastructure at Talk Python and for Python Bytes. </li> </ul> <p><strong>Brian #2:</strong> <a href="https://peps.python.org/pep-0723/"><strong>PEP 723 - Inline script metadata</strong></a></p> <ul> <li>Author: Ofek Lev</li> <li>This PEP specifies a metadata format that can be embedded in single-file Python scripts to assist launchers, IDEs and other external tools which may need to interact with such scripts.</li> <li>Example: <pre><code> # /// script # requires-python = "&gt;=3.11" # dependencies = [ # "requests&amp;lt;3", # "rich", # ] # /// import requests from rich.pretty import pprint resp = requests.get("https://peps.python.org/api/peps.json") data = resp.json() pprint([(k, v["title"]) for k, v in data.items()][:10]) </code></pre></li> </ul> <p><strong>Michael #3:</strong> <a href="https://flet.dev/blog/flet-for-android"><strong>Flet for Android</strong></a></p> <ul> <li>via Balázs</li> <li><a href="https://talkpython.fm/episodes/show/378/flet-flutter-apps-in-python">Remember Flet</a>?</li> <li>Here’s a <a href="https://flet.dev/docs/guides/python/drag-and-drop">code sample</a> (scroll down a bit).</li> <li>It’s amazing but has been basically impossible to deploy. </li> <li>Now we have Android.</li> <li>Here’s a good <a href="https://www.youtube.com/watch?v=Hj09tFCdjSw">YouTube video</a> showing the build process for APKs.</li> </ul> <p><strong>Brian #4:</strong> <a href="https://github.com/tconbeer/harlequin"><strong>harlequin: The SQL IDE for Your Terminal.</strong></a></p> <ul> <li>Ted Conbeer &amp; other contributors</li> <li>Works with DuckDB and SQLite</li> <li>Speaking of SQLite <ul> <li><a href="https://mastodon.social/@webology/111766195410833730">Jeff Triplett and warnings of using Docker and SQLite in production</a></li> <li><a href="https://blog.pecar.me/">Anže’s post</a></li> <li>and and article: <a href="https://blog.pecar.me/django-sqlite-dblock">Django, SQLite, and the Database is Locked Error</a></li> </ul></li> </ul> <p><strong>Extras</strong> </p> <p><strong>Brian</strong>:</p> <ul> <li>Recent <a href="https://pythonpeople.fm">Python People</a> episodes <ul> <li>Will Vincent</li> <li>Julian Sequeira</li> <li>Pamela Fox</li> </ul></li> </ul> <p><strong>Michael</strong>:</p> <ul> <li>PageFind and <a href="https://fosstodon.org/@mkennedy/111637520985150159">how I’m using it</a></li> <li>When "<a href="https://socket.dev/blog/when-everything-becomes-too-much?utm_source=tldrnewsletter">Everything" Becomes Too Much</a>: The npm Package Chaos of 2024</li> <li>Essay: <a href="https://mkennedy.codes/posts/michael-kennedys-unsolicited-advice-for-mozilla-and-firefox/">Unsolicited Advice for Mozilla and Firefox</a></li> <li><a href="https://fosstodon.org/@matthewfeickert/111763520503201675">SciPy 2024 is coming to Washington</a> </li> </ul> <p><strong>Joke:</strong> Careful with that <a href="https://trello.com/1/cards/655ef44fcc1657159ad4102c/attachments/655ef452b9b27b86253285c2/download/1700711828998blob.jpg">bike lock combination code</a></p>
Categories: FLOSS Project Planets

Seth Michael Larson: urllib3 is fundraising for HTTP/2 support

Mon, 2024-01-15 19:00
urllib3 is fundraising for HTTP/2 support AboutBlogNewsletterLinks urllib3 is fundraising for HTTP/2 support

Published 2024-01-16 by Seth Larson
Reading time: minutes

TLDR: urllib3 is raising ~$40,000 USD to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023.

What is urllib3?

urllib3 is an HTTP client library for Python and is depended on by widely used projects like pip, Requests, major cloud and service provider SDKs, and more. urllib3 is one of the most used Python packages overall, installed over 4 billion times in 2023 with 1.5 million dependent repos on GitHub, up 50% from just last year.

Project update

2023 was a transformative year for urllib3, headlined by the first stable release of v2.0 after multiple years of development by our maintainers and community. This major release is only the beginning of our plans to overhaul the library’s capabilities by removing constraints on our HTTP implementation while preserving backwards compatibility.

We’ve been able to accomplish this incredible work in 2023 thanks to financial support from Tidelift, the Spotify 2022 FOSS Fund, and our other sponsors which allowed us to offer bounties on tasks to fairly compensate maintainers and contributors for their time investments with the project.

Unfortunately, compared to past years we’ve experienced a sharp drop in financial support from non-Tidelift sources heading into 2024.

Year Non-Tidelift Funding 2019 $18,580 2020 $100* 2021 $9,950 2022 $14,493 2023 $2,330

* December 2020 was the first time we offered ad-hoc financial support via GitHub Sponsors. Before this we only accepted grants for funding.

Our team has worked hard to set the stage for HTTP/2 support with urllib3 v2.0, and we plan to land HTTP/2 support without compromising on the sustainability of the project. Backwards-compatible HTTP/2 support in urllib3 would immediately benefit millions of users, among them the largest companies in the world, and requires adding more long-term maintenance burden to maintainers. This important work and its maintenance should not be uncompensated.

To ensure timely and sustainable development of HTTP/2 for urllib3 we're launching a fundraiser with a goal of raising our Open Collective balance to $50,000 USD. HTTP/2 support has just started being developed and we're hoping to release stable support once our fundraising goal has been reached. Donations to Open Collective directly or to platforms like GitHub Sponsors or Thanks.dev will all be counted towards this fundraising goal.

Our team has a long track record of using our financial resources to complete larger projects like secure URL parsing, TLS 1.3, modernizing our test suite framework, and finding security issues across multiple projects. All receipts are published publicly on our Open Collective with links to the work items being accomplished and blogged about by our maintainers. If you or your organization has questions about this fundraiser please email sethmichaellarson@gmail.com or ask in our community Discord.

There’s more information below about the work we’ve done so far for HTTP/2 support and what else we plan to do in 2024 during our fundraiser. Thanks for supporting open source software!

Funding update

urllib3 received $17,830 US dollars in financial support in 2023 from all sources and distributed $24,350 to contributors and maintainers. Our primary supporter continues to be Tidelift, who provided $15,500 to core maintainers Seth, Quentin, and Illia.

We distributed $1,800 to community contributors through our bounty program, less than last year but still a sizable amount. We are looking to leverage our bounty program more in 2024 to implement HTTP/2 and WebAssembly features.

Our Open Collective started the year with nearly $19,000 USD and ended the year with $12,179. This statistic clearly shows the gap in funding, comparing this year's fundraising of $2,330 to the average across 4 prior years of over $10,000 per year.

2022 OC Balance → Open Collective: $18,932 Tidelift → Tidelift Lifters: $15,500 Open Collective → 2023 OC Balance: $12,179 Tidelift → Tidelift Partnerships*: $12,000 Tidelift Partnerships* → Seth Larson: $12,000 Tidelift Lifters → Seth Larson: $6,904 Tidelift Lifters → Quentin Pradet: $6,603 Open Collective → Illia Volochii: $3,275 Open Collective → Quentin Pradet: $2,325 Tidelift Lifters → Illia Volochii: $1,993 Open Collective → Bounty Program: $1,800 Open Collective → Seth Larson: $1,450 GitHub Sponsors → Open Collective: $1,346 Sourcegraph → Open Collective: $600 Thanks.dev → Open Collective: $379 Open Collective → OSC Host Fees: $233 Donations → Open Collective: $5 Tidelift: $27,500 Tidelift Partnerships*: $12,000 Seth Larson: $20,354 Tidelift Lifters: $15,500 Quentin Pradet: $8,928 Illia Volochii: $5,268 2022 OC Balance: $18,932 Open Collective: $21,262 GitHub Sponsors: $1,346 Sourcegraph: $600 Thanks.dev: $379 Donations: $5 Bounty Program: $1,800 2023 OC Balance: $12,179 OSC Host Fees: $233 Tidelift$27,500 Tidelift Partnerships*$12,000 Seth Larson$20,354 Tidelift Lifters$15,500 Quentin Pradet$8,928 Illia Volochii$5,268 2022 OC Balance$18,932 Open Collective$21,262 GitHub Sponsors$1,346 Sourcegraph$600 Thanks.dev$379 Donations$5 Bounty Program$1,800 2023 OC Balance$12,179 OSC Host Fees$233

* Seth Larson was also paid $7,000 by Tidelift for a packaging security standards project and $5,000 as a part of their "lifter advocate" program. Neither of these projects are directly related to urllib3 but are listed for completeness.

Maintenance update

2023 marks the 15th anniversary of urllib3 being first published to PyPI! 🥳 Not many open source projects stand the test of time and continue to see the widespread usage that urllib3 does every day. We attribute our longevity to quickly elevating contributors from our community into project maintainers which we believe is a critical property of a sustainable open source project. Financial rewards through our bounty program is a crucial piece of our approach to staying sustainable for the long-term.

This year we welcomed a new core maintainer to our team, Illia Volochii! 🎉 Illia has been putting in high quality and consistent work to get v2.0 out the door. Illia started contributing to urllib3 in 2022 and after landing multiple high-quality pull requests was asked to join the team of collaborators and begin reviewing PRs and issues and helping with the release process.

After adding Illia we now have three core maintainers including Seth Larson and Quentin Pradet, in addition to multiple collaborators and community contributors.

We landed 160 commits from 13 unique contributors during 2023 which is up from ~130 commits during 2022. We published 16 releases to PyPI in 2023, up from 8 in 2022.

From a security perspective, we continue to lead the pack for Python packages in terms of implementing security standards. urllib3 is the highest rated project according to OpenSSF Scorecard with a score of 9.6 out of 10 overall. We also were an early adopter of Trusted Publishers, adopting the new feature days after they were announced during PyCon US 2023.

We remediated two moderate-severity vulnerabilities in 2023 and made the fixes available in both the new v2.0 and security-fix only v1.26.x release streams. Support for the previous major version of urllib3 is provided thanks to funding from Tidelift.

Support for HTTP/2

When you first read this post you might have thought:

“Hasn't HTTP/2 been around for a long time?” 🤔

And you'd be right! HTTP/2 was published in 2015 in RFC 7540 and is now used for the majority of web requests. HTTP/2 and has been around for so long that there's an already HTTP/3!

So why are we only just now starting to add support for HTTP/2 to urllib3? The reason is that the standard library module http.client only supports HTTP/1 and before urllib3 v2.0 was released urllib3 was strongly tied to http.client APIs. By breaking backwards compatibility in a few key ways (while maintaining compatibility where it matters for most users) we've been able to set the stage for adding HTTP/2 to urllib3! 🚀

urllib3 is in good company: many of Python's stable HTTP clients don't support HTTP/2 like Requests (which uses urllib3 under the hood), aiohttp, and httplib2.

Even though we're waiting to release HTTP/2 support until after our fundraiser concludes, we aren't waiting to get started. Our team has already started some of the required prep-work to implement HTTP/2. Want to follow along? We have a top-level tracking issue for HTTP/2 support on GitHub.

Over the past two months Quentin has migrated our test suite from the venerable Tornado web backend to using the Hypercorn server and Quart microframework. Our test application communicates with the server using ASGI, which is perfect for our use-case: low-level enough to satisfy the needs of the test suite and high-level enough to abstract the differences between HTTP/1 and HTTP/2. Now that the test suite runs with both HTTP/1 and HTTP/2, we can start developing HTTP/2 with an extensive initial battery of test cases.

Support for Webassembly and Emscripten

When PyScript was first announced at PyCon US 2022 during a keynote by Peter Wang, Seth was sitting front row to witness Python moving to the web. Later that same day in the PyScript open space there were experiments for making HTTP requests with urllib3 and Pyodide together using a synchronous call to the JavaScript fetch() API. At the time, despite having assistance from PyScript maintainers, there didn't seem to be a way forwards yet.

Fast-forward to today, the pyodide-http project has figured out how to make a synchronous or streaming HTTP exchange using the fetch() and XMLHttpRequest JavaScript APIs along with Web Workers. Now that a synchronous approach to HTTP requests was possible we could add support to urllib3!

Thanks to Joe Marshall, urllib3 now has experimental support for the Emscripten platform, complete with bundling a small JavaScript stub for Web Worker support and testing against Chrome and Firefox in our CI. What's next is to thoroughly test and document the feature. We're aiming to release stable Emscripten support for urllib3 in 2024.

The most exciting part of this is that once a core dependency like urllib3 has been made compatible with Emscripten we'll likely see a wave of other packages that immediately become compatible too, bringing even more of the Python package ecosystem to the web 🥳

Stable release of urllib3 v2.0

urllib3 had its first stable release of v2.0 in April 2023 and later the v2.1.0 release to remove many long-deprecated features like the [secure] extra which had become redundant with new improvements to the ssl standard library module and the urllib3.contrib.securetransport module which was needed on macOS due to unavailability of an OpenSSL library on the platform to perform HTTPS with PyPI.

This release also put the project in a good place for future improvements like those discussed above. The biggest blocker to adopting new HTTP implementations were vestigial APIs from urllib3 primarily subclassing the standard libraries http.client (or for Python 2: httplib) modules.

By removing and discouraging these implicit APIs we're better able to adopt alternate HTTP implementations such as the h2 library for HTTP/2 and JavaScript's fetch API for Emscripten.

Increasing adoption of urllib3 v2.x

The initial adoption of urllib3 v2.x was lower than expected, due to the following factors:

  • By default, RedHat Enterprise Linux 7 (RHEL 7), AWS Lambda, Amazon Linux 2 and Read the Docs were all compiling the ssl module with OpenSSL 1.0.2. While botocore still pinned urllib3 to 1.26.x, Amazon Linux 2 was more popular than we expected and many users were not pinning or resolving their dependencies correctly and thus were receiving an incompatible version of urllib3.
  • Various third-party packages like dockerpy, request-toolbelt and vcrpy were relying on implementation details of urllib3 that were deprecated or removed in v2.0 so couldn’t upgrade right away.
  • And finally, we intentionally removed the strict parameter from HTTPResponse which had no effect since Python 3. This affected only a few users.

After a few weeks, we had around 3 millions daily downloads for v2.0. That's a lot of downloads, but only accounted for 30% of 1.26.x downloads at the time without any obvious upward trend. The only exception was Read the Docs that encouraged users to move to Ubuntu 22.04 and Python 3.11 shortly after the urllib3 2.0 release. To avoid a prolonged split in the ecosystem, we took various actions to help migrating to 2.x:

Our friend and Requests maintainer, Nate Prewitt allowed urllib3 v2.0 for Python 3.10+ users of botocore. This work on Requests inspired snowflake-connector-python to follow suit.

Today, most popular libraries support urllib3 2.0 and later, at least with Python 3.10 and above. And the libraries that don't support it yet get requests from users. urllib3 2.x is reliably above 70% of 1.26.x downloads and growing. Additionally, Python 3.10+ users already download 2.x more than 1.26.x, making us confident that the ecosystem split will eventually disappear in favor of the newest major version of urllib3.

👋 That's all for now, if you want to discuss this article you can join our community Discord. Please share this article to help spread the word of our fundraiser and coming HTTP/2 support.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0

Categories: FLOSS Project Planets

Chris Warrick: Python Packaging, One Year Later: A Look Back at 2023 in Python Packaging

Mon, 2024-01-15 13:50

A year ago, I wrote about the sad state of Python packaging. The large number of tools in the space, the emphasis on writing vague standards instead of rallying around the One True Tool, and the complicated venv-based ecosystem instead of a solution similar to node_modules. What has changed in the past year? Has anything improved, is everything the same, or are things worse than they were before?

The tools

The original post listed a bunch of packaging tools, calling fourteen tools at least twelve too many. My idea with that was that most people would be happy with one tool that does everything, but the scientific-Python folks might have special requirements that would work best as a second tool.

Out of the tools named in last year’s post, all of them still seem to be maintained. Except for Flit (zero new commits in the past 30 days) and virtualenv (only automated and semi-automated version bumps), the tools have recent commits, pull requests, and issues.

All of those tools are still in use. Françoise Conil analysed all PyPI packages and checked their PEP 517 build backends: setuptools is the most popular (at 50k packages), Poetry is second at 41k, Hathling is third at 8.1k. Other tools to cross 500 users include Flit (4.4k), PDM (1.3k), Maturin (1.3k, build backend for Rust-based packages).

There are some new tools, of course. Those that crossed my radar are Posy and Rye. Posy is a project of Nathaniel J. Smith (of trio fame), Rye is a project of Armin Ronacher (of Flask fame). The vision for both of them is to manage Python interpreters and projects, but not have a custom build backend (instead using something like hatchling). Posy is built on top of PyBI (a format for distributing binaries of Python interpreters, proposed by Smith in draft PEP 711), Rye uses Gregory Szorc’s pre-built Pythons. Rye seems to be fairly complete and usable, Posy is right now a PoC of the PyBI format, and only offers a REPL with pre-installed packages.

Both Posy and Rye are written in Rust. On the one hand, it makes sense that the part that manages Python interpreters is not written in Python, because that would require a separate Python, not managed by Posy/Rye, to run those tools. But Rye also has its own pyproject.toml parser in Rust, and many of its commands are implemented mostly or largely using Rust (sometimes also calling one-off Python scripts; although the main tasks of creating venvs, installing packages, and working with lockfiles are handed off to venv, pip, and pip-tools respectively).

Speaking of Rust and Python, there’s been another project in that vein that has grown a lot (and gathered a lot of funding) in the past year. That project is Ruff, which is a linter and code formatter. Ruff formats Python code, and is written in Rust. This means it’s 10–100× faster than existing tools written in Python (according to Ruff’s own benchmarks). Fast is good, I guess, but what does this say about Python? Is the fact that package tools (which aren’t rocket science, maybe except for fast dependency solvers, and which often need access to Python internals to do their job) and code formatters (which require a deep understanding of Python syntax, and parsing Python sources to ASTs, something easy by the ast Python module) are written in another language? Does this trend make Python a toy language (as it is also often considered a glue language for NumPy and friends)? Also, why should contributing to a tool important to many Python developers require learning Rust?

The standards

Last time we looked at packaging standards, we focused on PEP 582. It proposed the introduction of __pypackages__, which would be a place for third-party packages to be installed to locally, on a per-project basis, without involving virtual environments, similarly to what node_modules is for node. The PEP was ultimately rejected in March 2023. The PEP wasn’t perfect, and some of its choices were questionable or insufficient (such as not recursively searching for __pypackages__ in parent directories, or focusing on simple use-cases only). No new standards for something in that vein (with a better design) were proposed to this day.

Another contentious topic is lock files. Lock files for packaging systems are useful for reproducible dependency installations. The lock file records all installed packages (i.e. includes transitive dependencies) and their versions. Lock files often include checksums (like sha512) of the installed packages, and they often support telling apart packages installed via different groups of dependencies (runtime, buildtime, optional, development, etc.).

The classic way of achieving this goal are requirements.txt files. They are specific to pip, and they only contain a list of packages, versions, and possibly checksums. Those files can be generated by pip freeze, or the third-party pip-compile from pip-tools. pip freeze is very basic, pip-compile can’t handle different groups of dependencies other than making multiple requirements.in files, compiling them, and hoping there are no conflicts.

Pipenv, Poetry, and PDM have their own lockfile implementations, incompatible with one another. Rye piggybacks on top of pip-tools. Hatch doesn’t have anything in core; they’re waiting for a standard implementation (there are some plugins though). PEP 665 was rejected in January 2022. Its author, Brett Cannon, is working on a PoC of something that might become a standard (named mousebender).

This is the danger of the working model adopted by the Python packaging world. Even for something as simple as lock files, there are at least four incompatible standards. An attempt at a specification was rejected due to “lukewarm reception”, even though there exist at least four implementations which are achieving roughly the same goals, and other ecosystems also went through this before.

Another thing important to Python are extension modules. Extension modules are written in C, and they are usually used to interact with libraries written in other languages (and also sometimes for performance). Poetry, PDM, and Hatchling don’t really support building extension modules. Setuptools does; SciPy and NumPy migrated from their custom numpy.distutils to Meson. The team behind the PyO3 Rust bindings for Python develops Maturin, which allows for building Rust-based extension modules — but it’s not useful if you’re working with C.

There weren’t many packaging-related standards that were accepted in 2023. A standard worth mentioning is PEP 668, which allows distributors to prevent pip from working (to avoid breaking distro-owned site packages) by adding an EXTERNALLY-MANAGED file. It was accepted in June 2022, but pip only implemented support for it in January 2023, and many distros already have enabled this feature in 2023. Preventing broken systems is a good thing.

But some standards did make it through. Minor and small ones aside, the most prominent 2023 standard would be PEP 723: inline script metadata. It allows to add a comment block at the top of the file, that specifies the dependencies and the minimum Python version in a way that can be consumed by tools. Is it super useful? I don’t think so; setting up a project with pyproject.toml would easily allow things to grow. If you’re sending something via a GitHub gist, just make a repo. If you’re sending something by e-mail, just tar the folder. That approach promotes messy programming without source control.

Learning curves and the deception of “simple”

Microsoft Word is simple, and a great beginner’s writing tool. You can make text bold with a single click. You can also make it blue in two clicks. But it’s easy to make an inconsistent mess. To make section headers, many users may just make the text bold and a bit bigger, without any consistency or semantics [1]. Making a consistent document with semantic formatting is hard in Word. Adding section numbering requires you to select a heading and turn it into a list. There’s also supposedly some magic involved, that magic doesn’t work for me, and I have to tell Word to update the heading style. Even if you try doing things nicely, Word will randomly break, mess up the styles, mix up styles and inline ad-hoc formatting, and your document may look differently on different computers.

LaTeX is very confusing to a beginner, and has a massive learning curve. And you can certainly write \textbf{hello} everywhere. But with some learning, you’ll be producing beautiful documents. You’ll define a \code{} command that makes code monospace and adds a border today, but it might change the background and typeset in Comic Sans tomorrow if you so desire. You’ll use packages that can render code from external files with syntax highlighting. Heading numbering is on by default, but it can easily be disabled for a section. LaTeX can also automatically put new sections on new pages, for example. LaTeX was built for scientific publishing, so it has stellar support for maths and bibliographies, among other things.

Let’s now talk about programming. Python is simple, and a great beginner’s programming language. You can write hello world in a single line of code. The syntax is simpler, there are no confusing leftovers from C (like the index-based for loop) or machine-level code (like break in switch), no pointers in sight. You also don’t need to write classes at all; you don’t need to write a class only to put a public static void main(String[] args) method there [2]. You don’t need an IDE, you can just write code using any editor (even notepad.exe will do for the first day or so), you can save it as a .py file and run it using python whatever.py.

Your code got more complicated? No worry, you can split it into multiple .py files, use import name_of_other_file_without_py and it will just work. Do you need more structure, grouping into folders perhaps? Well, forget about python whatever.py, you must use python -m whatever, and you must cd to where your code is, or mess with PYTHONPATH, or install your thing with pip. This simple yet common action (grouping things into folders) has massively increased complexity.

The standard library is not enough [3] and you need a third-party dependency? You find some tutorial that tells you to pip install, but pip will now tell you to use apt. And apt may work, but it may give you an ancient version that does not match the tutorial you’re reading. Or it may not have the package. Or the Internet will tell you not to use Python packages from apt. So now you need to learn about venvs (which add more complexity, more things to remember; most tutorials teach activation, venvs are easy to mess up via basic operations like renaming a folder, and you may end up with a venv in git or your code in a venv). Or you need to pick one of the many one-stop-shop tools to manage things.

In other ecosystems, an IDE is often a necessity, even for beginners. The IDE will force you into a project system (maybe not the best or most common one by default, but it will still be a coherent project system). Java will force you to make more than one file with the “1 public class = 1 file” rule, and it will be easy to do so, you won’t even need an import.

Do you want folders? In Java or C#, you just create a folder in the IDE, and create a class there. The new file may have a different package/namespace, but the IDE will help you to add the correct import/using to the codebase, and there is no risk of you using too many directories (including something like src) or using too few (not making a top-level package for all your code) that will require correcting all imports. The disruption from adding a folder in Java or C# is minimal.

The project system will also handle third-party packages without you needing to think about where they’re downloaded or what a virtual environment is and how to activate it from different contexts. A few clicks and you’re done. And if you don’t like IDEs? Living in the CLI is certainly possible in many ecosystems, they have reasonable CLI tools for common management tasks, as well as building and running your project.

PEP 723 solves a very niche problem: dependency management for single-file programs. Improving life for one-off things and messy code was apparently more important to the packaging community than any other improvements for big projects.

By the way, you could adapt this lesson to static and dynamic typing. Dynamic typing is easier to get started with and requires less typing, but compile-type checking can prevent many bugs — bugs that require higher test coverage to catch with dynamic typing. That’s why the JS world has TypeScript, that’s why mypy/pyright/typing has gained a lot of mindshare in the Python world.

The future…

Looking at the Python Packaging Discourse, there were some discussions about ways to improve things.

For example, this discussion about porting off setup.py was started by Gregory Szorc, who had a long list of complaints, pointing out the issues with the communication from the packaging world, and documentation mess (his post is worth a read, or at least a skim, because it’s long and full of packaging failures). There’s one page which recommends setuptools, another which has four options with Hatchling as a default, and another still promoting Pipenv. We’ve seen this a year ago, nothing changed in that regard. Some people tried finding solutions, some people shared their opinions… and then the Discourse moderator decided to protect his PyPA friends from having to read user feedback and locked the thread.

Many other threads about visions were had, like the one about 10-year views or about singular packaging tools. The strategy discussions, based on the user survey, had a second part (the first one concluded in January 2023), but it saw less posts than the first one, and discussions did not continue (and there were discussions about how to hold the discussions). There are plans to create a packaging council — design-by-committee at its finest.

But all those discussions, even when not locked by an overzealous moderator, haven’t had any meaningful effect. The packaging ecosystem is still severely fragmented and confusing. The PyPA docs and tutorials still contradict each other. The PyPA-affiliated tools still have less features than the unaffiliated competition (even the upstart Rye has some form of lockfiles, unlike Hatch or Flit), and going by the PEP 517 build backend usage statistics, they are more popular than the modern PyPA tools. The authors of similar yet competing tools have not joined forces to produce the One True Packaging Tool.

…is looking pretty bleak

On the other hand, if you look at the 2023 contribution graphs for most packaging tools, you might be worried about the state of the packaging ecosystem.

  • Pip has had a healthy mix of contributors and a lot of commits going into it.

  • Pipenv and setuptools have two lead committers, but still a healthy amount of commits.

  • Hatch, however, is a one-man-show: Ofek Lev (the project founder) made 184 commits, the second place belongs to Dependabot with 6 commits, and the third-place contributor (who is a human) has five commits. The bus factor of Hatch and Hatchling is 1.

The non-PyPA tools aren’t doing much better:

  • Poetry has two top contributors, but at least there are four human contributors with a double-digit number of commits.

  • PDM is a one-man-show, like Hatch.

  • Rye has one main contributor, and three with a double-digit number of commits; note it’s pretty new (started in late April 2023) and it’s not as popular as the others.

Conclusion

I understand the PyPA is a loose association of volunteers. It is sometimes said the name Python Packaging Authority was originally a joke. However, they are also the group that maintains all the packaging standards, so they are the authority when it comes to packaging. For example, PEP 668 starts with a warning block saying it’s a historical document, and the up-to-date version of the specification is on PyPA’s site (as well as a bunch of other packaging specs).

The PyPA should shut down or merge some duplicate projects, and work with the community (including maintainers of non-PyPA projects) to build One True Packaging Tool. To make things easier. To avoid writing code that does largely the same thing 5 times. To make sure thousands of projects don’t depend on tools with a bus factor of 1 or 2. To turn packaging from a problem and an insurmountable obstacle to something that just works™, something that an average developer doesn’t need to think about.

It’s not rocket science. Tons of languages, big and small, have a coherent packaging ecosystem (just read last year’s post for some examples of how simple it can be). Instead of focusing on specifications and governance, focus on producing one comprehensive, usable, user-friendly tool.

Discuss below or on Hacker News.

Footnotes [1]

Modern Word at least makes this easier, because the heading styles get top billing on the ribbon; they were hidden behind a completely non-obvious combo box that said Normal in Word 2003 and older.

[2]

C# 10 removed the requirement to make a class with a Main method, it can pick up one file with top-level statements and make it the entrypoint.

[3]

The Python standard library gets a lot of praise. It is large compared to C, but nothing special compared to Java or C#. It is also full of low-quality libraries, like http.server or urllib.request, yet some people insist on only using the standard library. The standard library is also less stable and dependable (with constant deprecations and removals, and with new features requiring upgrading all of Python). All the “serious” use-cases, like web development or ML/AI/data science are impossible with just the standard library.

Categories: FLOSS Project Planets

TechBeamers Python: Create a Full-fledged LangChain App – A ChatBot

Mon, 2024-01-15 13:37

In this tutorial, we have provided the basic code to create the LangChain chatbot app. You’ll find a comprehensive example, instructions, and guidance to help you. Also Read: Introduction to LangChain – Use With Python LangChain ChatBot App Here’s a detailed example of a chatbot that generates poems based on user-provided prompts. Firstly, let’s try […]

The post Create a Full-fledged LangChain App – A ChatBot appeared first on TechBeamers.

Categories: FLOSS Project Planets

Django Weblog: DjangoCon Europe 2025 Call for Proposals

Mon, 2024-01-15 11:14

DjangoCon Europe 2024 will be held June 5th-9th in Vigo, Spain but we're already looking ahead to the 2025 conference. Could your town - or your football stadium, circus tent, private island or city hall - host this wonderful community event?

Hosting a DjangoCon is an ambitious undertaking. It's hard work, but each year it has been successfully run by a team of community volunteers, not all of whom have had previous experience - more important is enthusiasm, organizational skills, the ability to plan and manage budgets, time and people - and plenty of time to invest in the project.

How to apply

We've set up a working group of previous DjangoCon Europe organizers that you can reach out to with questions about organizing and running a DjangoCon Europe. european-organizers-support@djangoproject.com. There will also be an informational session set up towards the end of January or early February for interested organizers. Please email the working group to express interest in participating.

In order to give people the chance to go to many different conferences DjangoCon Europe should be held between January 5 and April 15 2025. Please read the licensing agreement the selected organizers will need to sign for the specific requirements around hosting a DjangoCon Europe

If you're interested, we'd love to hear from you. This year we are going to do rolling reviews of applications, in order to hopefully give more time and certainty to the selected proposal to start planning. The board will begin evaluating proposals on February 20th. The selection will be made at any time between February 20th and May 31st. The DSF Board will communicate when a selection has been made and the application process is complete. IF you are interested in organizing it is in your best interest to get a good proposal in early.

Following the established tradition, the selected hosts will be publicly announced at this year's DjangoCon Europe by the current organizers.

The more detailed and complete your proposal, the better. Things you should consider, and that we'd like to know about, are:

  • dates Ideally between early January and mid April 2025
  • numbers of attendees
  • venue(s)
  • accommodation
  • transport links
  • budgets and ticket prices
  • committee members

We'd like to see:

  • timelines
  • pictures
  • prices
  • draft agreements with providers
  • alternatives you have considered

Email you proposals to djangocon-europe-2025-proposals at djangoproject dot com. We look forward to reviewing great proposals that continue the excellence the whole community associates with DjangoCon Europe.

Categories: FLOSS Project Planets

Real Python: Inheritance and Composition: A Python OOP Guide

Mon, 2024-01-15 09:00

In this tutorial, you’ll explore inheritance and composition in Python. Inheritance and composition are two important concepts in object-oriented programming that model the relationship between two classes. They’re the building blocks of object-oriented design, and they help programmers to write reusable code.

By the end of this tutorial, you’ll know how to:

  • Use inheritance in Python
  • Model class hierarchies using inheritance
  • Use multiple inheritance in Python and understand its drawbacks
  • Use composition to create complex objects
  • Reuse existing code by applying composition
  • Change application behavior at runtime through composition

Get Your Code: Click here to get the free sample code that shows you how to use inheritance and composition in Python.

What Are Inheritance and Composition?

Inheritance and composition are two major concepts in object-oriented programming that model the relationship between two classes. They drive the design of an application and determine how the application should evolve as new features are added or requirements change.

Both of them enable code reuse, but they do it in different ways.

What’s Inheritance?

Inheritance models what’s called an is a relationship. This means that when you have a Derived class that inherits from a Base class, you’ve created a relationship where Derived is a specialized version of Base.

Inheritance is represented using the Unified Modeling Language, or UML, in the following way:

This model represents classes as boxes with the class name on top. It represents the inheritance relationship with an arrow from the derived class pointing to the base class. The word extends is usually added to the arrow.

Note: In an inheritance relationship:

  • Classes that inherit from another are called derived classes, subclasses, or subtypes.
  • Classes from which other classes are derived are called base classes or super classes.
  • A derived class is said to derive, inherit, or extend a base class.

Say you have the base class Animal, and you derive from it to create a Horse class. The inheritance relationship states that Horse is an Animal. This means that Horse inherits the interface and implementation of Animal, and you can use Horse objects to replace Animal objects in the application.

This is known as the Liskov substitution principle. The principle states that if S is a subtype of T, then replacing objects of type T with objects of type S doesn’t change the program’s behavior.

You’ll see in this tutorial why you should always follow the Liskov substitution principle when creating your class hierarchies, and you’ll learn about the problems that you’ll run into if you don’t.

What’s Composition?

Composition is a concept that models a has a relationship. It enables creating complex types by combining objects of other types. This means that a class Composite can contain an object of another class Component. This relationship means that a Composite has a Component.

UML represents composition as follows:

The model represents composition through a line that starts with a diamond at the composite class and points to the component class. The composite side can express the cardinality of the relationship. The cardinality indicates the number or the valid range of Component instances that the Composite class will contain.

In the diagram above, the 1 represents that the Composite class contains one object of type Component. You can express cardinality in the following ways:

  • A number indicates the number of Component instances that Composite contains.
  • The * symbol indicates that the Composite class can contain a variable number of Component instances.
  • A range 1..4 indicates that the Composite class can contain a range of Component instances. You indicate the range with the minimum and maximum number of instances, or minimum and many instances like in 1..*.

Note: Classes that contain objects of other classes are usually referred to as composites, while classes that are used to create more complex types are referred to as components.

For example, your Horse class can be composed by another object of type Tail. Composition allows you to express that relationship by saying Horse has a Tail.

Read the full article at https://realpython.com/inheritance-composition-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyPy: PyPy v7.3.15 release

Mon, 2024-01-15 07:22
PyPy v7.3.15: release of python 2.7, 3.9, and 3.10

The PyPy team is proud to release version 7.3.15 of PyPy.

This is primarily a bug-fix release, and includes work done to migrate PyPy to Git and Github.

The release includes three different interpreters:

  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the + is for backported security updates)

  • PyPy3.9, which is an interpreter supporting the syntax and the features of Python 3.9, including the stdlib for CPython 3.9.18.

  • PyPy3.10, which is an interpreter supporting the syntax and the features of Python 3.10, including the stdlib for CPython 3.10.13.

The interpreters are based on much the same codebase, thus the multiple release. This is a micro release, all APIs are compatible with the other 7.3 releases. It follows after 7.3.14 release on Dec 25, 2023

We recommend updating. You can find links to download the v7.3.15 releases here:

https://pypy.org/download.html

We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our blog via a pull request to https://github.com/pypy/pypy.org

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, PyPy and RPython documentation improvements, or general help with making RPython's JIT even better.

If you are a python library maintainer and use C-extensions, please consider making a HPy / CFFI / cppyy version of your library that would be performant on PyPy. In any case, both cibuildwheel and the multibuild system support building wheels for PyPy.

What is PyPy?

PyPy is a Python interpreter, a drop-in replacement for CPython It's fast (PyPy and CPython 3.7.4 performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

We provide binary builds for:

  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)

  • 64-bit ARM machines running Linux (aarch64).

  • Apple M1 arm64 machines (macos_arm64).

  • s390x running Linux

PyPy support Windows 32-bit, Linux PPC64 big- and little-endian, and Linux ARM 32 bit, but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.

What else is new?

For more information about the 7.3.15 release, see the full changelog.

Please update, and continue to help us make pypy better.

Cheers, The PyPy Team

Categories: FLOSS Project Planets

PyCharm: Join the Livestream: “Python, Django, PyCharm, and More”

Mon, 2024-01-15 05:44

Join us for the new PyCharm Livestream episode to learn about everything new in the world of Python on January 25 at 4:00 pm UTC.

We will be chatting with Helen Scott, Jodie Burchell, Sarah Boyce, Mukul Mantosh, and Paul Everitt. Among other things, we’ll be talking about Python 3.12, Django 5.0, and PyCharm 2023.3. We’ll highlight some of the features we’re most excited about, and what we’re tracking in the future.

It’s an excellent opportunity to put faces to names and share your thoughts. If you have something you’re excited about in the Python world or you want to share your latest data science project with us, we’re looking forward to hearing about it!

Join the livestream

Date: January 25, 2024

Time: 4:00 pm UTC (5:00 pm CET)

Categories: FLOSS Project Planets

Zato Blog: Network packet brokers and automation in Python

Mon, 2024-01-15 03:00
Network packet brokers and automation in Python 2024-01-15, by Dariusz Suchojad

Packet brokers are crucial for network engineers, providing a clear, detailed view of network traffic, aiding in efficient issue identification and resolution.

But what is a network packet broker (NBP) really? Why are they needed? And how to automate one in Python?

Read this article about network packet brokers and their automation in Python to find out more.

Next steps
  • Click here to read more about using Python and Zato in telecommunications
  • Start the tutorial which will guide you how to design and build Python API services for automation and integrations
More blog posts
Categories: FLOSS Project Planets

Python GUIs: Plotting With PyQtGraph — Create Custom Plots in PyQt with PyQtGraph

Mon, 2024-01-15 01:00

One of the major fields where Python shines is in data science. For data exploration and cleaning, Python has many powerful tools, such as pandas and polar. For visualization, Python has Matplotlib.

When you're building GUI applications with PyQt, you can have access to all those tools directly from within your app. While it is possible to embed matplotlib plots in PyQt, the experience doesn't feel entirely native. So, for highly integrated plots, you may want to consider using the PyQtGraph library instead.

PyQtGraph is built on top of Qt's native QGraphicsScene, so it gives better drawing performance, particularly for live data. It also provides interactivity and the ability to customize plots according to your needs.

In this tutorial, you'll learn the basics of creating plots with PyQtGraph. You'll also explore the different plot customization options, including background color, line colors, line type, axis labels, and more.

Table of Contents Installing PyQtGraph

To use PyQtGraph with PyQt, you first need to install the library in your Python environment. You can do this using pip as follows:

sh $ python -m pip install pyqtgraph

Once the installation is complete, you will be able to import the module into your Python code. So, now you are ready to start creating plots.

Creating a PlotWidget Instance

In PyQtGraph, all plots use the PlotWidget class. This widget provides a canvas on which we can add and configure many types of plots. Under the hood, PlotWidget uses Qt's QGraphicsScene class, meaning that it's fast, efficient, and well-integrated with the rest of your app.

The code below shows a basic GUI app with a single PlotWidget in a QMainWindow:

python import pyqtgraph as pg from PyQt5 import QtWidgets class MainWindow(QtWidgets.QMainWindow): def __init__(self): super().__init__() # Temperature vs time plot self.plot_graph = pg.PlotWidget() self.setCentralWidget(self.plot_graph) time = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] temperature = [30, 32, 34, 32, 33, 31, 29, 32, 35, 30] self.plot_graph.plot(time, temperature) app = QtWidgets.QApplication([]) main = MainWindow() main.show() app.exec()

In this short example, you create a PyQt app with a PlotWidget as its central widget. Then you create two lists of sample data for time and temperature. The final step to create the plot is to call the plot() methods with the data you want to visualize.

The first argument to plot() will be your x coordinate, while the second argument will be the y coordinate.

In all the examples in this tutorial, we import PyQtGraph using import pyqtgraph as pg. This is a common practice in PyQtGraph examples to keep things tidy and reduce typing.

If you run the above application, then you'll get the following window on your screen:

Basic PyQtGraph plot: Temperature vs time.

PyQtGraph's default plot style is quite basic — a black background with a thin (barely visible) white line. Fortunately, the library provides several options that will allow us to deeply customize our plots.

In the examples in this tutorial, we'll create the PyQtGraph widget in code. To learn how to embed PyQtGraph plots when using Qt Designer, check out Embedding custom widgets from Qt Designer.

In the following section, we'll learn about the options we have available in PyQtGraph to improve the appearance and usability of our plots.

Customizing PyQtGraph Plots

Because PyQtGraph uses Qt's QGraphicsScene to render the graphs, we have access to all the standard Qt line and shape styling options for use in plots. PyQtGraph provides an API for using these options to draw plots and manage the plot canvas.

Below, we'll explore the most common styling features that you'll need to create and customize your own plots with PyQtGraph.

Background Color

Beginning with the app skeleton above, we can change the background color by calling setBackground() on our PlotWidget instance, self.graphWidget. The code below sets the background to white by passing in the string "w":

python import pyqtgraph as pg from PyQt5 import QtWidgets class MainWindow(QtWidgets.QMainWindow): def __init__(self): super().__init__() # Temperature vs time plot self.plot_graph = pg.PlotWidget() self.setCentralWidget(self.plot_graph) self.plot_graph.setBackground("w") time = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] temperature = [30, 32, 34, 32, 33, 31, 29, 32, 35, 45] self.plot_graph.plot(time, temperature) app = QtWidgets.QApplication([]) main = MainWindow() main.show() app.exec()

Calling setBackground() with "w" as an argument changes the background of your plot to white, as you can see in the following window:

PyQtGraph plot with a white background.

There are a number of colors available using single letters, as we did in the example above. They're based on the standard colors used in Matplotlib. Here are the most common codes:

Letter Code Color "b" Blue "c" Cian "d" Grey "g" Green "k" Black "m" Magenta "r" Red "w" White "y" Yellow

In addition to these single-letter codes, we can create custom colors using the hexadecimal notation as a string:

python self.plot_graph.setBackground("#bbccaa") # Hex

We can also use RGB and RGBA values passed in as 3-value and 4-value tuples, respectively. We must use values in the range from 0 to 255:

python self.plot_graph.setBackground((100, 50, 255)) # RGB each 0-255 self.plot_graph.setBackground((100, 50, 255, 25)) # RGBA (A = alpha opacity)

The first call to setBackground() takes a tuple representing an RGB color, while the second call takes a tuple representing an RGBA color.

We can also specify colors using Qt's QColor class if we prefer it:

python from PyQt5 import QtGui # ... self.plot_graph.setBackground(QtGui.QColor(100, 50, 254, 25))

Using QColor can be useful when you're using specific QColor objects elsewhere in your application and want to reuse them in your plots. For example, say that your app has a custom window background color, and you want to use it in the plots as well. Then you can do something like the following:

python color = self.palette().color(QtGui.QPalette.Window) # ... self.plot_graph.setBackground(color)

In the first line, you get the GUI's background color, while in the second line, you use that color for your plots.

Line Color, Width, and Style

Plot lines in PyQtGraph are drawn using the Qt QPen class. This gives us full control over line drawing, as we would have in any other QGraphicsScene drawing. To use a custom pen, you need to create a new QPen instance and pass it into the plot() method.

In the app below, we use a custom QPen object to change the line color to red:

python from PyQt5 import QtWidgets import pyqtgraph as pg class MainWindow(QtWidgets.QMainWindow): def __init__(self): super().__init__() # Temperature vs time plot self.plot_graph = pg.PlotWidget() self.setCentralWidget(self.plot_graph) self.plot_graph.setBackground("w") pen = pg.mkPen(color=(255, 0, 0)) time = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] temperature = [30, 32, 34, 32, 33, 31, 29, 32, 35, 45] self.plot_graph.plot(time, temperature, pen=pen) app = QtWidgets.QApplication([]) main = MainWindow() main.show() app.exec()

Here, we create a QPen object, passing in a 3-value tuple that defines an RGB red color. We could also define this color with the "r" code or with a QColor object. Then, we pass the pen to plot() with the pen argument.

PyQtGraph plot with a red plot line.

By tweaking the QPen object, we can change the appearance of the line. For example, you can change the line width in pixels and the style (dashed, dotted, etc.), using Qt's line styles.

Update the following lines of code in your app to create a red, dashed line with 5 pixels of width:

python from PyQt5 import QtCore, QtWidgets # ... pen = pg.mkPen(color=(255, 0, 0), width=5, style=QtCore.Qt.DashLine)

The result of this code is shown below, giving a 5-pixel, dashed, red line:

PyQtGraph plot with a red, dashed, and 5-pixel line

You can use all other Qt's line styles, including Qt.SolidLine, Qt.DotLine, Qt.DashDotLine, and Qt.DashDotDotLine. Examples of each of these lines are shown in the image below:

Qt's line styles.

To learn more about Qt's line styles, check the documentation about pen styles. There, you'll all you need to deeply customize the lines in your PyQtGraph plots.

Line Markers

For many plots, it can be helpful to use point markers in addition or instead of lines on the plot. To draw a marker on your plot, pass the symbol you want to use as a marker when calling plot(). The following example uses the plus sign as a marker:

python self.plot_graph.plot(hour, temperature, symbol="+")

In this line of code, you pass a plus sign to the symbol argument. This tells PyQtGraph to use that symbol as a marker for the points in your plot.

If you use a custom symbol, then you can also use the symbolSize, symbolBrush, and symbolPen arguments to further customize the marker.

The value passed as symbolBrush can be any color, or QBrush instance, while symbolPen can be any color or a QPen instance. The pen is used to draw the shape, while the brush is used for the fill.

Go ahead and update your app's code to use a blue marker of size 15, on a red line:

python pen = pg.mkPen(color=(255, 0, 0)) self.plot_graph.plot( time, temperature, pen=pen, symbol="+", symbolSize=20, symbolBrush="b", )

In this code, you pass a plus sign to the symbol argument. You also customize the marker size and color. The resulting plot looks something like this:

PyQtGraph plot with a plus sign as a point marker.

In addition to the + plot marker, PyQtGraph supports the markers shown in the table below:

Character Marker Shape "o" Circle "s" Square "t" Triangle "d" Diamond "+" Plus "t1" Triangle pointing upwards "t2" Triangle pointing right side "t3" Triangle pointing left side "p" Pentagon "h" Hexagon "star" Star "x" Cross "arrow_up" Arrow Up "arrow_right" Arrow Right "arrow_down" Arrow Down "arrow_left" Arrow Left "crosshair" Crosshair

You can use any of these symbols as markers for your data points. If you have more specific marker requirements, then you can also use a QPainterPath object, which allows you to draw completely custom marker shapes.

Plot Titles

Plot titles are important to provide context around what is shown on a given chart. In PyQtGraph, you can add a main plot title using the setTitle() method on the PlotWidget object. Your title can be a regular Python string:

python self.plot_graph.setTitle("Temperature vs Time")

You can style your titles and change their font color and size by passing additional arguments to setTitle(). The code below sets the color to blue and the font size to 20 points:

python self.plot_graph.setTitle("Temperature vs Time", color="b", size="20pt")

In this line of code, you set the title's font color to blue and the size to 20 points using the color and size arguments of setTitle().

You could've even used CSS style and basic HTML tag syntax if you prefer, although it's less readable:

python self.plot_graph.setTitle( '<span style="color: blue; font-size: 20pt">Temperature vs Time</span>' )

In this case, you use a span HTML tag to wrap the title and apply some CSS styles on to of it. The final result is the same as suing the color and size arguments. Your plot will look like this:

PyQtGraph plot with title.

Your plot looks way better now. You can continue customizing it by adding informative lables to both axis.

Axis Labels

When it comes to axis labels, we can use the setLabel() method to create them. This method requires two arguments, position and text.

python self.plot_graph.setLabel("left", "Temperature (°C)") self.plot_graph.setLabel("bottom", "Time (min)")

The position argument can be any one of "left", "right", "top", or "bottom". They define the position of the axis on which the text is placed. The second argument, text is the text you want to use for the label.

You can pass an optional style argument into the setLabel() method. In this case, you need to use valid CSS name-value pairs. To provide these CSS pairs, you can use a dictionary:

python styles = {"color": "red", "font-size": "18px"} self.plot_graph.setLabel("left", "Temperature (°C)", **styles) self.plot_graph.setLabel("bottom", "Time (min)", **styles)

Here, you first create a dictionary containing CSS pairs. Then you pass this dictionary as an argument to the setLabel() method. Note that you need to use the dictionary unpacking operator to unpack the styles in the method call.

Again, you can use basic HTML syntax and CSS for the labels if you prefer:

python self.plot_graph.setLabel( "left", '<span style="color: red; font-size: 18px">Temperature (°C)</span>' ) self.plot_graph.setLabel( "bottom", '<span style="color: red; font-size: 18px">Time (min)</span>' )

This time, you've passed the styles in a span HTML tag with appropriate CSS styles. In either case, your plot will look something like this:

PyQtGraph plot with axis labels.

Having axis labels highly improves the readability of your plots as you can see in the above example. So, it's a good practice that you should keep in mind when creating your plots.

Plot Legends

In addition to the axis labels and the plot title, you will often want to show a legend identifying what a given line represents. This feature is particularly important when you start adding multiple lines to a plot.

You can add a legend to a plot by calling the addLegend() method on the PlotWidget object. However, for this method to work, you need to provide a name for each line when calling plot().

The example below assigns the name "Temperature Sensor" to the plot() method. This name will be used to identify the line in the legend:

python self.plot_graph.addLegend() # ... self.plot_graph.plot( time, temperature, name="Temperature Sensor", pen=pen, symbol="+", symbolSize=15, symbolBrush="b", )

Note that you must call addLegend() before you call plot() for the legend to show up. Otherwise, the plot won't show the legend at all. Now your plot will look like the following:

PyQtGraph plot with legend.

The legend appears in the top left by default. If you would like to move it, you can drag and drop the legend elsewhere. You can also specify a default offset by passing a 2-value tuple to the offset parameter when calling the addLegend() method. This will allow you to specify a custom position for the legend.

Background Grid

Adding a background grid can make your plots easier to read, particularly when you're trying to compare relative values against each other. You can turn on the background grid for your plot by calling the showGrid() method on your PlotWidget instance. The method takes two Boolean arguments, x and y:

python self.plot_graph.showGrid(x=True, y=True)

In this call to the showGrid() method, you enable the grid lines in both dimensions x and y. Here's how the plot looks now:

PyQtGraph plot with grid.

You can toggle the x and y arguments independently, according to the dimension on which you want to enable the grid lines.

Axis Range

Sometimes, it can be useful to predefine the range of values that is visible on the plot or to lock the axis to a consistent range regardless of the data input. In PyQtGraph, you can do this using the setXRange() and setYRange() methods. They force the plot to only show data within the specified ranges.

Below, we set two ranges, one on each axis. The first argument is the minimum value, and the second is the maximum:

python self.plot_graph.setXRange(1, 10) self.plot_graph.setYRange(20, 40)

The first line of code sets the x-axis to show values between 1 and 10. The second line sets the y-axis to display values between 20 and 40. Here's how this changes the plot:

PyQtGraph plot with axis ranges

Now your plot looks more consistent. The axis show fix scales that are specifically set for the possible range of input data.

Multiple Plot Lines

It is common to have plots that involve more than one dependent variable. In PyQtGraph, you can plot multiple variables in a single chart by calling .plot() multiple times on the same PlotWidget instance.

In the following example, we plot temperatures values from two different sensors. We use the same line style but change the line color. To avoid code repetition, we define a new plot_line() method on our window:

python from PyQt5 import QtWidgets import pyqtgraph as pg class MainWindow(QtWidgets.QMainWindow): def __init__(self): super().__init__() # Temperature vs time plot self.plot_graph = pg.PlotWidget() self.setCentralWidget(self.plot_graph) self.plot_graph.setBackground("w") self.plot_graph.setTitle("Temperature vs Time", color="b", size="20pt") styles = {"color": "red", "font-size": "18px"} self.plot_graph.setLabel("left", "Temperature (°C)", **styles) self.plot_graph.setLabel("bottom", "Time (min)", **styles) self.plot_graph.addLegend() self.plot_graph.showGrid(x=True, y=True) self.plot_graph.setXRange(1, 10) self.plot_graph.setYRange(20, 40) time = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] temperature_1 = [30, 32, 34, 32, 33, 31, 29, 32, 35, 30] temperature_2 = [32, 35, 40, 22, 38, 32, 27, 38, 32, 38] pen = pg.mkPen(color=(255, 0, 0)) self.plot_line( "Temperature Sensor 1", time, temperature_1, pen, "b" ) pen = pg.mkPen(color=(0, 0, 255)) self.plot_line( "Temperature Sensor 2", time, temperature_2, pen, "r" ) def plot_line(self, name, time, temperature, pen, brush): self.plot_graph.plot( time, temperature, name=name, pen=pen, symbol="+", symbolSize=15, symbolBrush=brush, ) app = QtWidgets.QApplication([]) main = MainWindow() main.show() app.exec()

The custom plot_line() method on the main window does the hard work. It accepts a name to set the line name for the plot legend. Then it takes the time and temperature arguments. The pen and brush arguments allow you to tweak other features of the lines.

To plot separate temperature values, we'll create a new list called temperature_2 and populate it with random numbers similar to our old temperature, which now is temperature_1. Here's the plot looks now:

PyQtGrap plot with two lines.

You can play around with the plot_line() method, customizing the markers, line widths, colors, and other parameters.

Creating Dynamic Plots

You can also create dynamic plots with PyQtGraph. The PlotWidget can take new data and update the plot in real time without affecting other elements. To update a plot dynamically, we need a reference to the line object that the plot() method returns.

Once we have the reference to the plot line, we can call the setData() method on the line object to apply the new data. In the example below, we've adapted our temperature vs time plot to accept new temperature measures every minute. Note that we've set the timer to 300 milliseconds so that we don't have to wait an entire minute to see the updates:

python from random import randint import pyqtgraph as pg from PyQt5 import QtCore, QtWidgets class MainWindow(QtWidgets.QMainWindow): def __init__(self): super().__init__() # Temperature vs time dynamic plot self.plot_graph = pg.PlotWidget() self.setCentralWidget(self.plot_graph) self.plot_graph.setBackground("w") pen = pg.mkPen(color=(255, 0, 0)) self.plot_graph.setTitle("Temperature vs Time", color="b", size="20pt") styles = {"color": "red", "font-size": "18px"} self.plot_graph.setLabel("left", "Temperature (°C)", **styles) self.plot_graph.setLabel("bottom", "Time (min)", **styles) self.plot_graph.addLegend() self.plot_graph.showGrid(x=True, y=True) self.plot_graph.setYRange(20, 40) self.time = list(range(10)) self.temperature = [randint(20, 40) for _ in range(10)] # Get a line reference self.line = self.plot_graph.plot( self.time, self.temperature, name="Temperature Sensor", pen=pen, symbol="+", symbolSize=15, symbolBrush="b", ) # Add a timer to simulate new temperature measurements self.timer = QtCore.QTimer() self.timer.setInterval(300) self.timer.timeout.connect(self.update_plot) self.timer.start() def update_plot(self): self.time = self.time[1:] self.time.append(self.time[-1] + 1) self.temperature = self.temperature[1:] self.temperature.append(randint(20, 40)) self.line.setData(self.time, self.temperature) app = QtWidgets.QApplication([]) main = MainWindow() main.show() app.exec()

The first step to creating a dynamic plot is to get a reference to the plot line. In this example, we've used a QTimer object to set the measuring interval. We've connected the update_plot() method with the timer's timeout signal.

Theupdate_plot() method does the work of updating the data in every interval. If you run the app, then you will see a plot with random data scrolling to the left:

The time scale in the x-axis changes as the stream of data provides new values. You can replace the random data with your own real data. You can take the data from a live sensor readout, API, or from any other stream of data. PyQtGraph is performant enough to support multiple simultaneous dynamic plots using this technique.

Conclusion

In this tutorial, you've learned how to draw basic plots with PyQtGraph and customize plot components, such as lines, markers, titles, axis labels, and more. For a complete overview of PyQtGraph methods and capabilities, see the PyQtGraph documentation. The PyQtGraph repository on Github also has a complete set of plot examples.

Categories: FLOSS Project Planets

Chris Moffitt: Introduction to Polars

Sun, 2024-01-14 17:25
Introduction

It’s been a while since I’ve posted anything on the blog. One of the primary reasons for the hiatus is that I have been using python and pandas but not to do anything very new or different.

In order to shake things up and hopefully get back into the blog a bit, I’m going to write about polars. This article assumes you know how to use pandas and are interested in determining if polars can fit into your workflow. I will cover some basic polars concepts that should get you started on your journey.

Along the way I will point out some of the things I liked and some of the differences that that might limit your usage of polars if you’re coming from pandas.

Ultimately, I do like polars and what it is trying to do. I’m not ready to throw out all my pandas code and move over to polars. However, I can see where polars could fit into my toolkit and provide some performance and capability that is missing from pandas.

As you evaluate the choice for yourself, it is important to try other frameworks and tools and evaluate them on their merits as they apply to your needs. Even if you decide polars doesn’t meet your needs it is good to evaluate options and learn along the way. Hopefully this article will get you started down that path.

Polars

As mentioned above, pandas has been the data analysis tool for python for the past few years. Wes McKinney started the initial work on pandas in 2008 and the 1.0 release was in January 2020. Pandas has been around a long time and will continue to be.

While pandas is great, it has it’s warts. Wes McKinney wrote about several of these challenges. There are many other criticisms online but most will boil down to two items: performance and awkward/complex API.

Polars was initially developed by Richie Vink to solve these issues. His 2021 blog post does a thorough job of laying out metrics to back up his claims on the performance improvements and underlying design that leads to these benefit with polars.

The user guide concisely lays out the polars philosophy:

The goal of Polars is to provide a lightning fast DataFrame library that:

  • Utilizes all available cores on your machine.
  • Optimizes queries to reduce unneeded work/memory allocations.
  • Handles datasets much larger than your available RAM.
  • Has an API that is consistent and predictable.
  • Has a strict schema (data-types should be known before running the query).

Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.

As such Polars goes to great lengths to:

  • Reduce redundant copies.
  • Traverse memory cache efficiently.
  • Minimize contention in parallelism.
  • Process data in chunks.
  • Reuse memory allocations.

Clearly performance is an important goal in the development of polars and key reason why you might consider using polars.

This article won’t discuss performance but will focus on the polars API. The main reason is that for the type of work I do, the data easily fits in RAM on a business-class laptop. The data will fit in Excel but it is slow and inefficient on a standard computer. I rarely find myself waiting on pandas once I have read in the data and have done basic data pre-processing.

Of course performance matters but it’s not everything. If you’re trying to make a choice between pandas, polars or other tools don’t make a choice based on general notions of “performance improvement” but based on what works for your specific needs.

Getting started

For this article, I’ll be using data from an earlier post which you can find on github.

I would recommend following the latest polar installation instructions in the user guide .

I chose to install polars with all of the dependencies:

python -m pip install polars[all]

Once installed, reading the downloaded Excel file is straightforward:

import polars as pl df = pl.read_excel( source="2018_Sales_Total_v2.xlsx", schema_overrides={"date": pl.Datetime} )

When I read this specific file, I found that the date column did not come through as a DateTime type so I used the scheme_override argument to make sure the data was properly typed.

Since data typing is so important, here’s one quick way to check on it:

df.schema OrderedDict([('account number', Int64), ('name', Utf8), ('sku', Utf8), ('quantity', Int64), ('unit price', Float64), ('ext price', Float64), ('date', Datetime(time_unit='us', time_zone=None))])

A lot of the standard pandas commands such as head , tail , describe work as expected with a little extra output sprinkled in:

df.head() df.describe()

The polars output has a couple of notable features:

  • The shape is included which is useful to make sure you’re not dropping rows or columns inadvertently
  • Underneath each column name is a data type which is another useful reminder
  • There are no index numbers
  • The string columns include ” ” around the values

Overall, I like this output and do find it useful for analyzing the data and making sure the data is stored in the way I expect.

Basic concepts - selecting and filtering rows and columns

Polars introduces the concept of Expressions to help you work with your data. There are four main expressions you need to understand when working with data in polars:

  • select to choose the subset of columns you want to work with
  • filter to choose the subset of rows you want to work with
  • with_columns to create new columns
  • group_by to group data together

Choosing or reordering columns is straightforward with select()

df.select(pl.col("name", "quantity", "sku"))

The pl.col() code is used to create column expressions. You will want to use this any time you want to specify one or more columns for an action. There are shortcuts where you can use data without specifying pl.col() but I’m choosing to show the recommended way.

Filtering is a similar process (note the use of pl.col() again):

df.filter(pl.col("quantity") > 50)

Coming from pandas, I found selecting columns and filtering rows to be intuitive.

Basic concepts - adding columns

The next expression, with_columns , takes a little more getting used to. The easiest way to think about it is that any time you want to add a new column to your data, you need to use with_columns .

To illustrate, I will add a month name column which will also show how to work with date and strings.

df.with_columns((pl.col("date").dt.strftime("%b").alias("month_name")))

This command does a couple of things to create a new column:

  • Select the date column
  • Access the underlying date with dt and convert it to the 3 character month name using strftime
  • Name the newly created column month_name using the alias function

As a brief aside, I like using alias to rename columns. As I played with polars, this made a lot of sense to me.

Here’s another example to drive the point home.

Let’s say we want to understand how much any one product order contributes to the total percentage unit volume for the year:

df.with_columns( (pl.col("quantity") / pl.col("quantity").sum()).alias("pct_total") )

In this example we divide the line item quantity by the total quantity pl.col("quantity").sum() and label it as pct_total .

You may have noticed that the previous month_name column is not there. That’s because none of the operations we have done are in-place. If we want to persist a new column, we need to assign it to a new variable. I will do so in a moment.

I briefly mentioned working with strings but here’s another example.

Let’s say that any of the sku data with an “S” at the front is a special product and we want to indicate that for each item. We use str in a way very similar to the pandas str accessor.

df.with_columns(pl.col("sku").str.starts_with("S").alias("special"))

Polars has a useful function when then otherwise which can replace pandas mask or np.where

Let’s say we want to create a column that indicates a special or includes the original sku if it’s not a special product.

df.with_columns( pl.when(pl.col("sku").str.starts_with("S")) .then(pl.lit("Special")) .otherwise(pl.col("sku")) .alias("sales_status") )

Which yields:

This is somewhat analogous to an if-then-else statement in python. I personally like this syntax because I alway struggle to use pandas equivalents.

This example also introduces pl.lit() which we use to assign a literal value to the columns.

Basic concepts - grouping data

The pandas groupby and polars group_by functional similarly but the key difference is that polars does not have the concept of an index or multi-index.

There are pros and cons to this approach which I will briefly touch on later in this article.

Here’s a simple polars group_by example to total the unit amount by sku by customer.

df.group_by("name", "sku").agg(pl.col("quantity").sum().alias("qty-total"))

The syntax is similar to pandas groupby with agg dictionary approach I have mentioned before. You will notice that we continue to use pl.col() to reference our column of data and then alias() to assign a custom name.

The other big change here is that the data does not have a multi-index, the result is roughly the same as using as_index=False with a pandas groupby. The benefit of this approach is that it is easy to work with this data without flattening or resetting your data.

The downside is that you can not use unstack and stack to make the data wider or narrower as needed.

When working with date/time data, you can group data similar to the pandas grouper function by using group_by_dynamic :

df.sort(by="date").group_by_dynamic("date", every="1mo").agg( pl.col("quantity").sum().alias("qty-total-month") )

There are a couple items to note:

  • Polars asks that you sort the data by column before doing the group_by_dynamic
  • The every argument allows you to specify what date/time level to aggregate to

To expand on this example, what if we wanted to show the month name and year, instead of the date time? We can chain together the group_by_dynamic and add a new column by using with_columns

df.sort(by="date").group_by_dynamic("date", every="1mo").agg( pl.col("quantity").sum().alias("qty-total-month") ).with_columns(pl.col("date").dt.strftime("%b-%Y").alias("month_name")).select( pl.col("month_name", "qty-total-month") )

This example starts to show the API expressiveness of polars. Once you understand the basic concepts, you can chain them together in a way that is generally more straightforward than doing so with pandas.

To summarize this example:

  • Grouped the data by month
  • Totaled the quantity and assigned the column name to qty-total-month
  • Change the date label to be more readable and assigned the name month_name
  • Then down-selected to show the two columns I wanted to focus on
Chaining expressions

We have touched on chaining expressions but I wanted to give one full example below to act as a reference.

Combining multiple expressions is available in pandas but it’s not required. This post from Tom Augspurger shows a nice example of how to use different pandas functions to chain operations together. This is also a common topic that Matt Harrison (@__mharrison__) discusses.

Chaining expressions together is a first class citizen in polars so it is intuitive and an essential part of working with polars.

Here is an example combining several concepts we showed earlier in the article:

df_month = df.with_columns( (pl.col("date").dt.month().alias("month")), (pl.col("date").dt.strftime("%b").alias("month_name")), (pl.col("quantity") / pl.col("quantity").sum()).alias("pct_total"), ( pl.when(pl.col("sku").str.starts_with("S")) .then(pl.lit("Special")) .otherwise(pl.col("sku")) .alias("sales_status") ), ).select( pl.col( "name", "quantity", "sku", "month", "month_name", "sales_status", "pct_total" ) ) df_month

I made this graphic to show how the pieces of code interact with each other:

The image is small on the blog but if you open it in a new window, it should be more legible.

It may take a little time to wrap your head around this approach to programming. But the results should pay off in more maintainable and performant code.

Additional notes

As you work with pandas and polars there are convenience functions for moving back and forth between the two. Here’s an example of creating a pandas dataframe from polars:

df.with_columns( pl.when(pl.col("sku").str.starts_with("S")) .then(pl.lit("Special")) .otherwise(pl.lit("Standard")) .alias("sales_status") ).to_pandas()

Having this capability means you can gradually start to use polars and go back to pandas if there are activities you need in polars that don’t quite work as expected.

If you need to work the other way, you can convert a pandas dataframe to a polars one using from_pandas()

Finally, one other item I noticed when working with polars is that there are some nice convenience features when saving your polars dataframe to Excel. By default the dataframe is stored in a table and you can make a lot of changes to the output by tweaking the parameters to the write_excel() . I recommend reviewing the official API docs for the details.

To give you a quick flavor, here is an example of some simple configuration:

df.group_by("name", "sku").agg(pl.col("quantity").sum().alias("qty-total")).write_excel( "sample.xlsx", table_style={ "style": "Table Style Medium 2", }, autofit=True, sheet_zoom=150, )

There are a lot of configuration options available but I generally find this default output easier to work with thank pandas.

Additional resources

I have only touched on the bare minimum of capabilities in polars. If there is interest, I’ll write some more. In the meantime, I recommend you check out the following resources:

The Modern Polars resource goes into a much more detailed look at how to work with pandas and polars with code examples side by side. It’s a top notch resource. You should definitely check it out.

Conclusion

Pandas has been the go-to data analysis tool in the python ecosystem for over a decade. Over that time it has grown and evolved and the surrounding ecosystem has changed. As a result some of the core parts of pandas might be showing their age.

Polars brings a new approach to working with data. It is still in the early phases of its development but I am impressed with how far it has come in the first few years. As of this writing, polars is moving to a 1.0 release. This milestone means that the there will be fewer breaking changes going forward and the API will stabilize. It’s a good time to jump in and learn more for yourself.

I’ve only spent a few hours with polars so I’m still developing my long-term view on where it fits. Here are a few of my initial observations:

Polars pros:

  • Performant design from the ground up which maximizes modern hardware and minimizes memory usage
  • Clean, consistent and expressive API for chaining methods
  • Not having indices simplifies many cases
  • Useful improvement in displaying output, saving excel files, etc.
  • Good API and user documentation
  • No built in plotting library.

Regarding the plotting functionality, I think it’s better to use the available ones than try to include in polars. There is a plot namespace in polars but it defers to other libraries to do the plotting.

Polars cons:

  • Still newer code base with breaking API changes
  • Not as much third party documentation
  • Not as seamlessly integrated with other libraries (although it is improving)
  • Some pandas functions like stacking and unstacking are not as mature in polars

Pandas pros:

  • Tried and tested code base that has been improved significantly over the years
  • The multi-index support provides helpful shortcuts for re-shaping data
  • Strong integrations with the rest of the python data ecosystem
  • Good official documentation as well as lots of 3rd party sources for tips and tricks

Pandas cons:

  • Some cruft in the API design. There’s more than one way to do things in many cases.
  • Performance for large data sets can get bogged down

This is not necessarily exhaustive but I think hits the highlights. At the end of the day, diversity in tools and approaches is helpful. I intend to continue evaluating the integration of polars into my analysis - especially in cases where performance becomes an issue or the pandas code gets too be too messy. However, I don’t think pandas is going away any time soon and I continue to be excited about pandas evolution.

I hope this article helps you get started. As always, if you have experiences, thoughts or comments on the article, let me know below.

Categories: FLOSS Project Planets

Ned Batchelder: Randomly sub-setting test suites

Sun, 2024-01-14 09:39

I needed to run random subsets of my test suite to narrow down the cause of some mysterious behavior. I didn’t find an existing tool that worked the way I wanted to, so I cobbled something together.

I wanted to run 10 random tests (out of 1368), and keep choosing randomly until I saw the bad behavior. Once I had a selection of 10, I wanted to be able to whittle it down to try to reduce it further.

I tried a few different approaches, and here’s what I came up with, two tools in the coverage.py repo that combine to do what I want:

  • A pytest plugin (select_plugin.py) that lets me run a command to output the names of the exact tests I want to run,
  • A command-line tool (pick.py) to select random lines of text from a file. For convenience, blank or commented-out lines are ignored.

More details are in the comment at the top of pick.py, but here’s a quick example:

  1. Get all the test names in tests.txt. These are pytest “node” specifications: pytest --collect-only | grep :: > tests.txt
  2. Now tests.txt has a line per test node. Some are straightforward: tests/test_cmdline.py::CmdLineStdoutTest::test_version
    tests/test_html.py::HtmlDeltaTest::test_file_becomes_100
    tests/test_report_common.py::ReportMapsPathsTest::test_map_paths_during_html_report
    but with parameterization they can be complicated: tests/test_files.py::test_invalid_globs[bar/***/foo.py-***]
    tests/test_files.py::FilesTest::test_source_exists[a/b/c/foo.py-a/b/c/bar.py-False]
    tests/test_config.py::ConfigTest::test_toml_parse_errors[[tool.coverage.run]\nconcurrency="foo"-not a list]
  3. Run a random bunch of 10 tests: pytest --select-cmd="python pick.py sample 10 < tests.txt"
    We’re using --select-cmd to specify the shell command that will output the names of tests. Our command uses pick.py to select 10 random lines from tests.txt.
  4. Run many random bunches of 10, announcing the seed each time: for seed in $(seq 1 100); do
        echo seed=$seed
        pytest --select-cmd="python pick.py sample 10 $seed < tests.txt"
    done
  5. Once you find a seed that produces the small batch you want, save that batch: python pick.py sample 10 17 < tests.txt > bad.txt
  6. Now you can run that bad batch repeatedly: pytest --select-cmd="cat bad.txt"
  7. To reduce the bad batch, comment out lines in bad.txt with a hash character, and the tests will be excluded. Keep editing until you find the small set of tests you want.

I like that this works and I understand it. I like that it’s based on the bedrock of text files and shell commands. I like that there’s room for different behavior in the future by adding to how pick.py works. For example, it doesn’t do any bisecting now, but it could be adapted to it.

As usual, there might be a better way to do this, but this works for me.

Categories: FLOSS Project Planets

TechBeamers Python: Understanding LangChain: A Guide for Beginners

Sun, 2024-01-14 09:21

LangChain is a toolkit for building apps powered by large language models like GPT-3. Think of it as Legos for AI apps – it simplifies connecting these powerful models to build things like text generators, chatbots, and question answerers. It was created by an open-source community and lets developers quickly prototype and deploy AI-powered apps. […]

The post Understanding LangChain: A Guide for Beginners appeared first on TechBeamers.

Categories: FLOSS Project Planets

Pages