Planet Python

Subscribe to Planet Python feed
Planet Python -
Updated: 6 hours 13 min ago

Mike Driscoll: Episode 42 – Harlequin – The SQL IDE for Your Terminal

Wed, 2024-05-29 17:06

This episode focuses on the Harlequin application, a Python SQL IDE for your terminal written using the amazing Textual package.

I was honored to have Ted Conbeer, the creator of Harlequin, on the show to discuss his creation and the other things he does with Python.

Specifically, we focused on the following topics:

  • Favorite Python packages
  • Origins of Harlequin
  • Why program for the terminal versus a GUI
  • Lessons learned in creating the tool
  • Asyncio
  • and more!

The post Episode 42 – Harlequin – The SQL IDE for Your Terminal appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Django Weblog: Django Enhancement Proposal 14: Background Workers

Wed, 2024-05-29 15:04

As of today, DEP-14 has been approved 🛫

The DEP was written and stewarded by Jake Howard. A very enthusiastic community has been active with feedback and encouragement, while the Django Steering Council gave the final inputs before its formal acceptance. The implementation of DEP-14 is expected to be a major leap forward for the “batteries included” philosophy of Django.

Whilst Django is a web framework, there's more to web applications than just the request-response lifecycle. Sending emails, communicating with external services or running complex actions should all be done outside the request-response cycle.

Django doesn't have a first-party solution for long-running tasks, however the ecosystem is filled with incredibly popular frameworks, all of which interact with Django in slightly different ways. Other frameworks such as Laravel have background workers built-in, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur.

Library maintainers must implement support for any possible task backend separately, should they wish to offload functionality to the background. This includes smaller libraries, but also larger meta-frameworks with their own package ecosystem such as Wagtail.

This proposal sets out to provide an interface and base implementation for long-running background tasks in Django.

Future work

The DEP will now move on to the Implementation phase before being merged into Django itself.

If you would like to help or try it out, go have a look at django-tasks, a separate reference implementation by Jake Howard, the author of the DEP.

Jake will also be speaking about the DEP in his talk at DjangoCon Europe at DjangoCon Europe 2024 in Vigo next week.

Categories: FLOSS Project Planets

PyCharm: PyCharm 2024.1.2: What’s New!

Wed, 2024-05-29 14:54

PyCharm 2024.1.2 is here with features designed to enhance your productivity and streamline your development workflow. This update includes support for DRF viewsets and routers in the Endpoints tool window, code assistance for TypedDict and Unpack, and improved debugger performance when handling large collections.

You can download the latest version from our download page or update your current version through our free Toolbox App

For more details, please visit our What’s New page.

Download PyCharm 2024.1.2

Key features Support for DRF viewsets and routers in the Endpoints tool window

When working with the Django REST Framework in PyCharm, not only can you specify function-based or class-based views in the path, but you can now also specify viewsets and see the results in the Endpoints tool window. Additionally, you can map HTTP methods to viewset methods, and PyCharm will display the HTTP methods next to the relevant route, including for custom methods. Routes without @actions decorators are now displayed with the related viewset methods.

Learn more Code assistance for TypedDict and Unpack

PEP 692 made it possible to add type information for keyword arguments of different types by using TypedDict and Unpack. PyCharm allows you to use this feature confidently by providing parameter info, type checking, and code completion.

Improved debugger performance for large collections

PyCharm’s debugger now offers a smoother experience, even when very large collections are involved. You can now work on your data science projects without having to put up with high CPU loads and UI freezes.

Download PyCharm 2024.1.2

Be sure to check out our release notes to learn all of the details and ensure you don’t miss out on any new features.

We appreciate your support as we work to improve your PyCharm experience. Please report any bugs via our issue tracker so we can resolve them promptly. Connect with us on X (formerly Twitter) to share your feedback on PyCharm 2024.1.2!

Categories: FLOSS Project Planets

The Python Show: 42 - Harlequin - The SQL IDE for Your Terminal

Wed, 2024-05-29 10:14

This episode focuses on the Harlequin application, a Python SQL IDE for your terminal written using the amazing Textual package.

I was honored to have Ted Conbeer, the creator of Harlequin, on the show to discuss his creation and the other things he does with Python.

Specifically, we focused on the following topics:

  • Favorite Python packages

  • Origins of Harlequin

  • Why program for the terminal versus a GUI

  • Lessons learned in creating the tool

  • Asyncio

  • and more!

Categories: FLOSS Project Planets

Real Python: What Are CRUD Operations?

Wed, 2024-05-29 10:00

CRUD operations are at the heart of nearly every application you interact with. As a developer, you usually want to create data, read or retrieve data, update data, and delete data. Whether you access a database or interact with a REST API, only when all four operations are present are you able to make a complete data roundtrip in your app.

Creating, reading, updating, and deleting are so vital in software development that these methods are widely referred to as CRUD. Understanding CRUD will give you an actionable blueprint when you build applications and help you understand how the applications you use work behind the scenes. So, what exactly does CRUD mean?

Get Your Code: Click here to download the free sample code that you’ll use to learn about CRUD operations in Python.

Take the Quiz: Test your knowledge with our interactive “What Are CRUD Operations?” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

What Are CRUD Operations?

In this quiz, you'll revisit the key concepts and techniques related to CRUD operations. These operations are fundamental to any system that interacts with a database, and understanding them is crucial for effective data management.

In Short: CRUD Stands for Create, Read, Update, and Delete

CRUD operations are the cornerstone of application functionality, touching every aspect of how apps store, retrieve, and manage data. Here’s a brief overview of the four CRUD operations:

  • Create: This is about adding new entries to your database. But it’s also applicable to other types of persistent storage, such as files or networked services. When you perform a create operation, you’re initiating a journey for a new piece of data within your system.
  • Read: Through reading, you retrieve or view existing database entries. This operation is as basic as checking your email or reloading a website. Every piece of information you get has been received from a database, thanks to the read operation.
  • Update: Updating allows you to modify the details of data already in the database. For example, when you update a profile picture or edit a chat message. Each time, there’s an update operation at work, ensuring your new data is stored in the database.
  • Delete: Deleting removes existing entries from the database. Whether you’re closing an account or removing a post, delete operations ensure that unwanted or unnecessary data can be properly discarded.

CRUD operations describe the steps that data takes from creation to deletion, regardless of what programming language you use. Every time you interact with an application, you’re likely engaging in one of the four CRUD operations.

Why Are CRUD Operations Essential?

Whether you’re working on a basic task list app or a complex e-commerce platform, CRUD operations offer a universal language for designing and manipulating data models. Knowing about CRUD as a user helps you understand what’s happening behind the curtains. As a developer, understanding CRUD provides you with a structured framework for storing data in your application with persistence:

In computer science, persistence refers to the characteristic of state of a system that outlives (persists more than) the process that created it. This is achieved in practice by storing the state as data in computer data storage. (Source)

So even when a program crashes or a user disconnects, the data is safe and can be retrieved later. This also means that the order of the operations is important. You can only read, update, or delete items that were previously created.

It’s good practice to implement each CRUD operation separately in your applications. For example, when you retrieve items, then you shouldn’t update them at the same time.

Note: An exception to this rule may be when you update a “last time retrieved” value after a read operation. Although the user performs a read CRUD operation to retrieve data, you may want to trigger an update operation in the back end to keep track of a user’s retrievals. This can be handy if you want to show the last visited posts to the user.

While CRUD describes a concept that’s independent of specific programming languages, one could argue that CRUD operations are strongly connected to SQL commands and HTTP methods.

What Are CRUD Operations in SQL?

The idea of CRUD is strongly connected with databases. That’s why it’s no surprise that CRUD operations correspond almost one-to-one with SQL commands:

CRUD Operation SQL Command Create INSERT Read SELECT Update UPDATE Delete DELETE

When you create data, you’re using the INSERT command to add new records to a table. After creation, you may read data using SELECT. With a SELECT query, you’re asking the database to retrieve the specific pieces of information you need, whether it’s a single value, a set of records, or complex relationships between data points.

The update operation corresponds to the UPDATE command in SQL, which allows you to modify data. It lets you edit or change an existing item.

Lastly, the delete operation relates to the DELETE command. This is the digital equivalent of shredding a confidential document. With DELETE, you permanently remove an item from the database.

Writing CRUD Operations in Raw SQL

CRUD operations describe actions. That’s why it’s a good idea to pull up your sleeves and write some code to explore how CRUD operations translate into raw SQL commands.

In the examples below, you’ll use Python’s built-in sqlite3 package. SQLite is a convenient SQL library to try things out, as you’ll work with a single SQLite database file.

You’ll name the database birds.db. As the name suggests, you’ll use the database to store the names of birds you like. To keep the example small, you’ll only keep track of the bird names and give them an ID as a unique identifier.

Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Talk Python to Me: #464: Seeing code flows and generating tests with Kolo

Wed, 2024-05-29 04:00
Do you want to look inside your Django request? How about all of your requests in development and see where they overlap? If that sounds useful, you should check out Kolo. It's a pretty incredible extension for your editor (VS Code at the moment, more editors to come most likely). We have Wilhelm Klopp on to tell us all about it.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href=''>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href=''>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Wil on Twitter</b>: <a href="" target="_blank" rel="noopener">@wilhelmklopp</a><br/> <b>Kolo</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Kolo's info repo</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Kolo Playground</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Generating tests with Kolo</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Watch this episode on YouTube</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Episode transcripts</b>: <a href="" target="_blank" rel="noopener"></a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Python Morsels: Equality versus identity in Python

Tue, 2024-05-28 19:52

Equality checks whether two objects represent the same value. Identity checks whether two variables point to the same object.

Table of contents

  1. The equality operator in Python
  2. The is operator in Python
  3. How equality and identity work differently?
  4. Inequality and non-identity operators
  5. Where are identity checks used?
  6. Equality vs. Identity

The equality operator in Python

Let's say we have two variables that point to two lists:

>>> a = [2, 1, 3] >>> b = [2, 1, 3, 4]

When we use the == operator to check whether these lists are equal, we'll see that they are not equal:

>>> a == b False

These lists don't have the same values right now, so they're not equal.

Let's update the first list so that these two lists do have equivalent values:

>>> a.append(4) >>> a [2, 1, 3, 4]

If we use == again, we'll see that these lists are equal now:

>>> a == b True

Python's == operator checks for equality. Two objects are equal if they represent the same data.

The is operator in Python

Python also has an is …

Read the full article:
Categories: FLOSS Project Planets

Trey Hunner: PyCon 2024 Reflection

Tue, 2024-05-28 16:00

I traveled back home from PyCon US 2024 last week. This is my reflection on my time at PyCon.

Attempting to eat vegan

Since 2020, I’ve been gradually eating more plant-based and a few months ago I decided to take PyCon as an opportunity to attempt exclusively vegan eating outside my own home. As I noted on Mastodon, it was a challenge and I failed every day at least once but I found the experience worthwhile. Our food system is very dairy-oriented.

Staying hydrated and fed

One of the first things I did before heading to the convention center was walk to Target and buy snacks and drinks. When at PyCon, I prefer to spend 30 minutes and $20 to have a backup plan for last minute hydration and calories (even if not the greatest calories). I never quite know when I might sleep through breakfast, find lunch lacking, or wish I’d eaten more dinner.

A tutorial, an orientation, a lightning talk, and open spaces

My responsibilities at PyCon this year included teaching a tutorial and helping run the Newcomer’s Orientation with Kojo and Sumana.

Yngve and Marie offered to act as teaching assistants during my tutorial and I was very grateful for their help! Rodrigo and Krishna also offered to TA just before my tutorial started and I was extra grateful to have even more help than I’d expected. The attendees were mostly better prepared than I expected they would be, which was also great. It’s always great to spend less time on setup and more time exploring Python together.

The newcomer’s orientation the next day went well. We kept it fairly brief and were able to address about 10 minutes of audience questions before the opening reception started.

Once my PyCon responsibilities completed, I invented a few more (light) responsibilities for myself. 😅 I signed up to give a lightning talk on how to give a lightning talk. They slotted it as the first talk of the first lightning talk session on Friday night. I kept this talk pretty much the same as the one I presented DjangoCon 2016. I could have made the transitions fancier, but I decided to embrace the idea of simplicity with the hope that audience members might think “look if that first speaker can give such a simple and succinct presentation, maybe I can too.”

On Saturday I ran an open space on Python Learning. Some of you showed up because you’re on my mailing list or you’re paying Python Morsels subscribers. Many folks showed up because the topic was interesting, either as a learner or as a teacher. I really enjoyed the round-table-style conversation we had.

I also ran a Cabo Card game open space during lunch on Sunday on the 4th floor rooftop. Cabo is my usual conference ice breaker game and I played it at least a few nights in The Westin lobby as well.

Seeing conference friends, old and new

For me, PyCon is largely about having conversations. The talks and tutorials are great for starting me thinking about an idea. The hallway track, open spaces, and meals are great for continuing conversations about those ideas (or other ideas).

My first morning in Pittsburgh, I chatted with Naomi Ceder and Reuven Lerner. I’m glad I ran into them before the conference kicked off because (as often happens at PyCon) I only very briefly saw either of them during the rest of PyCon!

After my tutorial that afternoon, I did dinner with Marie, Yngve, and Rodrigo at Rosewater Mediterranean (good vegan options, assuming you enjoy falafel and various sauces). As sometimes happens at PyCon, another PyCon attendee, Sachin, joined our table because we noticed him eating on his own at a table near us and invited him to join us.

On Saturday, Melanie, David, Jay, and I had a sort of mini San Diego Python study group reunion dinner before inviting folks to join us for Cabo and Knucklebones one night. The 4 of us originally met each other (along with Carol and other wonderful Python folks) at the San Diego Python study group about 10 years ago.

I had some wonderful conversations about ways to improve the Python documentation over dinner (at Nicky’s Thai) on Sunday night with so many docs-concerned folks who I highly respect. I’m really excited that Python has the documentation editorial board and I’m hopeful that that board, with the help of many others community members, will usher in big improvements to the documentation in the coming years.

I also met a number of Internet acquaintances IRL for the first time at PyCon. I met Tereza and Jessica, who I know from our work in the PSF Code of Conduct workgroup. I met Steve Lott, who I originally knew as a prolific question-answerer. I also met Hugo, a CPython core dev, the Python 3.14 & 3.15 release manager, and a social media user (which is how I’ve primarily interacted with him because the Internet is occasionally lovely). I was also very excited to meet many Python Morsels members as well as folks who know me through my weekly Python tips newsletter.

I was grateful to chat with Hynek and Al about creating talks, YouTube videos, and other online content. I also enjoyed chatting with Glyph a bit about our experiences consulting and training and (in hindsight) wished I’d planned an open space for either consultants or trainers, both of which have been held at PyCon before but it just takes someone to stick it on the open space board.

Many folks I only saw very briefly (I said a quick hi and bye to Andrew over lunch during the sprints) and some I didn’t see at all (Frank was at PyCon but we never ran into each other). Some I essentially saw through playing a few rounds of Cabo (Thomas and Ethan among many others). We also ran into at least 4 other PyCon attendees in the airport on Tuesday afternoon, including Bob and Julian, who it’s always a pleasure to see.

A Mastodon-oriented PyCon

On Thursday night I had the feeling that the number of Mastodon posts I saw on the #PyConUS hashtag was greater than the number of Twitter posts. I (very unscientifically) counted up the number of posts I was seeing on each and found that my perception was correct: Mastodon seemed to slightly overtake Twitter at PyCon this year.

Over dinner on Wednesday, I tried to convince Marie, Yngve, and Rodrigo to get Mastodon accounts just to follow the hashtag during PyCon. I succeeded: Marie and Yngve and Rodrigo!

Mastodon will never be the social media platform. Its decentralized nature is too much of a barrier for many folks. However, it does seem to be used by enough somewhat nerdy Python folks to now be one the most used social media platform for PyCon posting.

The talks

I ended up spending little time in the talks during PyCon. This wasn’t on purpose. I just happened to attend many open spaces, take personal breaks, and end up in hallway conversations often. I did see many of the lightning talks live, as well as Jay, Simon, and Sumana’s keynotes (all of them were exceptional) and the opening and closing remarks. I also watched a few talks from my hotel room while taking breaks.

While I’m often a bit light on my talk load at PyCon, I do recommend folks attend a good handful of live talks during PyCon, as Jon and others recommend. I wish I had seen more talks live. I also wish I had attended a few open spaces that I missed.

At any one time, I know that I’m always missing about 90% of what’s scheduled during PyCon (if you include the talks and the open spaces). That’s assuming I don’t ditch the conference entirely for a few hours and walk across a bridge or ride a funicular (neither of which I did, as I stuck around the venue the whole time this year). I am glad I saw, did, and talked about everything I did, but there’s always something I wish I’d seen/done!

The sprints

Thanks to the documentation dinner, I had a couple documentation-related ideas in mind on the first day of sprints. But I’m also really excited about the new Python REPL coming in Python 3.13 (in case you can’t tell from how much I talk about it), so I sprinted on that instead. Łukasz assigned me the task of researching keyboard shortcuts that the new REPL is missing (compared to the current one on Linux and Mac) so I spent some time researching that. I got to see the REPL running on Anthony’s laptop on Windows and I am so excited that Windows support will be included before 3.13.0 lands! 🎉

Partly inspired by Carol Willing’s PyCon preview message, I also thanked Pablo, Łukasz, and Lysandros in-person for all their work on the new Python REPL. 🤗

Until next year

I’ll be keynoting at PyOhio this year.

Besides PyOhio, I’m not sure whether I’ll make it to another conference until PyCon US next year. I’d love to attend all of them, but I do have work and personal goals that need accomplishing too!

I hope to see you at PyCon US 2025! In the meantime, if you’re wishing we’d exchanged contact details or met in-person, please feel free to stay in touch through Mastodon, LinkedIn, my weekly emails, YouTube, or Twitter.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #631 (May 28, 2024)

Tue, 2024-05-28 15:30

#631 – MAY 28, 2024
View in Browser »

Building a Python GUI Application With Tkinter

In this video course, you’ll learn the basics of GUI programming with Tkinter, the de facto Python GUI framework. Master GUI programming concepts such as widgets, geometry managers, and event handlers. Then, put it all together by building two applications: a temperature converter and a text editor.

pyastgrep and Custom Linting

This article from the developer of pyastgrep introduces you to the tool which can now be used as a library. The post talks about how to use it and what kind of linting it does best.

Upgrade Python Versions Without the Pain

Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →

What’s New in Django 5.1

Django 5.1 has gone alpha so the list of features targeting this release has more or less solidified. This article introduces you to what is coming in Django 5.1.

Quiz: How to Create Pivot Tables With Pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won’t find all the answers by reading the tutorial, so you’ll need to do some investigating on your own. By finding all the answers, you’re sure to learn some other interesting things along the way.

PEP 649 Re-targeted to 3.14

Python Enhancement Proposal 649: Deferred Evaluation Of Annotations Using Descriptors has been re-targeted to the Python 3.14 release

JupyterLab 4.2 and Notebook 7.2 Released


Articles & Tutorials Testing With Python: The Different Types of Tests

This is part 5 of a deep dive into writing automated tests, but also works well as an independent article. This post talks about the taxonomy of testing, like the differences between unit and integration tests, and how nobody can quite agree on a definition of either.

Python’s Built-in Exceptions: A Walkthrough With Examples

In this tutorial, you’ll get to know some of the most commonly used built-in exceptions in Python. You’ll learn when these exceptions can appear in your code and how to handle them. Finally, you’ll learn how to raise some of these exceptions in your code.

Software Engineering Hiring and Firing

This article is a deep dive on the hiring and firing practices in the software field, and unlike most articles focuses on senior engineering roles. It isn’t a “first job” post, but a “how the decision process works” article.

Enabling Async MongoDB Operations in Streamlit

Streamlit is a wonderful tool for building dashboards with its peculiar execution model, but using asyncio data sources with it can be a real pain. This article is about how you correctly use those two technologies together.
HANDMADESOFTWARE • Shared by Thorin Schiffer

EuroPython 2024 Announces Keynote Speakers

EuroPython happens in Prague July 8-14 and as the conference approaches more and more is happening. This posting from their May newsletter highlights the keynotes and other announcements.

Writing Commit Messages

This guide admits to being “yet another”, but unlike most that are out there, spends less time discussing the cosmetic aspects of a good commit message and more time on the content.

PSF Announces 5-Year Sponsorship Commitment From Fastly

Python Software Foundation securing this sponsorship affects the entire Python ecosystem, most notably the security and reliability of the Python Package Index (PyPI).
SOCKET.DEV • Shared by Sarah Gooding

Untold Stories From 6 Years Working on Python Packaging

Sumana gave the closing keynote address at PyCon US this year and this posting shares all the links and references from the talk.

The Python calendar Module: Create Calendars With Python

Learn to use the Python calendar module to create and customize calendars in plain text, HTML or directly in your terminal.

TIL: Accessibility Resources #2

This post is a collection of accessibility resources mostly for web sites, but some tools can be used elsewhere as well.

Projects & Code PgQueuer: Python & PostgreSQL Job Queuing Library


Tapyr: Shiny for Python Application Template

GITHUB.COM/APPSILON • Shared by Appsilon

Oven: Explore Python Packages


tkforge: Drag & Drop in Figma to Create a Python GUI


tach: Enforce a Modular, Decoupled Package Architecture


Events Weekly Real Python Office Hours Q&A (Virtual)

May 29, 2024

SPb Python Drinkup

May 30, 2024

Building Python Communities Yaounde

June 1 to June 3, 2024

Django Girls Medellín

June 1 to June 2, 2024

PyDelhi User Group Meetup

June 1, 2024

Melbourne Python Users Group, Australia

June 3, 2024

DjangoCon Europe 2024

June 5 to June 10, 2024

PyCon Colombia 2024

June 7 to June 10, 2024

Happy Pythoning!
This was PyCoder’s Weekly Issue #631.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Ned Batchelder: One way to fix Python circular imports

Tue, 2024-05-28 13:46

In Python, a circular import is when two files each try to import the other, causing a failure when a module isn’t fully initialized. The best way to fix this situation is to organize your code in layers so that the importing relationships naturally flow in just one direction. But sometimes it works to simply change the style of import statement you use. I’ll show you.

Let’s say you have these files:

2from two import func_two
4def func_one():
5    func_two()
2from one import func_one
4def do_work():
5    func_one()
7def func_two():
8    print("Hello, world!")
2from two import do_work

If we run, we get this:

% python
Traceback (most recent call last):
  File "", line 2, in <module>
    from two import do_work
  File "", line 2, in <module>
    from one import func_one
  File "", line 2, in <module>
    from two import func_two
ImportError: cannot import name 'func_two' from partially initialized
  module 'two' (most likely due to a circular import) (

When Python imports a module, it executes the file line by line. Every global in the file (top-level name including functions and classes) becomes an attribute on the module object being constructed. In, we import from at line 2. At that moment, the two module has been created, but it has no attributes yet because nothing has been defined yet. It will eventually have do_work and func_two, but we haven’t executed those def statements yet, so they don’t exist. Like a function call, when the import statement is run, it begins executing the imported file, and doesn’t come back to the current file until the import is done.

The import of starts, and its line 2 tries to get a name from the two module. As we just said, the two module exists, but has no names defined yet. That gives us the error.

Instead of importing names from modules, we can import whole modules instead. All we do is change the form of the imports, and how we reference the functions from the imported modules, like this:

2import two              # was:  from two import func_two
4def func_one():
5    two.func_two()      # was:  func_two()
2import one              # was:  from one import func_one
4def do_work():
5    one.func_one()      # was:  func_one()
7def func_two():
8    print("Hello, world!")
2from two import do_work

Running the fixed code, we get this:

% python
Hello, world!

It works because imports one at line 2, and then imports two at its line 2. That works just fine, because the two module exists. It’s still empty like it was before the fix, but now we aren’t trying to find a name in it during the import. Once all of the imports are done, the one and two modules both have all their names defined, and we can access them from inside our functions.

The key idea here is that “from two import func_two” tries to find func_two during the import, before it exists. Deferring the name lookup to the body of the function by using “import two” lets all of the modules get themselves fully initialized before we try to use them, avoiding the circular import error.

As I mentioned at the top, the best way to fix circular imports is to structure your code so that modules don’t have mutual dependencies like this. But that isn’t always easy, and this can buy you a little time to get your code working again.

Categories: FLOSS Project Planets

Go Deh: Recreating the CVM algorithm for estimating distinct elements gives problems

Tue, 2024-05-28 12:15


 Someone at work posted a link to this Quanta Magazine article. It describes a novel, and seemingly straight-forward way to estimate the number of distinct elements in a datastream. 

Quanta describes the algorithm, and as an example gives "counting the number of distinct words in Hamlet".

Following Quanta

I looked at the description and decided to follow their text. They carefully described each round of the algorithm which I coded up and then looked for the generalizations and implemented a loop over alll items in the stream ....

It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I re-checked my implementation of the Quanta-described algorithm and couldn't see any mistake, but I had originally noticed a link to the original paper. I did not follow it at first as original papers can be heavily into maths notation and I prefer reading algorithms described in code/pseudocode. 

I decided to take a look at the original.

The CVM Original Paper

I scanned the paper.

I read the paper.

I looked at Algorithm 1 as a probable candidate to decypher into Python, but the description was cryptic. Heres that description taken from the paper:

AI To the rescue!?

I had a brainwave💡lets chuck it at two AI's and see what they do. I had Gemini and I had Copilot to hand and asked them each to express Algorithm 1 as Python. Gemini did something, and Copilot finally did something but I first had to open the page in Microsoft Edge.
There followed hours of me reading and cross-comparing between the algorithm and the AI's. If I did not understand where something came from I would ask the generating AI; If I found an error I would first, (and second and...), try to get the AI to make a fix I suggested.

At this stage I was also trying to get a feel for how the AI's could help me, (now way past what I thought the algorithm should be, just to see what it would take to get those AI's to cross T's and dot I's on a good solution).
Not a good use of time! I now know that asking questions to update one of the 20 to 30 lines of the Python function might fix that line, but unfix another line you had fixed before. Code from the AI does not have line numbers making it difficult to state what needs changing, and where.They can suggest type hints and create the beginnings of docstrings, but, for example, it pulled out the wrong authors for the name of the algorithm.
In line 1 of the algorithm, the initialisation of thresh is clearly shown, I thought, but both AI's had difficulty getting the Python right. eventually I cut-n-pasted the text into each AI, where they confidentially said "OF course...", made a change, and then I had to re-check for any other changes.

My Code

I first created this function:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    ...    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            X = {x_item for x_item in X                    if random.random() < 0.5}            p /= 2    return len(X) / p

I tested it with Hamlet data and it made OK estimates.

Elated, I took a break.

Hacker News

The next evening I decided to do a search to see If anyone else was talking about the algorithm and found a thread on Hacker News that was right up my street. People were discussing those same problems found in the Quanta Article - and getting similar ginormous answers. They had one of the original Authors of the paper making comments! And others had created code from the actual paper and said it was also easier than the Quanta description.

The author mentioned that no less than Donald Knuth had taken an interest in their algorithm and had noted that the expression starting `X = ...` four lines from the end could, thoretically, make no change to X, and the solution was to encase the assignment in a while loop that only exited if len(X) < thresh.

Code update

I decided to add that change:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object must support an initial call to __len__
    Parameters:    stream (Collection[Any]): The input stream as a collection of hashable         items.    epsilon (float): The desired relative error in the estimate. It must be in         the range (0, 1).    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1).
    Returns:    float: An estimate of the number of distinct elements in the input stream.    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2    return len(X) / p


In the code above, the variable thresh, (threshhold), named from Algorithm 1, is used in the Quanta article to describe the maximum storage available to keep items from the stream that have been seen before. You must know the length of the stream - m, epsilon, and delta to calculate thresh.

If you were to have just the stream and  thresh as the arguments you could return both the estimate of the number of distinct items in the stream as well as counting the number of total elements in the stream.
Epsilon could be calculated from the numbers we now know.

def F0_Estimator2(stream: Iterable[Any],                 thresh: int,                  ) -> tuple[float, int]:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object does NOT have to support a call to __len__
    Parameters:    stream (Iterable[Any]): The input stream as an iterable of hashable         items.    thresh (int): The max threshhold of stream items used in the
    Returns:    tuple[float, int]: An estimate of the number of distinct elements in the         input stream, and the count of the number of items in stream.    """    p = 1    X = set()    m = 0  # Count of items in stream
    for item in stream:        m += 1        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2                return len(X) / p, m
def F0_epsilon(               thresh: int,               m: int,               delta: float=0.05,  #  0.05 is 95%              ) -> float:    """    Calculate the relative error in the estimate from F0_Estimator2(...)
    Parameters:    thresh (int): The thresh value used in the call TO F0_Estimator2.    m (int): The count of items in the stream FROM F0_Estimator2.    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1) and is usually 0.05        to 0.01, (95% to 99% certainty).
    Returns:    float: The calculated relative error in the estimate
    """    return math.sqrt(12 / thresh * math.log(8 * m / delta))

Testingdef stream_gen(k: int=30_000, r: int=7_000) -> list[int]:    "Create a randomised list of k ints of up to r different values."    return random.choices(range(r), k=k)
def stream_stats(s: list[Any]) -> tuple[int, int]:    length, distinct = len(s), len(set(s))    return length, distinct
stream_size = 2**18reps = 5target_uniques = 1while target_uniques < stream_size:    the_stream = stream_gen(stream_size+1, target_uniques)    target_uniques *= 4    size, unique = stream_stats(the_stream)
    print(f"\n  Actual:\n    {size = :_}, {unique = :_}\n  Estimations:")
    delta = 0.05    threshhold = 2    print(f"    All runs using {delta = :.2f} and with estimate averaged from {reps} runs:")    while threshhold < size:        estimate, esize = F0_Estimator2(the_stream.copy(), threshhold)        estimate = sum([estimate] +                    [F0_Estimator2(the_stream.copy(), threshhold)[0]                        for _ in range(reps - 1)]) / reps        estimate = int(estimate + 0.5)        epsilon = F0_epsilon(threshhold, esize, delta)        print(f"      With {threshhold = :7_} -> "            f"{estimate = :_}, +/-{epsilon*100:.0f}%"            + (f" {esize = :_}" if esize != size else ""))        threshhold *= 8

The algorithm generates an estimate based on random sampling, so I run it multiple times for the same input and report the mean estimate from those runs.

Sample output


  Actual:    size = 262_145, unique = 1  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 1, +/-1026%      With threshhold =      16 -> estimate = 1, +/-363%      With threshhold =     128 -> estimate = 1, +/-128%      With threshhold =   1_024 -> estimate = 1, +/-45%      With threshhold =   8_192 -> estimate = 1, +/-16%      With threshhold =  65_536 -> estimate = 1, +/-6%
  Actual:    ...   Actual:    size = 262_145, unique = 1_024  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 16_384, +/-1026%      With threshhold =      16 -> estimate = 768, +/-363%      With threshhold =     128 -> estimate = 1_101, +/-128%      With threshhold =   1_024 -> estimate = 1_018, +/-45%      With threshhold =   8_192 -> estimate = 1_024, +/-16%      With threshhold =  65_536 -> estimate = 1_024, +/-6%
  Actual:    size = 262_145, unique = 4_096  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 13_107, +/-1026%      With threshhold =      16 -> estimate = 3_686, +/-363%      With threshhold =     128 -> estimate = 3_814, +/-128%      With threshhold =   1_024 -> estimate = 4_083, +/-45%      With threshhold =   8_192 -> estimate = 4_096, +/-16%      With threshhold =  65_536 -> estimate = 4_096, +/-6%
  Actual:    size = 262_145, unique = 16_384  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 0, +/-1026%      With threshhold =      16 -> estimate = 15_155, +/-363%      With threshhold =     128 -> estimate = 16_179, +/-128%      With threshhold =   1_024 -> estimate = 16_986, +/-45%      With threshhold =   8_192 -> estimate = 16_211, +/-16%      With threshhold =  65_536 -> estimate = 16_384, +/-6%
  Actual:    size = 262_145, unique = 64_347  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 26_214, +/-1026%      With threshhold =      16 -> estimate = 73_728, +/-363%      With threshhold =     128 -> estimate = 61_030, +/-128%      With threshhold =   1_024 -> estimate = 64_422, +/-45%      With threshhold =   8_192 -> estimate = 64_760, +/-16%      With threshhold =  65_536 -> estimate = 64_347, +/-6%

 Looks good!


Another day, and I decide to start writing this blog post. I searched again and found the Wikipedia article on what it called the Count-distinct problem

Looking through it, It had this wrong description of the CVM algorithm:

The, (or a?),  problem with the wikipedia entry is that it shows

p ← p 2

...within the while loop. You need an enclosing if |B| >= s for the while loop and the  assignment to p outside the while loop, but inside this new if statement.

It's tough!

Both Quanta Magazine, and whoever added the algorithm to Wikipedia got the algorithm wrong.

I've written around two hundred tasks on site for over a decade. Others had to read my description and create code in their chosen language to implement those tasks. I have learnt from the feedback I got on talk pages to hone that craft, but details matter. Examples matter. Constructive feedback matters.



Categories: FLOSS Project Planets

Real Python: Efficient Iterations With Python Iterators and Iterables

Tue, 2024-05-28 10:00

Python’s iterators and iterables are two different but related tools that come in handy when you need to iterate over a data stream or container. Iterators power and control the iteration process, while iterables typically hold data that you want to iterate over one value at a time.

Iterators and iterables are fundamental components of Python programming, and you’ll have to deal with them in almost all your programs. Learning how they work and how to create them is key for you as a Python developer.

In this video course, you’ll learn how to:

  • Create iterators using the iterator protocol in Python
  • Understand the differences between iterators and iterables
  • Work with iterators and iterables in your Python code
  • Use generator functions and the yield statement to create generator iterators
  • Build your own iterables using different techniques, such as the iterable protocol
  • Use the asyncio module and the await and async keywords to create asynchronous iterators

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!

Tue, 2024-05-28 06:27

PSF Board elections are a chance for the community to choose representatives to help the PSF create a vision for and build the future of the Python community. This year there are 3 seats open on the PSF board. Check out who is currently on the PSF Board. (Débora Azevedo, Kwon-Han Bae, and Tania Allard are at the end of their current terms.)

Office Hours Details

This year, the PSF Board is running Office Hours so you can connect with current members to ask questions and learn more about what being a part of the Board entails. There will be two Office Hour sessions:

  • June 11th, 4 PM UTC
  • June 18th, 12 PM UTC

Make sure to check what time that is for you. We welcome you to join the PSF Discord and navigate to the #psf-elections channel to participate in Office Hours. The server is moderated by PSF Staff and locked between office hours sessions. If you’re new to Discord, check out some Discord Basics to help you get started.

Who runs for the Board?

People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Nomination info

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, so you have a few weeks to research the role and craft a nomination statement. The nomination period ends on June 25th, 2:00 PM UTC.

Categories: FLOSS Project Planets

Robin Wilson: How to install the Python triangle package on an Apple Silicon Mac

Tue, 2024-05-28 05:53

I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:

triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:

pip install git+

and it should all work fine.

Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.

Categories: FLOSS Project Planets

Real Python: How to Create Pivot Tables With pandas

Mon, 2024-05-27 10:00

A pivot table is a data analysis tool that allows you to take columns of raw data from a pandas DataFrame, summarize them, and then analyze the summary data to reveal its insights.

Pivot tables allow you to perform common aggregate statistical calculations such as sums, counts, averages, and so on. Often, the information a pivot table produces reveals trends and other observations your original raw data hides.

Pivot tables were originally implemented in early spreadsheet packages and are still a commonly used feature of the latest ones. They can also be found in modern database applications and in programming languages. In this tutorial, you’ll learn how to implement a pivot table in Python using pandas’ DataFrame.pivot_table() method.

Before you start, you should familiarize yourself with what a pandas DataFrame looks like and how you can create one. Knowing the difference between a DataFrame and a pandas Series will also prove useful.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.

The other thing you’ll need for this tutorial is, of course, data. You’ll use the Sales Data Presentation - Dashboards data, which is freely available for you to use under the Apache 2.0 License. The data has been made available for you in the sales_data.csv file that you can download by clicking the link below.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

This table provides an explanation of the data you’ll use throughout this tutorial:

Column Name Data Type (PyArrow) Description order_number int64 Order number (unique) employee_id int64 Employee’s identifier (unique) employee_name string Employee’s full name job_title string Employee’s job title sales_region string Sales region employee works within order_date timestamp[ns] Date order was placed order_type string Type of order (Retail or Wholesale) customer_type string Type of customer (Business or Individual) customer_name string Customer’s full name customer_state string Customer’s state of residence product_category string Category of product (Bath Products, Gift Basket, Olive Oil) product_number string Product identifier (unique) product_name string Name of product quantity int64 Quantity ordered unit_price double Selling price of one product sale_price double Total sale price (unit_price × quantity)

As you can see, the table stores data for a fictional set of orders. Each row contains information about a single order. You’ll become more familiar with the data as you work through the tutorial and try to solve the various challenge exercises contained within it.

Throughout this tutorial, you’ll use the pandas library to allow you to work with DataFrames and the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types pandas uses by default.

If you’re working at the command line, you can install both pandas and pyarrow using python -m pip install pandas pyarrow, perhaps within a virtual environment to avoid clashing with your existing environment. If you’re working within a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. With the libraries in place, you can then read your data into a DataFrame:

Python >>> import pandas as pd >>> sales_data = pd.read_csv( ... "sales_data.csv", ... parse_dates=["order_date"], ... dayfirst=True, ... ).convert_dtypes(dtype_backend="pyarrow") Copied!

First of all, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the sales_data variable, you used pandas’ read_csv() function. The first parameter refers to the file being read, while parse_dates highlights that the order_date column’s data is intended to be read as the datetime64[ns] type. But there’s an issue that will prevent this from happening.

In your source file, the order dates are in dd/mm/yyyy format, so to tell read_csv() that the first part of each date represents a day, you also set the dayfirst parameter to True. This allows read_csv() to now read the order dates as datetime64[ns] types.

With order dates successfully read as datetime64[ns] types, the .convert_dtypes() method can then successfully convert them to a timestamp[ns][pyarrow] data type, and not the more general string[pyarrow] type it would have otherwise done. Although this may seem a bit circuitous, your efforts will allow you to analyze data by date should you need to do this.

If you want to take a look at the data, you can run sales_data.head(2). This will let you see the first two rows of your dataframe. When using .head(), it’s preferable to do so in a Jupyter Notebook because all of the columns are shown. Many Python REPLs show only the first and last few columns unless you use pd.set_option("display.max_columns", None) before you run .head().

If you want to verify that PyArrow types are being used, sales_data.dtypes will confirm it for you. As you’ll see, each data type contains [pyarrow] in its name.

Note: If you’re experienced in data analysis, you’re no doubt aware of the need for data cleansing. This is still important as you work with pivot tables, but it’s equally important to make sure your input data is also tidy.

Tidy data is organized as follows:

  • Each row should contain a single record or observation.
  • Each column should contain a single observable or variable.
  • Each cell should contain an atomic value.

If you tidy your data in this way, as part of your data cleansing, you’ll also be able to analyze it better. For example, rather than store address details in a single address field, it’s usually better to split it down into house_number, street_name, city, and country component fields. This allows you to analyze it by individual streets, cities, or countries more easily.

In addition, you’ll also be able to use the data from individual columns more readily in calculations. For example, if you had columns room_length and room_width, they can be multiplied together to give you room area information. If both values are stored together in a single column in a format such as "10 x 5", the calculation becomes more awkward.

The data within the sales_data.csv file is already in a suitably clean and tidy format for you to use in this tutorial. However, not all raw data you acquire will be.

It’s now time to create your first pandas pivot table with Python. To do this, first you’ll learn the basics of using the DataFrame’s .pivot_table() method.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

Take the Quiz: Test your knowledge with our interactive “How to Create Pivot Tables With pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Create Pivot Tables With pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won't find all the answers by reading the tutorial, so you'll need to do some investigating on your own. By finding all the answers, you're sure to learn some other interesting things along the way.

How to Create Your First Pivot Table With pandas

Now that your learning journey is underway, it’s time to progress toward your first learning milestone and complete the following task:

Calculate the total sales for each type of order for each region.

Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Bytes: #385 RESTing on Postgres

Mon, 2024-05-27 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="">PostgresREST</a></li> <li><a href=""><strong>How Python Asyncio Works: Recreating it from Scratch</strong></a></li> <li><a href="">Bend</a></li> <li><a href="">The Smartest Way to Learn Python Regular Expressions</a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="385">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by Mailtrap: <a href=""><strong></strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href=""><strong></strong></a></li> <li>Brian: <a href=""><strong></strong></a></li> <li>Show: <a href=""><strong></strong></a></li> </ul> <p>Join us on YouTube at <a href=""><strong></strong></a> to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="">PostgresREST</a></p> <ul> <li>PostgREST serves a fully RESTful API from any existing PostgreSQL database. It provides a cleaner, more standards-compliant, faster API than you are likely to write from scratch.</li> <li>Speedy <ul> <li>First the server is written in <a href="">Haskell</a> using the <a href="">Warp</a> HTTP server (aka a compiled language with lightweight threads). </li> <li>Next it delegates as much calculation as possible to the database.</li> <li>Finally it uses the database efficiently with the <a href="">Hasql</a> library</li> </ul></li> <li>PostgREST <a href="">handles authentication</a> (via JSON Web Tokens) and delegates authorization to the role information defined in the database. This ensures there is a single declarative source of truth for security.</li> </ul> <p><strong>Brian #2:</strong> <a href=""><strong>How Python Asyncio Works: Recreating it from Scratch</strong></a></p> <ul> <li>Jacob Padilla</li> <li>Cool tutorial walking through how async works, including <ul> <li>Generators Review</li> <li>The Event Loop</li> <li>Sleeping</li> <li>Yield to Await</li> <li>Await with AsyncIO</li> </ul></li> <li>Another great async resource is: <ul> <li><a href="">Build your Own Async</a> <ul> <li>David Beasley talk from 2019</li> </ul></li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="">Bend</a></p> <ul> <li>A massively parallel, high-level programming language.</li> <li>With <strong>Bend</strong> you can write parallel code for multi-core CPUs/GPUs without being a C/CUDA expert with 10 years of experience. </li> <li>It feels just like Python!</li> <li>No need to deal with the complexity of concurrent programming: locks, mutexes, atomics... <strong>any</strong> work that can be done in parallel <strong>will</strong> be done in parallel.</li> </ul> <p><strong>Brian #4:</strong> <a href="">The Smartest Way to Learn Python Regular Expressions</a></p> <ul> <li>Christian Mayer, Zohaib Riaz, and Lukas Rieger</li> <li>Self published ebook on Python Regex that utilizes <ul> <li>book form readings, links to video course sections</li> <li>puzzle challenges to complete online</li> </ul></li> <li>It’s a paid resource, but the min is free.</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="">Replay</a> - A graphic memoir by Prince of Persia creator Jordan Mechner, recounting his own family story of war, exile and new beginnings.</li> </ul> <p>Michael:</p> <ul> <li><a href="">PyCon 2026</a></li> </ul> <p><strong>Joke:</strong> Shells Scripts</p>
Categories: FLOSS Project Planets

Zato Blog: Web scraping as an API service

Mon, 2024-05-27 04:00
Web scraping as an API service 2024-05-27, by Dariusz Suchojad Overview

In systems-to-systems integrations, there comes an inevitable time when we have to employ some kind of a web scraping tool to integrate with a particular application. Despite its not being our first choice, it is good to know what to use at such a time - in this article, I provide a gentle introduction to my favorite tool of this kind, called Playwright, followed by sample Python code that integrates it with an API service.

Naturally, in the context of backend integrations, web scraping should be avoided and, generally, it should be considered the last resort. The basic issue here is that while the UI term contains the "interface" part, it is not really the "Application Programming" Interface that we would like to have.

It is not that the UI cannot be programmed against. After all, a web browser does just that, it takes a web page and renders it as expected. Same goes for desktop or mobile applications. Also, anyone integrating with mainframe computers will recognize that this is basically what 3270 can be used for too.

Rather, the fundamental issue is that web scraping goes against the principles of separation of layers and roles across frontend, middleware and backend, which in turn means that authors of resources (e.g. HTML pages) do not really expect for many people to access them in automated ways.

Perhaps they actually should expect it, and web pages should finally start to resemble genuine knowledge graphs, easy to access by humans, be it manually or through automation tools, but the reality today is that it is not the case and, in comparison with backend systems, the whole of the web scraping space is relatively brittle, which is why we shun this approach in integrations.

Yet, another part of reality, particularly in enterprise integrations, is that people may be sometimes given access to a frontend application on an internal network and that is it. No API, no REST, no JSON, no POST data, no real data formats, and one is simply supposed to fill out forms as part of a business process.

Typically, such a situation will result in an integration gap. There will be fully automated parts in the business process preceding this gap, with multiple systems coordinated towards a specific goal and there will be subsequent steps in the process, also fully automated.

Or you may be given access only to a specific frontend and only through VPN via a single remote Windows desktop. Getting access to a REST API may take months or may be never realized because of some high level licensing issues. This is not uncommon in the real life.

Such a gap can be a jarring and sore point, truly ruining the whole, otherwise fluid, integration process. This creates a tension and to resolve the tension, we can, should all the attempts to find a real API fail, finally resort to web scraping.

It is mostly in this context that I am looking at Playwright below - the tool is good and it has many other uses that go beyond the scope of this text, and it is well worth knowing it, for instance for frontend testing of your backend systems, but, when we deal with API integrations, we should not overdo with web scraping.

Needless to say, if web scraping is what you do primarily, your perspective will be somewhat different - you will not need any explanation of why it is needed or when, and you may be only looking for a way to enclose up your web scraping code in API services. This article will explain that too.

Introducing Playwright

The nice part of Playwright is that we can use it to visually prepare a draft of Python code that will scrape a given resource. That is, instead of programming it in Python, we go to an address, fill out a form, click buttons and otherwise use everything as usually and Playwright generates for us code that will be later used in integrations.

That code will require a bit of clean-up work, which I will talk about below, but overall it works very nicely and is certainly useful. The result is not one of these do-not-touch auto-generated pieces of code that are better left to their own.

While there are better ways to integrate with Jira, I chose that application as an example of Playwright's usage simply because I cannot show you any internal application in a public blog post.

Below, there are two windows. One is Playwright's emulating a Blackberry device to open a resource. I was clicking around, I provided an email address and then I clicked the same email field once more. To the right, based on my actions, we can find the generated Python code, which I consider quite good and readable.

The Playwright Inspector, the tool that gave us the code, will keep recording all of our actions until we click the "Record" button which then allows us to click the button next to "Record" which is "Copy code to clipboard". We can then save the code to a separate file and run it on demand, automatically.

But first, we will need to install Playwright.

Installing and starting Playwright

The tools is written in TypeScript and can be installed using npx, which in turn is part of NodeJS.

Afterwards, the "playwright install" call is needed as well because that will potentially install runtime dependencies, such as Chrome libraries.

Finally, we install Playwright using pip as well because we want to access with Python. Note that if you are installing Playwright under Zato, the "/path/to/pip" will be typically "/opt/zato/code/bin/pip".

npx -g --yes playwright install playwright install /path/to/pip install playwright

We can now start it as below. I am using BlackBerry as an example of what Playwright is capable of. Also, it is usually more convenient to use a mobile version of a site when the main window and Inspector are opened side by side, but you may prefer to use Chrome, Firefox or anything else.

playwright codegen --device "BlackBerry Z30"

That is practically everything as using Playwright to generate code in our context goes. Open the tool, fill out forms, copy code to a Python module, done.

What is still needed, though, is cleaning up the resulting code and embedding it in an API integration process.

Code clean-up

After you keep using Playwright for a while with longer forms and pages, you will note that the generated code tends to accumulate parts that repeat.

For instance, in the module below, which I already cleaned up, the same "[placeholder=\"Enter email\"]" reference to the email field is used twice, even if a programmer developing this could would prefer to introduce a variable for that.

There is not a good answer to the question of what to do about it. On the one hand, obviously, being programmers we would prefer not to repeat that kind of details. On the other hand, if we clean up the code too much, this may result in too much of a maintenance burden because we need to keep it mind that we do not really want to invest to much in web scraping and, should there be a need to repeat the whole process, we do not want to end up with Playwright's code auto-generated from scratch once more, without any of our clean-up.

A good compromise position is to at least extract any kind of credentials from the code to environment variables or a similar place and to remove some of the code comments that Playwright generates. The result as below is what it should like at the end. Not too much effort without leaving the whole code as it was originally either.

Save the code below as "" as this is what the API service below will use.

# -*- coding: utf-8 -*- # stdlib import os # Playwright from playwright.sync_api import Playwright, sync_playwright class Config: Email = os.environ.get('APP_EMAIL', '') Password = os.environ.get('APP_PASSWORD', '') Headless = bool(os.environ.get('APP_HEADLESS', False)) def run(playwright: Playwright) -> None: browser = playwright.chromium.launch(headless=Config.Headless) # type: ignore context = browser.new_context() # Open new page page = context.new_page() # Open project boards page.goto("") page.goto("") # Fill out the email page.locator("[placeholder=\"Enter email\"]").click() page.locator("[placeholder=\"Enter email\"]").fill(Config.Email) # Click #login-submit page.locator("#login-submit").click() with sync_playwright() as playwright: run(playwright) Web scraping as a standalone activity

We have the generated code so the first thing to do with it is to run it from command line. This will result in a new Chrome window's accessing Jira - it is Chrome, not Blackberry, because that is the default for Playwright.

The window will close soon enough but this is fine, that code only demonstrates a principle, it is not a full integration task.

python /path/to/

It is also useful that we can run the same Python module from our IDE, giving us the ability to step through the code line by line, observing what changes when and why.

Web scraping as an API service

Finally, we are ready to invoke the standalone module from an API service, as in the following code that we are also going to make available as a REST channel.

A couple of notes about the Python service below:

  • We invoke Playwright in a subprocess, as a shell command
  • We accept input through data models although we do not provide any output definition because it is not needed here
  • When we invoke Playwright, we set the APP_HEADLESS to True which will ensure that it does not attempt to actually display a Chrome window. After all, we intend for this service to run on Linux servers, in backend, and such a thing will be unlikely to work in this kind of an environment.

Other than that, this is a straightforward Zato service - it receives input, carries out its work and a reply is returned to the caller (here, empty).

# -*- coding: utf-8 -*- # stdlib from dataclasses import dataclass # Zato from zato.server.service import Model, Service # ########################################################################### @dataclass(init=False) class WebScrapingDemoRequest(Model): email: str password: str # ########################################################################### class WebScrapingDemo(Service): name = 'demo.web-scraping' class SimpleIO: input = WebScrapingDemoRequest def handle(self): # Path to a Python installation that Playwright was installed under py_path = '/path/to/python' # Path to a Playwright module with code to invoke playwright_path = '/path/to/' # This is a template script that we will invoke in a subprocess command_template = """ APP_EMAIL={app_email} APP_PASSWORD={app_password} APP_HEADLESS=True {py_path} {playwright_path} """ # This is our input data input = self.request.input # type: WebScrapingDemoRequest # Extract credentials from the input .. email = password = input.password # .. build the full command, taking all the config into account .. command = command_template.format( app_email = email, app_password = password, py_path = py_path, playwright_path = playwright_path, ) # .. invoke the command in a subprocess .. result = self.commands.invoke(command) # .. if it was not a success, log the details received .. if not result.is_ok:'Exit code -> %s', result.exit_code)'Stderr -> %s', result.stderr)'Stdout -> %s', result.stdout) # ###########################################################################

Now, the REST channel:

The last thing to do is to invoke the service - I am using curl from the command line below but it could very well be Postman or a similar option.

curl localhost:17010/demo/web-scraping -d '{"email":"", "password":"abc"}' ; echo

There will be no Chrome window this time around because we run Playwright in the headless mode. There will be no output from curl either because we do not return anything from the service but in server logs we will find details such as below.

We can learn from the log that the command took close to 4 seconds to complete, that the exit code was 0 (indicating success) and that is no stdout or stderr at all.

INFO - Command ` APP_PASSWORD=abc APP_HEADLESS=True /path/to/python /path/to/ ` completed in 0:00:03.844157, exit_code -> 0; len-out=0 (0 Bytes); len-err=0 (0 Bytes); cid -> zcmdc5422816b2c6ff9f10742134

We are now ready to continue to work on it - for instance, you will notice that the password is visible in logs and this should not be allowed.

But, all such works are extra in comparison with the main theme - we have Playwright, which is a a tool that allows us to quickly integrate with frontend applications and we can automate it through API services. Just as expected.

Next steps More blog posts
Categories: FLOSS Project Planets

Quansight Labs Blog: Dataframe interoperability - what has been achieved, and what comes next?

Sun, 2024-05-26 20:00
An overview of the dataframe landscape, and solution to the "we only support pandas" problem
Categories: FLOSS Project Planets

Talk Python to Me: #463: Running on Rust: Granian Web Server

Sat, 2024-05-25 04:00
So you've created a web app with Python using Flask, Django, FastAPI, or even Emmett. It works great on your machine. How do you get it out to the world? You'll need a production-ready web server. On this episode, we have Giovanni Barillari to tell us about his relatively-new server named Granian. It promises better performance and much better consistency than many of the more well known ones today.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href=''>Neo4j</a><br> <a href=''>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>New spaCy course</b>: <a href="" target="_blank" rel="noopener"></a><br/> <br/> <b>Giovanni</b>: <a href="" target="_blank" rel="noopener">@gi0baro</a><br/> <b>Granian</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Emmett</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Renoir</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Watch this episode on YouTube</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Episode transcripts</b>: <a href="" target="_blank" rel="noopener"></a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="" target="_blank" rel="noopener"></a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Real Python: Quiz: How to Create Pivot Tables With pandas

Fri, 2024-05-24 08:00

In this quiz, you’ll test your understanding of how to create pivot tables with pandas.

By working through this quiz, you’ll review your knowledge of pivot tables and also expand beyond what you learned in the tutorial. For some of the questions, you’ll need to do some research outside of the tutorial itself.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets