Planet Python
PyCoder’s Weekly: Issue #631 (May 28, 2024)
#631 – MAY 28, 2024
View in Browser »
In this video course, you’ll learn the basics of GUI programming with Tkinter, the de facto Python GUI framework. Master GUI programming concepts such as widgets, geometry managers, and event handlers. Then, put it all together by building two applications: a temperature converter and a text editor.
REAL PYTHON course
This article from the developer of pyastgrep introduces you to the tool which can now be used as a library. The post talks about how to use it and what kind of linting it does best.
LUKE PLANT
Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →
ACTIVESTATE sponsor
Django 5.1 has gone alpha so the list of features targeting this release has more or less solidified. This article introduces you to what is coming in Django 5.1.
JEFF TRIPLETT
This quiz is designed to push your knowledge of pivot tables a little bit further. You won’t find all the answers by reading the tutorial, so you’ll need to do some investigating on your own. By finding all the answers, you’re sure to learn some other interesting things along the way.
REAL PYTHON
Python Enhancement Proposal 649: Deferred Evaluation Of Annotations Using Descriptors has been re-targeted to the Python 3.14 release
PYTHON.ORG
This is part 5 of a deep dive into writing automated tests, but also works well as an independent article. This post talks about the taxonomy of testing, like the differences between unit and integration tests, and how nobody can quite agree on a definition of either.
BITECODE
In this tutorial, you’ll get to know some of the most commonly used built-in exceptions in Python. You’ll learn when these exceptions can appear in your code and how to handle them. Finally, you’ll learn how to raise some of these exceptions in your code.
REAL PYTHON
This article is a deep dive on the hiring and firing practices in the software field, and unlike most articles focuses on senior engineering roles. It isn’t a “first job” post, but a “how the decision process works” article.
ED CREWE
Streamlit is a wonderful tool for building dashboards with its peculiar execution model, but using asyncio data sources with it can be a real pain. This article is about how you correctly use those two technologies together.
HANDMADESOFTWARE • Shared by Thorin Schiffer
EuroPython happens in Prague July 8-14 and as the conference approaches more and more is happening. This posting from their May newsletter highlights the keynotes and other announcements.
EUROPYTHON
This guide admits to being “yet another”, but unlike most that are out there, spends less time discussing the cosmetic aspects of a good commit message and more time on the content.
SIMON TATHAM
Python Software Foundation securing this sponsorship affects the entire Python ecosystem, most notably the security and reliability of the Python Package Index (PyPI).
SOCKET.DEV • Shared by Sarah Gooding
Sumana gave the closing keynote address at PyCon US this year and this posting shares all the links and references from the talk.
SUMANA HARIHARESWARA
Learn to use the Python calendar module to create and customize calendars in plain text, HTML or directly in your terminal.
REAL PYTHON
This post is a collection of accessibility resources mostly for web sites, but some tools can be used elsewhere as well.
SARAH ABDEREMANE
GITHUB.COM/APPSILON • Shared by Appsilon
Oven: Explore Python Packages tkforge: Drag & Drop in Figma to Create a Python GUI tach: Enforce a Modular, Decoupled Package Architecture Events Weekly Real Python Office Hours Q&A (Virtual) May 29, 2024
REALPYTHON.COM
May 30, 2024
MEETUP.COM
June 1 to June 3, 2024
NOKIDBEHIND.ORG
June 1 to June 2, 2024
DJANGOGIRLS.ORG
June 1, 2024
MEETUP.COM
June 3, 2024
J.MP
June 5 to June 10, 2024
DJANGOCON.EU
June 7 to June 10, 2024
PYCON.CO
Happy Pythoning!
This was PyCoder’s Weekly Issue #631.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
Ned Batchelder: One way to fix Python circular imports
In Python, a circular import is when two files each try to import the other, causing a failure when a module isn’t fully initialized. The best way to fix this situation is to organize your code in layers so that the importing relationships naturally flow in just one direction. But sometimes it works to simply change the style of import statement you use. I’ll show you.
Let’s say you have these files:
1# one.py2from two import func_two
3
4def func_one():
5 func_two()
1# two.py
2from one import func_one
3
4def do_work():
5 func_one()
6
7def func_two():
8 print("Hello, world!")
1# main.py
2from two import do_work
3do_work()
If we run main.py, we get this:
% python main.pyTraceback (most recent call last):
File "main.py", line 2, in <module>
from two import do_work
File "two.py", line 2, in <module>
from one import func_one
File "one.py", line 2, in <module>
from two import func_two
ImportError: cannot import name 'func_two' from partially initialized
module 'two' (most likely due to a circular import) (two.py)
When Python imports a module, it executes the file line by line. Every global in the file (top-level name including functions and classes) becomes an attribute on the module object being constructed. In two.py, we import from one.py at line 2. At that moment, the two module has been created, but it has no attributes yet because nothing has been defined yet. It will eventually have do_work and func_two, but we haven’t executed those def statements yet, so they don’t exist. Like a function call, when the import statement is run, it begins executing the imported file, and doesn’t come back to the current file until the import is done.
The import of one.py starts, and its line 2 tries to get a name from the two module. As we just said, the two module exists, but has no names defined yet. That gives us the error.
Instead of importing names from modules, we can import whole modules instead. All we do is change the form of the imports, and how we reference the functions from the imported modules, like this:
1# one.py2import two # was: from two import func_two
3
4def func_one():
5 two.func_two() # was: func_two()
1# two.py
2import one # was: from one import func_one
3
4def do_work():
5 one.func_one() # was: func_one()
6
7def func_two():
8 print("Hello, world!")
1# main.py
2from two import do_work
3do_work()
Running the fixed code, we get this:
% python main.pyHello, world!
It works because two.py imports one at line 2, and then one.py imports two at its line 2. That works just fine, because the two module exists. It’s still empty like it was before the fix, but now we aren’t trying to find a name in it during the import. Once all of the imports are done, the one and two modules both have all their names defined, and we can access them from inside our functions.
The key idea here is that “from two import func_two” tries to find func_two during the import, before it exists. Deferring the name lookup to the body of the function by using “import two” lets all of the modules get themselves fully initialized before we try to use them, avoiding the circular import error.
As I mentioned at the top, the best way to fix circular imports is to structure your code so that modules don’t have mutual dependencies like this. But that isn’t always easy, and this can buy you a little time to get your code working again.
Go Deh: Recreating the CVM algorithm for estimating distinct elements gives problems
Someone at work posted a link to this Quanta Magazine article. It describes a novel, and seemingly straight-forward way to estimate the number of distinct elements in a datastream.
Quanta describes the algorithm, and as an example gives "counting the number of distinct words in Hamlet".
Following QuantaI looked at the description and decided to follow their text. They carefully described each round of the algorithm which I coded up and then looked for the generalizations and implemented a loop over alll items in the stream ....
It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I re-checked my implementation of the Quanta-described algorithm and couldn't see any mistake, but I had originally noticed a link to the original paper. I did not follow it at first as original papers can be heavily into maths notation and I prefer reading algorithms described in code/pseudocode.
I decided to take a look at the original.
The CVM Original PaperI scanned the paper.
I read the paper.
I looked at Algorithm 1 as a probable candidate to decypher into Python, but the description was cryptic. Heres that description taken from the paper:
AI To the rescue!?I had a brainwave💡lets chuck it at two AI's and see what they do. I had Gemini and I had Copilot to hand and asked them each to express Algorithm 1 as Python. Gemini did something, and Copilot finally did something but I first had to open the page in Microsoft Edge.
There followed hours of me reading and cross-comparing between the algorithm and the AI's. If I did not understand where something came from I would ask the generating AI; If I found an error I would first, (and second and...), try to get the AI to make a fix I suggested.
At this stage I was also trying to get a feel for how the AI's could help me, (now way past what I thought the algorithm should be, just to see what it would take to get those AI's to cross T's and dot I's on a good solution).
Not a good use of time! I now know that asking questions to update one of the 20 to 30 lines of the Python function might fix that line, but unfix another line you had fixed before. Code from the AI does not have line numbers making it difficult to state what needs changing, and where.They can suggest type hints and create the beginnings of docstrings, but, for example, it pulled out the wrong authors for the name of the algorithm.
In line 1 of the algorithm, the initialisation of thresh is clearly shown, I thought, but both AI's had difficulty getting the Python right. eventually I cut-n-pasted the text into each AI, where they confidentially said "OF course...", made a change, and then I had to re-check for any other changes.
I first created this function:
def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float: """ ... """ p = 1 X = set() m = len(stream) thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))for item in stream: X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: X = {x_item for x_item in X if random.random() < 0.5} p /= 2 return len(X) / p
I tested it with Hamlet data and it made OK estimates.
Elated, I took a break.
Hacker NewsThe next evening I decided to do a search to see If anyone else was talking about the algorithm and found a thread on Hacker News that was right up my street. People were discussing those same problems found in the Quanta Article - and getting similar ginormous answers. They had one of the original Authors of the paper making comments! And others had created code from the actual paper and said it was also easier than the Quanta description.
The author mentioned that no less than Donald Knuth had taken an interest in their algorithm and had noted that the expression starting `X = ...` four lines from the end could, thoretically, make no change to X, and the solution was to encase the assignment in a while loop that only exited if len(X) < thresh.
Code updateI decided to add that change:
def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float: """ Estimates the number of distinct elements in the input stream.This function implements the CVM algorithm for the problem of estimating the number of distinct elements in a stream of data. The stream object must support an initial call to __len__
Parameters: stream (Collection[Any]): The input stream as a collection of hashable items. epsilon (float): The desired relative error in the estimate. It must be in the range (0, 1). delta (float): The desired probability of the estimate being within the relative error. It must be in the range (0, 1).
Returns: float: An estimate of the number of distinct elements in the input stream. """ p = 1 X = set() m = len(stream) thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
for item in stream: X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: while len(X) == thresh: # Force a change X = {x_item for x_item in X if random.random() < 0.5} # Random, so could do nothing p /= 2 return len(X) / p
In the code above, the variable thresh, (threshhold), named from Algorithm 1, is used in the Quanta article to describe the maximum storage available to keep items from the stream that have been seen before. You must know the length of the stream - m, epsilon, and delta to calculate thresh.
If you were to have just the stream and thresh as the arguments you could return both the estimate of the number of distinct items in the stream as well as counting the number of total elements in the stream.
Epsilon could be calculated from the numbers we now know.
This function implements the CVM algorithm for the problem of estimating the number of distinct elements in a stream of data. The stream object does NOT have to support a call to __len__
Parameters: stream (Iterable[Any]): The input stream as an iterable of hashable items. thresh (int): The max threshhold of stream items used in the estimation.py
Returns: tuple[float, int]: An estimate of the number of distinct elements in the input stream, and the count of the number of items in stream. """ p = 1 X = set() m = 0 # Count of items in stream
for item in stream: m += 1 X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: while len(X) == thresh: # Force a change X = {x_item for x_item in X if random.random() < 0.5} # Random, so could do nothing p /= 2 return len(X) / p, m
def F0_epsilon( thresh: int, m: int, delta: float=0.05, # 0.05 is 95% ) -> float: """ Calculate the relative error in the estimate from F0_Estimator2(...)
Parameters: thresh (int): The thresh value used in the call TO F0_Estimator2. m (int): The count of items in the stream FROM F0_Estimator2. delta (float): The desired probability of the estimate being within the relative error. It must be in the range (0, 1) and is usually 0.05 to 0.01, (95% to 99% certainty).
Returns: float: The calculated relative error in the estimate
""" return math.sqrt(12 / thresh * math.log(8 * m / delta))Testingdef stream_gen(k: int=30_000, r: int=7_000) -> list[int]: "Create a randomised list of k ints of up to r different values." return random.choices(range(r), k=k)
def stream_stats(s: list[Any]) -> tuple[int, int]: length, distinct = len(s), len(set(s)) return length, distinct
# %% print("CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM")
stream_size = 2**18reps = 5target_uniques = 1while target_uniques < stream_size: the_stream = stream_gen(stream_size+1, target_uniques) target_uniques *= 4 size, unique = stream_stats(the_stream)
print(f"\n Actual:\n {size = :_}, {unique = :_}\n Estimations:")
delta = 0.05 threshhold = 2 print(f" All runs using {delta = :.2f} and with estimate averaged from {reps} runs:") while threshhold < size: estimate, esize = F0_Estimator2(the_stream.copy(), threshhold) estimate = sum([estimate] + [F0_Estimator2(the_stream.copy(), threshhold)[0] for _ in range(reps - 1)]) / reps estimate = int(estimate + 0.5) epsilon = F0_epsilon(threshhold, esize, delta) print(f" With {threshhold = :7_} -> " f"{estimate = :_}, +/-{epsilon*100:.0f}%" + (f" {esize = :_}" if esize != size else "")) threshhold *= 8
The algorithm generates an estimate based on random sampling, so I run it multiple times for the same input and report the mean estimate from those runs.
Sample outputCVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM
Actual: size = 262_145, unique = 1 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 1, +/-1026% With threshhold = 16 -> estimate = 1, +/-363% With threshhold = 128 -> estimate = 1, +/-128% With threshhold = 1_024 -> estimate = 1, +/-45% With threshhold = 8_192 -> estimate = 1, +/-16% With threshhold = 65_536 -> estimate = 1, +/-6%
Actual: ... Actual: size = 262_145, unique = 1_024 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 16_384, +/-1026% With threshhold = 16 -> estimate = 768, +/-363% With threshhold = 128 -> estimate = 1_101, +/-128% With threshhold = 1_024 -> estimate = 1_018, +/-45% With threshhold = 8_192 -> estimate = 1_024, +/-16% With threshhold = 65_536 -> estimate = 1_024, +/-6%
Actual: size = 262_145, unique = 4_096 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 13_107, +/-1026% With threshhold = 16 -> estimate = 3_686, +/-363% With threshhold = 128 -> estimate = 3_814, +/-128% With threshhold = 1_024 -> estimate = 4_083, +/-45% With threshhold = 8_192 -> estimate = 4_096, +/-16% With threshhold = 65_536 -> estimate = 4_096, +/-6%
Actual: size = 262_145, unique = 16_384 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 0, +/-1026% With threshhold = 16 -> estimate = 15_155, +/-363% With threshhold = 128 -> estimate = 16_179, +/-128% With threshhold = 1_024 -> estimate = 16_986, +/-45% With threshhold = 8_192 -> estimate = 16_211, +/-16% With threshhold = 65_536 -> estimate = 16_384, +/-6%
Actual: size = 262_145, unique = 64_347 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 26_214, +/-1026% With threshhold = 16 -> estimate = 73_728, +/-363% With threshhold = 128 -> estimate = 61_030, +/-128% With threshhold = 1_024 -> estimate = 64_422, +/-45% With threshhold = 8_192 -> estimate = 64_760, +/-16% With threshhold = 65_536 -> estimate = 64_347, +/-6%
Looks good!
WikipediaAnother day, and I decide to start writing this blog post. I searched again and found the Wikipedia article on what it called the Count-distinct problem.
Looking through it, It had this wrong description of the CVM algorithm:
The, (or a?), problem with the wikipedia entry is that it shows
...within the while loop. You need an enclosing if |B| >= s for the while loop and the assignment to p outside the while loop, but inside this new if statement.
It's tough!Both Quanta Magazine, and whoever added the algorithm to Wikipedia got the algorithm wrong.
I've written around two hundred tasks on site Rosettacode.org for over a decade. Others had to read my description and create code in their chosen language to implement those tasks. I have learnt from the feedback I got on talk pages to hone that craft, but details matter. Examples matter. Constructive feedback matters.
END.
Real Python: Efficient Iterations With Python Iterators and Iterables
Python’s iterators and iterables are two different but related tools that come in handy when you need to iterate over a data stream or container. Iterators power and control the iteration process, while iterables typically hold data that you want to iterate over one value at a time.
Iterators and iterables are fundamental components of Python programming, and you’ll have to deal with them in almost all your programs. Learning how they work and how to create them is key for you as a Python developer.
In this video course, you’ll learn how to:
- Create iterators using the iterator protocol in Python
- Understand the differences between iterators and iterables
- Work with iterators and iterables in your Python code
- Use generator functions and the yield statement to create generator iterators
- Build your own iterables using different techniques, such as the iterable protocol
- Use the asyncio module and the await and async keywords to create asynchronous iterators
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!
PSF Board elections are a chance for the community to choose representatives to help the PSF create a vision for and build the future of the Python community. This year there are 3 seats open on the PSF board. Check out who is currently on the PSF Board. (Débora Azevedo, Kwon-Han Bae, and Tania Allard are at the end of their current terms.)
Office Hours DetailsThis year, the PSF Board is running Office Hours so you can connect with current members to ask questions and learn more about what being a part of the Board entails. There will be two Office Hour sessions:
- June 11th, 4 PM UTC
- June 18th, 12 PM UTC
Make sure to check what time that is for you. We welcome you to join the PSF Discord and navigate to the #psf-elections channel to participate in Office Hours. The server is moderated by PSF Staff and locked between office hours sessions. If you’re new to Discord, check out some Discord Basics to help you get started.
People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.
You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, so you have a few weeks to research the role and craft a nomination statement. The nomination period ends on June 25th, 2:00 PM UTC.
Robin Wilson: How to install the Python triangle package on an Apple Silicon Mac
I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:
triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:
pip install git+https://github.com/drufat/triangle.gitand it should all work fine.
Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.
Real Python: How to Create Pivot Tables With pandas
A pivot table is a data analysis tool that allows you to take columns of raw data from a pandas DataFrame, summarize them, and then analyze the summary data to reveal its insights.
Pivot tables allow you to perform common aggregate statistical calculations such as sums, counts, averages, and so on. Often, the information a pivot table produces reveals trends and other observations your original raw data hides.
Pivot tables were originally implemented in early spreadsheet packages and are still a commonly used feature of the latest ones. They can also be found in modern database applications and in programming languages. In this tutorial, you’ll learn how to implement a pivot table in Python using pandas’ DataFrame.pivot_table() method.
Before you start, you should familiarize yourself with what a pandas DataFrame looks like and how you can create one. Knowing the difference between a DataFrame and a pandas Series will also prove useful.
In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.
The other thing you’ll need for this tutorial is, of course, data. You’ll use the Sales Data Presentation - Dashboards data, which is freely available for you to use under the Apache 2.0 License. The data has been made available for you in the sales_data.csv file that you can download by clicking the link below.
Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.
This table provides an explanation of the data you’ll use throughout this tutorial:
Column Name Data Type (PyArrow) Description order_number int64 Order number (unique) employee_id int64 Employee’s identifier (unique) employee_name string Employee’s full name job_title string Employee’s job title sales_region string Sales region employee works within order_date timestamp[ns] Date order was placed order_type string Type of order (Retail or Wholesale) customer_type string Type of customer (Business or Individual) customer_name string Customer’s full name customer_state string Customer’s state of residence product_category string Category of product (Bath Products, Gift Basket, Olive Oil) product_number string Product identifier (unique) product_name string Name of product quantity int64 Quantity ordered unit_price double Selling price of one product sale_price double Total sale price (unit_price × quantity)As you can see, the table stores data for a fictional set of orders. Each row contains information about a single order. You’ll become more familiar with the data as you work through the tutorial and try to solve the various challenge exercises contained within it.
Throughout this tutorial, you’ll use the pandas library to allow you to work with DataFrames and the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types pandas uses by default.
If you’re working at the command line, you can install both pandas and pyarrow using python -m pip install pandas pyarrow, perhaps within a virtual environment to avoid clashing with your existing environment. If you’re working within a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. With the libraries in place, you can then read your data into a DataFrame:
Python >>> import pandas as pd >>> sales_data = pd.read_csv( ... "sales_data.csv", ... parse_dates=["order_date"], ... dayfirst=True, ... ).convert_dtypes(dtype_backend="pyarrow") Copied!First of all, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the sales_data variable, you used pandas’ read_csv() function. The first parameter refers to the file being read, while parse_dates highlights that the order_date column’s data is intended to be read as the datetime64[ns] type. But there’s an issue that will prevent this from happening.
In your source file, the order dates are in dd/mm/yyyy format, so to tell read_csv() that the first part of each date represents a day, you also set the dayfirst parameter to True. This allows read_csv() to now read the order dates as datetime64[ns] types.
With order dates successfully read as datetime64[ns] types, the .convert_dtypes() method can then successfully convert them to a timestamp[ns][pyarrow] data type, and not the more general string[pyarrow] type it would have otherwise done. Although this may seem a bit circuitous, your efforts will allow you to analyze data by date should you need to do this.
If you want to take a look at the data, you can run sales_data.head(2). This will let you see the first two rows of your dataframe. When using .head(), it’s preferable to do so in a Jupyter Notebook because all of the columns are shown. Many Python REPLs show only the first and last few columns unless you use pd.set_option("display.max_columns", None) before you run .head().
If you want to verify that PyArrow types are being used, sales_data.dtypes will confirm it for you. As you’ll see, each data type contains [pyarrow] in its name.
Note: If you’re experienced in data analysis, you’re no doubt aware of the need for data cleansing. This is still important as you work with pivot tables, but it’s equally important to make sure your input data is also tidy.
Tidy data is organized as follows:
- Each row should contain a single record or observation.
- Each column should contain a single observable or variable.
- Each cell should contain an atomic value.
If you tidy your data in this way, as part of your data cleansing, you’ll also be able to analyze it better. For example, rather than store address details in a single address field, it’s usually better to split it down into house_number, street_name, city, and country component fields. This allows you to analyze it by individual streets, cities, or countries more easily.
In addition, you’ll also be able to use the data from individual columns more readily in calculations. For example, if you had columns room_length and room_width, they can be multiplied together to give you room area information. If both values are stored together in a single column in a format such as "10 x 5", the calculation becomes more awkward.
The data within the sales_data.csv file is already in a suitably clean and tidy format for you to use in this tutorial. However, not all raw data you acquire will be.
It’s now time to create your first pandas pivot table with Python. To do this, first you’ll learn the basics of using the DataFrame’s .pivot_table() method.
Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.
Take the Quiz: Test your knowledge with our interactive “How to Create Pivot Tables With pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Create Pivot Tables With pandasThis quiz is designed to push your knowledge of pivot tables a little bit further. You won't find all the answers by reading the tutorial, so you'll need to do some investigating on your own. By finding all the answers, you're sure to learn some other interesting things along the way.
How to Create Your First Pivot Table With pandasNow that your learning journey is underway, it’s time to progress toward your first learning milestone and complete the following task:
Calculate the total sales for each type of order for each region.
Read the full article at https://realpython.com/how-to-pandas-pivot-table/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Bytes: #385 RESTing on Postgres
Zato Blog: Web scraping as an API service
In systems-to-systems integrations, there comes an inevitable time when we have to employ some kind of a web scraping tool to integrate with a particular application. Despite its not being our first choice, it is good to know what to use at such a time - in this article, I provide a gentle introduction to my favorite tool of this kind, called Playwright, followed by sample Python code that integrates it with an API service.
Naturally, in the context of backend integrations, web scraping should be avoided and, generally, it should be considered the last resort. The basic issue here is that while the UI term contains the "interface" part, it is not really the "Application Programming" Interface that we would like to have.
It is not that the UI cannot be programmed against. After all, a web browser does just that, it takes a web page and renders it as expected. Same goes for desktop or mobile applications. Also, anyone integrating with mainframe computers will recognize that this is basically what 3270 can be used for too.
Rather, the fundamental issue is that web scraping goes against the principles of separation of layers and roles across frontend, middleware and backend, which in turn means that authors of resources (e.g. HTML pages) do not really expect for many people to access them in automated ways.
Perhaps they actually should expect it, and web pages should finally start to resemble genuine knowledge graphs, easy to access by humans, be it manually or through automation tools, but the reality today is that it is not the case and, in comparison with backend systems, the whole of the web scraping space is relatively brittle, which is why we shun this approach in integrations.
Yet, another part of reality, particularly in enterprise integrations, is that people may be sometimes given access to a frontend application on an internal network and that is it. No API, no REST, no JSON, no POST data, no real data formats, and one is simply supposed to fill out forms as part of a business process.
Typically, such a situation will result in an integration gap. There will be fully automated parts in the business process preceding this gap, with multiple systems coordinated towards a specific goal and there will be subsequent steps in the process, also fully automated.
Or you may be given access only to a specific frontend and only through VPN via a single remote Windows desktop. Getting access to a REST API may take months or may be never realized because of some high level licensing issues. This is not uncommon in the real life.
Such a gap can be a jarring and sore point, truly ruining the whole, otherwise fluid, integration process. This creates a tension and to resolve the tension, we can, should all the attempts to find a real API fail, finally resort to web scraping.
It is mostly in this context that I am looking at Playwright below - the tool is good and it has many other uses that go beyond the scope of this text, and it is well worth knowing it, for instance for frontend testing of your backend systems, but, when we deal with API integrations, we should not overdo with web scraping.
Needless to say, if web scraping is what you do primarily, your perspective will be somewhat different - you will not need any explanation of why it is needed or when, and you may be only looking for a way to enclose up your web scraping code in API services. This article will explain that too.
Introducing PlaywrightThe nice part of Playwright is that we can use it to visually prepare a draft of Python code that will scrape a given resource. That is, instead of programming it in Python, we go to an address, fill out a form, click buttons and otherwise use everything as usually and Playwright generates for us code that will be later used in integrations.
That code will require a bit of clean-up work, which I will talk about below, but overall it works very nicely and is certainly useful. The result is not one of these do-not-touch auto-generated pieces of code that are better left to their own.
While there are better ways to integrate with Jira, I chose that application as an example of Playwright's usage simply because I cannot show you any internal application in a public blog post.
Below, there are two windows. One is Playwright's emulating a Blackberry device to open a resource. I was clicking around, I provided an email address and then I clicked the same email field once more. To the right, based on my actions, we can find the generated Python code, which I consider quite good and readable.
The Playwright Inspector, the tool that gave us the code, will keep recording all of our actions until we click the "Record" button which then allows us to click the button next to "Record" which is "Copy code to clipboard". We can then save the code to a separate file and run it on demand, automatically.
But first, we will need to install Playwright.
Installing and starting PlaywrightThe tools is written in TypeScript and can be installed using npx, which in turn is part of NodeJS.
Afterwards, the "playwright install" call is needed as well because that will potentially install runtime dependencies, such as Chrome libraries.
Finally, we install Playwright using pip as well because we want to access with Python. Note that if you are installing Playwright under Zato, the "/path/to/pip" will be typically "/opt/zato/code/bin/pip".
npx -g --yes playwright install playwright install /path/to/pip install playwrightWe can now start it as below. I am using BlackBerry as an example of what Playwright is capable of. Also, it is usually more convenient to use a mobile version of a site when the main window and Inspector are opened side by side, but you may prefer to use Chrome, Firefox or anything else.
playwright codegen https://example.atlassian.net/jira --device "BlackBerry Z30"That is practically everything as using Playwright to generate code in our context goes. Open the tool, fill out forms, copy code to a Python module, done.
What is still needed, though, is cleaning up the resulting code and embedding it in an API integration process.
Code clean-upAfter you keep using Playwright for a while with longer forms and pages, you will note that the generated code tends to accumulate parts that repeat.
For instance, in the module below, which I already cleaned up, the same "[placeholder=\"Enter email\"]" reference to the email field is used twice, even if a programmer developing this could would prefer to introduce a variable for that.
There is not a good answer to the question of what to do about it. On the one hand, obviously, being programmers we would prefer not to repeat that kind of details. On the other hand, if we clean up the code too much, this may result in too much of a maintenance burden because we need to keep it mind that we do not really want to invest to much in web scraping and, should there be a need to repeat the whole process, we do not want to end up with Playwright's code auto-generated from scratch once more, without any of our clean-up.
A good compromise position is to at least extract any kind of credentials from the code to environment variables or a similar place and to remove some of the code comments that Playwright generates. The result as below is what it should like at the end. Not too much effort without leaving the whole code as it was originally either.
Save the code below as "play1.py" as this is what the API service below will use.
# -*- coding: utf-8 -*- # stdlib import os # Playwright from playwright.sync_api import Playwright, sync_playwright class Config: Email = os.environ.get('APP_EMAIL', 'zato@example.com') Password = os.environ.get('APP_PASSWORD', '') Headless = bool(os.environ.get('APP_HEADLESS', False)) def run(playwright: Playwright) -> None: browser = playwright.chromium.launch(headless=Config.Headless) # type: ignore context = browser.new_context() # Open new page page = context.new_page() # Open project boards page.goto("https://example.atlassian.net/jira/software/projects/ABC/boards/1") page.goto("https://id.atlassian.com/login?continue=https%3A%2F%2Fexample.atlassian.net%2Flogin%3FredirectCount%3D1%26dest-url%3D%252Fjira%252Fsoftware%252Fprojects%252FABC%252Fboards%252F1%26application%3Djira&application=jira") # Fill out the email page.locator("[placeholder=\"Enter email\"]").click() page.locator("[placeholder=\"Enter email\"]").fill(Config.Email) # Click #login-submit page.locator("#login-submit").click() with sync_playwright() as playwright: run(playwright) Web scraping as a standalone activityWe have the generated code so the first thing to do with it is to run it from command line. This will result in a new Chrome window's accessing Jira - it is Chrome, not Blackberry, because that is the default for Playwright.
The window will close soon enough but this is fine, that code only demonstrates a principle, it is not a full integration task.
python /path/to/play1.pyIt is also useful that we can run the same Python module from our IDE, giving us the ability to step through the code line by line, observing what changes when and why.
Web scraping as an API service
Finally, we are ready to invoke the standalone module from an API service, as in the following code that we are also going to make available as a REST channel.
A couple of notes about the Python service below:
- We invoke Playwright in a subprocess, as a shell command
- We accept input through data models although we do not provide any output definition because it is not needed here
- When we invoke Playwright, we set the APP_HEADLESS to True which will ensure that it does not attempt to actually display a Chrome window. After all, we intend for this service to run on Linux servers, in backend, and such a thing will be unlikely to work in this kind of an environment.
Other than that, this is a straightforward Zato service - it receives input, carries out its work and a reply is returned to the caller (here, empty).
# -*- coding: utf-8 -*- # stdlib from dataclasses import dataclass # Zato from zato.server.service import Model, Service # ########################################################################### @dataclass(init=False) class WebScrapingDemoRequest(Model): email: str password: str # ########################################################################### class WebScrapingDemo(Service): name = 'demo.web-scraping' class SimpleIO: input = WebScrapingDemoRequest def handle(self): # Path to a Python installation that Playwright was installed under py_path = '/path/to/python' # Path to a Playwright module with code to invoke playwright_path = '/path/to/the-playwright-module.py' # This is a template script that we will invoke in a subprocess command_template = """ APP_EMAIL={app_email} APP_PASSWORD={app_password} APP_HEADLESS=True {py_path} {playwright_path} """ # This is our input data input = self.request.input # type: WebScrapingDemoRequest # Extract credentials from the input .. email = input.email password = input.password # .. build the full command, taking all the config into account .. command = command_template.format( app_email = email, app_password = password, py_path = py_path, playwright_path = playwright_path, ) # .. invoke the command in a subprocess .. result = self.commands.invoke(command) # .. if it was not a success, log the details received .. if not result.is_ok: self.logger.info('Exit code -> %s', result.exit_code) self.logger.info('Stderr -> %s', result.stderr) self.logger.info('Stdout -> %s', result.stdout) # ###########################################################################Now, the REST channel:
The last thing to do is to invoke the service - I am using curl from the command line below but it could very well be Postman or a similar option.
curl localhost:17010/demo/web-scraping -d '{"email":"hello@example.com", "password":"abc"}' ; echoThere will be no Chrome window this time around because we run Playwright in the headless mode. There will be no output from curl either because we do not return anything from the service but in server logs we will find details such as below.
We can learn from the log that the command took close to 4 seconds to complete, that the exit code was 0 (indicating success) and that is no stdout or stderr at all.
INFO - Command ` APP_EMAIL=hello@example.com APP_PASSWORD=abc APP_HEADLESS=True /path/to/python /path/to/the-playwright-module.py ` completed in 0:00:03.844157, exit_code -> 0; len-out=0 (0 Bytes); len-err=0 (0 Bytes); cid -> zcmdc5422816b2c6ff9f10742134We are now ready to continue to work on it - for instance, you will notice that the password is visible in logs and this should not be allowed.
But, all such works are extra in comparison with the main theme - we have Playwright, which is a a tool that allows us to quickly integrate with frontend applications and we can automate it through API services. Just as expected.
Next steps- Read more about using Python in API integrations
- Start the tutorial which will guide you how to design and build Python API services for automation and integrations
Quansight Labs Blog: Dataframe interoperability - what has been achieved, and what comes next?
Talk Python to Me: #463: Running on Rust: Granian Web Server
Real Python: Quiz: How to Create Pivot Tables With pandas
In this quiz, you’ll test your understanding of how to create pivot tables with pandas.
By working through this quiz, you’ll review your knowledge of pivot tables and also expand beyond what you learned in the tutorial. For some of the questions, you’ll need to do some research outside of the tutorial itself.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Luke Plant: pyastgrep and custom linting
A while back I released pyastgrep, which is a rewrite of astpath. It’s a tool that allows you to search for specific Python syntax elements using XPath as a query language.
As part of the rewrite, I separated out the layers of code so that it can now be used as a library as well as a command line tool. I haven’t committed to very much API surface area for library usage, but there is enough.
My main personal use of this has been for linting tasks or enforcing of conventions that might be difficult to do otherwise. I don’t always use this – quite often I’d reach for custom Semgrep rules, and at other times I use introspection to enforce conventions. However, there are times when both of these fail or are rather difficult.
ExamplesSome examples of the kinds of rules I’m thinking of include:
Boolean arguments to functions/methods should always be “keyword only”.
Keyword-only arguments are a big win in many cases, and especially when it comes to boolean values. For example, forcing delete_thing(True, False). to be something like delete_thing(permanent=True, force=False) is an easy win, and this is common enough that applying this as a default policy across the code base will probably be a good idea.
The pattern can be distinguished easily at syntax level. Good:
def foo(*, my_bool_arg: bool): ...Bad:
def foo(my_bool_arg: bool): ...Simple coding conventions like “Don’t use single letter variables like i or j as a loop variables, use index or idx instead”.
This can be found by looking for code like:
for i, val in enumerate(...): ...You might not care about this, but if you do, you really want the rule to be applied as an automated test, not a nit-picky code review.
A Django-specific one: for inclusion tags, the tag names should match the template file name. This is nice for consistency and code navigation, plus I actually have some custom “jump to definition” code in my editor that relies on it for fast navigation.
The pattern can again be seen quite easily at the syntax level. Good:
@inclusion_tag("something/foo.html") def foo(): ...Bad:
@inclusion_tag("something/bar.html") def foo(): ...Any ’task’ (something decorated with @task) should be named foo_task or foo_task, in order to give a clue that it works as an asynchronous call, and its return value is just a promise object.
There are many more examples you’ll come up with once you start thinking like this.
MethodHaving identified the bad patterns we want to find and fix, my method for doing so looks as follows. It contains a number of tips and refinements I’ve made over the past few years.
First, I open a test file, e.g. tests/test_conventions.py, and start by inserting some example code – at least one bad example (the kind we are trying to fix), and one good example.
There are a few reasons for this:
First, I need to make sure I can prove life exists on earth, as John D. Cook puts it. I’ll say more about this later on.
Second, it gives me a deliberately simplified bit of code that I can pass to pyastdump.
Third, it provides some explanation for the test you are going to write, and a potentially rather hairy XPath expression.
I’ll use my first example above, keyword-only boolean args. I start by inserting the following text into my test file:
def bad_boolean_arg(foo: bool): pass def good_boolean_arg(*, foo: bool): passThen, I copy both of these in turn to the clipboard (or both together if there isn’t much code, like in this case), and pass them through pyastdump. From a terminal, I do:
$ xsel | pyastdump -I’m using the xsel Linux utility, you can also use xclip -out, or pbpaste on MacOS, or Get-Clipboard in Powershell.
This gives me some AST to look at, structured as XML:
<Module> <body> <FunctionDef lineno="1" col_offset="0" type="str" name="bad_boolean_arg"> <args> <arguments> <posonlyargs/> <args> <arg lineno="1" col_offset="20" type="str" arg="foo"> <annotation> <Name lineno="1" col_offset="25" type="str" id="bool"> <ctx> <Load/> </ctx> </Name> </annotation> </arg> </args> <kwonlyargs/> <kw_defaults/> <defaults/> </arguments> </args> <body> <Pass lineno="2" col_offset="4"/> </body> <decorator_list/> </FunctionDef> </body> <type_ignores/> </Module>In this case, the current structure of Python’s AST has helped us out a lot – it has separated out posonlyargs (positional only arguments), args (positional or keyword), and kwonlyargs (keyword only args). We can see the offending annotation containing a Name with id="bool" inside the args, when we want it only to be allowed as a keyword-only argument.
(Do we want to disallow boolean-annotated arguments as positional only? I’m leaning towards “no” here, as positional only is quite rare and usually a very deliberate choice).
I now have to construct an XPath expression that will find the offending XML nodes, but not match good examples. It’s pretty straightforward in this case, once you know the basics of XPath. I test it out straight away at the CLI:
pyastgrep './/FunctionDef/args/arguments/args/arg/annotation/Name[@id="bool"]' tests/test_conventions.pyIf I’ve done it correctly, it should print my bad example, and not my good example.
Then I widen the net, omitting tests/test_conventions.py to search everywhere in my current directory.
At this point, I’ve probably got some real results that I want to address, but I might also notice there are other variants of the same thing I need to be able to match, and so I iterate, adding more bad/good examples as necessary.
Now I need to write a test. It’s going to look like this:
def test_boolean_arguments_are_keyword_only(): assert_expected_pyastgrep_matches( """ .//FunctionDef/args/arguments/args/arg/annotation/Name[@id="bool"] """, message="Function arguments with type `bool` should be keyword-only", expected_count=1, )Of course, the real work is being done inside my assert_expected_pyastgrep_matches utility, which looks like this:
from pathlib import Path from boltons import iterutils from pyastgrep.api import Match, search_python_files SRC_ROOT = Path(__file__).parent.parent.resolve() # depends on project structure def assert_expected_pyastgrep_matches(xpath_expr: str, *, expected_count: int, message: str): """ Asserts that the pyastgrep XPath expression matches only `expected_count` times, each of which must be marked with `pyastgrep_exception` `message` is a message to be printed on failure. """ xpath_expr = xpath_expr.strip() matches: list[Match] = [item for item in search_python_files([SRC_ROOT], xpath_expr) if isinstance(item, Match)] expected_matches, other_matches = iterutils.partition( matches, key=lambda match: "pyastgrep: expected" in match.matching_line ) if len(expected_matches) < expected_count: assert False, f"Expected {expected_count} matches but found {len(expected_matches)} for {xpath_expr}" assert not other_matches, ( message + "\n Failing examples:\n" + "\n".join( f" {match.path}:{match.position.lineno}:{match.position.col_offset}:{match.matching_line}" for match in other_matches ) )There is a bit of explaining to do now.
Being sure that you can “find life on earth” is especially important for a negative test like this. It would be very easy to have an XPath query that you thought worked but didn’t, as it might just silently return zero results. In addition, Python’s AST is not stable – so a query that works now might stop working in the future.
It’s like you have a machine that claims to be able to find needles in haystacks – when it comes back and says “no needles found”, do you believe it? To increase your confidence that everything works and continues to work, you place a few needles at locations that you know, then check that the machine is able to find those needles. When it claims “found exactly 2 needles”, and you can account for those, you’ve got much more confidence that it has indeed found the only needles.
So, it’s important to leave my bad examples in there.
But, I obviously don’t want the bad examples to cause the test to fail! In addition, I want a mechanism for exceptions. A simple mechanism I’ve chosen is to add the text pyastgrep: expected as a comment.
So, I need to change my bad example like this:
def bad_boolean_arg(foo: bool): # pyastgrep: expected passI also pass expected_count=1 to indicate that I expect to find at least one bad example (or more, if I’ve added more bad examples).
Hopefully that explains everything assert_expected_pyastgrep_matches does. A couple more notes:
it uses boltons, a pretty useful set of Python utilities
it requires a SRC_ROOT folder to be defined, which will depend on your project, and might be different depending on which folder(s) you want to apply the convention too.
Now, everything is set up, and I run the test for real, hopefully locating all the bad usages. I work through them and fix, then leave the test in.
Tipspyastgrep works strictly at the syntax level, so unlike Semgrep you might get caught out by aliases if you try match on specific names:
from foo import bar from foo import bar as foo_bar import foo # These all call the same function but look different in AST: foo.bar() bar() foo_bar()There is however, an advantage to this – you don’t need a real import to construct your bad examples, you can just use a Mock. e.g. for my inclusion_tag example above, I have code like:
from unittest.mock import Mock register = Mock() @register.inclusion_tag(filename="something/not_bad_tag.html") def bad_tag(): # pyastgrep: expected passYou can see the full code on GitHub.
You might be able to use a mixture of techniques:
A Semgrep rule avoids one set of bad patterns using some thirdparty.func, and requiring everyone to use your own wrapper, which is then constructed in such a way to make it easier to apply a pyastgrep rule
Some introspection that produces a list of classes or functions to which some rule applies, then dynamically generates XPath expression to pass to pyastgrep.
Syntax level searching isn’t right for every job, but it can be a powerful addition to your toolkit, and with a decent query language like XPath, you can do a surprising amount. Have a look at the pyastgrep examples for inspiration!
Mike Driscoll: Episode 41 – Python Packaging and FOSS with Armin Ronacher
In this episode, I chatted with Armin Ronacher about his many amazing Python packages, such as pygments, flask, Jinja, Rye, and Click!
Specifically, we talked about the following:
- How Flask came about
- Favorite Python packages
- Python packaging
- and much more!
The post Episode 41 – Python Packaging and FOSS with Armin Ronacher appeared first on Mouse Vs Python.
Python Anywhere: New help page: Playwright
We’ve had an increasing number of people asking us how to use Playwright on PythonAnywhere. Playwright is a browser automation framework that was developed by Microsoft; like the more-established Selenium it’s really useful for testing and web-scraping, and it’s getting a reputation for being a robust and fast library for that kind of work.
Getting it set up to run on PythonAnywhere is pretty easy, but you need to do the installation slightly differently to the way it’s documented on Playwright’s own site; user hcaptcha on our forums worked out what the trick was to making it work, so now we have a new help page documenting how to do it.
EuroPython: EuroPython May 2024 Newsletter
Hello, Python people! 🐍
Welcome back to our cosy corner of community & code!
It&aposs absolutely fantastic to have you here again, especially as we countdown the days until our reunion in Prague! 🎉 It’s been months since our last gathering in this magical city, and we couldn’t be more excited to be back.
Fun fact: did you know that Czechia has recently been ranked among the world&aposs top 20 happiest countries? It&aposs the ideal place to spark joy and inspiration as we come together once again.
A lot of things have happened since our last catch-up. Let’s dive right into it!
📣 ProgrammeAs you might know, we reached a record number of submitted proposals this year. After analyzing 627 submissions, we are glad to announce a sneak peek of what you will see at EuroPython this year.
If you are still not convinced to join us, we have another big announcement! Here are the keynote speakers we have lined up. 🌟
- Carol Willing - Three-time Python Steering Council membership, Core Developer status, and PSF Fellow
- Tereza Iofciu - Seasoned data pro with 15 years experience, Python community star, won the PSF community service award in 2021
- Łukasz Langa - CPython Developer in Residence, Python 3.8 & 3.9 release manager, original creator of Black, pianist, dad
- Mai Giménez - Google Deepmind senior engineer specializing in large language and multimodal models, former Spanish Python Association board member 🐍
- Armin Ronacher - Creator of popular Python libraries such as Flask, Jinja, and Click, Armin is currently the VP of Platform at Sentry.
- Anna Přistoupilová - Bioinformatics scientist focused on genome analysis techniques and their applications in understanding rare genetic diseases. She received the Bolzano Award for her doctoral thesis.
The schedule is gonna be packed with all your amazing talks, tutorials and posters. The official schedule with the dates and times will be posted soon! Keep those eyes open on our social media channels and website! 📅✨
All of this wouldn’t be real without the great efforts of the EuroPython programme team. Many volunteers from all teams spared their time to reach out to people and review the proposals. ❤️
🙌 Big cheers to those who helped shape the EuroPython programme making EuroPython 2024 the best one yet! 🚀🐍
Let us better introduce the listed EuroPython 2024 keynote speakers! ⚡️🐍❤️
Keynote speaker #1: Carol WillingCarol WillingDon&apost miss your chance to hear from Carol Willing, who is a three-time Python Steering Council member, Python Core Developer, PSF Fellow, and Project Jupyter core contributor. 🔥
In 2019, she was honoured with the Frank Willison Award for her outstanding technical and community contributions to Python. Carol played a pivotal role in Project Jupyter, recognized with the prestigious 2017 ACM Software System Award for its enduring impact. Being the leading figure in open science and open-source governance herself. Carol serves on the advisory boards of Quansight Labs, CZI Open Science, and pyOpenSci.
She’s committed to democratizing open science through accessible tools and learning resources, and most recently served as Noteable&aposs VP of Engineering.
Get ready to be inspired by Carol&aposs insights at EuroPython 2024!
Keynote speaker #2: Tereza IofciuTereza IofciuGet ready to be inspired by Tereza Iofciu, a seasoned data practitioner and leadership coach with over 15 years of expertise in Data Science, Data Engineering, Product Management, and Team Management. 🔥
Tereza&aposs dedication to the Python Community is unmatched; she has worn numerous hats over the years, serving as the organiser for PyLadies Hamburg, board member of Python Software Verband, steering committee member of NumFocus DISC, and team member of Python Software Foundation Code of Conduct. Not stopping there, Tereza is also actively involved in promoting Diversity & Inclusion as a working group member, while also taking the roles of the organiser for PyConDE & PyData Berlin, Python Pizza Hamburg, and co-leader of PyPodcats (If you haven&apost heard, PyPodcats is a fantastic new podcast dedicated to highlighting the hidden figures of the Python community. Led by the PyPodcats team—Cheuk Ting Ho, Georgi Ker, Mariatta Wijaya, and Tereza Iofciu— aimed to amplify the voices of underrepresented group members within the Python community). 🐈🐱
In recognition of her outstanding contributions, Tereza was honoured with the Python Software Foundation Community Service Award in 2021. Now, if that&aposs not a sign to catch her awesome keynote, I don&apost know what is!
Keynote speaker #3: Łukasz LangaIntroducing Łukasz Langa: a polymath whose impact on the Python ecosystem is as diverse as his array of interests!
As the CPython Developer in Residence and the mastermind behind Python 3.8 & 3.9 releases, Łukasz plays a pivotal role in shaping the future of Python. He&aposs the original creator of Black, revolutionising the way we write Python code.
Beyond his coding prowess, Łukasz is a talented pianist and a devoted father.
When he&aposs not immersed in Python development, you&aposll find him indulging his passions for analogue modular synthesisers 😍, immersing himself in captivating single-player role-playing games like Fallout and Elder Scrolls, or relishing in the complexity of a fine single malt Scotch whisky.
Brace yourself for an enlightening journey through Łukasz&aposs experiences and insights! 🚀🎹🥃
Keynote speaker #4: Mai GiménezMai GiménezAllow us to introduce Mai Giménez, Ph.D., a senior research engineer at Google DeepMind specialising in large language and multimodal models.
Mai&aposs passion lies in crafting technology that benefits everyone, with her primary research focus being on language and the sociotechnical impacts of AI in the real world. The impact of her contributions extends beyond her work at Google DeepMind. She&aposs a former board member of the Spanish Python Association and has played a pivotal role in organising several PyConES conferences.
Additionally, Mai proudly contributes to the Python community as a member of PyLadies. Get ready to be inspired by Mai&aposs expertise and insights as she graces the stage at EP24! 🌟
Keynote speaker #5: Armin RonacherA household name in the open-source world Armin Ronacher is the creator of popular Python libraries such as Flask, Jinja, and Click, Armin has left quite a mark on the Python ecosystem, empowering developers worldwide with efficient tools and frameworks.
Armin RonacherHe is currently the VP of Platform at Sentry and recently he started an experimental Python package and project manager that attempts to bring Rust’s modern developer experience to Python. We are so excited to hear from him at EuroPython 2024!
Keynote speaker #6: Anna PřistoupilováPut your hands together for Anna Přistoupilová!! Anna is a bioinformatics scientist focused on genome analysis techniques and their applications in understanding rare genetic diseases. She received the Bolzano Award for her doctoral thesis!
Anna PřistoupilováAnna holds a PhD in Molecular and Cell Biology, Genetics, and Virology and two MSc degrees: one in Medical Technology and Informatics, and the other in Molecular Biology and Genetics, all from Charles University.
She has co-authored over 25 publications in peer-reviewed journals and has presented her work at various scientific conferences.
Currently, Anna works as a Senior Bioinformatics Scientist at DNAnexus company, where she assists customers with their bioinformatics analysis. She also conducts research at the Research Unit for Rare Diseases at the First Faculty of Medicine, Charles University.
🎟️ Conference RegistrationIt&aposs time to secure your tickets for the conference!
We&aposve heard you loud and clear—you don’t want to miss the opportunity to hear from our incredible keynote speakers and be a part of EuroPython 2024.
Here&aposs the list of tickets available for purchase.👇
- Conference Tickets: access to the main Conference AND Sprints Weekend
- Tutorial Tickets: access to the two days of Workshops AND Sprints Weekend. NO access to the main event.
- Combined Tickets: access to everything for the seven days! Includes workshops, main Conference and Sprints weekend.
Other than the types, there are also payment tiers that are offered to answer each participant’s needs. Such as:
- Business Tickets (for companies and employees funded by their companies)
- Personal Tickets (for individuals)
- Education Tickets (for students and teachers, an educational ID is required at the registration)
For those who cannot physically join us but still want to support the community, we have the remote ticket option.
- Remote ticket: access to the Live streaming of the talks, Q&A with the speakers and Discord server.
Join us and connect with the delightful community of Pythonistas in Prague. Make your summer more fun!
Need more information regarding tickets? Please visit the website or contact us at helpdesk@europython.eu!
⚖️ Visa ApplicationNot sure if you need one? Please check the website and consult your local consular office or embassy. 🏫
If you do need a visa to attend EuroPython 2024, you can lodge a visa application issued for Short Stay (C), up to 90 days, for the purpose of “Business /Conference”. Please, do it ASAP!
Make sure you read all the visa pages carefully and prepare all the required documents before making your application. The EuroPython organizers are not able nor qualified to give visa advice.
However, we’re more than happy to help you with a visa support letter. But before sending your request, please note that you will need to be registered to request the letter. We can only issue visa support letters to confirmed participants.
Hence, We kindly ask you to purchase your ticket before filling in the request form.
For more information, please check https://ep2024.europython.eu/visa or contact us at visa@europython.eu. ✈️
💶 Financial AidThe first round of our Financial Aid Programme received record-high applications this year and we are very proud to be supporting so many Pythonistas to attend the conference.
The second round of applications wrapped up on May 19th and now the team is actively working to individually review the applications! More information at https://ep2024.europython.eu/finaid/.
💰 SponsorshipIf you want to support EuroPython and its efforts to make the event accessible to everyone, please consider sponsoring, or asking your employer to do so. More information at: https://ep2024.europython.eu/sponsor 🫂
Sponsoring EuroPython guarantees you highly targeted visibility and the opportunity to present yourself/your company to one of the largest and most diverse Python communities in Europe and beyond!
There are several sponsor tiers and slots are limited. This year, besides our main packages, we offer add-ons as optional extras where companies can opt to support the community in many other ways:
- By directly sponsoring the PyLadies lunch event
- By supporting participants by funding Financial Aid
- By having their logo on all lanyards of the conference
- Or even by improving the event’s accessibility.
Interested? Email us at sponsoring@europython.eu.
🤝Join us as a Volunteer!To make the conference an amazing experience for everyone, we need enthusiastic on-site volunteers from July 8-14. Whether you&aposre confident at leading people, love chatting with new folks at registration, are interested in chairing a session or just want to help out - we&aposve got a role for you. Volunteering is a fantastic way to gain experience, make new connections, and have lots of fun while doing it.
Interested? Have a look at https://ep2024.europython.eu/volunteers to find out more and how to apply.
We&aposre also considering remote volunteers, so if you&aposre interested in helping out but can&apost make it to Prague, please mention that explicitly in your email.
We can&apost wait to see you in Prague! 🚀
🎟Events @EuroPythonEuroPython 2023 social eventThis year, we want to make our social event bigger and better for everyone. Hence, we are planning to host a bigger party. Tickets will be available for purchase on the website soon! Stay tuned.
🎉 CommunityEuroPython at PyCon Italia 🇮🇹 May 22nd - 25th 2024PyCon Italia 2024 will happen in Florence. The birthplace of Renaissance will receive a wave of Pythonistas looking to geek out this year, including a lot of EuroPython people.
If you are going to PyCon Italia (tickets are sold out) join us to help organise EuroPython 2024!
🎤 First-Time Speaker WorkshopJoin us for the Speaker’s Mentorship Programme - First-Time Speaker Workshop on Monday, June 3rd, at 19:00 CEST! 🎤
This online panel session features experienced speakers sharing advice for first-time (or other) speakers. Following the panel discussion, there will be a dedicated Q&A session to address all participants&apos inquiries. The event is open to the public, and registration is required through this form.
As the event date approaches, registered participants will receive further details via email. Don&apost miss this opportunity to learn and grow as a speaker!
News from the EuroPython Community❣️- Check out our phenomenal co-organizer Mia Bajic on a recent podcast where she shared her experiences volunteering in the Python community! 🎙️ Mia is a true pillar of the Python community, she has shared her expertise and passion at multiple PyCons across the globe. 🌍 Her efforts extend beyond borders as she tirelessly works to bring Pythonic people together in Prague, hosting events such as Pyvo and the first-ever Python Pizza in the city! 🍕 Mia&aposs dedication and contributions make both Czech Python community and EuroPython a better place, and we&aposre beyond grateful to have her on board shaping the EuroPython experience. 🙌 Note: The podcast is in Czech. 🎧
Podcast: https://www.youtube.com/watch?v=-UcHqap89Ac
- Joana is doing a Master&aposs in Medical Imaging and Applications. Originally from Ghana, she joined the Communications team of EuroPython this year bringing her experience and innovative thinking from PyLadies Ghana.
She wrote an article about her Community involvement and the impact it has had on her career. She says:
I met and saw women who had achieved great things in data science and machine learning, which meant that I could also, through their stories, find a plan to at least help me get close to what they had done.Full article: https://blog.europython.eu/community-post-invisible-threads/
- GeoPython 2024 will happen in Basel, Switzerland.
For more information about GeoPython 2024, you can visit their website: https://2024.geopython.net/
Djangocon.eu: June 5-9 2024DjangoCon Europe 2024 will happen in Vigo, Spain. You can check more information about Django Con Europe at their lovely website (https://2024.djangocon.eu/)
PyCon Portugal: 17-19 October 2024Pycon Portugal will happen in Altice Forum, Braga. More information on the official website: https://2024.pycon.pt/
PyCon Poland: August 29th - September 1stThe 16th edition of PyCon PL is happening in Gliwice! For more information, visit their website https://pl.pycon.org/2024
PyCon Taiwan 2024: September 21st - September 22ndPyCon Taiwan will introduce a new activity segment: Poster Session! The deadline to submit your posters is June 15th, through the submission form.
More information on their website: https://tw.pycon.org/2024/en-us
🤭 Py.Jokes~ pyjoke Two threads walk into a bar. The barkeeper looks up and yells, &aposHey, I want don&apost any conditions race like time last!&apos🐣 See You All Next MonthBefore saying goodbye, thank you so much for reading. We can’t wait to reunite with all you amazing people in beautiful Prague again.
It truly is time to make new Python memories together!
With so much joy and excitement,
EuroPython 2024 Team 🤗
Matt Layman: Export Journal Feature - Building SaaS with Python and Django #191
Ned Batchelder: Echos of the People API user guide
PyCon 2024 just happened, and as always it was a whirlwind of re-connecting with old friends, meeting new friends, talking about anything and everything, juggling, and getting simultaneously energized and exhausted. I won’t write about everything that happened, but one thread sticks with me: the continuing echos of my talk from last year.
If you haven’t seen it yet, I spoke at PyCon 2023 about the ways engineers can use their engineering skills to interact better with people, something engineers stereotypically need help with. I called it People: The API User’s Guide.
A number of people mentioned the talk to me this year, which was very gratifying. It’s good to hear that it stuck with people. A few said they had passed it on to others, which is a real sign that it landed well with them.
On the other hand, at lunch one day, someone asked me if I had done a talk last year, and I sheepishly answered, “um, yeah, I did the opening keynote...” It’s hard to answer that question without it becoming a humble brag!
The most unexpected echo of the talk was at the coverage.py sprint table. Ludovico Bianchi was working away when he turned to me and said, “oh, I forgot to send you this last year!” He showed me this picture he drew during the talk a year ago:
I’ve only ever stayed for one day of sprints. It can be hard to get people started on meaty issues in that short time. We have a good time anyway, and merge a few pull requests. This year, three people came back who sprinted with me in 2023, another sign that something is going right.
Once the sprint was over, Ludovico also sketched Sleepy into the group photo of the sprint gang:
Half the fun of preparing last year’s talk was art-directing the illustrations by my son Ben, similar to how we had worked to make Sleepy Snake. As much as I like hearing that people like my words, as a dad it’s just as good to hear that people like Ben’s art. Seeing other people play with Sleepy in clever ways is extra fun.
Glyph Lefkowitz: A Grand Unified Theory of the AI Hype Cycle
The history of AI goes in cycles, each of which looks at least a little bit like this:
- Scientists do some basic research and develop a promising novel mechanism, N. One important detail is that N has a specific name; it may or may not be carried out under the general umbrella of “AI research” but it is not itself “AI”. N always has a few properties, but the most common and salient one is that it initially tends to require about 3x the specifications of the average computer available to the market at the time; i.e., it requires three times as much RAM, CPU, and secondary storage as is shipped in the average computer.
- Research and development efforts begin to get funded on the hypothetical potential of N. Because N is so resource intensive, this funding is used to purchase more computing capacity (RAM, CPU, storage) for the researchers, which leads to immediate results, as the technology was previously resource constrained.
- Initial successes in the refinement of N hint at truly revolutionary possibilities for its deployment. These revolutionary possibilities include a dimension of cognition that has not previously been machine-automated.
- Leaders in the field of this new development — specifically leaders, like lab administrators, corporate executives, and so on, as opposed to practitioners like engineers and scientists — recognize the sales potential of referring to this newly-“thinking” machine as “Artificial Intelligence”, often speculating about science-fictional levels of societal upheaval (specifically in a period of 5-20 years), now that the “hard problem” of machine cognition has been solved by N.
- Other technology leaders, in related fields, also recognize the sales potential and begin adopting elements of the novel mechanism to combine with their own areas of interest, also referring to their projects as “AI” in order to access the pool of cash that has become available to that label. In the course of doing so, they incorporate N in increasingly unreasonable ways.
- The scope of “AI” balloons to include pretty much all of computing technology. Some things that do not even include N start getting labeled this way.
- There’s a massive economic boom within the field of “AI”, where “the field of AI” means any software development that is plausibly adjacent to N in any pitch deck or grant proposal.
- Roughly 3 years pass, while those who control the flow of money gradually become skeptical of the overblown claims that recede into the indeterminate future, where N precipitates a robot apocalypse somewhere between 5 and 20 years away. Crucially, because of the aforementioned resource-intensiveness, the gold owners skepticism grows slowly over this period, because their own personal computers or the ones they have access to do not have the requisite resources to actually run the technology in question and it is challenging for them to observe its performance directly. Public critics begin to appear.
- Competent practitioners — not leaders — who have been successfully using N in research or industry quietly stop calling their tools “AI”, or at least stop emphasizing the “artificial intelligence” aspect of them, and start getting funding under other auspices. Whatever N does that isn’t “thinking” starts getting applied more seriously as its limitations are better understood. Users begin using more specific terms to describe the things they want, rather than calling everything “AI”.
- Thanks to the relentless march of Moore’s law, the specs of the average computer improve. The CPU, RAM, and disk resources required to actually run the software locally come down in price, and everyone upgrades to a new computer that can actually run the new stuff.
- The investors and grant funders update their personal computers, and they start personally running the software they’ve been investing in. Products with long development cycles are finally released to customers as well, but they are disappointing. The investors quietly get mad. They’re not going to publicly trash their own investments, but they stop loudly boosting them and they stop writing checks. They pivot to biotech for a while.
- The field of “AI” becomes increasingly desperate, as it becomes the label applied to uses of N which are not productive, since the productive uses are marketed under their application rather than their mechanism. Funders lose their patience, the polarity of the “AI” money magnet rapidly reverses. Here, the AI winter is finally upon us.
- The remaining AI researchers who still have funding via mechanisms less vulnerable to hype, who are genuinely thinking about automating aspects of cognition rather than simply N, quietly move on to the next impediment to a truly thinking machine, and in the course of doing so, they discover a new novel mechanism, M. Go to step 1, with M as the new N, and our current N as a thing that is now “not AI”, called by its own, more precise name.
A non-exhaustive list of previous values of N have been:
- Neural networks and symbolic reasoning in the 1950s.
- Theorem provers in the 1960s.
- Expert systems in the 1980s.
- Fuzzy logic and hidden Markov models in the 1990s.
- Deep learning in the 2010s.
Each of these cycles has been larger and lasted longer than the last, and I want to be clear: each cycle has produced genuinely useful technology. It’s just that each follows the progress of a sigmoid curve that everyone mistakes for an exponential one. There is an initial burst of rapid improvement, followed by gradual improvement, followed by a plateau. Initial promises imply or even state outright “if we pour more {compute, RAM, training data, money} into this, we’ll get improvements forever!” The reality is always that these strategies inevitably have a limit, usually one that does not take too long to find.
Where Are We Now?So where are we in the current hype cycle?
- Here’s a Computerphile video which explains some recent research into LLM performance. I’d highly encourage you to have a look at the paper itself, particularly Figure 2, “Log-linear relationships between concept frequency and CLIP zero-shot performance”.
- Here’s a series of posts by Simon Willison explaining the trajectory of the practicality of actually-useful LLMs on personal devices. He hasn’t written much about it recently because it is now fairly pedestrian for an AI-using software developer to have a bunch of local models, and although we haven’t quite broken through the price floor of the gear-acquisition-syndrome prosumer market in terms of the requirements of doing so, we are getting close.
- The Rabbit R1 and Humane AI Pin were both released; were they disappointments to their customers and investors? I think we all know how that went at this point.
- I hear Karius just raised a series C, and they’re an “emerging unicorn”.
- It does appear that we are all still resolutely calling these things “AI” for now, though, much as I wish, as a semasiology enthusiast, that we would be more precise.
History does not repeat itself, but it does rhyme. This hype cycle is unlike any that have come before in various ways. There is more money involved now. It’s much more commercial; I had to phrase things above in very general ways because many previous hype waves have been based on research funding, some really being exclusively a phenomenon at one department in DARPA, and not, like, the entire economy.
I cannot tell you when the current mania will end and this bubble will burst. If I could, you’d be reading this in my $100,000 per month subscribers-only trading strategy newsletter and not a public blog. What I can tell you is that computers cannot think, and that the problems of the current instantation of the nebulously defined field of “AI” will not all be solved within “5 to 20 years”.
AcknowledgmentsThank you to my patrons who are supporting my writing on this blog. Special thanks also to Ben Chatterton for a brief pre-publication review; any errors remain my own. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “what are we doing that history will condemn us for”. Or, you know, Python programming.
Real Python: The Python calendar Module: Create Calendars With Python
The Python calendar module provides several ways to generate calendars for Python programs. It also includes a variety of functions for working with calendar data as strings, numbers, and datetime objects.
In this tutorial, you’ll learn how to use the calendar module to create and customize calendars with Python.
By the end of this tutorial, you’ll be able to:
- Display calendars in your terminal with Python
- Create plain text and HTML calendars
- Format calendars for specific locales and display conventions
- Use calendar-related functions and methods to access lower-level calendar data in a variety of formats
Get Your Code: Click here to download the free sample code you’ll use to learn about creating calendars with the calendar module in Python.
Take the Quiz: Test your knowledge with our interactive “The Python calendar Module” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
The Python calendar ModuleIn this quiz, you'll test your understanding of the calendar module in Python. It'll evaluate your proficiency in manipulating, customizing, and displaying calendars directly within your terminal. By working through this quiz, you'll revisit the fundamental functions and methods provided by the calendar module.
Displaying Calendars in Your TerminalUnix and Unix-like operating systems such as macOS and Linux include a cal command-line utility for displaying calendars in an interactive console:
Shell $ cal May 2024 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Copied!Python provides a similar tool, which allows you to run the calendar module as a command-line script. To begin exploring the Python calendar module, open up your terminal program and enter the following command:
Shell $ python -m calendar 2024 January February March Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 1 2 3 8 9 10 11 12 13 14 5 6 7 8 9 10 11 4 5 6 7 8 9 10 15 16 17 18 19 20 21 12 13 14 15 16 17 18 11 12 13 14 15 16 17 22 23 24 25 26 27 28 19 20 21 22 23 24 25 18 19 20 21 22 23 24 29 30 31 26 27 28 29 25 26 27 28 29 30 31 April May June Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 5 1 2 8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 22 23 24 25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 29 30 27 28 29 30 31 24 25 26 27 28 29 30 July August September Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 1 8 9 10 11 12 13 14 5 6 7 8 9 10 11 2 3 4 5 6 7 8 15 16 17 18 19 20 21 12 13 14 15 16 17 18 9 10 11 12 13 14 15 22 23 24 25 26 27 28 19 20 21 22 23 24 25 16 17 18 19 20 21 22 29 30 31 26 27 28 29 30 31 23 24 25 26 27 28 29 30 October November December Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 1 2 3 1 7 8 9 10 11 12 13 4 5 6 7 8 9 10 2 3 4 5 6 7 8 14 15 16 17 18 19 20 11 12 13 14 15 16 17 9 10 11 12 13 14 15 21 22 23 24 25 26 27 18 19 20 21 22 23 24 16 17 18 19 20 21 22 28 29 30 31 25 26 27 28 29 30 23 24 25 26 27 28 29 30 31 Copied!Running python -m calendar with no arguments outputs a full year’s calendar for the current year. To display the full calendar for a different year, pass in the integer representation of a year as the first argument of the calendar command:
Shell $ python -m calendar 1989 Copied!To view a single month, pass in both a year and a month as the second parameter:
Shell $ python -m calendar 2054 07 July 2054 Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Copied!As you can see in these examples, the calendar module can display calendars for both past and future dates. According to the official documentation, the calendar module uses the current Gregorian calendar, extended indefinitely in both directions. It also uses the ISO 8601 standard, which is an international standard for exchanging and communicating date and time-related data.
Now that you know how to display calendars in your terminal with Python, you can move on and explore other approaches to creating calendars as plain text or HTML markup representations.
Creating Text-Based CalendarsTo generate plain text calendars, the calendar module provides calendar.TextCalendar with methods to format and print monthly and yearly calendars.
TextCalendar.formatyear() accepts a single parameter for the year, like the calendar command-line script. Try it out in your Python REPL by executing the following code:
Python >>> import calendar >>> text_calendar = calendar.TextCalendar() >>> text_calendar.formatyear(2024) ' 2024\n\n January (...)' Copied! Read the full article at https://realpython.com/python-calendar-module/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]