Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 13 hours 8 min ago

PyCoder’s Weekly: Issue #631 (May 28, 2024)

Tue, 2024-05-28 15:30

#631 – MAY 28, 2024
View in Browser »

Building a Python GUI Application With Tkinter

In this video course, you’ll learn the basics of GUI programming with Tkinter, the de facto Python GUI framework. Master GUI programming concepts such as widgets, geometry managers, and event handlers. Then, put it all together by building two applications: a temperature converter and a text editor.
REAL PYTHON course

pyastgrep and Custom Linting

This article from the developer of pyastgrep introduces you to the tool which can now be used as a library. The post talks about how to use it and what kind of linting it does best.
LUKE PLANT

Upgrade Python Versions Without the Pain

Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →
ACTIVESTATE sponsor

What’s New in Django 5.1

Django 5.1 has gone alpha so the list of features targeting this release has more or less solidified. This article introduces you to what is coming in Django 5.1.
JEFF TRIPLETT

Quiz: How to Create Pivot Tables With Pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won’t find all the answers by reading the tutorial, so you’ll need to do some investigating on your own. By finding all the answers, you’re sure to learn some other interesting things along the way.
REAL PYTHON

PEP 649 Re-targeted to 3.14

Python Enhancement Proposal 649: Deferred Evaluation Of Annotations Using Descriptors has been re-targeted to the Python 3.14 release
PYTHON.ORG

JupyterLab 4.2 and Notebook 7.2 Released

JUPYTER

Articles & Tutorials Testing With Python: The Different Types of Tests

This is part 5 of a deep dive into writing automated tests, but also works well as an independent article. This post talks about the taxonomy of testing, like the differences between unit and integration tests, and how nobody can quite agree on a definition of either.
BITECODE

Python’s Built-in Exceptions: A Walkthrough With Examples

In this tutorial, you’ll get to know some of the most commonly used built-in exceptions in Python. You’ll learn when these exceptions can appear in your code and how to handle them. Finally, you’ll learn how to raise some of these exceptions in your code.
REAL PYTHON

Software Engineering Hiring and Firing

This article is a deep dive on the hiring and firing practices in the software field, and unlike most articles focuses on senior engineering roles. It isn’t a “first job” post, but a “how the decision process works” article.
ED CREWE

Enabling Async MongoDB Operations in Streamlit

Streamlit is a wonderful tool for building dashboards with its peculiar execution model, but using asyncio data sources with it can be a real pain. This article is about how you correctly use those two technologies together.
HANDMADESOFTWARE • Shared by Thorin Schiffer

EuroPython 2024 Announces Keynote Speakers

EuroPython happens in Prague July 8-14 and as the conference approaches more and more is happening. This posting from their May newsletter highlights the keynotes and other announcements.
EUROPYTHON

Writing Commit Messages

This guide admits to being “yet another”, but unlike most that are out there, spends less time discussing the cosmetic aspects of a good commit message and more time on the content.
SIMON TATHAM

PSF Announces 5-Year Sponsorship Commitment From Fastly

Python Software Foundation securing this sponsorship affects the entire Python ecosystem, most notably the security and reliability of the Python Package Index (PyPI).
SOCKET.DEV • Shared by Sarah Gooding

Untold Stories From 6 Years Working on Python Packaging

Sumana gave the closing keynote address at PyCon US this year and this posting shares all the links and references from the talk.
SUMANA HARIHARESWARA

The Python calendar Module: Create Calendars With Python

Learn to use the Python calendar module to create and customize calendars in plain text, HTML or directly in your terminal.
REAL PYTHON

TIL: Accessibility Resources #2

This post is a collection of accessibility resources mostly for web sites, but some tools can be used elsewhere as well.
SARAH ABDEREMANE

Projects & Code PgQueuer: Python & PostgreSQL Job Queuing Library

GITHUB.COM/JANBJORGE

Tapyr: Shiny for Python Application Template

GITHUB.COM/APPSILON • Shared by Appsilon

Oven: Explore Python Packages

FMING.DEV

tkforge: Drag & Drop in Figma to Create a Python GUI

GITHUB.COM/AXORAX

tach: Enforce a Modular, Decoupled Package Architecture

GITHUB.COM/NEVER-OVER

Events Weekly Real Python Office Hours Q&A (Virtual)

May 29, 2024
REALPYTHON.COM

SPb Python Drinkup

May 30, 2024
MEETUP.COM

Building Python Communities Yaounde

June 1 to June 3, 2024
NOKIDBEHIND.ORG

Django Girls Medellín

June 1 to June 2, 2024
DJANGOGIRLS.ORG

PyDelhi User Group Meetup

June 1, 2024
MEETUP.COM

Melbourne Python Users Group, Australia

June 3, 2024
J.MP

DjangoCon Europe 2024

June 5 to June 10, 2024
DJANGOCON.EU

PyCon Colombia 2024

June 7 to June 10, 2024
PYCON.CO

Happy Pythoning!
This was PyCoder’s Weekly Issue #631.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Ned Batchelder: One way to fix Python circular imports

Tue, 2024-05-28 13:46

In Python, a circular import is when two files each try to import the other, causing a failure when a module isn’t fully initialized. The best way to fix this situation is to organize your code in layers so that the importing relationships naturally flow in just one direction. But sometimes it works to simply change the style of import statement you use. I’ll show you.

Let’s say you have these files:

1# one.py
2from two import func_two
3
4def func_one():
5    func_two()
1# two.py
2from one import func_one
3
4def do_work():
5    func_one()
6
7def func_two():
8    print("Hello, world!")
1# main.py
2from two import do_work
3do_work()

If we run main.py, we get this:

% python main.py
Traceback (most recent call last):
  File "main.py", line 2, in <module>
    from two import do_work
  File "two.py", line 2, in <module>
    from one import func_one
  File "one.py", line 2, in <module>
    from two import func_two
ImportError: cannot import name 'func_two' from partially initialized
  module 'two' (most likely due to a circular import) (two.py)

When Python imports a module, it executes the file line by line. Every global in the file (top-level name including functions and classes) becomes an attribute on the module object being constructed. In two.py, we import from one.py at line 2. At that moment, the two module has been created, but it has no attributes yet because nothing has been defined yet. It will eventually have do_work and func_two, but we haven’t executed those def statements yet, so they don’t exist. Like a function call, when the import statement is run, it begins executing the imported file, and doesn’t come back to the current file until the import is done.

The import of one.py starts, and its line 2 tries to get a name from the two module. As we just said, the two module exists, but has no names defined yet. That gives us the error.

Instead of importing names from modules, we can import whole modules instead. All we do is change the form of the imports, and how we reference the functions from the imported modules, like this:

1# one.py
2import two              # was:  from two import func_two
3
4def func_one():
5    two.func_two()      # was:  func_two()
1# two.py
2import one              # was:  from one import func_one
3
4def do_work():
5    one.func_one()      # was:  func_one()
6
7def func_two():
8    print("Hello, world!")
1# main.py
2from two import do_work
3do_work()

Running the fixed code, we get this:

% python main.py
Hello, world!

It works because two.py imports one at line 2, and then one.py imports two at its line 2. That works just fine, because the two module exists. It’s still empty like it was before the fix, but now we aren’t trying to find a name in it during the import. Once all of the imports are done, the one and two modules both have all their names defined, and we can access them from inside our functions.

The key idea here is that “from two import func_two” tries to find func_two during the import, before it exists. Deferring the name lookup to the body of the function by using “import two” lets all of the modules get themselves fully initialized before we try to use them, avoiding the circular import error.

As I mentioned at the top, the best way to fix circular imports is to structure your code so that modules don’t have mutual dependencies like this. But that isn’t always easy, and this can buy you a little time to get your code working again.

Categories: FLOSS Project Planets

Go Deh: Recreating the CVM algorithm for estimating distinct elements gives problems

Tue, 2024-05-28 12:15

 

 Someone at work posted a link to this Quanta Magazine article. It describes a novel, and seemingly straight-forward way to estimate the number of distinct elements in a datastream. 

Quanta describes the algorithm, and as an example gives "counting the number of distinct words in Hamlet".

Following Quanta

I looked at the description and decided to follow their text. They carefully described each round of the algorithm which I coded up and then looked for the generalizations and implemented a loop over alll items in the stream ....

It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I re-checked my implementation of the Quanta-described algorithm and couldn't see any mistake, but I had originally noticed a link to the original paper. I did not follow it at first as original papers can be heavily into maths notation and I prefer reading algorithms described in code/pseudocode. 

I decided to take a look at the original.

The CVM Original Paper

I scanned the paper.

I read the paper.

I looked at Algorithm 1 as a probable candidate to decypher into Python, but the description was cryptic. Heres that description taken from the paper:

AI To the rescue!?

I had a brainwave💡lets chuck it at two AI's and see what they do. I had Gemini and I had Copilot to hand and asked them each to express Algorithm 1 as Python. Gemini did something, and Copilot finally did something but I first had to open the page in Microsoft Edge.
There followed hours of me reading and cross-comparing between the algorithm and the AI's. If I did not understand where something came from I would ask the generating AI; If I found an error I would first, (and second and...), try to get the AI to make a fix I suggested.

At this stage I was also trying to get a feel for how the AI's could help me, (now way past what I thought the algorithm should be, just to see what it would take to get those AI's to cross T's and dot I's on a good solution).
Not a good use of time! I now know that asking questions to update one of the 20 to 30 lines of the Python function might fix that line, but unfix another line you had fixed before. Code from the AI does not have line numbers making it difficult to state what needs changing, and where.They can suggest type hints and create the beginnings of docstrings, but, for example, it pulled out the wrong authors for the name of the algorithm.
In line 1 of the algorithm, the initialisation of thresh is clearly shown, I thought, but both AI's had difficulty getting the Python right. eventually I cut-n-pasted the text into each AI, where they confidentially said "OF course...", made a change, and then I had to re-check for any other changes.

My Code

I first created this function:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    ...    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            X = {x_item for x_item in X                    if random.random() < 0.5}            p /= 2    return len(X) / p

I tested it with Hamlet data and it made OK estimates.

Elated, I took a break.

Hacker News

The next evening I decided to do a search to see If anyone else was talking about the algorithm and found a thread on Hacker News that was right up my street. People were discussing those same problems found in the Quanta Article - and getting similar ginormous answers. They had one of the original Authors of the paper making comments! And others had created code from the actual paper and said it was also easier than the Quanta description.

The author mentioned that no less than Donald Knuth had taken an interest in their algorithm and had noted that the expression starting `X = ...` four lines from the end could, thoretically, make no change to X, and the solution was to encase the assignment in a while loop that only exited if len(X) < thresh.

Code update

I decided to add that change:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object must support an initial call to __len__
    Parameters:    stream (Collection[Any]): The input stream as a collection of hashable         items.    epsilon (float): The desired relative error in the estimate. It must be in         the range (0, 1).    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1).
    Returns:    float: An estimate of the number of distinct elements in the input stream.    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2    return len(X) / p


thresh

In the code above, the variable thresh, (threshhold), named from Algorithm 1, is used in the Quanta article to describe the maximum storage available to keep items from the stream that have been seen before. You must know the length of the stream - m, epsilon, and delta to calculate thresh.

If you were to have just the stream and  thresh as the arguments you could return both the estimate of the number of distinct items in the stream as well as counting the number of total elements in the stream.
Epsilon could be calculated from the numbers we now know.

def F0_Estimator2(stream: Iterable[Any],                 thresh: int,                  ) -> tuple[float, int]:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object does NOT have to support a call to __len__
    Parameters:    stream (Iterable[Any]): The input stream as an iterable of hashable         items.    thresh (int): The max threshhold of stream items used in the estimation.py
    Returns:    tuple[float, int]: An estimate of the number of distinct elements in the         input stream, and the count of the number of items in stream.    """    p = 1    X = set()    m = 0  # Count of items in stream
    for item in stream:        m += 1        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2                return len(X) / p, m
def F0_epsilon(               thresh: int,               m: int,               delta: float=0.05,  #  0.05 is 95%              ) -> float:    """    Calculate the relative error in the estimate from F0_Estimator2(...)
    Parameters:    thresh (int): The thresh value used in the call TO F0_Estimator2.    m (int): The count of items in the stream FROM F0_Estimator2.    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1) and is usually 0.05        to 0.01, (95% to 99% certainty).
    Returns:    float: The calculated relative error in the estimate
    """    return math.sqrt(12 / thresh * math.log(8 * m / delta))

Testingdef stream_gen(k: int=30_000, r: int=7_000) -> list[int]:    "Create a randomised list of k ints of up to r different values."    return random.choices(range(r), k=k)
def stream_stats(s: list[Any]) -> tuple[int, int]:    length, distinct = len(s), len(set(s))    return length, distinct
# %% print("CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM")
stream_size = 2**18reps = 5target_uniques = 1while target_uniques < stream_size:    the_stream = stream_gen(stream_size+1, target_uniques)    target_uniques *= 4    size, unique = stream_stats(the_stream)
    print(f"\n  Actual:\n    {size = :_}, {unique = :_}\n  Estimations:")
    delta = 0.05    threshhold = 2    print(f"    All runs using {delta = :.2f} and with estimate averaged from {reps} runs:")    while threshhold < size:        estimate, esize = F0_Estimator2(the_stream.copy(), threshhold)        estimate = sum([estimate] +                    [F0_Estimator2(the_stream.copy(), threshhold)[0]                        for _ in range(reps - 1)]) / reps        estimate = int(estimate + 0.5)        epsilon = F0_epsilon(threshhold, esize, delta)        print(f"      With {threshhold = :7_} -> "            f"{estimate = :_}, +/-{epsilon*100:.0f}%"            + (f" {esize = :_}" if esize != size else ""))        threshhold *= 8

The algorithm generates an estimate based on random sampling, so I run it multiple times for the same input and report the mean estimate from those runs.

Sample output

 

CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM
  Actual:    size = 262_145, unique = 1  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 1, +/-1026%      With threshhold =      16 -> estimate = 1, +/-363%      With threshhold =     128 -> estimate = 1, +/-128%      With threshhold =   1_024 -> estimate = 1, +/-45%      With threshhold =   8_192 -> estimate = 1, +/-16%      With threshhold =  65_536 -> estimate = 1, +/-6%
  Actual:    ...   Actual:    size = 262_145, unique = 1_024  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 16_384, +/-1026%      With threshhold =      16 -> estimate = 768, +/-363%      With threshhold =     128 -> estimate = 1_101, +/-128%      With threshhold =   1_024 -> estimate = 1_018, +/-45%      With threshhold =   8_192 -> estimate = 1_024, +/-16%      With threshhold =  65_536 -> estimate = 1_024, +/-6%
  Actual:    size = 262_145, unique = 4_096  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 13_107, +/-1026%      With threshhold =      16 -> estimate = 3_686, +/-363%      With threshhold =     128 -> estimate = 3_814, +/-128%      With threshhold =   1_024 -> estimate = 4_083, +/-45%      With threshhold =   8_192 -> estimate = 4_096, +/-16%      With threshhold =  65_536 -> estimate = 4_096, +/-6%
  Actual:    size = 262_145, unique = 16_384  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 0, +/-1026%      With threshhold =      16 -> estimate = 15_155, +/-363%      With threshhold =     128 -> estimate = 16_179, +/-128%      With threshhold =   1_024 -> estimate = 16_986, +/-45%      With threshhold =   8_192 -> estimate = 16_211, +/-16%      With threshhold =  65_536 -> estimate = 16_384, +/-6%
  Actual:    size = 262_145, unique = 64_347  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 26_214, +/-1026%      With threshhold =      16 -> estimate = 73_728, +/-363%      With threshhold =     128 -> estimate = 61_030, +/-128%      With threshhold =   1_024 -> estimate = 64_422, +/-45%      With threshhold =   8_192 -> estimate = 64_760, +/-16%      With threshhold =  65_536 -> estimate = 64_347, +/-6%

 Looks good!

Wikipedia

Another day, and I decide to start writing this blog post. I searched again and found the Wikipedia article on what it called the Count-distinct problem

Looking through it, It had this wrong description of the CVM algorithm:

The, (or a?),  problem with the wikipedia entry is that it shows

p ← p 2

...within the while loop. You need an enclosing if |B| >= s for the while loop and the  assignment to p outside the while loop, but inside this new if statement.

It's tough!

Both Quanta Magazine, and whoever added the algorithm to Wikipedia got the algorithm wrong.

I've written around two hundred tasks on site Rosettacode.org for over a decade. Others had to read my description and create code in their chosen language to implement those tasks. I have learnt from the feedback I got on talk pages to hone that craft, but details matter. Examples matter. Constructive feedback matters.

END.

 

Categories: FLOSS Project Planets

Real Python: Efficient Iterations With Python Iterators and Iterables

Tue, 2024-05-28 10:00

Python’s iterators and iterables are two different but related tools that come in handy when you need to iterate over a data stream or container. Iterators power and control the iteration process, while iterables typically hold data that you want to iterate over one value at a time.

Iterators and iterables are fundamental components of Python programming, and you’ll have to deal with them in almost all your programs. Learning how they work and how to create them is key for you as a Python developer.

In this video course, you’ll learn how to:

  • Create iterators using the iterator protocol in Python
  • Understand the differences between iterators and iterables
  • Work with iterators and iterables in your Python code
  • Use generator functions and the yield statement to create generator iterators
  • Build your own iterables using different techniques, such as the iterable protocol
  • Use the asyncio module and the await and async keywords to create asynchronous iterators

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!

Tue, 2024-05-28 06:27

PSF Board elections are a chance for the community to choose representatives to help the PSF create a vision for and build the future of the Python community. This year there are 3 seats open on the PSF board. Check out who is currently on the PSF Board. (Débora Azevedo, Kwon-Han Bae, and Tania Allard are at the end of their current terms.)

Office Hours Details

This year, the PSF Board is running Office Hours so you can connect with current members to ask questions and learn more about what being a part of the Board entails. There will be two Office Hour sessions:

  • June 11th, 4 PM UTC
  • June 18th, 12 PM UTC

Make sure to check what time that is for you. We welcome you to join the PSF Discord and navigate to the #psf-elections channel to participate in Office Hours. The server is moderated by PSF Staff and locked between office hours sessions. If you’re new to Discord, check out some Discord Basics to help you get started.

Who runs for the Board?

People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Nomination info

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, so you have a few weeks to research the role and craft a nomination statement. The nomination period ends on June 25th, 2:00 PM UTC.

Categories: FLOSS Project Planets

Robin Wilson: How to install the Python triangle package on an Apple Silicon Mac

Tue, 2024-05-28 05:53

I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:

triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:

pip install git+https://github.com/drufat/triangle.git

and it should all work fine.

Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.

Categories: FLOSS Project Planets

Real Python: How to Create Pivot Tables With pandas

Mon, 2024-05-27 10:00

A pivot table is a data analysis tool that allows you to take columns of raw data from a pandas DataFrame, summarize them, and then analyze the summary data to reveal its insights.

Pivot tables allow you to perform common aggregate statistical calculations such as sums, counts, averages, and so on. Often, the information a pivot table produces reveals trends and other observations your original raw data hides.

Pivot tables were originally implemented in early spreadsheet packages and are still a commonly used feature of the latest ones. They can also be found in modern database applications and in programming languages. In this tutorial, you’ll learn how to implement a pivot table in Python using pandas’ DataFrame.pivot_table() method.

Before you start, you should familiarize yourself with what a pandas DataFrame looks like and how you can create one. Knowing the difference between a DataFrame and a pandas Series will also prove useful.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.

The other thing you’ll need for this tutorial is, of course, data. You’ll use the Sales Data Presentation - Dashboards data, which is freely available for you to use under the Apache 2.0 License. The data has been made available for you in the sales_data.csv file that you can download by clicking the link below.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

This table provides an explanation of the data you’ll use throughout this tutorial:

Column Name Data Type (PyArrow) Description order_number int64 Order number (unique) employee_id int64 Employee’s identifier (unique) employee_name string Employee’s full name job_title string Employee’s job title sales_region string Sales region employee works within order_date timestamp[ns] Date order was placed order_type string Type of order (Retail or Wholesale) customer_type string Type of customer (Business or Individual) customer_name string Customer’s full name customer_state string Customer’s state of residence product_category string Category of product (Bath Products, Gift Basket, Olive Oil) product_number string Product identifier (unique) product_name string Name of product quantity int64 Quantity ordered unit_price double Selling price of one product sale_price double Total sale price (unit_price × quantity)

As you can see, the table stores data for a fictional set of orders. Each row contains information about a single order. You’ll become more familiar with the data as you work through the tutorial and try to solve the various challenge exercises contained within it.

Throughout this tutorial, you’ll use the pandas library to allow you to work with DataFrames and the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types pandas uses by default.

If you’re working at the command line, you can install both pandas and pyarrow using python -m pip install pandas pyarrow, perhaps within a virtual environment to avoid clashing with your existing environment. If you’re working within a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. With the libraries in place, you can then read your data into a DataFrame:

Python >>> import pandas as pd >>> sales_data = pd.read_csv( ... "sales_data.csv", ... parse_dates=["order_date"], ... dayfirst=True, ... ).convert_dtypes(dtype_backend="pyarrow") Copied!

First of all, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the sales_data variable, you used pandas’ read_csv() function. The first parameter refers to the file being read, while parse_dates highlights that the order_date column’s data is intended to be read as the datetime64[ns] type. But there’s an issue that will prevent this from happening.

In your source file, the order dates are in dd/mm/yyyy format, so to tell read_csv() that the first part of each date represents a day, you also set the dayfirst parameter to True. This allows read_csv() to now read the order dates as datetime64[ns] types.

With order dates successfully read as datetime64[ns] types, the .convert_dtypes() method can then successfully convert them to a timestamp[ns][pyarrow] data type, and not the more general string[pyarrow] type it would have otherwise done. Although this may seem a bit circuitous, your efforts will allow you to analyze data by date should you need to do this.

If you want to take a look at the data, you can run sales_data.head(2). This will let you see the first two rows of your dataframe. When using .head(), it’s preferable to do so in a Jupyter Notebook because all of the columns are shown. Many Python REPLs show only the first and last few columns unless you use pd.set_option("display.max_columns", None) before you run .head().

If you want to verify that PyArrow types are being used, sales_data.dtypes will confirm it for you. As you’ll see, each data type contains [pyarrow] in its name.

Note: If you’re experienced in data analysis, you’re no doubt aware of the need for data cleansing. This is still important as you work with pivot tables, but it’s equally important to make sure your input data is also tidy.

Tidy data is organized as follows:

  • Each row should contain a single record or observation.
  • Each column should contain a single observable or variable.
  • Each cell should contain an atomic value.

If you tidy your data in this way, as part of your data cleansing, you’ll also be able to analyze it better. For example, rather than store address details in a single address field, it’s usually better to split it down into house_number, street_name, city, and country component fields. This allows you to analyze it by individual streets, cities, or countries more easily.

In addition, you’ll also be able to use the data from individual columns more readily in calculations. For example, if you had columns room_length and room_width, they can be multiplied together to give you room area information. If both values are stored together in a single column in a format such as "10 x 5", the calculation becomes more awkward.

The data within the sales_data.csv file is already in a suitably clean and tidy format for you to use in this tutorial. However, not all raw data you acquire will be.

It’s now time to create your first pandas pivot table with Python. To do this, first you’ll learn the basics of using the DataFrame’s .pivot_table() method.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

Take the Quiz: Test your knowledge with our interactive “How to Create Pivot Tables With pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Create Pivot Tables With pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won't find all the answers by reading the tutorial, so you'll need to do some investigating on your own. By finding all the answers, you're sure to learn some other interesting things along the way.

How to Create Your First Pivot Table With pandas

Now that your learning journey is underway, it’s time to progress toward your first learning milestone and complete the following task:

Calculate the total sales for each type of order for each region.

Read the full article at https://realpython.com/how-to-pandas-pivot-table/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Bytes: #385 RESTing on Postgres

Mon, 2024-05-27 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/PostgREST/postgrest">PostgresREST</a></li> <li><a href="https://jacobpadilla.com/articles/recreating-asyncio"><strong>How Python Asyncio Works: Recreating it from Scratch</strong></a></li> <li><a href="https://higherorderco.com">Bend</a></li> <li><a href="https://leanpub.com/regexpython/">The Smartest Way to Learn Python Regular Expressions</a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=f-tuQBIn1fQ' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="385">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by Mailtrap: <a href="https://pythonbytes.fm/mailtrap"><strong>pythonbytes.fm/mailtrap</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="https://github.com/PostgREST/postgrest">PostgresREST</a></p> <ul> <li>PostgREST serves a fully RESTful API from any existing PostgreSQL database. It provides a cleaner, more standards-compliant, faster API than you are likely to write from scratch.</li> <li>Speedy <ul> <li>First the server is written in <a href="https://www.haskell.org/">Haskell</a> using the <a href="http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchmarks">Warp</a> HTTP server (aka a compiled language with lightweight threads). </li> <li>Next it delegates as much calculation as possible to the database.</li> <li>Finally it uses the database efficiently with the <a href="https://nikita-volkov.github.io/hasql-benchmarks/">Hasql</a> library</li> </ul></li> <li>PostgREST <a href="http://postgrest.org/en/stable/auth.html">handles authentication</a> (via JSON Web Tokens) and delegates authorization to the role information defined in the database. This ensures there is a single declarative source of truth for security.</li> </ul> <p><strong>Brian #2:</strong> <a href="https://jacobpadilla.com/articles/recreating-asyncio"><strong>How Python Asyncio Works: Recreating it from Scratch</strong></a></p> <ul> <li>Jacob Padilla</li> <li>Cool tutorial walking through how async works, including <ul> <li>Generators Review</li> <li>The Event Loop</li> <li>Sleeping</li> <li>Yield to Await</li> <li>Await with AsyncIO</li> </ul></li> <li>Another great async resource is: <ul> <li><a href="https://www.youtube.com/watch?v=Y4Gt3Xjd7G8">Build your Own Async</a> <ul> <li>David Beasley talk from 2019</li> </ul></li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="https://higherorderco.com">Bend</a></p> <ul> <li>A massively parallel, high-level programming language.</li> <li>With <strong>Bend</strong> you can write parallel code for multi-core CPUs/GPUs without being a C/CUDA expert with 10 years of experience. </li> <li>It feels just like Python!</li> <li>No need to deal with the complexity of concurrent programming: locks, mutexes, atomics... <strong>any</strong> work that can be done in parallel <strong>will</strong> be done in parallel.</li> </ul> <p><strong>Brian #4:</strong> <a href="https://leanpub.com/regexpython/">The Smartest Way to Learn Python Regular Expressions</a></p> <ul> <li>Christian Mayer, Zohaib Riaz, and Lukas Rieger</li> <li>Self published ebook on Python Regex that utilizes <ul> <li>book form readings, links to video course sections</li> <li>puzzle challenges to complete online</li> </ul></li> <li>It’s a paid resource, but the min is free.</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://www.jordanmechner.com/en/books/replay">Replay</a> - A graphic memoir by Prince of Persia creator Jordan Mechner, recounting his own family story of war, exile and new beginnings.</li> </ul> <p>Michael:</p> <ul> <li><a href="https://en.wikipedia.org/wiki/Python_Conference">PyCon 2026</a></li> </ul> <p><strong>Joke:</strong> Shells Scripts</p>
Categories: FLOSS Project Planets

Zato Blog: Web scraping as an API service

Mon, 2024-05-27 04:00
Web scraping as an API service 2024-05-27, by Dariusz Suchojad Overview

In systems-to-systems integrations, there comes an inevitable time when we have to employ some kind of a web scraping tool to integrate with a particular application. Despite its not being our first choice, it is good to know what to use at such a time - in this article, I provide a gentle introduction to my favorite tool of this kind, called Playwright, followed by sample Python code that integrates it with an API service.

Naturally, in the context of backend integrations, web scraping should be avoided and, generally, it should be considered the last resort. The basic issue here is that while the UI term contains the "interface" part, it is not really the "Application Programming" Interface that we would like to have.

It is not that the UI cannot be programmed against. After all, a web browser does just that, it takes a web page and renders it as expected. Same goes for desktop or mobile applications. Also, anyone integrating with mainframe computers will recognize that this is basically what 3270 can be used for too.

Rather, the fundamental issue is that web scraping goes against the principles of separation of layers and roles across frontend, middleware and backend, which in turn means that authors of resources (e.g. HTML pages) do not really expect for many people to access them in automated ways.

Perhaps they actually should expect it, and web pages should finally start to resemble genuine knowledge graphs, easy to access by humans, be it manually or through automation tools, but the reality today is that it is not the case and, in comparison with backend systems, the whole of the web scraping space is relatively brittle, which is why we shun this approach in integrations.

Yet, another part of reality, particularly in enterprise integrations, is that people may be sometimes given access to a frontend application on an internal network and that is it. No API, no REST, no JSON, no POST data, no real data formats, and one is simply supposed to fill out forms as part of a business process.

Typically, such a situation will result in an integration gap. There will be fully automated parts in the business process preceding this gap, with multiple systems coordinated towards a specific goal and there will be subsequent steps in the process, also fully automated.

Or you may be given access only to a specific frontend and only through VPN via a single remote Windows desktop. Getting access to a REST API may take months or may be never realized because of some high level licensing issues. This is not uncommon in the real life.

Such a gap can be a jarring and sore point, truly ruining the whole, otherwise fluid, integration process. This creates a tension and to resolve the tension, we can, should all the attempts to find a real API fail, finally resort to web scraping.

It is mostly in this context that I am looking at Playwright below - the tool is good and it has many other uses that go beyond the scope of this text, and it is well worth knowing it, for instance for frontend testing of your backend systems, but, when we deal with API integrations, we should not overdo with web scraping.

Needless to say, if web scraping is what you do primarily, your perspective will be somewhat different - you will not need any explanation of why it is needed or when, and you may be only looking for a way to enclose up your web scraping code in API services. This article will explain that too.

Introducing Playwright

The nice part of Playwright is that we can use it to visually prepare a draft of Python code that will scrape a given resource. That is, instead of programming it in Python, we go to an address, fill out a form, click buttons and otherwise use everything as usually and Playwright generates for us code that will be later used in integrations.

That code will require a bit of clean-up work, which I will talk about below, but overall it works very nicely and is certainly useful. The result is not one of these do-not-touch auto-generated pieces of code that are better left to their own.

While there are better ways to integrate with Jira, I chose that application as an example of Playwright's usage simply because I cannot show you any internal application in a public blog post.

Below, there are two windows. One is Playwright's emulating a Blackberry device to open a resource. I was clicking around, I provided an email address and then I clicked the same email field once more. To the right, based on my actions, we can find the generated Python code, which I consider quite good and readable.

The Playwright Inspector, the tool that gave us the code, will keep recording all of our actions until we click the "Record" button which then allows us to click the button next to "Record" which is "Copy code to clipboard". We can then save the code to a separate file and run it on demand, automatically.

But first, we will need to install Playwright.

Installing and starting Playwright

The tools is written in TypeScript and can be installed using npx, which in turn is part of NodeJS.

Afterwards, the "playwright install" call is needed as well because that will potentially install runtime dependencies, such as Chrome libraries.

Finally, we install Playwright using pip as well because we want to access with Python. Note that if you are installing Playwright under Zato, the "/path/to/pip" will be typically "/opt/zato/code/bin/pip".

npx -g --yes playwright install playwright install /path/to/pip install playwright

We can now start it as below. I am using BlackBerry as an example of what Playwright is capable of. Also, it is usually more convenient to use a mobile version of a site when the main window and Inspector are opened side by side, but you may prefer to use Chrome, Firefox or anything else.

playwright codegen https://example.atlassian.net/jira --device "BlackBerry Z30"

That is practically everything as using Playwright to generate code in our context goes. Open the tool, fill out forms, copy code to a Python module, done.

What is still needed, though, is cleaning up the resulting code and embedding it in an API integration process.

Code clean-up

After you keep using Playwright for a while with longer forms and pages, you will note that the generated code tends to accumulate parts that repeat.

For instance, in the module below, which I already cleaned up, the same "[placeholder=\"Enter email\"]" reference to the email field is used twice, even if a programmer developing this could would prefer to introduce a variable for that.

There is not a good answer to the question of what to do about it. On the one hand, obviously, being programmers we would prefer not to repeat that kind of details. On the other hand, if we clean up the code too much, this may result in too much of a maintenance burden because we need to keep it mind that we do not really want to invest to much in web scraping and, should there be a need to repeat the whole process, we do not want to end up with Playwright's code auto-generated from scratch once more, without any of our clean-up.

A good compromise position is to at least extract any kind of credentials from the code to environment variables or a similar place and to remove some of the code comments that Playwright generates. The result as below is what it should like at the end. Not too much effort without leaving the whole code as it was originally either.

Save the code below as "play1.py" as this is what the API service below will use.

# -*- coding: utf-8 -*- # stdlib import os # Playwright from playwright.sync_api import Playwright, sync_playwright class Config: Email = os.environ.get('APP_EMAIL', 'zato@example.com') Password = os.environ.get('APP_PASSWORD', '') Headless = bool(os.environ.get('APP_HEADLESS', False)) def run(playwright: Playwright) -> None: browser = playwright.chromium.launch(headless=Config.Headless) # type: ignore context = browser.new_context() # Open new page page = context.new_page() # Open project boards page.goto("https://example.atlassian.net/jira/software/projects/ABC/boards/1") page.goto("https://id.atlassian.com/login?continue=https%3A%2F%2Fexample.atlassian.net%2Flogin%3FredirectCount%3D1%26dest-url%3D%252Fjira%252Fsoftware%252Fprojects%252FABC%252Fboards%252F1%26application%3Djira&application=jira") # Fill out the email page.locator("[placeholder=\"Enter email\"]").click() page.locator("[placeholder=\"Enter email\"]").fill(Config.Email) # Click #login-submit page.locator("#login-submit").click() with sync_playwright() as playwright: run(playwright) Web scraping as a standalone activity

We have the generated code so the first thing to do with it is to run it from command line. This will result in a new Chrome window's accessing Jira - it is Chrome, not Blackberry, because that is the default for Playwright.

The window will close soon enough but this is fine, that code only demonstrates a principle, it is not a full integration task.

python /path/to/play1.py

It is also useful that we can run the same Python module from our IDE, giving us the ability to step through the code line by line, observing what changes when and why.

Web scraping as an API service

Finally, we are ready to invoke the standalone module from an API service, as in the following code that we are also going to make available as a REST channel.

A couple of notes about the Python service below:

  • We invoke Playwright in a subprocess, as a shell command
  • We accept input through data models although we do not provide any output definition because it is not needed here
  • When we invoke Playwright, we set the APP_HEADLESS to True which will ensure that it does not attempt to actually display a Chrome window. After all, we intend for this service to run on Linux servers, in backend, and such a thing will be unlikely to work in this kind of an environment.

Other than that, this is a straightforward Zato service - it receives input, carries out its work and a reply is returned to the caller (here, empty).

# -*- coding: utf-8 -*- # stdlib from dataclasses import dataclass # Zato from zato.server.service import Model, Service # ########################################################################### @dataclass(init=False) class WebScrapingDemoRequest(Model): email: str password: str # ########################################################################### class WebScrapingDemo(Service): name = 'demo.web-scraping' class SimpleIO: input = WebScrapingDemoRequest def handle(self): # Path to a Python installation that Playwright was installed under py_path = '/path/to/python' # Path to a Playwright module with code to invoke playwright_path = '/path/to/the-playwright-module.py' # This is a template script that we will invoke in a subprocess command_template = """ APP_EMAIL={app_email} APP_PASSWORD={app_password} APP_HEADLESS=True {py_path} {playwright_path} """ # This is our input data input = self.request.input # type: WebScrapingDemoRequest # Extract credentials from the input .. email = input.email password = input.password # .. build the full command, taking all the config into account .. command = command_template.format( app_email = email, app_password = password, py_path = py_path, playwright_path = playwright_path, ) # .. invoke the command in a subprocess .. result = self.commands.invoke(command) # .. if it was not a success, log the details received .. if not result.is_ok: self.logger.info('Exit code -> %s', result.exit_code) self.logger.info('Stderr -> %s', result.stderr) self.logger.info('Stdout -> %s', result.stdout) # ###########################################################################

Now, the REST channel:

The last thing to do is to invoke the service - I am using curl from the command line below but it could very well be Postman or a similar option.

curl localhost:17010/demo/web-scraping -d '{"email":"hello@example.com", "password":"abc"}' ; echo

There will be no Chrome window this time around because we run Playwright in the headless mode. There will be no output from curl either because we do not return anything from the service but in server logs we will find details such as below.

We can learn from the log that the command took close to 4 seconds to complete, that the exit code was 0 (indicating success) and that is no stdout or stderr at all.

INFO - Command ` APP_EMAIL=hello@example.com APP_PASSWORD=abc APP_HEADLESS=True /path/to/python /path/to/the-playwright-module.py ` completed in 0:00:03.844157, exit_code -> 0; len-out=0 (0 Bytes); len-err=0 (0 Bytes); cid -> zcmdc5422816b2c6ff9f10742134

We are now ready to continue to work on it - for instance, you will notice that the password is visible in logs and this should not be allowed.

But, all such works are extra in comparison with the main theme - we have Playwright, which is a a tool that allows us to quickly integrate with frontend applications and we can automate it through API services. Just as expected.

Next steps More blog posts
Categories: FLOSS Project Planets

Quansight Labs Blog: Dataframe interoperability - what has been achieved, and what comes next?

Sun, 2024-05-26 20:00
An overview of the dataframe landscape, and solution to the "we only support pandas" problem
Categories: FLOSS Project Planets

Talk Python to Me: #463: Running on Rust: Granian Web Server

Sat, 2024-05-25 04:00
So you've created a web app with Python using Flask, Django, FastAPI, or even Emmett. It works great on your machine. How do you get it out to the world? You'll need a production-ready web server. On this episode, we have Giovanni Barillari to tell us about his relatively-new server named Granian. It promises better performance and much better consistency than many of the more well known ones today.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/neo4j-graphstuff'>Neo4j</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>New spaCy course</b>: <a href="https://training.talkpython.fm/courses/getting-started-with-spacy" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>Giovanni</b>: <a href="https://twitter.com/gi0baro" target="_blank" rel="noopener">@gi0baro</a><br/> <b>Granian</b>: <a href="https://github.com/emmett-framework/granian" target="_blank" rel="noopener">github.com</a><br/> <b>Emmett</b>: <a href="https://emmett.sh" target="_blank" rel="noopener">emmett.sh</a><br/> <b>Renoir</b>: <a href="https://github.com/emmett-framework/renoir" target="_blank" rel="noopener">github.com</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=KwqO7KVEpxs" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/463/running-on-rust-granian-web-server" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Real Python: Quiz: How to Create Pivot Tables With pandas

Fri, 2024-05-24 08:00

In this quiz, you’ll test your understanding of how to create pivot tables with pandas.

By working through this quiz, you’ll review your knowledge of pivot tables and also expand beyond what you learned in the tutorial. For some of the questions, you’ll need to do some research outside of the tutorial itself.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Luke Plant: pyastgrep and custom linting

Thu, 2024-05-23 15:07

A while back I released pyastgrep, which is a rewrite of astpath. It’s a tool that allows you to search for specific Python syntax elements using XPath as a query language.

As part of the rewrite, I separated out the layers of code so that it can now be used as a library as well as a command line tool. I haven’t committed to very much API surface area for library usage, but there is enough.

My main personal use of this has been for linting tasks or enforcing of conventions that might be difficult to do otherwise. I don’t always use this – quite often I’d reach for custom Semgrep rules, and at other times I use introspection to enforce conventions. However, there are times when both of these fail or are rather difficult.

Examples

Some examples of the kinds of rules I’m thinking of include:

  • Boolean arguments to functions/methods should always be “keyword only”.

    Keyword-only arguments are a big win in many cases, and especially when it comes to boolean values. For example, forcing delete_thing(True, False). to be something like delete_thing(permanent=True, force=False) is an easy win, and this is common enough that applying this as a default policy across the code base will probably be a good idea.

    The pattern can be distinguished easily at syntax level. Good:

    def foo(*, my_bool_arg: bool): ...

    Bad:

    def foo(my_bool_arg: bool): ...
  • Simple coding conventions like “Don’t use single letter variables like i or j as a loop variables, use index or idx instead”.

    This can be found by looking for code like:

    for i, val in enumerate(...): ...

    You might not care about this, but if you do, you really want the rule to be applied as an automated test, not a nit-picky code review.

  • A Django-specific one: for inclusion tags, the tag names should match the template file name. This is nice for consistency and code navigation, plus I actually have some custom “jump to definition” code in my editor that relies on it for fast navigation.

    The pattern can again be seen quite easily at the syntax level. Good:

    @inclusion_tag("something/foo.html") def foo(): ...

    Bad:

    @inclusion_tag("something/bar.html") def foo(): ...
  • Any ’task’ (something decorated with @task) should be named foo_task or foo_task, in order to give a clue that it works as an asynchronous call, and its return value is just a promise object.

There are many more examples you’ll come up with once you start thinking like this.

Method

Having identified the bad patterns we want to find and fix, my method for doing so looks as follows. It contains a number of tips and refinements I’ve made over the past few years.

First, I open a test file, e.g. tests/test_conventions.py, and start by inserting some example code – at least one bad example (the kind we are trying to fix), and one good example.

There are a few reasons for this:

  • First, I need to make sure I can prove life exists on earth, as John D. Cook puts it. I’ll say more about this later on.

  • Second, it gives me a deliberately simplified bit of code that I can pass to pyastdump.

  • Third, it provides some explanation for the test you are going to write, and a potentially rather hairy XPath expression.

I’ll use my first example above, keyword-only boolean args. I start by inserting the following text into my test file:

def bad_boolean_arg(foo: bool): pass def good_boolean_arg(*, foo: bool): pass

Then, I copy both of these in turn to the clipboard (or both together if there isn’t much code, like in this case), and pass them through pyastdump. From a terminal, I do:

$ xsel | pyastdump -

I’m using the xsel Linux utility, you can also use xclip -out, or pbpaste on MacOS, or Get-Clipboard in Powershell.

This gives me some AST to look at, structured as XML:

<Module> <body> <FunctionDef lineno="1" col_offset="0" type="str" name="bad_boolean_arg"> <args> <arguments> <posonlyargs/> <args> <arg lineno="1" col_offset="20" type="str" arg="foo"> <annotation> <Name lineno="1" col_offset="25" type="str" id="bool"> <ctx> <Load/> </ctx> </Name> </annotation> </arg> </args> <kwonlyargs/> <kw_defaults/> <defaults/> </arguments> </args> <body> <Pass lineno="2" col_offset="4"/> </body> <decorator_list/> </FunctionDef> </body> <type_ignores/> </Module>

In this case, the current structure of Python’s AST has helped us out a lot – it has separated out posonlyargs (positional only arguments), args (positional or keyword), and kwonlyargs (keyword only args). We can see the offending annotation containing a Name with id="bool" inside the args, when we want it only to be allowed as a keyword-only argument.

(Do we want to disallow boolean-annotated arguments as positional only? I’m leaning towards “no” here, as positional only is quite rare and usually a very deliberate choice).

I now have to construct an XPath expression that will find the offending XML nodes, but not match good examples. It’s pretty straightforward in this case, once you know the basics of XPath. I test it out straight away at the CLI:

pyastgrep './/FunctionDef/args/arguments/args/arg/annotation/Name[@id="bool"]' tests/test_conventions.py

If I’ve done it correctly, it should print my bad example, and not my good example.

Then I widen the net, omitting tests/test_conventions.py to search everywhere in my current directory.

At this point, I’ve probably got some real results that I want to address, but I might also notice there are other variants of the same thing I need to be able to match, and so I iterate, adding more bad/good examples as necessary.

Now I need to write a test. It’s going to look like this:

def test_boolean_arguments_are_keyword_only(): assert_expected_pyastgrep_matches( """ .//FunctionDef/args/arguments/args/arg/annotation/Name[@id="bool"] """, message="Function arguments with type `bool` should be keyword-only", expected_count=1, )

Of course, the real work is being done inside my assert_expected_pyastgrep_matches utility, which looks like this:

from pathlib import Path from boltons import iterutils from pyastgrep.api import Match, search_python_files SRC_ROOT = Path(__file__).parent.parent.resolve() # depends on project structure def assert_expected_pyastgrep_matches(xpath_expr: str, *, expected_count: int, message: str): """ Asserts that the pyastgrep XPath expression matches only `expected_count` times, each of which must be marked with `pyastgrep_exception` `message` is a message to be printed on failure. """ xpath_expr = xpath_expr.strip() matches: list[Match] = [item for item in search_python_files([SRC_ROOT], xpath_expr) if isinstance(item, Match)] expected_matches, other_matches = iterutils.partition( matches, key=lambda match: "pyastgrep: expected" in match.matching_line ) if len(expected_matches) < expected_count: assert False, f"Expected {expected_count} matches but found {len(expected_matches)} for {xpath_expr}" assert not other_matches, ( message + "\n Failing examples:\n" + "\n".join( f" {match.path}:{match.position.lineno}:{match.position.col_offset}:{match.matching_line}" for match in other_matches ) )

There is a bit of explaining to do now.

Being sure that you can “find life on earth” is especially important for a negative test like this. It would be very easy to have an XPath query that you thought worked but didn’t, as it might just silently return zero results. In addition, Python’s AST is not stable – so a query that works now might stop working in the future.

It’s like you have a machine that claims to be able to find needles in haystacks – when it comes back and says “no needles found”, do you believe it? To increase your confidence that everything works and continues to work, you place a few needles at locations that you know, then check that the machine is able to find those needles. When it claims “found exactly 2 needles”, and you can account for those, you’ve got much more confidence that it has indeed found the only needles.

So, it’s important to leave my bad examples in there.

But, I obviously don’t want the bad examples to cause the test to fail! In addition, I want a mechanism for exceptions. A simple mechanism I’ve chosen is to add the text pyastgrep: expected as a comment.

So, I need to change my bad example like this:

def bad_boolean_arg(foo: bool): # pyastgrep: expected pass

I also pass expected_count=1 to indicate that I expect to find at least one bad example (or more, if I’ve added more bad examples).

Hopefully that explains everything assert_expected_pyastgrep_matches does. A couple more notes:

  • it uses boltons, a pretty useful set of Python utilities

  • it requires a SRC_ROOT folder to be defined, which will depend on your project, and might be different depending on which folder(s) you want to apply the convention too.

Now, everything is set up, and I run the test for real, hopefully locating all the bad usages. I work through them and fix, then leave the test in.

Tips
  • pyastgrep works strictly at the syntax level, so unlike Semgrep you might get caught out by aliases if you try match on specific names:

    from foo import bar from foo import bar as foo_bar import foo # These all call the same function but look different in AST: foo.bar() bar() foo_bar()
  • There is however, an advantage to this – you don’t need a real import to construct your bad examples, you can just use a Mock. e.g. for my inclusion_tag example above, I have code like:

    from unittest.mock import Mock register = Mock() @register.inclusion_tag(filename="something/not_bad_tag.html") def bad_tag(): # pyastgrep: expected pass

    You can see the full code on GitHub.

  • You might be able to use a mixture of techniques:

    • A Semgrep rule avoids one set of bad patterns using some thirdparty.func, and requiring everyone to use your own wrapper, which is then constructed in such a way to make it easier to apply a pyastgrep rule

    • Some introspection that produces a list of classes or functions to which some rule applies, then dynamically generates XPath expression to pass to pyastgrep.

Conclusion

Syntax level searching isn’t right for every job, but it can be a powerful addition to your toolkit, and with a decent query language like XPath, you can do a surprising amount. Have a look at the pyastgrep examples for inspiration!

Categories: FLOSS Project Planets

Mike Driscoll: Episode 41 – Python Packaging and FOSS with Armin Ronacher

Thu, 2024-05-23 12:13

In this episode, I chatted with Armin Ronacher about his many amazing Python packages, such as pygments, flask, Jinja, Rye, and Click!

Specifically, we talked about the following:

  • How Flask came about
  • Favorite Python packages
  • Python packaging
  • and much more!
Links

The post Episode 41 – Python Packaging and FOSS with Armin Ronacher appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Python Anywhere: New help page: Playwright

Thu, 2024-05-23 07:00

We’ve had an increasing number of people asking us how to use Playwright on PythonAnywhere. Playwright is a browser automation framework that was developed by Microsoft; like the more-established Selenium it’s really useful for testing and web-scraping, and it’s getting a reputation for being a robust and fast library for that kind of work.

Getting it set up to run on PythonAnywhere is pretty easy, but you need to do the installation slightly differently to the way it’s documented on Playwright’s own site; user hcaptcha on our forums worked out what the trick was to making it work, so now we have a new help page documenting how to do it.

Categories: FLOSS Project Planets

EuroPython: EuroPython May 2024 Newsletter

Thu, 2024-05-23 06:12

Hello, Python people! &#x1F40D;

Welcome back to our cosy corner of community & code!

It&aposs absolutely fantastic to have you here again, especially as we countdown the days until our reunion in Prague! &#x1F389; It’s been months since our last gathering in this magical city, and we couldn’t be more excited to be back.

Fun fact: did you know that Czechia has recently been ranked among the world&aposs top 20 happiest countries? It&aposs the ideal place to spark joy and inspiration as we come together once again.

A lot of things have happened since our last catch-up. Let’s dive right into it!

&#x1F4E3; Programme

As you might know, we reached a record number of submitted proposals this year. After analyzing 627 submissions, we are glad to announce a sneak peek of what you will see at EuroPython this year.

If you are still not convinced to join us, we have another big announcement! Here are the keynote speakers we have lined up. &#x1F31F;

  • Carol Willing - Three-time Python Steering Council membership, Core Developer status, and PSF Fellow
  • Tereza Iofciu - Seasoned data pro with 15 years experience, Python community star, won the PSF community service award in 2021
  • &#x141;ukasz Langa - CPython Developer in Residence, Python 3.8 & 3.9 release manager, original creator of Black, pianist, dad
  • Mai Giménez - Google Deepmind senior engineer specializing in large language and multimodal models, former Spanish Python Association board member &#x1F40D;
  • Armin Ronacher - Creator of popular Python libraries such as Flask, Jinja, and Click, Armin is currently the VP of Platform at Sentry.
  • Anna P&#x159;istoupilová - Bioinformatics scientist focused on genome analysis techniques and their applications in understanding rare genetic diseases. She received the Bolzano Award for her doctoral thesis.

The schedule is gonna be packed with all your amazing talks, tutorials and posters. The official schedule with the dates and times will be posted soon!  Keep those eyes open on our social media channels and website! &#x1F4C5;✨

All of this wouldn’t be real without the great efforts of the EuroPython programme team.  Many volunteers from all teams spared their time to reach out to people and review the proposals. ❤️

&#x1F64C; Big cheers to those who helped shape the EuroPython programme making EuroPython 2024 the best one yet! &#x1F680;&#x1F40D;

&#x1F5C3;️ Keynote Speakers

Let us better introduce the listed EuroPython 2024 keynote speakers! ⚡️&#x1F40D;❤️

Keynote speaker #1: Carol WillingCarol Willing

Don&apost miss your chance to hear from Carol Willing, who is a three-time Python Steering Council member, Python Core Developer, PSF Fellow, and Project Jupyter core contributor. &#x1F525;

In 2019, she was honoured with the Frank Willison Award for her outstanding technical and community contributions to Python. Carol played a pivotal role in Project Jupyter, recognized with the prestigious 2017 ACM Software System Award for its enduring impact. Being the leading figure in open science and open-source governance herself. Carol serves on the advisory boards of Quansight Labs, CZI Open Science, and pyOpenSci.

She’s committed to democratizing open science through accessible tools and learning resources, and most recently served as Noteable&aposs VP of Engineering.

Get ready to be inspired by Carol&aposs insights at EuroPython 2024!

Keynote speaker #2: Tereza IofciuTereza Iofciu

Get ready to be inspired by Tereza Iofciu, a seasoned data practitioner and leadership coach with over 15 years of expertise in Data Science, Data Engineering, Product Management, and Team Management. &#x1F525;

Tereza&aposs dedication to the Python Community is unmatched; she has worn numerous hats over the years, serving as the organiser for PyLadies Hamburg, board member of Python Software Verband, steering committee member of NumFocus DISC, and team member of Python Software Foundation Code of Conduct. Not stopping there, Tereza is also actively involved in promoting Diversity & Inclusion as a working group member, while also taking the roles of the organiser for PyConDE & PyData Berlin, Python Pizza Hamburg, and co-leader of PyPodcats (If you haven&apost heard, PyPodcats is a fantastic new podcast dedicated to highlighting the hidden figures of the Python community. Led by the PyPodcats team—Cheuk Ting Ho, Georgi Ker, Mariatta Wijaya, and Tereza Iofciu— aimed to amplify the voices of underrepresented group members within the Python community). &#x1F408;&#x1F431;

In recognition of her outstanding contributions, Tereza was honoured with the Python Software Foundation Community Service Award in 2021. Now, if that&aposs not a sign to catch her awesome keynote, I don&apost know what is!

Keynote speaker #3: &#x141;ukasz Langa

Introducing &#x141;ukasz Langa: a polymath whose impact on the Python ecosystem is as diverse as his array of interests!

As the CPython Developer in Residence and the mastermind behind Python 3.8 & 3.9 releases, &#x141;ukasz plays a pivotal role in shaping the future of Python. He&aposs the original creator of Black, revolutionising the way we write Python code.

Beyond his coding prowess, &#x141;ukasz is a talented pianist and a devoted father.

When he&aposs not immersed in Python development, you&aposll find him indulging his passions for analogue modular synthesisers &#x1F60D;, immersing himself in captivating single-player role-playing games like Fallout and Elder Scrolls, or relishing in the complexity of a fine single malt Scotch whisky.

Brace yourself for an enlightening journey through &#x141;ukasz&aposs experiences and insights! &#x1F680;&#x1F3B9;&#x1F943;

Keynote speaker #4: Mai GiménezMai Giménez

Allow us to introduce Mai Giménez, Ph.D., a senior research engineer at Google DeepMind specialising in large language and multimodal models.

Mai&aposs passion lies in crafting technology that benefits everyone, with her primary research focus being on language and the sociotechnical impacts of AI in the real world. The impact of her contributions extends beyond her work at Google DeepMind. She&aposs a former board member of the Spanish Python Association and has played a pivotal role in organising several PyConES conferences.

Additionally, Mai proudly contributes to the Python community as a member of PyLadies. Get ready to be inspired by Mai&aposs expertise and insights as she graces the stage at EP24! &#x1F31F;

Keynote speaker #5: Armin Ronacher

A household name in the open-source world Armin Ronacher is the creator of popular Python libraries such as Flask, Jinja, and Click, Armin has left quite a mark on the Python ecosystem, empowering developers worldwide with efficient tools and frameworks.

Armin Ronacher

He is currently the VP of Platform at Sentry and recently he started an experimental Python package and project manager that attempts to bring Rust’s modern developer experience to Python. We are so excited to hear from him at EuroPython 2024!

Keynote speaker #6: Anna P&#x159;istoupilová

Put your hands together for Anna P&#x159;istoupilová!! Anna is a bioinformatics scientist focused on genome analysis techniques and their applications in understanding rare genetic diseases. She received the Bolzano Award for her doctoral thesis!

Anna P&#x159;istoupilová

Anna holds a PhD in Molecular and Cell Biology, Genetics, and Virology and two MSc degrees: one in Medical Technology and Informatics, and the other in Molecular Biology and Genetics, all from Charles University.

She has co-authored over 25 publications in peer-reviewed journals and has presented her work at various scientific conferences.

Currently, Anna works as a Senior Bioinformatics Scientist at DNAnexus company, where she assists customers with their bioinformatics analysis. She also conducts research at the Research Unit for Rare Diseases at the First Faculty of Medicine, Charles University.

&#x1F39F;️ Conference Registration

It&aposs time to secure your tickets for the conference!

We&aposve heard you loud and clear—you don’t want to miss the opportunity to hear from our incredible keynote speakers and be a part of EuroPython 2024.

Here&aposs the list of tickets available for purchase.&#x1F447;

  • Conference Tickets: access to the main Conference AND Sprints Weekend
  • Tutorial Tickets: access to the two days of Workshops AND Sprints Weekend. NO access to the main event.
  • Combined Tickets: access to everything for the seven days! Includes workshops, main Conference and Sprints weekend.

Other than the types, there are also payment tiers that are offered to answer each participant’s needs. Such as:

  • Business Tickets (for companies and employees funded by their companies)
  • Personal Tickets (for individuals)
  • Education Tickets (for students and teachers, an educational ID is required at the registration)
Buzzing registration desk from EuroPython 2023

For those who cannot physically join us but still want to support the community, we have the remote ticket option.

  • Remote ticket: access to the Live streaming of the talks, Q&A with the speakers and Discord server.

Join us and connect with the delightful community of Pythonistas in Prague. Make your summer more fun!

Need more information regarding tickets? Please visit the website or contact us at helpdesk@europython.eu!

⚖️ Visa Application

Not sure if you need one? Please check the website and consult your local consular office or embassy. &#x1F3EB;

If you do need a visa to attend EuroPython 2024, you can lodge a visa application issued for Short Stay (C), up to 90 days, for the purpose of “Business /Conference”. Please, do it ASAP!

Make sure you read all the visa pages carefully and prepare all the required documents before making your application. The EuroPython organizers are not able nor qualified to give visa advice.

However, we’re more than happy to help you with a visa support letter. But before sending your request, please note that you will need to be registered to request the letter. We can only issue visa support letters to confirmed participants.

Hence, We kindly ask you to purchase your ticket before filling in the request form.

For more information, please check https://ep2024.europython.eu/visa or contact us at visa@europython.eu. ✈️

&#x1F4B6; Financial Aid

The first round of our Financial Aid Programme received record-high applications this year and we are very proud to be supporting so many Pythonistas to attend the conference.

The second round of applications wrapped up on May 19th and now the team is actively working to individually review the applications! More information at https://ep2024.europython.eu/finaid/.

&#x1F4B0; Sponsorship

If you want to support EuroPython and its efforts to make the event accessible to everyone, please consider sponsoring, or asking your employer to do so. More information at: https://ep2024.europython.eu/sponsor &#x1FAC2;

Sponsoring EuroPython guarantees you highly targeted visibility and the opportunity to present yourself/your company to one of the largest and most diverse Python communities in Europe and beyond!

There are several sponsor tiers and slots are limited. This year, besides our main packages, we offer add-ons as optional extras where companies can opt to support the community in many other ways:

  • By directly sponsoring the PyLadies lunch event
  • By supporting participants by funding Financial Aid
  • By having their logo on all lanyards of the conference
  • Or even by improving the event’s accessibility.

Interested? Email us at sponsoring@europython.eu.

&#x1F91D;Join us as a Volunteer!

To make the conference an amazing experience for everyone, we need enthusiastic on-site volunteers from July 8-14. Whether you&aposre confident at leading people, love chatting with new folks at registration, are interested in chairing a session or just want to help out -  we&aposve got a role for you. Volunteering is a fantastic way to gain experience, make new connections, and have lots of fun while doing it.

Interested? Have a look at https://ep2024.europython.eu/volunteers to find out more and how to apply.

We&aposre also considering remote volunteers, so if you&aposre interested in helping out but can&apost make it to Prague, please mention that explicitly in your email.

We can&apost wait to see you in Prague! &#x1F680;

&#x1F39F;Events @EuroPythonEuroPython 2023 social event

This year, we want to make our social event bigger and better for everyone. Hence, we are planning to host a bigger party. Tickets will be available for purchase on the website soon! Stay tuned.

&#x1F389; CommunityEuroPython at PyCon Italia &#x1F1EE;&#x1F1F9; May 22nd - 25th 2024

PyCon Italia 2024 will happen in Florence. The birthplace of Renaissance will receive a wave of Pythonistas looking to geek out this year, including a lot of EuroPython people.

If you are going to PyCon Italia (tickets are sold out) join us to help organise EuroPython 2024!

&#x1F3A4; First-Time Speaker Workshop

Join us for the Speaker’s Mentorship Programme - First-Time Speaker Workshop on Monday, June 3rd, at 19:00 CEST! &#x1F3A4;

This online panel session features experienced speakers sharing advice for first-time (or other) speakers. Following the panel discussion, there will be a dedicated Q&A session to address all participants&apos inquiries. The event is open to the public, and registration is required through this form.

As the event date approaches, registered participants will receive further details via email. Don&apost miss this opportunity to learn and grow as a speaker!

News from the EuroPython Community❣️
  • Check out our phenomenal co-organizer Mia Bajic on a recent podcast where she shared her experiences volunteering in the Python community! &#x1F399;️ Mia is a true pillar of the Python community, she has shared her expertise and passion at multiple PyCons across the globe. &#x1F30D; Her efforts extend beyond borders as she tirelessly works to bring Pythonic people together in Prague, hosting events such as Pyvo and the first-ever Python Pizza in the city! &#x1F355; Mia&aposs dedication and contributions make both Czech Python community and EuroPython a better place, and we&aposre beyond grateful to have her on board shaping the EuroPython experience. &#x1F64C; Note: The podcast is in Czech. &#x1F3A7;

Podcast: https://www.youtube.com/watch?v=-UcHqap89Ac

  • Joana is doing a Master&aposs in Medical Imaging and Applications. Originally from Ghana, she joined the Communications team of EuroPython this year bringing her experience and innovative thinking from PyLadies Ghana.

She wrote an article about her Community involvement and the impact it has had on her career. She says:

I met and saw women who had achieved great things in data science and machine learning, which meant that I could also, through their stories, find a plan to at least help me get close to what they had done.

Full article: https://blog.europython.eu/community-post-invisible-threads/

&#x1F40D; Upcoming EventsGeoPython: May 27-29 2024
  • GeoPython 2024 will happen in Basel, Switzerland.

For more information about GeoPython 2024, you can visit their website: https://2024.geopython.net/

Djangocon.eu: June 5-9 2024

DjangoCon Europe 2024 will happen in Vigo, Spain. You can check more information about Django Con Europe at their lovely website (https://2024.djangocon.eu/)

PyCon Portugal: 17-19 October 2024

Pycon Portugal will happen in Altice Forum, Braga. More information on the official website: https://2024.pycon.pt/

PyCon Poland: August 29th - September 1st

The 16th edition of PyCon PL is happening in Gliwice! For more information, visit their website https://pl.pycon.org/2024

PyCon Taiwan 2024: September 21st - September 22nd

PyCon Taiwan will introduce a new activity segment: Poster Session! The deadline to submit your posters is June 15th, through the submission form.

More information on their website: https://tw.pycon.org/2024/en-us

&#x1F92D; Py.Jokes~ pyjoke Two threads walk into a bar. The barkeeper looks up and yells, &aposHey, I want don&apost any conditions race like time last!&apos&#x1F423; See You All Next Month

Before saying goodbye, thank you so much for reading. We can’t wait to reunite with all you amazing people in beautiful Prague again.

It truly is time to make new Python memories together!

With so much joy and excitement,

EuroPython 2024 Team &#x1F917;

Categories: FLOSS Project Planets

Matt Layman: Export Journal Feature - Building SaaS with Python and Django #191

Wed, 2024-05-22 20:00
In this episode, I started with cleaning up a few small items. After those warmups, we moved on to building an export feature that will allow users to take their journal entries if they want to leave the service.
Categories: FLOSS Project Planets

Ned Batchelder: Echos of the People API user guide

Wed, 2024-05-22 18:58

PyCon 2024 just happened, and as always it was a whirlwind of re-connecting with old friends, meeting new friends, talking about anything and everything, juggling, and getting simultaneously energized and exhausted. I won’t write about everything that happened, but one thread sticks with me: the continuing echos of my talk from last year.

If you haven’t seen it yet, I spoke at PyCon 2023 about the ways engineers can use their engineering skills to interact better with people, something engineers stereotypically need help with. I called it People: The API User’s Guide.

A number of people mentioned the talk to me this year, which was very gratifying. It’s good to hear that it stuck with people. A few said they had passed it on to others, which is a real sign that it landed well with them.

On the other hand, at lunch one day, someone asked me if I had done a talk last year, and I sheepishly answered, “um, yeah, I did the opening keynote...” It’s hard to answer that question without it becoming a humble brag!

The most unexpected echo of the talk was at the coverage.py sprint table. Ludovico Bianchi was working away when he turned to me and said, “oh, I forgot to send you this last year!” He showed me this picture he drew during the talk a year ago:

I’ve only ever stayed for one day of sprints. It can be hard to get people started on meaty issues in that short time. We have a good time anyway, and merge a few pull requests. This year, three people came back who sprinted with me in 2023, another sign that something is going right.

Once the sprint was over, Ludovico also sketched Sleepy into the group photo of the sprint gang:

Half the fun of preparing last year’s talk was art-directing the illustrations by my son Ben, similar to how we had worked to make Sleepy Snake. As much as I like hearing that people like my words, as a dad it’s just as good to hear that people like Ben’s art. Seeing other people play with Sleepy in clever ways is extra fun.

Categories: FLOSS Project Planets

Glyph Lefkowitz: A Grand Unified Theory of the AI Hype Cycle

Wed, 2024-05-22 12:58
The Cycle

The history of AI goes in cycles, each of which looks at least a little bit like this:

  1. Scientists do some basic research and develop a promising novel mechanism, N. One important detail is that N has a specific name; it may or may not be carried out under the general umbrella of “AI research” but it is not itself “AI”. N always has a few properties, but the most common and salient one is that it initially tends to require about 3x the specifications of the average computer available to the market at the time; i.e., it requires three times as much RAM, CPU, and secondary storage as is shipped in the average computer.
  2. Research and development efforts begin to get funded on the hypothetical potential of N. Because N is so resource intensive, this funding is used to purchase more computing capacity (RAM, CPU, storage) for the researchers, which leads to immediate results, as the technology was previously resource constrained.
  3. Initial successes in the refinement of N hint at truly revolutionary possibilities for its deployment. These revolutionary possibilities include a dimension of cognition that has not previously been machine-automated.
  4. Leaders in the field of this new development — specifically leaders, like lab administrators, corporate executives, and so on, as opposed to practitioners like engineers and scientists — recognize the sales potential of referring to this newly-“thinking” machine as “Artificial Intelligence”, often speculating about science-fictional levels of societal upheaval (specifically in a period of 5-20 years), now that the “hard problem” of machine cognition has been solved by N.
  5. Other technology leaders, in related fields, also recognize the sales potential and begin adopting elements of the novel mechanism to combine with their own areas of interest, also referring to their projects as “AI” in order to access the pool of cash that has become available to that label. In the course of doing so, they incorporate N in increasingly unreasonable ways.
  6. The scope of “AI” balloons to include pretty much all of computing technology. Some things that do not even include N start getting labeled this way.
  7. There’s a massive economic boom within the field of “AI”, where “the field of AI” means any software development that is plausibly adjacent to N in any pitch deck or grant proposal.
  8. Roughly 3 years pass, while those who control the flow of money gradually become skeptical of the overblown claims that recede into the indeterminate future, where N precipitates a robot apocalypse somewhere between 5 and 20 years away. Crucially, because of the aforementioned resource-intensiveness, the gold owners skepticism grows slowly over this period, because their own personal computers or the ones they have access to do not have the requisite resources to actually run the technology in question and it is challenging for them to observe its performance directly. Public critics begin to appear.
  9. Competent practitioners — not leaders — who have been successfully using N in research or industry quietly stop calling their tools “AI”, or at least stop emphasizing the “artificial intelligence” aspect of them, and start getting funding under other auspices. Whatever N does that isn’t “thinking” starts getting applied more seriously as its limitations are better understood. Users begin using more specific terms to describe the things they want, rather than calling everything “AI”.
  10. Thanks to the relentless march of Moore’s law, the specs of the average computer improve. The CPU, RAM, and disk resources required to actually run the software locally come down in price, and everyone upgrades to a new computer that can actually run the new stuff.
  11. The investors and grant funders update their personal computers, and they start personally running the software they’ve been investing in. Products with long development cycles are finally released to customers as well, but they are disappointing. The investors quietly get mad. They’re not going to publicly trash their own investments, but they stop loudly boosting them and they stop writing checks. They pivot to biotech for a while.
  12. The field of “AI” becomes increasingly desperate, as it becomes the label applied to uses of N which are not productive, since the productive uses are marketed under their application rather than their mechanism. Funders lose their patience, the polarity of the “AI” money magnet rapidly reverses. Here, the AI winter is finally upon us.
  13. The remaining AI researchers who still have funding via mechanisms less vulnerable to hype, who are genuinely thinking about automating aspects of cognition rather than simply N, quietly move on to the next impediment to a truly thinking machine, and in the course of doing so, they discover a new novel mechanism, M. Go to step 1, with M as the new N, and our current N as a thing that is now “not AI”, called by its own, more precise name.
The History

A non-exhaustive list of previous values of N have been:

  • Neural networks and symbolic reasoning in the 1950s.
  • Theorem provers in the 1960s.
  • Expert systems in the 1980s.
  • Fuzzy logic and hidden Markov models in the 1990s.
  • Deep learning in the 2010s.

Each of these cycles has been larger and lasted longer than the last, and I want to be clear: each cycle has produced genuinely useful technology. It’s just that each follows the progress of a sigmoid curve that everyone mistakes for an exponential one. There is an initial burst of rapid improvement, followed by gradual improvement, followed by a plateau. Initial promises imply or even state outright “if we pour more {compute, RAM, training data, money} into this, we’ll get improvements forever!” The reality is always that these strategies inevitably have a limit, usually one that does not take too long to find.

Where Are We Now?

So where are we in the current hype cycle?

Some Qualifications

History does not repeat itself, but it does rhyme. This hype cycle is unlike any that have come before in various ways. There is more money involved now. It’s much more commercial; I had to phrase things above in very general ways because many previous hype waves have been based on research funding, some really being exclusively a phenomenon at one department in DARPA, and not, like, the entire economy.

I cannot tell you when the current mania will end and this bubble will burst. If I could, you’d be reading this in my $100,000 per month subscribers-only trading strategy newsletter and not a public blog. What I can tell you is that computers cannot think, and that the problems of the current instantation of the nebulously defined field of “AI” will not all be solved within “5 to 20 years”.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. Special thanks also to Ben Chatterton for a brief pre-publication review; any errors remain my own. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “what are we doing that history will condemn us for”. Or, you know, Python programming.

Categories: FLOSS Project Planets

Real Python: The Python calendar Module: Create Calendars With Python

Wed, 2024-05-22 10:00

The Python calendar module provides several ways to generate calendars for Python programs. It also includes a variety of functions for working with calendar data as strings, numbers, and datetime objects.

In this tutorial, you’ll learn how to use the calendar module to create and customize calendars with Python.

By the end of this tutorial, you’ll be able to:

  • Display calendars in your terminal with Python
  • Create plain text and HTML calendars
  • Format calendars for specific locales and display conventions
  • Use calendar-related functions and methods to access lower-level calendar data in a variety of formats

Get Your Code: Click here to download the free sample code you’ll use to learn about creating calendars with the calendar module in Python.

Take the Quiz: Test your knowledge with our interactive “The Python calendar Module” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

The Python calendar Module

In this quiz, you'll test your understanding of the calendar module in Python. It'll evaluate your proficiency in manipulating, customizing, and displaying calendars directly within your terminal. By working through this quiz, you'll revisit the fundamental functions and methods provided by the calendar module.

Displaying Calendars in Your Terminal

Unix and Unix-like operating systems such as macOS and Linux include a cal command-line utility for displaying calendars in an interactive console:

Shell $ cal May 2024 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Copied!

Python provides a similar tool, which allows you to run the calendar module as a command-line script. To begin exploring the Python calendar module, open up your terminal program and enter the following command:

Shell $ python -m calendar 2024 January February March Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 1 2 3 8 9 10 11 12 13 14 5 6 7 8 9 10 11 4 5 6 7 8 9 10 15 16 17 18 19 20 21 12 13 14 15 16 17 18 11 12 13 14 15 16 17 22 23 24 25 26 27 28 19 20 21 22 23 24 25 18 19 20 21 22 23 24 29 30 31 26 27 28 29 25 26 27 28 29 30 31 April May June Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 5 1 2 8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 22 23 24 25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 29 30 27 28 29 30 31 24 25 26 27 28 29 30 July August September Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 1 2 3 4 1 8 9 10 11 12 13 14 5 6 7 8 9 10 11 2 3 4 5 6 7 8 15 16 17 18 19 20 21 12 13 14 15 16 17 18 9 10 11 12 13 14 15 22 23 24 25 26 27 28 19 20 21 22 23 24 25 16 17 18 19 20 21 22 29 30 31 26 27 28 29 30 31 23 24 25 26 27 28 29 30 October November December Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 1 2 3 1 7 8 9 10 11 12 13 4 5 6 7 8 9 10 2 3 4 5 6 7 8 14 15 16 17 18 19 20 11 12 13 14 15 16 17 9 10 11 12 13 14 15 21 22 23 24 25 26 27 18 19 20 21 22 23 24 16 17 18 19 20 21 22 28 29 30 31 25 26 27 28 29 30 23 24 25 26 27 28 29 30 31 Copied!

Running python -m calendar with no arguments outputs a full year’s calendar for the current year. To display the full calendar for a different year, pass in the integer representation of a year as the first argument of the calendar command:

Shell $ python -m calendar 1989 Copied!

To view a single month, pass in both a year and a month as the second parameter:

Shell $ python -m calendar 2054 07 July 2054 Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Copied!

As you can see in these examples, the calendar module can display calendars for both past and future dates. According to the official documentation, the calendar module uses the current Gregorian calendar, extended indefinitely in both directions. It also uses the ISO 8601 standard, which is an international standard for exchanging and communicating date and time-related data.

Now that you know how to display calendars in your terminal with Python, you can move on and explore other approaches to creating calendars as plain text or HTML markup representations.

Creating Text-Based Calendars

To generate plain text calendars, the calendar module provides calendar.TextCalendar with methods to format and print monthly and yearly calendars.

TextCalendar.formatyear() accepts a single parameter for the year, like the calendar command-line script. Try it out in your Python REPL by executing the following code:

Python >>> import calendar >>> text_calendar = calendar.TextCalendar() >>> text_calendar.formatyear(2024) ' 2024\n\n January (...)' Copied! Read the full article at https://realpython.com/python-calendar-module/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Pages