Planet Python

Planet Python - http://planetpython.org/

Updated: 2 hours 36 min ago

Real Python: Python Sequences: A Comprehensive Guide

Wed, 2024-05-01 10:00

A phrase you’ll often hear is that everything in Python is an object, and every object has a type. This points to the importance of data types in Python. However, often what an object can do is more important than what it is. So, it’s useful to discuss categories of data types and one of the main categories is Python’s sequence.

In this tutorial, you’ll learn about:

Basic characteristics of a sequence
Operations that are common to most sequences
Special methods associated with sequences
Abstract base classes Sequence and MutableSequence
User-defined mutable and immutable sequences and how to create them

This tutorial assumes that you’re familiar with Python’s built-in data types and with the basics of object-oriented programming.

Get Your Code: Click here to download the free sample code that you’ll use to learn about Python sequences in this comprehensive guide.

Take the Quiz: Test your knowledge with our interactive “Python Sequences: A Comprehensive Guide” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Sequences: A Comprehensive Guide

In this quiz, you'll test your understanding of sequences in Python. You'll revisit the basic characteristics of a sequence, operations common to most sequences, special methods associated with sequences, and how to create user-defined mutable and immutable sequences.

Building Blocks of Python Sequences

It’s likely you used a Python sequence the last time you wrote Python code, even if you don’t know it. The term sequence doesn’t refer to a specific data type but to a category of data types that share common characteristics.

Characteristics of Python Sequences

A sequence is a data structure that contains items arranged in order, and you can access each item using an integer index that represents its position in the sequence. You can always find the length of a sequence. Here are some examples of sequences from Python’s basic built-in data types:

Python >>> # List >>> countries = ["USA", "Canada", "UK", "Norway", "Malta", "India"] >>> for country in countries: ... print(country) ... USA Canada UK Norway Malta India >>> len(countries) 6 >>> countries[0] 'USA' >>> # Tuple >>> countries = "USA", "Canada", "UK", "Norway", "Malta", "India" >>> for country in countries: ... print(country) ... USA Canada UK Norway Malta India >>> len(countries) 6 >>> countries[0] 'USA' >>> # Strings >>> country = "India" >>> for letter in country: ... print(letter) ... I n d i a >>> len(country) 5 >>> country[0] 'I' Copied!

Lists, tuples, and strings are among Python’s most basic data types. Even though they’re different types with distinct characteristics, they have some common traits. You can summarize the characteristics that define a Python sequence as follows:

A sequence is an iterable, which means you can iterate through it.
A sequence has a length, which means you can pass it to len() to get its number of elements.
An element of a sequence can be accessed based on its position in the sequence using an integer index. You can use the square bracket notation to index a sequence.

There are other built-in data types in Python that also have all of these characteristics. One of these is the range object:

Python >>> numbers = range(5, 11) >>> type(numbers) <class 'range'> >>> len(numbers) 6 >>> numbers[0] 5 >>> numbers[-1] 10 >>> for number in numbers: ... print(number) ... 5 6 7 8 9 10 Copied!

You can iterate through a range object, which makes it iterable. You can also find its length using len() and fetch items through indexing. Therefore, a range object is also a sequence.

You can also verify that bytes and bytearray objects, two of Python’s built-in data structures, are also sequences. Both are sequences of integers. A bytes sequence is immutable, while a bytearray is mutable.

Special Methods Associated With Python Sequences

In Python, the key characteristics of a data type are determined using special methods, which are defined in the class definitions. The special methods associated with the properties of sequences are the following:

.__iter__(): This special method makes an object iterable using Python’s preferred iteration protocol. However, it’s possible for a class without an .__iter__() special method to create iterable objects if the class has a .__getitem__() special method that supports iteration. Most sequences have an .__iter__() special method, but it’s possible to have a sequence without this method.
.__len__(): This special method defines the length of an object, which is normally the number of elements contained within it. The len() built-in function calls an object’s .__len__() special method. Every sequence has this special method.
.__getitem__(): This special method enables you to access an item from a sequence. The square brackets notation can be used to fetch an item. The expression countries[0] is equivalent to countries.__getitem__(0). For sequences, .__getitem__() should accept integer arguments starting from zero. Every sequence has this special method. This method can also ensure an object is iterable if the .__iter__() special method is missing.

Therefore, all sequences have a .__len__() and a .__getitem__() special method and most also have .__iter__().

However, it’s not sufficient for an object to have these special methods to be a sequence. For example, many mappings also have these three methods but mappings aren’t sequences.

A dictionary is an example of a mapping. You can find the length of a dictionary and iterate through its keys using a for loop or other iteration techniques. You can also fetch an item from a dictionary using the square brackets notation.

This characteristic is defined by .__getitem__(). However, .__getitem__() needs arguments that are dictionary keys and returns their matching values. You can’t index a dictionary using integers that refer to an item’s position in the dictionary. Therefore, dictionaries are not sequences.

Slicing in Python Sequences Read the full article at https://realpython.com/python-sequences/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #627 (April 30, 2024)

Tue, 2024-04-30 15:30

#627 – APRIL 30, 2024
View in Browser »

PEP 686: Make UTF-8 Mode Default

This Python Enhancement Proposal outlines making UTF-8 the default throughout Python. This takes the addition of Unicode introduced in Python 3 to its full extent, applying it to file encoding, pipes, and more. Mechanisms for other encoding are still supported. This PEP is targeted for Python 3.15.
PEPS

What’s Lazy Evaluation in Python?

This tutorial explores lazy evaluation in Python and looks at the advantages and disadvantages of using lazy and eager evaluation methods. By the end of this tutorial, you’ll clearly understand which approach is best for you, depending on your needs.
REAL PYTHON

Build Your Own AI CLI Agent with Open Source by Pieces (OSP)

Unlock the power of Pieces, right in your terminal! Our open-source CLI plugin helps you manage code snippets, chat with your on-device AI copilot, and even auto-generate commit messages. Join our community to refine your Python skills and influence a product used by 1000s of devs. Contribute today →
PIECES sponsor

Serverless Python in 2024

Talk Python interviews Tony Sherman and they discuss the current state of serverless computing in the Python world, including some of the newer tools and best practices.
KENNEDY & SHERMAN podcast

Django Developers Survey 2023 Results

JETBRAINS

Djangonauts Space Session 2 Applications Open!

DJANGONAUTS

PyPy v7.3.16 Release

PYPY

PEP 745: Python 3.14 Release Schedule

PEPS

Quiz: Writing Unit Tests for Your Code With unittest

REAL PYTHON

Discussions High Quality Python Scripts or Small Libraries to Learn From?

HACKER NEWS

Articles & Tutorials Filter Sensitive Contents From Django’s Error Reports

Django has the ability to automatically email admins when a 500 error occurs. These kinds of errors can potentially contain sensitive information though, so there are decorators to hide these values. This post covers those as well as how to filter data when using Sentry.
GONÇALO VALÉRIO

Asyncio Coroutine Object Methods in Python

The async and await keywords that form Python’s coroutine mechanism can be used for class methods as well as the more common case of functions. This article shows you how you can use asyncio with your objects.
JASON BROWNLEE

How to Prevent Data Leakage in pandas & scikit-learn

How you impute missing values in machine learning data sets can effect the quality of your training. This article teaches you what data leakage is and what steps you should take to avoid it.
DATASCHOOL

An Open Letter Regarding the DjangoCon Europe CfP

Putting on a conference is a complex matter and an attempt to clarify how future DjangoCons in Europe would be structured has resulted in push-back. This open letter is by a Django board member explaining the situation and a hope of how to move forward.
DJANGO SOFTWARE FOUNDATION

Python Basics: Lists and Tuples

In this video course, you’ll learn about Python lists and tuples, including how to define and manipulate them in your code. By the end of the course, you’ll be ready to effectively use lists and tuples in your programming projects.
REAL PYTHON course

Don’t Lie in Interviews

This strongly worded opinion piece by Nat is in reaction to common advice given on Reddit and similar boards. Nat counters it all with “don’t lie in interviews”. Strong language warning.
NAT BENNETT

Fake Job Interviews Target Devs With New Python Backdoor

“A new campaign tracked as “Dev Popper” is targeting software developers with fake job interviews in an attempt to trick them into installing a Python remote access trojan (RAT).”
BILL TOULAS

Why You Need a “WTF Notebook”

There’s a very specific reputation Nat wants to have on a team: “Nat helps me solve my problems. Nat get things I care about done.” Keeping a WTF notebook helps him do just that.
NAT BENNETT

Write Unit Tests for Your Python Code With ChatGPT

In this tutorial, you’ll learn how to use ChatGPT to generate tests for your Python code. You’ll use the chat to create doctest, unittest, and pytest tests for your code.
REAL PYTHON

Leibniz Formula for Π in Python, JavaScript, and Ruby

This is a bare-bones, side-by-side comparison of the Leibniz formula for calculating pi in Python, JavaScript, and Ruby, along with performance measurements.
PETER BENGTSSON

Better Test Parametrisation in pytest

This “Things I’ve Learned” post discusses how to take advantage of test parameterisation in pytest.
RODRIGO GIRÃO SERRÃO

Projects & Code All Python 2023 Conference Talks Google Sheet

HH91

zpy: ZSH Helpers for Python Venvs, With Uv or Pip-Tools

GITHUB.COM/ANDYDECLEYRE

django-typescript-routes: Typescript Routes From a URL Conf

GITHUB.COM/BUTTONDOWN

pipxu: Install in Isolated Environments Using UV

GITHUB.COM/BULLETMARK

PyOptInterface: Interface for Mathematical Optimization

GITHUB.COM/METAB0T

Events Weekly Real Python Office Hours Q&A (Virtual)

May 1, 2024
REALPYTHON.COM

Canberra Python Meetup

May 2, 2024
MEETUP.COM

Sydney Python User Group (SyPy)

May 2, 2024
SYPY.ORG

TOUFU

May 4 to May 5, 2024
OHTOUFU.COM

PyDelhi User Group Meetup

May 4, 2024
MEETUP.COM

Melbourne Python Users Group, Australia

May 6, 2024
J.MP

Happy Pythoning!
This was PyCoder’s Weekly Issue #627.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

PyCharm: PyCharm 2024.1.1 Is Here! AI Assistant in Community Edition, Enhanced Endpoints Tool Window, and Navigation and Refactoring Across Notebooks and Scripts

Tue, 2024-04-30 14:28

Enhancements in the Endpoints tool window, extended GitHub gists support for notebooks, and navigation and refactoring across notebooks and scripts – these are just some of the improvements you’ll find in PyCharm 2024.1.1!

You can download the latest version from our download page or update your current version through our free Toolbox App.

Key features JetBrains AI Assistant in PyCharm Community Edition

JetBrains AI Assistant is now available in version 2024.1.1 of PyCharm Community Edition! With features ranging from smart suggestions to code generation, now Community Edition users can also enhance their coding journey with AI Assistant.

Improvements to the Endpoints tool window

Search URLs faster and more efficiently with the improved Endpoints tool window in PyCharm 2024.1.1. Use the dedicated Endpoints tab in Search Anywhere to have all your endpoints grouped by application with their routes displayed.

Navigation and refactoring across notebooks and scripts

Enjoy navigation and refactoring between notebooks and Python scripts within a single project in PyCharm. Find declarations or usages easily, use the Rename refactoring, and have our full spectrum of code inspections at your disposal. Changes are synchronized across file types, so if you employ any of these features in a notebook, they will automatically be applied to the related script, and vice versa.

Learn more Create gists from Jupyter notebooks

You can share Jupyter notebooks seamlessly and quickly now that PyCharm offers full support for GitHub gists. Create a gist for a single notebook or select several files in the Project tool window and create a Git repo with all of them at once.

DataFrame statistics and distribution histograms

Review essential statistics directly within DataFrame headers in both Jupyter notebooks and Python scripts. Gain instant insights into how your data is distributed via the histograms provided in the DataFrame header.

IPython config file in the console

Save time by configuring your IPython console automatically using config files. Eliminate the need to import dependencies manually every time you use the console.

Download PyCharm 2024.1.1

And that’s not all! Please visit our What’s New page to discover other improvements in PyCharm 2024.1.1. You can also check out our full release notes for all the details to ensure you don’t miss out on trying any of the enhancements.

Thank you for your continued support as we strive to improve your PyCharm experience. Please report any bugs through our issue tracker so we can take care of them as soon as possible. Connect with us on X (formerly Twitter) to share your valuable feedback on PyCharm 2024.1.1!

Categories: FLOSS Project Planets

Mike Driscoll: How to Watermark a Graph with Matplotlib

Tue, 2024-04-30 11:01

Matplotlib is one of the most popular data visualization packages for the Python programming language. It allows you to create many different charts and graphs. This tutorial focuses on adding a “watermark” to your graph. If you need to learn the basics, you might want to check out Matplotlib—An Intro to Creating Graphs with Python.

Let’s get started!

Installing Matplotlib

If you don’t have Matplotlib on your computer, you must install it. Fortunately, you can use pip, the Python package manager utility that comes with Python.

Open up your terminal or command prompt and run the following command:

python -m pip install matplotlib

Pip will now install Matplotlib and any dependencies that Matplotlib needs to work properly. Assuming that Matplotlib installs successfully, you are good to go!

Watermarking Your Graph

Adding a watermark to a graph is a fun way to learn how to use Matplotlib. For this example, you will create a simple bar chart and then add some text. The text will be added at an angle across the graph as a watermark.

Open up your favorite Python IDE or text editor and create a new Python file. Then add the following code:

import matplotlib.pyplot as plt def bar_chart(numbers, labels, pos): fig = plt.figure(figsize=(5, 8)) plt.bar(pos, numbers, color="red") # add a watermark fig.text(1, 0.15, "Mouse vs Python", fontsize=45, color="blue", ha="right", va="bottom", alpha=0.4, rotation=25) plt.xticks(ticks=pos, labels=labels) plt.show() if __name__ == "__main__": numbers = [2, 1, 4, 6] labels = ["Electric", "Solar", "Diesel", "Unleaded"] pos = list(range(4)) bar_chart(numbers, labels, pos)

Your bar_chart() function takes in some numbers, labels and a list of positions for where the bars should be placed. You then create a figure to put your plot into. Then you create the bar chart using the list of bar positions and the numbers. You also tell the chart that you want the bars to be colored “red”.

The next step is to add a watermark. To do that, you call fig.text() which lets you add text on top of your plot. Here is a quick listing of the arguments that you need to pass in:

x, y (the first two arguments are the x/y coordinates for the text)
fontsize – The size of the font
color – The color of the text
ha – Horizontal alignment
va – Vertical alignment
alpha – How transparent the text should be
rotation – How many degrees to rotate the text

The last bit of code in bar_chart() adds the ticks and labels to the bottom of the plot.

When you run this code, you will see something like this:

Isn’t that neat? You now have a simple plot, and you know how to add semi-transparent text to it, too!

Wrapping Up

Proper attribution is important in academics and business. Knowing how to add a watermark to your data visualization can help you do that. You now have that knowledge when using Matplotlib.

The Matplotlib package can do many other types of plots and provides much more customization than what it covered here. Check out its documentation to learn more!

The post How to Watermark a Graph with Matplotlib appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Real Python: Working With Global Variables in Python Functions

Tue, 2024-04-30 10:00

A global variable is a variable that you can use from any part of a program, including within functions. Using global variables inside your Python functions can be tricky. You’ll need to differentiate between accessing and changing the values of the target global variable if you want your code to work correctly.

Global variables can play a fundamental role in many software projects because they enable data sharing across an entire program. However, you should use them judiciously to avoid issues.

In this video course, you’ll:

Understand global variables and how they work in Python
Access global variables within your Python functions directly
Modify and create global variables within functions using the global keyword
Access, create, and modify global variables within your functions with the globals() function
Explore strategies to avoid using global variables in Python code

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Robin Wilson: What’s the largest building in Southampton? Find out with 5 lines of code

Tue, 2024-04-30 07:30

Recently I became suddenly curious about the sizes of buildings in Southampton, UK, where I live. I don’t know what triggered this sudden curiosity, but I wanted to know what the largest buildings in Southampton are. In this context, I’m using “largest” to mean largest in terms of land area covered – ie. the area of the outline when viewed in plan view. Luckily I know about useful sources of geographic data, and also know how to use GeoPandas, so I could answer this question pretty quickly – in fact, in only five lines of code. I’ll take you through this code below, as an example of a very simple GIS analysis.

First I needed to get hold of the data. I know Ordnance Survey release data on buildings in Great Britain, but to make this even easier we can go to Alastair Rae’s website where he has split the OS data up into Local Authority areas. We need to download the buildings data for Southampton, so we go here and download a GeoPackage file.

Then we need to create a Python environment to do the analysis in. You can do this in various ways – with virtualenvs, conda environments or whatever – but you just need to ensure that Jupyter and GeoPandas are installed. Then create a new notebook and you’re ready to start coding.

First, we import geopandas:

import geopandas as gpd

and then load the buildings data:

buildings = gpd.read_file("Southampton_buildings.gpkg")

The buildings GeoDataFrame has a load of polygon geometries, one for each building. We can calculate the area of a polygon with the .area property – so to create a new ‘area’ column in the GeoDataFrame we can run:

buildings[’area’] = buildings.geometry.area

I’m only interested in the largest buildings, so we can now sort by this new area column, and take the first twenty entries:

top20 = buildings.sort_values(’area’, ascending=False).head(20)

We can then use the lovely explore function to show these buildings on a map. This will load an interactive map in the Jupyter notebook:

top20.explore()

If you’d like to save the interactive map to a standalone HTML file then you can do this instead:

top20.explore().save(“map.html”)

I’ve done that, and uploaded that HTML file to my website – and you can view it here.

So, putting all the code together, we have:

import geopandas as gpd buildings = gpd.read_file("Southampton_buildings.gpkg") buildings[’area’] = buildings.geometry.area top20 = buildings.sort_values(’area’, ascending=False).head(20) top20.explore()

Five lines of code, with a simple analysis, resulting in an interactive map, and all with the power of GeoPandas.

Hopefully in a future post I’ll do a bit more work on this data – I’d like to make a prettier map, and I’d like to try and find some way to test my friends and see if they can work out what buildings they are.

Categories: FLOSS Project Planets

Python Bytes: #381 Python Packages in the Oven

Tue, 2024-04-30 04:00

Topics covered in this episode: <ul> <li><a href="https://wasmer.io/posts/py2wasm-a-python-to-wasm-compiler">Announcing py2wasm: A Python to Wasm compiler</a></li> <li>Exploring Python packages with <a href="https://oven.fming.dev">Oven</a> and <a href="https://pypi-browser.org">PyPI Browser</a></li> <li><a href="https://www.youtube.com/watch?v=DLBiJ5kYUFg">PyCharm Local LLM</a></li> <li>Google shedding Python devs (at least in the US).</li> <li>Extras</li> <li>Joke</li> </ul><a href='https://www.youtube.com/watch?v=KlLuQ7UT4t8' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="381">Watch on YouTube</a> About the show Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout">pythonbytes.fm/scout</a> Connect with the hosts <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy">@mkennedy@fosstodon.org</a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken">@brianokken@fosstodon.org</a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes">@pythonbytes@fosstodon.org</a></li> </ul> Join us on YouTube at <a href="https://pythonbytes.fm/stream/live">pythonbytes.fm/live</a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it. Michael #1: <a href="https://wasmer.io/posts/py2wasm-a-python-to-wasm-compiler">Announcing py2wasm: A Python to Wasm compiler</a> <ul> <li>py2wasm converts your Python programs to WebAssembly, running them at 3x faster speeds</li> <li>thanks to <a href="https://nuitka.net/">Nuitka</a></li> </ul> Brian #2: Exploring Python packages with <a href="https://oven.fming.dev">Oven</a> and <a href="https://pypi-browser.org">PyPI Browser</a> <ul> <li><a href="https://pypi.org">pypi.org</a> is great, but there are some handy alternatives</li> <li><a href="https://oven.fming.dev">Oven</a> <ul> <li>Shows how to install stuff with pip, pdm, rye, and poetry</li> <li>Similar meta and description as PyPI</li> <li>Includes README.md view (no tables yet, though)</li> <li>Nice listing of versions</li> <li>Ability to look at what files are in wheels and tarballs (very cool) </li> <li>Can deploy yourself. Node/Remix app.</li> <li>Really slick.</li> </ul></li> <li><a href="https://pypi-browser.org">PyPI Browser</a> <ul> <li>View versions</li> <li>View wheel and tarball contents.</li> <li>Metadata and contents.</li> <li>No README view</li> <li>Is a Starlette app that you can deploy on your on with a private registry. So that’s cool.</li> </ul></li> </ul> Michael #3: <a href="https://www.youtube.com/watch?v=DLBiJ5kYUFg">PyCharm Local LLM</a> <ul> <li>Pretty awesome full line completer based on a local LLM for PyCharm</li> <li>Requires PyCharm Professional</li> <li>An example, given this partial function in Flask: <pre><code>@blueprint.get('/listing') def listing(): videos = video_service.all_videos() </code></pre></li> </ul> Typing ret → <img src="https://python-bytes-static.nyc3.digitaloceanspaces.com/llm-complete.png" alt="img" /> That is, typing ret autocompletes to: <pre><code>return flask.render_template('home/listing.html', videos=videos) </code></pre> Which is pretty miraculous, and correct. Brian #4: Google shedding Python devs (at least in the US). <ul> <li><a href="https://techcrunch.com/2024/04/29/google-lays-off-staff-from-flutter-dart-python-weeks-before-its-developer-conference/">Google lays off staff from Flutter, Dart and Python teams weeks before its developer conference</a> - techcrunch</li> <li><a href="https://www.theregister.com/2024/04/29/google_python_flutter_layoffs/">Python, Flutter teams latest on the Google chopping block</a> - The Register <ul> <li>“Despite Alphabet last week <a href="https://www.theregister.com/2024/04/26/register_kettle_ai/">reporting</a> a 57 percent year-on-year jump in net profit to $23.66 billion for calendar Q1, more roles are being expunged as the mega-corp cracks down on costs.”</li> <li>“As for the Python team, the current positions have <a href="https://social.coop/@Yhg1s/112332127058328855">reportedly</a> been "reduced" in favor of a new team based in Munich.”</li> </ul></li> <li>MK: Related and timely: <a href="https://www.wheresyoured.at/the-men-who-killed-google/">How one power-hungry leader destroyed Google search</a></li> </ul> Extras Brian: <ul> <li><a href="https://andrewwegner.com/python-gotcha-strip-functions-unexpected-behavior.html">Python Gotcha: strip, lstrip, rstrip can remove more than expected</a> <ul> <li>Reminder: You probably want .removesuffix() and .removeprefix() </li> </ul></li> </ul> Michael: <ul> <li><a href="https://lmstudio.ai/blog/llama-3">Using Llama3</a> in <a href="https://lmstudio.ai">LMStudio</a></li> </ul> Joke: <a href="https://devhumor.com/media/broken-system">Broken System</a>

Categories: FLOSS Project Planets

Zero to Mastery: Python Monthly Newsletter 💻🐍

Tue, 2024-04-30 03:42

53rd issue of Andrei Neagoie's must-read monthly Python Newsletter: Whitehouse Recommends Python, Memory Footprint, Let's Talk About Devin, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.

Categories: FLOSS Project Planets

PyCon: Meet PyCon US Keynote Speakers

Mon, 2024-04-29 10:45

We can’t wait to welcome Jay Miller, Kate Chapman, Simon Willison, and Sumana Harihareswara to our stage as PyCon US keynote speakers this year.

We asked each of our keynote speakers:

What excites them about the Python community?
What they’re looking forward to doing at PyCon US?
What can we expect from their keynote speech?
And any advice they’d like to share with the Python community.

Check out our interviews with each of our keynote speakers below!

Jay Miller

"I fully believe I owe my entire tech career to the Python community. I met so many amazing people that I would become long lasting friends with."

Kate Chapman

"I'm excited to connect with people who are passionate about free and open source software and to learn about technologies that I haven't spent much time with."

Simon Willison"Take advantage of the fact that so many people from the worldwide Python community are in the same place at the same time for just a couple of days. Everybody here wants to talk to you. And you should assume that anyone who you think is interesting will find you interesting as well and will want to hear from you."

Sumana Harihareswara

"I do stand-up comedy and theater. I take it seriously, my responsibility to educate and entertain if you're sitting in front of me for 40 minutes."

Don’t miss out on meeting our keynote speakers in person! Register now for PyCon US before we sell out. As a note, some of our tutorials are already sold out, as well as our hotel room blocks. There are only a few short weeks left before the conference, so don’t wait, register today!

Stay in the Loop

The PyCon US website has all the information you need to know about attending our conference. In order to catch all the latest news, be sure to:

Subscribe to the PyCon US Blog
Follow @PyCon and @thePSF on Twitter/X
Follow @pycon@fosstodon.org and @ThePSF@fosstodon.org on Mastodon,
The PSF on LinkedIn
And subscribe to PyCon US 2024 News!

Engage with our community on social media by using our official hashtag: #PyConUS.

Thank you for supporting the Python community. We can’t wait to meet you all in Pittsburgh in a few short weeks!

Categories: FLOSS Project Planets

Real Python: Python's unittest: Writing Unit Tests for Your Code

Mon, 2024-04-29 10:00

The Python standard library ships with a testing framework named unittest, which you can use to write automated tests for your code. The unittest package has an object-oriented approach where test cases derive from a base class, which has several useful methods.

The framework supports many features that will help you write consistent unit tests for your code. These features include test cases, fixtures, test suites, and test discovery capabilities.

In this tutorial, you’ll learn how to:

Write unittest tests with the TestCase class
Explore the assert methods that TestCase provides
Use unittest from the command line
Group test cases using the TestSuite class
Create fixtures to handle setup and teardown logic

To get the most out of this tutorial, you should be familiar with some important Python concepts, such as object-oriented programming, inheritance, and assertions. Having a good understanding of code testing is a plus.

Free Bonus: Click here to download the free sample code that shows you how to use Python’s unittest to write tests for your code.

Take the Quiz: Test your knowledge with our interactive “Python's unittest: Writing Unit Tests for Your Code” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python's unittest: Writing Unit Tests for Your Code

In this quiz, you'll test your understanding of Python testing with the unittest framework from the standard library. With this knowledge, you'll be able to create basic tests, execute them, and find bugs before your users do.

Testing Your Python Code

Code testing or software testing is a fundamental part of a modern software development cycle. Through code testing, you can verify that a given software project works as expected and fulfills its requirements. Testing enforces code quality and robustness.

You’ll do code testing during the development stage of an application or project. You’ll write tests that isolate sections of your code and verify its correctness. A well-written battery or suite of tests can also serve as documentation for the project at hand.

You’ll find several different concepts and techniques around testing. Most of them surpass the scope of this tutorial. However, unit test is an important and relevant concept. A unit test is a test that operates on an individual unit of software. A unit test aims to validate that the tested unit works as designed.

A unit is often a small part of a program that takes a few inputs and produces an output. Functions, methods, and other callables are good examples of units that you’d need to test.

In Python, there are several tools to help you write, organize, run, and automate your unit test. In the Python standard library, you’ll find two of these tools:

doctest
unittest

Python’s doctest module is a lightweight testing framework that provides quick and straightforward test automation. It can read the test cases from your project’s documentation and your code’s docstrings. This framework is shipped with the Python interpreter as part of the batteries-included philosophy.

Note: To dive deeper into doctest, check out the Python’s doctest: Document and Test Your Code at Once tutorial.

The unittest package is also a testing framework. However, it provides a more complete solution than doctest. In the following sections, you’ll learn and work with unittest to create suitable unit tests for your Python code.

Getting to Know Python’s unittest

The unittest package provides a unit test framework inspired by JUnit, which is a unit test framework for the Java language. The unittest framework is directly available in the standard library, so you don’t have to install anything to use this tool.

The framework uses an object-oriented approach and supports some essential concepts that facilitate test creation, organization, preparation, and automation:

Test case: An individual unit of testing. It examines the output for a given input set.
Test suite: A collection of test cases, test suites, or both. They’re grouped and executed as a whole.
Test fixture: A group of actions required to set up an environment for testing. It also includes the teardown processes after the tests run.
Test runner: A component that handles the execution of tests and communicates the results to the user.

In the following sections, you’ll dive into using the unittest package to create test cases, suites of tests, fixtures, and, of course, run your tests.

Organizing Your Tests With the TestCase Class

The unittest package defines the TestCase class, which is primarily designed for writing unit tests. To start writing your test cases, you just need to import the class and subclass it. Then, you’ll add methods whose names should begin with test. These methods will test a given unit of code using different inputs and check for the expected results.

Here’s a quick test case that tests the built-in abs() function:

Read the full article at https://realpython.com/python-unittest/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Nikola: Nikola v8.3.1 is out!

Mon, 2024-04-29 08:11

On behalf of the Nikola team, I am pleased to announce the immediate availability of Nikola v8.3.1. This release fixes some small bugs, including some introduced by the new Nikola Plugin Manager.

The minimum Python version supported is now 3.8, and we have adopted a formal policy to define the Python versions supported by Nikola.

What is Nikola?

Nikola is a static site and blog generator, written in Python. It can use Mako and Jinja2 templates, and input in many popular markup formats, such as reStructuredText and Markdown — and can even turn Jupyter Notebooks into blog posts! It also supports image galleries, and is multilingual. Nikola is flexible, and page builds are extremely fast, courtesy of doit (which is rebuilding only what has been changed).

Find out more at the website: https://getnikola.com/

Downloads

Install using pip install Nikola.

Changes Features

Support passing --poll to nikola auto to better deal with symlink farms.

Bugfixes

Remove insecure HTTP fallback from nikola plugin
Fix the nikola plugin command not working (Issue #3736, #3737)
Fix nikola new_post --available-formats crashing with TypeError (Issue #3750)
Fix the new plugin manager not loading plugins if the plugin folder is a symlink (Issue #3741)
Fix the nikola plugin command not working (Issue #3736)
Remove no longer used leftovers of annotations support (Issue #3764)

Other

Nikola now requires Python 3.8 or newer.
Nikola has adopted a policy for Python version support, promising support for versions supported by the Python core team, Ubuntu LTS, or Debian stable, and taking into consideration Debian oldstable and PyPy.
Remove polyfill from polyfill.io.

Categories: FLOSS Project Planets

Zato Blog: API Testing in Pure English

Mon, 2024-04-29 03:43

API Testing in Pure English 2024-04-29, by Dariusz Suchojad How to test APIs in pure English

Do you have 20 minutes to learn how to test APIs in pure English, without any programming needed?

Great, the API testing tutorial is here.

Right after you complete it, you'll be able to write API tests as the one below.

Next steps:

Read about how to build APIs that your tests will cover

Django Weblog: Welcome our new OPS member - Baptiste Mispelon

Sun, 2024-04-28 08:45

The DSF Board are pleased to introduce Baptiste Mispelon as a new member of the ops team. Baptiste will join the team who maintains Django’s infrastructure.

Baptiste speaking at Django Under the Hood 2015

Baptiste (IPA pronunciation /ba.tist/) is a long-time Django contributor, having been a member of the community for over a decade now.

He was an initial board member of the Django Girls Foundation, co-created the Django Under the Hood, series of conferences, and was chair of the Djangocon Europe 2016 edition. More recently, he's taken up the maintenance of Django's venerable ticket tracker: code.djangoproject.com.

He currently lives in the Norwegian countryside where he works part time as a Django developer while studying for a degree in linguistics at the local university.

You can learn more about Baptiste on his website.

I’m also taking this time to thanks the OPS team on the behalf of the DSF board for their efforts on the maintenance during all the time of their service.

Please join me in welcoming Baptiste in the OPS team!

Categories: FLOSS Project Planets

ListenData: Run SAS in Python without Installation

Sun, 2024-04-28 05:50

Introduction

In the past few years python has gained a huge popularity as a programming language in data science world. Many banks and pharma organisations have started using Python and some of them are in transition stage, migrating SAS syntax library to Python.

Many big organisations have been using SAS since early 2000 and they developed a hundreds of SAS codes for various tasks ranging from data extraction to model building and validation. Hence it's a marathon task to migrate SAS code to any other programming language. Migration can only be done in phases so day to day tasks would not be hit by development and testing of python code. Since Python is open source it becomes difficult sometimes in terms of maintaining the existing code. Some SAS procedures are very robust and powerful in nature its alternative in Python is still not implemented, might be doable but not a straightforward way for average developer or analyst.

Do you wish to run both SAS and Python programs in the same environment (IDE)? If yes, you are not the only one. Many analysts have been desiring the same. It is possible now via python package called saspy developed by SAS. It allows flexibility to transfer data between Pandas Dataframe and SAS Dataset. Imagine a situation when you have data in pandas dataframe and you wish to run SAS statistical procedure on the same without switching between SAS and Python environment.

Table of Contents Access to SAS Software for free

First and Foremost is to have access to SAS either via cloud or server/desktop version of software.

If you don't have SAS software, you don't need to worry. You can get it for free without installation via SAS OnDemand for Academics It is available for free for everyone (not restricted to students or academicians). It includes access to all the commonly used SAS modules like SAS STAT, SAS ETS, SAS SQL etc. You just need to do registration once and it does not take more than 5 minutes.

saspy python package has the following dependencies :

Python 3.4 or higher
SAS 9.4 or higher

Steps to access SAS in Python (Jupyter)

Please follow the steps below to make SAS run in Jupyter Notebook.

Step 1 : Install Package

To install saspy package you can run the following command in Python.

!pip install saspy To read this article in full, please click hereThis post appeared first on ListenData

Categories: FLOSS Project Planets

Jeremy Epstein: On FastAPI

Sat, 2024-04-27 20:00

Over the past year or two, I've been heavily using FastAPI in my day job. I've been around the Python web framework block, and I gotta say, FastAPI really succeeds in its mission of building on the strengths of its predecessors (particularly Django and Flask), while feeling more modern and adhering to certain opinionated principles. In my opinion, it's pretty much exactly what the best-in-breed of the next logical generation of web frameworks should look like.

¡Ándale, ándale, arriba!
Image source: The Guardian

Let me start by lauding FastAPI's excellent documentation. Having a track record of rock-solid documentation, was (and still is!) – in my opinion – Django's most impressive achievement, and I'm pleased to see that it's also becoming Django's most enduring legacy. FastAPI, like Django, includes docs changes together with code changes in a single (these days called) pull request; it clearly documents that certain features are deprecated; and its docs often go beyond what is strictly required, by including end-to-end instructions for integrating with various third-party tools and services.

FastAPI's docs raise the bar further still, with more than a dash of humour in many sections, and with a frequent sprinkling of emojis as standard fare. That latter convention I have some reservations about – call me old-fashioned, but you could say that emoji-filled docs is unprofessional and is a distraction. However, they seem to enhance rather than detract from overall quality; and, you know what, they put a non-emoji real-life smile on my face. So, they get my tick of approval.

FastAPI more-or-less sits in the Flask camp of being a "microframework", in that it doesn't include an ORM, a template engine, or various other things that Django has always advertised as being part of its "batteries included" philosophy. But, on the other hand, it's more in the Django camp of being highly opinionated, and of consciously including things with which it wants a hassle-free experience. Most notably, it includes Swagger UI and Redoc out-of-the-box. I personally had quite a painful experience generating Swagger docs in Flask, back in the day; and I've been tremendously pleased with how API doc generation Just Works™ in FastAPI.

Much like with Flask, being a microframework means that FastAPI very much stands on the shoulders of giants. Just as Flask is a thin wrapper on top of Werkzeug, with the latter providing all things WSGI; so too is FastAPI a thin wrapper on top of Starlette, with the latter providing all things ASGI. FastAPI also heavily depends on Pydantic for data schemas / validation, for strongly-typed superpowers, for settings handling, and for all things JSON. I think it's fair to say that Pydantic is FastAPI's secret sauce.

My use of FastAPI so far has been rather unusual, in that I've been building apps that primarily talk to an Oracle database (and, indeed, this is unusual for Python dev more generally). I started out by depending on the (now-deprecated) cx_Oracle library, and I've recently switched to its successor python-oracledb. I was pleased to see that the fine folks at Oracle recently released full async support for python-oracledb, which I'm now taking full advantage of in the context of FastAPI. I wrote a little library called fastapi-oracle which I'm using as a bit of glue code, and I hope it's of use to anyone else out there who needs to marry those two particular bits of tech together.

There has been a not-insignificant amount of chit-chat on the interwebz lately, voicing concern that FastAPI is a one-man show (with its BDFL @tiangolo showing no intention of that changing anytime soon), and that the FastAPI issue and pull request queues receive insufficient TLC. Based on my experience so far, I'm not too concerned about this. It is, generally speaking, not ideal if a project has a bus factor of 1, and if support requests and bug fixes are left to rot.

However, in my opinion, the code and the documentation of FastAPI are both high-quality and highly-consistent, and I appreciate that this is largely thanks to @tiangolo continuing to personally oversee every small change, and that loosening the reins would mean a high risk of that deteriorating. And, speaking of quality, I personally have yet to uncover any bugs either in FastAPI or its core dependencies (which I'm pleasantly surprised by, considering how heavily I've been using it) – it would appear that the items languishing in the queue are lower priority, and it would appear that @tiangolo is on top of critical bugs as they arise.

In summary, I'm enjoying coding with FastAPI, I feel like it's a great fit for building Python web apps in 2024, and it will continue to be my Python framework of choice for the foreseeable future.

Categories: FLOSS Project Planets

Trey Hunner: 10 years of Python conferences

Sat, 2024-04-27 14:45

10 years and 10 days ago I flew home from my very first Python conference.

I left a few days into the PyCon US 2014 sprints and I remember feeling a bit like summer camp was ending. I’d played board games, contributed to an open source project, seen tons of talks, and met a ton of people.

My first Python conference: PyCon US 2014

PyCon 2014 was the first Python conference I attended.

At the start of the conference I only knew a handful of San Diegans. I left having met many more folks. Some of the folks I met I knew from online forums, GitHub repos, or videos I met Kenneth Love, Baptiste Mispelon, Carl Meyer, Eric Holscher in-person, among many others. Most folks I met I had never encountered online, but I was glad to have met in person.

For the most part, I had no idea who anyone was, what they did with Python, or what they might be interested in talking about. I also had no idea what most of the various non-talk activities were. I found out about the Education Summit and hadn’t realized that it required pre-registration. The open spaces are one of my favorite parts of PyCon and I didn’t even they existed until PyCon 2015.

I did stay for a couple days of the sprints and I was grateful for that. Most of the memorable human connections I had were during the sprints. I helped PyVideo upgrade their code base from Python 2 to Python 3 (this was before Will and Sheila stepped down as maintainers). Will guided me through the code base and seemed grateful for the help.

I also got the idea to write front-end JavaScript tests for Django during the sprints and eventually started that process after PyCon thanks to Carl Meyer’s guidance.

Attending regional conferences and DjangoCon

In fall 2014, I attended Django BarCamp at the Eventbrite office. That was my first exposure to the idea of an “unconference”… which I kept in mind when I spotted the open spaces board at PyCon 2015.

Before coming back to Montreal for PyCon 2015, I emailed Harry Percival to ask if he could use a teaching assistant during his tutorial on writing tests. His reply was much more enthusiastic than I expected: “YES YES OH GOD YES THANK YOU THANK YOU THANK YOU TREY”. I was very honored to be able to help Harry, as my testing workflow was heavily inspired by many blog posts he’d written about testing best practices in Django.

I coached at my first Django Girls event in 2015 in Ensenada and then my second at DjangoCon 2015 in Austin. I gave my first lightning talk at DjangoCon 2015, comparing modern JavaScript to Python. It was a lightning talk I had given at the San Diego JavaScript and San Diego Python meetups.

In 2016, I attended PyTennessee in Nashville. I remember attending a dinner of of about a dozen folks who spoke at the conference. I was grateful to get to chat with so many folks whose talks I’d attended.

Presenting talks and tutorials

I presented my first conference tutorial at PyCon 2016 in Portland and my first talk at DjangoCon US 2016 in Philadelphia. I had been presenting lightning talks every few months at my local Python and JavaScript meetups for a few years by then and I had hosted free workshops at my local meetup and paid workshops for training clients.

Having presented locally helped, but presenting on a big stage is always scary.

Volunteering

I volunteered at some of my first few conferences and found that I really enjoyed it. I especially enjoyed running the registration desk, as you’re often the first helpful face that people see coming into the conference.

During PyCon 2016, 2017, and 2018, I co-chaired the open spaces thanks to Anna Ossowski inviting me to help. I had first attended open spaces during PyCon 2015 and I loved them. Talks are great, but so are discussions!

I also ran for the PSF board of directors in 2016 and ended up serving on the board for a few years before stepping down. After my board terms, I volunteered for the PSF Code of Conduct working group for about 6 years. I didn’t even know what the PSF was until PyCon 2015!

A lot of travel… maybe too much

After DjangoCon 2016, I went a bit conference-wild. I attended PyTennessee 2017, PyCaribbean 2017 in Puerto Rico, PyCon US 2017 in Portland, PyCon Australia 2017 in Melbourne, DjangoCon 2017 in Spokane, PyGotham 2017 in NYC, and North Bay Python 2017 in Petaluma.

In 2018 I sponsored PyTennessee and PyOhio and spoke at both. I passed out chocolate chip cookies at PyTennessee as a way to announce the launch of Python Morsels. I also attended PyCon 2018 in Cleveland, DjangoCon 2018 in San Diego, PyGotham 2018, and North Bay Python 2018.

I slowed down a bit in 2019, with just PyCascades (Seattle), PyCon US (Cleveland), PyCon Australia (Sydney), and DjangoCon US (San Diego, which is home for me).

Since the pandemic

Since the start of the pandemic, I’ve attended PyCon US 2022, DjangoCon 2022 in San Diego (in my city for the third time!) and PyCon US 2023. Traveling is more challenging for me than it used to be, but I hope to attend more regional conferences again soon.

Between client work, I’ve been focusing less on conferences and more on blog posts (over here), screencasts, my weekly Python tips emails, and (of course) on Python Morsels.

My journey started locally

I became part of the Python community before I knew I was part of it.

I started using Python professionally in December 2009 and I attended my first San Diego Python meetup in March 2012. I met the organizers, gave some lightning talks, attended Saturday study group sessions (thanks Carol Willing, Alain Domissy, and others for running these), and volunteered to help organize meetups, study groups, and workshops.

By 2014, I had learned from folks online and in-person and I had helped out at my local Python meetup. I had even made a few contributions to some small Django packages I relied on heavily.

I was encouraged to attend PyCon 2014 by others who were attending (thanks Carol, Micah, and Paul among others). The conference was well-worth the occasional feeling of overwhelm.

We’re all just people

The biggest thing I’ve repeatedly learned over the past decade of Python conferences is that we’re all just people.

Carol Willing keynoted PyCon US 2023. But I met Carol as a kind Python user in San Diego who started the first Python study group meetings in Pangea Bakery on Convoy Street.

Jay Miller will be keynoting PyCon US 2024. But I met Jay as an attendee of the Python study group, who was enthusiastic about both learning and teaching others.

My partner, Melanie Arbor, keynoted DjangoCon 2022 along with Jay Miller. When I met Melanie, she was new to Python and was very eager to both learn and help others.

David Lord has made a huge impact on the maintenance of Flask and other Pallets projects. I met David as a Python study group attendee who was an enthusiastic StackOverflow contributor.

I learned a ton from Brandon Rhodes, Ned Batchelder, Russell Keith-Magee, and many others from online videos, forums, and open source projects before I ever met them. But each of them are also just Python-loving people like the rest of us. Russell gives good hugs, Ned is an organizer of his local Python meetup, and Brandon wears the same brand of shoes as me.

We all have people we’ve learned from, we suffer from feelings of inadequacy, we get grumpy sometimes, and we care about the Python language and community in big and small ways.

What’s next for you?

Will you attend a local meetup? Or will you attend an online social event?

If so, consider asking the organize if you can present a 5 minute lightning talk at a future event. As I noted in a DjangoCon 2016 lightning talk, lightning talks are a great way to connect with folks.

Will you attend a Python conference one day? See having a great first PyCon when/if you do.

Remember that we’re all just people though. Some may have a bit more experience (whether at speaking, contributing to open source, or something else), but we’re just people.

Categories: FLOSS Project Planets

Talk Python to Me: #459: I Built A Python SaaS with AI

Sat, 2024-04-27 04:00

We all know that tools like ChatGPT have really empowered developers to tackle bigger problems. Are you using TailwindCSS and need a login page? Try asking Chat "What is the HTML for a login page with the login username, password, and button in its own section in the center of the page?" It will literally give you a first pass version of it. But how far can you push this? Fred Tubiermont may have taken it farther than most. He built a functioning SaaS product with paying customers by only using ChatGPT and Python. It's fascinating to hear his story. Episode sponsors <a href='https://talkpython.fm/mailtrap'>Mailtrap</a> <a href='https://talkpython.fm/training'>Talk Python Courses</a> Links from the show <div>Frederick Tubiermont: <a href="https://www.linkedin.com/in/fredericktubiermont/" target="_blank" rel="noopener">linkedin.com</a> The #1 AI Jingle Generator: <a href="https://www.aijinglemaker.com" target="_blank" rel="noopener">aijinglemaker.com</a> Fred's YouTube Channel: <a href="https://www.youtube.com/@callmefred" target="_blank" rel="noopener">youtube.com</a> AI Coding Club: <a href="https://aicodingclub.com" target="_blank" rel="noopener">aicodingclub.com</a> No Code: <a href="https://www.saashub.com/best-no-code-software" target="_blank" rel="noopener">saashub.com</a> Prompt Engineering 101 - Crash Course & Tips: <a href="https://www.youtube.com/watch?v=aOm75o2Z5-o" target="_blank" rel="noopener">youtube.com</a> gpt-engineer: <a href="https://github.com/gpt-engineer-org/gpt-engineer" target="_blank" rel="noopener">github.com</a> Instant Deployments, Effortless Scale: <a href="https://railway.app" target="_blank" rel="noopener">railway.app</a> Self-hosting with superpowers.: <a href="https://coolify.io" target="_blank" rel="noopener">coolify.io</a> The newsletter platform built for growth.: <a href="https://www.beehiiv.com" target="_blank" rel="noopener">beehiiv.com</a> Watch this episode on YouTube: <a href="https://www.youtube.com/watch?v=lbX3B04sS1s" target="_blank" rel="noopener">youtube.com</a> Episode transcripts: <a href="https://talkpython.fm/episodes/transcript/459/i-built-a-python-saas-with-ai" target="_blank" rel="noopener">talkpython.fm</a> --- Stay in touch with us --- Subscribe to us on YouTube: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a> Follow Talk Python on Mastodon: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener">talkpython</a> Follow Michael on Mastodon: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener">mkennedy</a> </div>

Categories: FLOSS Project Planets

Real Python: Quiz: What Is the pycache Folder in Python?

Fri, 2024-04-26 08:00

As your Python project grows, you typically organize your code in modules and packages for easier maintenance and reusability. When you do that, you’ll likely notice the sudden emergence of a __pycache__ folder alongside your original files, popping up in various locations unexpectedly.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #202: Pydantic Data Validation & Python Web Security Practices

Fri, 2024-04-26 08:00

How do you verify and validate the data coming into your Python web application? What tools and security best practices should you consider as a developer? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Data School: How to prevent data leakage in pandas & scikit-learn ☔

Thu, 2024-04-25 10:51

Let&aposs pretend you&aposre working on a supervised Machine Learning problem using Python&aposs scikit-learn library. Your training data is in a pandas DataFrame, and you discover missing values in a column that you were planning to use as a feature.

After considering your options, you decide to impute the missing values, which means that you&aposre going to fill in the missing values with reasonable values.

How should you perform the imputation?

Option 1 is to fill in the missing values in pandas, and then pass the transformed data to scikit-learn.
Option 2 is to pass the original data to scikit-learn, and then perform all data transformations (including missing value imputation) within scikit-learn.

Option 1 will cause data leakage, whereas option 2 will prevent data leakage.

Here are questions you might be asking:

What is data leakage?
Why is data leakage problematic?
Why would data leakage result from missing value imputation in pandas?
How can I prevent data leakage when using pandas and scikit-learn?

Answers below! 👇

What is data leakage?

Data leakage occurs when you inadvertently include knowledge from testing data when training a Machine Learning model.

Why is data leakage problematic?

Data leakage is problematic because it will cause your model evaluation scores to be less reliable. This may lead you to make bad decisions when tuning hyperparameters, and it will lead you to overestimate how well your model will perform on new data.

It&aposs hard to know whether data leakage will skew your evaluation scores by a negligible amount or a huge amount, so it&aposs best to just avoid data leakage entirely.

Why would data leakage result from missing value imputation in pandas?

Your model evaluation procedure (such as cross-validation) is supposed to simulate the future, so that you can accurately estimate right now how well your model will perform on new data.

But if you impute missing values on your whole dataset in pandas and then pass your dataset to scikit-learn, your model evaluation procedure will no longer be an accurate simulation of reality. That&aposs because the imputation values will be based on your entire dataset (meaning both the training portion and the testing portion), whereas the imputation values should just be based on the training portion.

In other words, imputation based on the entire dataset is like peeking into the future and then using what you learned from the future during model training, which is definitely not allowed.

How can we avoid this in pandas?

You might think that one way around this problem would be to split your dataset into training and testing sets and then impute missing values using pandas. (Specifically, you would need to learn the imputation value from the training set and then use it to fill in both the training and testing sets.)

That would work if you&aposre only ever planning to use train/test split for model evaluation, but it would not work if you&aposre planning to use cross-validation. That&aposs because during 5-fold cross-validation (for example), the rows contained in the training set will change 5 times, and thus it&aposs quite impractical to avoid data leakage if you use pandas for imputation while using cross-validation!

How else can data leakage arise?

So far, I&aposve only mentioned data leakage in the context of missing value imputation. But there are other transformations that if done in pandas on the full dataset will also cause data leakage.

For example, feature scaling in pandas would lead to data leakage, and even one-hot encoding (or "dummy encoding") in pandas would lead to data leakage unless there&aposs a known, fixed set of categories.

More generally, any transformation which incorporates information about other rows when transforming a row will lead to data leakage if done in pandas.

How does scikit-learn prevent data leakage?

Now that you&aposve learned how data transformations in pandas can cause data leakage, I&aposll briefly mention three ways in which scikit-learn prevents data leakage:

First, scikit-learn transformers have separate fit and transform steps, which allow you to base your data transformations on the training set only, and then apply those transformations to both the training set and the testing set.
Second, the fit and predict methods of a Pipeline encapsulate all calls to fit_transform and transform so that they&aposre called at the appropriate times.
Third, cross_val_score splits the data prior to performing data transformations, which ensures that the transformers only learn from the temporary training sets that are created during cross-validation.

Conclusion

When working on a Machine Learning problem in Python, I recommend performing all of your data transformations in scikit-learn, rather than performing some of them in pandas and then passing the transformed data to scikit-learn.

Besides helping you to prevent data leakage, this enables you to tune the transformer and model hyperparameters simultaneously, which can lead to a better performing model!

One final note...

This post is an excerpt from my upcoming video course, Master Machine Learning with scikit-learn.

Join the waitlist below to get free lessons from the course and a special launch discount 👇

Categories: FLOSS Project Planets

Search form

Tag cloud

Planet Python

Real Python: Python Sequences: A Comprehensive Guide

PyCoder’s Weekly: Issue #627 (April 30, 2024)

PyCharm: PyCharm 2024.1.1 Is Here! AI Assistant in Community Edition, Enhanced Endpoints Tool Window, and Navigation and Refactoring Across Notebooks and Scripts

Mike Driscoll: How to Watermark a Graph with Matplotlib

Real Python: Working With Global Variables in Python Functions

Robin Wilson: What’s the largest building in Southampton? Find out with 5 lines of code

Python Bytes: #381 Python Packages in the Oven

Zero to Mastery: Python Monthly Newsletter 💻🐍

PyCon: Meet PyCon US Keynote Speakers

Real Python: Python's unittest: Writing Unit Tests for Your Code

Nikola: Nikola v8.3.1 is out!

Zato Blog: API Testing in Pure English

Django Weblog: Welcome our new OPS member - Baptiste Mispelon

ListenData: Run SAS in Python without Installation

Jeremy Epstein: On FastAPI

Trey Hunner: 10 years of Python conferences

Talk Python to Me: #459: I Built A Python SaaS with AI

Real Python: Quiz: What Is the pycache Folder in Python?

Real Python: The Real Python Podcast – Episode #202: Pydantic Data Validation & Python Web Security Practices

Data School: How to prevent data leakage in pandas & scikit-learn ☔

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Planet Python

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research