Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 2 hours 53 min ago

Python Morsels: Prompting a user for input

Sat, 2024-09-21 18:38

We can prompt our users for input with Python's built-in input function.

Table of contents

  1. Prompting for user input
  2. Customizing the prompt text
  3. Prompt users with Python's built-in input function

Prompting for user input

Python has a built-in input function that we can use to prompt a user of our program to enter some text:

>>> color = input("Favorite color:") Favorite color:▯

Our Python REPL is now hanging and waiting for us (the user) to input text. Let's type purple:

>>> color = input("Favorite color:") Favorite color:purple▯

After the user has typed the text they'd like to enter, they can hit the Enter key to lock-in the value.

>>> color = input("Favorite color:") Favorite color:purple >>>

Now the color variable contains the string purple:

>>> color 'purple' Customizing the prompt text

Note that the input function …

Read the full article: https://www.pythonmorsels.com/prompting-for-input/
Categories: FLOSS Project Planets

Erik Marsja: Using Pandas to Read JSON from URL

Sat, 2024-09-21 05:01

The post Using Pandas to Read JSON from URL appeared first on Erik Marsja.

When working with data in Python, using Pandas to read JSON from URL is an excellent tool that lets you directly load JSON data from a web source into a Pandas dataframe. This tutorial will teach you the steps to accomplish this task, building upon our previous discussions on reading JSON with Python more generally.

Table of Contents How to use Pandas to Read JSON from URL

First, let us look at a simple example of using Pandas to read JSON from a URL.

import pandas as pd # URL containing JSON data url = "http://api.open-notify.org/astros.json" # Read JSON data from URL into a DataFrame df = pd.read_json(url) # Display the dataframe print(df)

In the code chunk above, we start by importing the Pandas library. The URL variable contains the web address where the JSON data is hosted. The pd.read_json(url) function is then used to read the JSON data from the URL and load it into a Pandas DataFrame, which is a two-dimensional labeled data structure with columns of potentially different types. Finally, print(df) displays the DataFrame, allowing us to see the imported data in tabular format.


Now that we have seen a basic example, let us learn more about the parameters of the pd.read_json() method to understand how we can customize the reading process.

Parameters of the read_json Method

The pd.read_json() method has several parameters that allow you to fine-tune how the JSON data is read and converted into a dataframe. Here is an overview of the most important parameters:

  • path_or_buf: The string containing the URL or the path to the JSON file. This is the source of the JSON data that will be read.
  • orient: Defines the expected JSON string format. Default is ‘columns’. This parameter specifies the orientation of the JSON data. Other options include ‘split’, ‘records’, ‘index’, and ‘values’.
  • typ: Specifies the type of object to be returned. Default is ‘frame’. This parameter can be set to ‘series’ if you want to return a Series instead of a DataFrame.
  • dtype: Determines whether to infer types of objects. Default is ‘None’. This parameter can be used to specify the data type for each column.
  • convert_axes: Whether to convert the axes to another type. Default is ‘True’. This parameter allows you to convert the axes to a specified data type.
  • convert_dates: List of columns to convert to dates. Default is ‘True’. This parameter can be used to specify which columns should be parsed as dates.
  • keep_default_dates: Whether to include default date parsers. Default is ‘True’. This parameter determines whether to use the default date parsers provided by Pandas.
  • precise_float: Whether to use a high precision floating point converter. Default is ‘False’. This parameter can be set to ‘True’ if you need high precision for float values.
  • date_unit: Unit for encoding datetime. Default is ‘None’. This parameter can be used to specify the time unit for encoding datetime objects.
  • encoding: Specifies the encoding to be used. Default is ‘utf-8’. This parameter determines the encoding for reading the JSON data.
  • lines: Whether to read the JSON file as a JSON object per line. Default is ‘False’. This parameter can be set to ‘True’ if the JSON data is in a line-delimited format.

With these parameters allows you to better control how JSON data is read and processed, enabling you to tailor the DataFrame to your needs.

Summary

To summarize, we have learned how to use Pandas to read JSON data from a URL. We explored a practical example and detailed the parameters of the pd.read_json() method, enhancing our ability to customize the data reading process. Handling nested JSON data can be more challenging, but that will be covered in a future post.
I would appreciate it if you could share this post and leave your comments below. Your feedback is invaluable!

Resources

Here are some other reading data-related tutorials:

The post Using Pandas to Read JSON from URL appeared first on Erik Marsja.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #221: Thriving as a Developer With ADHD

Fri, 2024-09-20 08:00

What are strategies for being a productive developer with ADHD? How can you help your team members with ADHD to succeed and complete projects? This week on the show, we speak with Chris Ferdinandi about his website and podcast "ADHD For the Win!"

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCharm: What’s New in PyCharm 2024.2.2!

Fri, 2024-09-20 07:33

PyCharm 2024.2.2 is here with many key updates, including Python support improvements, new Django features, and enhancements to the Data View tool window! 

Visit our What’s New page for more details on all these features and to explore many others. You can download the latest version from our download page or update your current version through our free Toolbox App

Download PyCharm 2024.2.2 PyCharm 2024.2.2 highlights Django enhancements  PRO New code completion suggestions

When working with models, PyCharm now offers field completion suggestions in a variety of cases, such as Model.save(update_fields[…]), Model.refresh_from_db(fields=[…]), Model.clean_fields(exclude=[…]), and so on.

Quick-fix to create a method for an unresolved ViewSet

If a ViewSet has an unresolved reference, PyCharm suggests a quick-fix to introduce the missing method. Use Alt + Enter to call it.

What’s new! Data View  PRO

You can now look at n-dimensional NumPy arrays in the Data View tool window. Define the array you would like to inspect, along with a specific dimension or slice, in a special field at the bottom of the tool window, and PyCharm will display a table with the results.

Python support improvements Support for default types for type parameters (PEP 696)

Improve typing with PyCharm’s support for the Python 3.13 ability to define the default types for type parameters. The IDE now incorporates default types for type parameters both for old-style and new-style generic classes, functions, and type aliases, and it takes them into account in type inference.

Pattern matching: Foldable match statements

To improve the readability of code with large pattern-matching statements, you can now use folding for entire match statements or for separate cases inside them. 

Download PyCharm 2024.2.2

Visit our What’s New page to learn about other useful features included in this release, or read the release notes for the full breakdown, including more details on the features mentioned here. 

If you encounter any problems, please report them in our issue tracker so we can address them promptly. 

Connect with us on X (formerly Twitter) to share your thoughts on PyCharm 2024.2.2! 

Categories: FLOSS Project Planets

Talk Python to Me: #477: Awesome Text Tricks with NLP and spaCy

Fri, 2024-09-20 04:00
Do you have text that you want to process automatically? Maybe you want to pull out key products or topics of conversation? Maybe you want to get the sentiment? The possibilities are many with this week's topic: NLP with spaCy and Python. Our guest, Vincent D. Warmerdam, has worked on spaCy and other tools at Explosion AI and he's here to give us his tips and tricks for working with text from Python.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/posit'>Posit</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Course: Getting Started with NLP and spaCy</b>: <a href="https://training.talkpython.fm/courses/getting-started-with-spacy" target="_blank" >talkpython.fm</a><br/> <br/> <b>Vincent on X</b>: <a href="https://x.com/fishnets88?featured_on=talkpython" target="_blank" >@fishnets88</a><br/> <b>Vincent on Mastodon</b>: <a href="https://fosstodon.org/@koaning" target="_blank" >@koaning</a><br/> <br/> <b>Programmable Keyboards on CalmCode</b>: <a href="https://www.youtube.com/playlist?list=PLGj5nRqy15j93TD0iReqfLL9lU1lZFEs6" target="_blank" >youtube.com</a><br/> <b>Sample Space Podcast</b>: <a href="https://www.youtube.com/playlist?list=PLSIzlWDI17bRULf7X_55ab7THqA9TJPxd" target="_blank" >youtube.com</a><br/> <br/> <b>spaCy</b>: <a href="https://spacy.io?featured_on=talkpython" target="_blank" >spacy.io</a><br/> <b>Course: Build An Audio AI App</b>: <a href="https://training.talkpython.fm/courses/build-an-audio-ai-app-with-python-and-assemblyai" target="_blank" >talkpython.fm</a><br/> <b>Lemma example</b>: <a href="https://github.com/talkpython/audio-ai-with-assemblyai-course/blob/d5418ec26b4e8d9e524f2dedfc5c93c6edc817b8/code/02_feature_search/src/services/search_service.py#L80" target="_blank" >github.com</a><br/> <b>Code for spaCy course</b>: <a href="https://github.com/talkpython/nlp-with-python-and-spacy-course" target="_blank" >github.com</a><br/> <br/> <b>Python Bytes transcripts</b>: <a href="https://github.com/mikeckennedy/python_bytes_show_notes?featured_on=talkpython" target="_blank" >github.com</a><br/> <b>scikit-lego</b>: <a href="https://github.com/koaning/scikit-lego?featured_on=talkpython" target="_blank" >github.com</a><br/> <b>Projects that import "this"</b>: <a href="https://calmcode.io/til/python-import-this?featured_on=talkpython" target="_blank" >calmcode.io</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=AE7FPjaVNN0" target="_blank" >youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/477/awesome-text-tricks-with-nlp-and-spacy" target="_blank" >talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Wingware: Wing Python IDE Version 10.0.6 - September 20, 2024

Thu, 2024-09-19 21:00

Wing 10.0.6 adds support for Python 3.13 and fixes some issues with AI development, code refactoring, and unit testing with pytest.

See the change log for details.

Download Wing 10 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


What's New in Wing 10

AI Assisted Development

Wing Pro 10 takes advantage of recent advances in the capabilities of generative AI to provide powerful AI assisted development, including AI code suggestion, AI driven code refactoring, description-driven development, and AI chat. You can ask Wing to use AI to (1) implement missing code at the current input position, (2) refactor, enhance, or extend existing code by describing the changes that you want to make, (3) write new code from a description of its functionality and design, or (4) chat in order to work through understanding and making changes to code.

Examples of requests you can make include:

"Add a docstring to this method" "Create unit tests for class SearchEngine" "Add a phone number field to the Person class" "Clean up this code" "Convert this into a Python generator" "Create an RPC server that exposes all the public methods in class BuildingManager" "Change this method to wait asynchronously for data and return the result with a callback" "Rewrite this threaded code to instead run asynchronously"

Yes, really!

Your role changes to one of directing an intelligent assistant capable of completing a wide range of programming tasks in relatively short periods of time. Instead of typing out code by hand every step of the way, you are essentially directing someone else to work through the details of manageable steps in the software development process.

Read More

Support for Python 3.12, 3.13, and ARM64 Linux

Wing 10 adds support for Python 3.12 and 3.13, including (1) faster debugging with PEP 669 low impact monitoring API, (2) PEP 695 parameterized classes, functions and methods, (3) PEP 695 type statements, and (4) PEP 701 style f-strings.

Wing 10 also adds support for running Wing on ARM64 Linux systems.

Poetry Package Management

Wing Pro 10 adds support for Poetry package management in the New Project dialog and the Packages tool in the Tools menu. Poetry is an easy-to-use cross-platform dependency and package manager for Python, similar to pipenv.

Ruff Code Warnings & Reformatting

Wing Pro 10 adds support for Ruff as an external code checker in the Code Warnings tool, accessed from the Tools menu. Ruff can also be used as a code reformatter in the Source > Reformatting menu group. Ruff is an incredibly fast Python code checker that can replace or supplement flake8, pylint, pep8, and mypy.


Try Wing 10 Now!

Wing 10 is a ground-breaking new release in Wingware's Python IDE product line. Find out how Wing 10 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 9 and earlier, and Migrating from Older Versions for a list of compatibility notes.

Categories: FLOSS Project Planets

Matt Layman: Docker Go, JS, Static Files - Building SaaS #203

Thu, 2024-09-19 20:00
In this episode, we continued on the cloud migration path. We need to build a Docker container with all the necessary static files. Some of these come from Go via Hugo for a content blog. Some comes from JavaScript via Tailwind for CSS. Some come from Python via Sphinx for documentation. All need to built into the same image. That’s what we covered on this stream.
Categories: FLOSS Project Planets

Programiz: Getting Started with Python

Thu, 2024-09-19 07:15
In this tutorial, you will learn to write your first Python program.
Categories: FLOSS Project Planets

PyCharm: How to Use FastAPI for Machine Learning

Thu, 2024-09-19 04:59

This is a guest post from Cheuk Ting Ho, a data scientist who contributes to multiple open-source libraries, such as pandas, Polars, and Jupyter Notebook.

FastAPI provides a quick way to build a backend service with Python. With a few decorators, you can turn your Python function into an API application.

It is widely used by many companies including Microsoft, Uber, and Netflix. According to the Python Developers Survey, FastAPI usage has grown from 21% in 2021 to 29% in 2023. For data scientists, it’s the second most popular framework, with 31% using it.

In this blog post, we will cover the basics of FastAPI for data scientists who may want to build a quick prototype for their project. 

What is FastAPI?

FastAPI is a popular web framework for building APIs with Python, based on standard Python type hints. It is intuitive and easy to use, and it can provide a production-ready application in a short period of time. It is fully compatible with OpenAPI and JSON Schema.

Why use FastAPI for machine learning?

Most teams working on machine learning projects consist of data scientists whose domains and professions lie on the statistics side of things. They may not have experience developing software or applications to ship their machine learning projects. FastAPI enables data scientists to easily create APIs for the following projects:

Deploying prediction models

The data science team may have trained a model for the prediction of the sales demand in a warehouse. To make it useful, they have to provide an API interface so other parts of the stock management system can use this new prediction functionality.

Suggestion engines

One of the very common uses of machine learning is as a system that provides suggestions based on the users’ choices. For example, if someone puts certain products in their shopping cart, more items can be suggested to that user. Such an e-commerce system requires an API call to the suggestion engine that takes input parameters.

Dynamic dashboards and reporting systems

Sometimes, reports for data science projects need to be presented as dashboards so users can inspect the results themselves. One possible approach is to have the data model provide an API. Frontend developers can use this API to create applications that allow users to interact with the data.

Advantages of using FastAPI

Compared to other Python web frameworks, FastAPI is simple yet fully functional. Mainly using decorators and type hints, it allows you to build a web application without the complexity of building a whole ORM (object-relational mapping) model and with the flexibility of using any database, including any SQL and NoSQL databases. FastAPI also provides automatic documentation generation, support for additional information and validation for query parameters, and good async support.

Fast development

Creating API calls in FastAPI is as easy as adding decorators in the Python code. Little to no backend experience is needed for anyone who wants to turn a Python function into an application that will respond to API calls.

Fast documentation

FastAPI provides automatic interactive API documentation using Swagger UI, which is an industry standard. No extra effort is required to build clear documentation with API call examples. This creates an advantage for busy data science teams who may not have the energy and expertise to write technical specifications and documentation.

Easy testing

Writing tests is one of the most important steps in software development, but it can also be one of the most tedious, especially when the time of the data science team is valuable. Testing FastAPI is made simple thanks to Starlette and HTTPX. Most of the time no monkey patching is needed and tests are easy to write and understand.

Fast deployment

FastAPI comes with a CLI tool that can bridge development and deployment smoothly. It allows you to switch between development mode and production mode easily. Once development is completed, the code can be easily deployed using a Docker container with images that have Python prebuilt.

How to use FastAPI for a machine learning project

In this example, we will turn a classification prediction model that uses the Nearest Neighbors algorithm to predict the species of various penguins based on their bill and flipper length into a backend application. We will provide an API that takes parameters from the query parameters of a URL and gives back the prediction. This shows how a prototype can be made quickly by any data scientist with no backend development experience.

We will use a simple `KNeighborsClassifier` on the penguin data set as an example. Details of how to build the model will be omitted, but feel free to check out the relevant notebook here. In the following tutorial, we will focus on the usage of FastAPI and explain some fundamental concepts. We will be building a prototype to do so. 

1. Start a FastAPI project with PyCharm

In this blog post, we will be using PyCharm Professional 2024.1. The best way to start using FastAPI is to create a FastAPI project with PyCharm. When you click New Project in PyCharm, you will be presented with a large selection of projects to choose from. Select the FastAPI tab:



From here, you can put in the name of your project and take advantage of other options such as initializing Git and the virtual environment that you want to use.

After doing so, you will see the basic structure of a FastAPI project set up for you.



There is also a `test_main.http` file set up for you to quickly test all the endpoints.


2. Set up environment dependencies

Next, set up our environment dependency with `requirements.txt` by selecting ​​Sync Python Requirements under PyCarm’s Tool menu.



Then you can select the `requirements.txt` file to be used.



You can copy and use this `requirements.txt` file. We will be using pandas and scikit-learn for the machine learning part of the project. Also, add the `penguins.csv` file to your project directory.

3. Set up your machine learning model

Arrange your machine learning code in the `main.py` file. We will start with a script that trains our model:

import pandas as pd from sklearn.model_selection import train_test_split from sklearn import preprocessing from sklearn.neighbors import KNeighborsClassifier from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler data = pd.read_csv('penguins.csv') data = data.dropna() le = preprocessing.LabelEncoder() X = data[["bill_length_mm", "flipper_length_mm"]] le.fit(data["species"]) y = le.transform(data["species"]) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0) clf = Pipeline( steps=[("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=11))] ) clf.set_params().fit(X_train, y_train)

We can place the above code after `app = FastAPI()`. All of it will be run when we start the application.



However, there is a better way to run the start-up code we used to set up our model. We will cover that in a later part of the blog post.

4. Request a response

Next we will look at how to add our model to FastAPI functionality. As a first step, we will add a response to the root of the URL and just simply return a message about our model in JSON format. Change the code in `async def root():` from “Hello world” to our message like this:

@app.get("/") async def root(): return { "Name": "Penguins Prediction", "description": "This is a penguins prediction model based on the bill length and flipper length of the bird.", }

Now, test our application. First, we will start our application, which is easy in PyCharm. Just press the arrow button () next to your project name at the top.



If you are using the default settings, your application will run on http://127.0.0.1:8000. You can double-check that by looking at the prompt from the Run window.

Once the process has started, let’s go to `test_main.http` and press the first arrow button () next to `GET`. From the HTTP Client in the Services window, you will see the response message that we put in.



The response JSON file is also saved for future inspection.

5. Request with query parameters

Next, we would like to let users make predictions by providing query parameters in the URL. Let’s add the code below after the `root` function.

@app.get("/predict/") async def predict(bill_length_mm: float = 0.0, flipper_length_mm: float = 0.0): param = { "bill_length_mm": bill_length_mm, "flipper_length_mm": flipper_length_mm } if bill_length_mm <=0.0 or flipper_length_mm <=0.0: return { "parameters": param, "error message": "Invalid input values", } else: result = clf.predict([[bill_length_mm, flipper_length_mm]]) return { "parameters": param, "result": le.inverse_transform(result)[0], }

Here we set the default value of the `bill_length_mm` and `flipper_length_mm` to be 0 if the user didn’t input a value. We also add a check to see if either of the values is 0 and return an error message instead of trying to predict which penguin the input refers to.

If the inputs are not 0, we will use the model to make a prediction and use the encoder to do an inverse transformation to get the label of the predicted target, i.e. the name of the penguin species.

This is not the only way you can verify inputs. You can also consider using Pydantic for input verification.

If you are using the same version of FastAPI as stated in `requirements.txt`, FastAPI automatically refreshes the service and applies changes on save. Now put in a new URL in `test_main.http` to test (separated from the URL before with ###):

### GET http://127.0.0.1:8000/predict/?bill_length_mm=40.3&flipper_length_mm=195 Accept: application/json

Press the arrow button () next to our new URL and see the output.



Next you can try a URL with one or both of the parameters removed to see the error message:

### GET http://127.0.0.1:8000/predict/?bill_length_mm=40.3 Accept: application/json


6. Set up a machine learning model with lifespan events

Last, let’s look at how we can set up our model with FastAPI lifespan events. The advantage of doing that is we can make sure no request will be accepted while the model is still being set up and the memory used will be cleaned up afterward. To do that, we will use an `asynccontextmanager`. Before `app = FastAPI()` we will add:

from contextlib import asynccontextmanager ml_models = {} @asynccontextmanager async def lifespan(app: FastAPI): # Set up the ML model here yield # Clean up the models and release resources ml_models.clear()

Now we will move the import of pandas and scikit-learn to be alongside the other imports. We will also move our setup code inside the `lifespan` function, setting the machine learning model and LabelEncoder inside `ml_models` like this:

from fastapi import FastAPI from contextlib import asynccontextmanager import pandas as pd from sklearn.model_selection import train_test_split from sklearn import preprocessing from sklearn.neighbors import KNeighborsClassifier from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler ml_models = {} @asynccontextmanager async def lifespan(app: FastAPI): # Set up the ML model here data = pd.read_csv('penguins.csv') data = data.dropna() le = preprocessing.LabelEncoder() X = data[["bill_length_mm", "flipper_length_mm"]] le.fit(data["species"]) y = le.transform(data["species"]) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0) clf = Pipeline( steps=[("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=11))] ) clf.set_params().fit(X_train, y_train) ml_models["clf"] = clf ml_models["le"] = le yield # Clean up the models and release resources ml_models.clear()

After that we will add the `lifespan=lifespan` parameter in `app = FastAPI()`:

app = FastAPI(lifespan=lifespan)

Now save and test again. Everything should work and we should see the same result as before.

Afterthought: When to train the model?

From our example, you may wonder when the model is trained. Since `clf` is trained at the beginning, i.e. when the service is launched, you may wonder why we do not train the model every time someone makes a prediction.

We do not want the model to be trained every time someone makes a call, because it costs way more resources to re-train everything. Additionally, it may cause race conditions since our FastAPI application is working concurrently. This is especially the case if we use live data that changes all the time.

Technically, we can set up an API to collect data and re-train the model (which we will demonstrate in the next example). Other options would be to schedule a re-train at a certain time when a certain amount of new data has been collected or to let a super user upload new data and trigger the re-training.

So far, we are aiming to build a prototype that runs locally. Check out this article on deploying a FastAPI project on a cloud service for more information.

What is concurrency?

To put it simply, concurrency is like when you are cooking in the kitchen, and while waiting for the water to boil, you go ahead and chop the vegetables. Since, in the web service world, the server is talking to many terminals, and the communication between the server and the terminals is slower than most internal applications, so the server will not talk to and serve the terminals one by one. Instead, it will talk to and serve many of them at the same time while fulfilling their requests. You may want to check out this explanation in the FastAPI documentation.

In Python, this is achieved by using async code. In our FastAPI code, the use of `async def` instead of `def` is obvious evidence that FastAPI is working concurrently. There are other keywords used in Python async code, like `await` and `asyncio.get_event_loop`, but we won’t be able to cover them in this blog post. 

How to use FastAPI for an image classification project

To discover more FastAPI functionality, we will add an image classification model based on the MNIST example in Keras to our application as well (we are using the TensorFlow backend). If you installed the `requirements.txt` provided, you should have Keras and Pillow installed for image processing and building a convolutional neural network (CNN).

1. Refactoring

Before we start, let’s refactor our code. To make the code more organized, we will put the model setup for the penguins prediction in a function:

def penguins_pipeline(): data = pd.read_csv('penguins.csv') data = data.dropna() le = preprocessing.LabelEncoder() X = data[["bill_length_mm", "flipper_length_mm"]] le.fit(data["species"]) y = le.transform(data["species"]) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0) clf = Pipeline( steps=[("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=11))] ) clf.set_params().fit(X_train, y_train) return clf, le

Then we rewrite the lifespan function. With full-line code completion in PyCharm, it is very easy:

2. Set up a CNN model for MNIST prediction

In similar fashion as the penguin prediction model, we create a function for MNIST prediction (and we will store the meta parameters globally):

# MNIST model meta parameters num_classes = 10 input_shape = (28, 28, 1) batch_size = 128 epochs = 15 def mnist_pipeline(): # Load the data and split it between train and test sets (x_train, y_train), _ = keras.datasets.mnist.load_data() # Scale images to the [0, 1] range x_train = x_train.astype("float32") / 255 # Make sure images have shape (28, 28, 1) x_train = np.expand_dims(x_train, -1) # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) model = keras.Sequential( [ keras.Input(shape=input_shape), layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Flatten(), layers.Dropout(0.5), layers.Dense(num_classes, activation="softmax"), ] ) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1) return model

Then add the model setup in the lifespan function:

ml_models["cnn"] = mnist_pipeline()

Note that since this is added, every time you make changes to `main.py` and save, the model will be trained again. It can take a bit of time. So in development you may want to use a dummy model that requires no training time at all or a pre-trained model instead. After training, the CNN model will be ready to go.

3. Set up a POST endpoint for uploading an image file for prediction

To set up an endpoint that takes an upload file, we have to use UploadFile in FastAPI:

@app.post("/predict-image/") async def predicct_upload_file(file: UploadFile): img = await file.read() # process image for prediction img = Image.open(BytesIO(img)).convert('L') img = np.array(img).astype("float32") / 255 img = np.expand_dims(img, (0, -1)) # predict the result result = ml_models["cnn"].predict(img).argmax(axis=-1)[0] return {"filename": file.filename, "result": str(result)}

Please note that this is a POST endpoint (so far we have only set up GET endpoints).

Don’t forget to import `UploadFile` from `fastapi`:

from fastapi import FastAPI, UploadFile

And `Image` from Pillow. We are also using `BytesIO` from the `io` module:

from PIL import Image from io import BytesIO

To test this using the PyCharm HTTP Client with a test image file, we will make use of the `multipart/form-data` encoding. You can check out the HTTP request syntax here. This is what you will put in the `test_in.http` file:

### POST http://127.0.0.1:8000/predict-image/ HTTP/1.1 Content-Type: multipart/form-data; boundary=boundary --boundary Content-Disposition: form-data; name="file"; filename="test_img0.png" < ./test_img0.png --boundary– 4. Add an API to collect data and trigger retraining

Now, here comes the retraining. We set up a POST endpoint like above to accept a zip file which contains training images and labels. The zip file will then be processed and the training data will be prepared. After that we will fit the CNN model again:

@app.post("/upload-images/") async def retrain_upload_file(file: UploadFile): img_files = [] labels_file = None train_img = None with ZipFile(BytesIO(await file.read()), 'r') as zfile: for fname in zfile.namelist(): if fname[-4:] == '.txt' and fname[:2] != '__': labels_file = fname elif fname[-4:] == '.png': img_files.append(fname) if len(img_files) == 0: return {"error": "No training images (png files) found."} else: for fname in sorted(img_files): with zfile.open(fname) as img_file: img = img_file.read() # process image img = Image.open(BytesIO(img)).convert('L') img = np.array(img).astype("float32") / 255 img = np.expand_dims(img, (0, -1)) if train_img is None: train_img = img else: train_img = np.vstack((train_img, img)) if labels_file is None: return {"error": "No training labels file (txt file) found."} else: with zfile.open(labels_file) as labels: labels_data = labels.read() labels_data = labels_data.decode("utf-8").split() labels_data = np.array(labels_data).astype("int") labels_data = keras.utils.to_categorical(labels_data, num_classes) # retrain model ml_models["cnn"].fit(train_img, labels_data, batch_size=batch_size, epochs=epochs, validation_split=0.1) return {"message": "Model trained successfully."}

Remember to import `ZipFile`:

from zipfile import ZipFile

If we now try the endpoint with this zip file of 1000 retraining images and labels, you will see that it takes a moment for the response to come, as the training is taking a while:

POST http://127.0.0.1:8000/upload-images/ HTTP/1.1 Content-Type: multipart/form-data; boundary=boundary --boundary Content-Disposition: form-data; name="file"; filename="training_data.zip" < ./retrain_img.zip --boundary--

Imagine the zip files contain more training data or you’re retraining a more complicated model. The user would then have to wait for a long time and it would seem like things are not working for them.

5. Retrain the model with BackgroundTasks

A better way to handle retraining is, after receiving the training data, we process it and check if the data is in the right format, then give a response saying that the retraining has restarted and train the model in `BackgroundTasks`. Here is how to do it. First, we will add `BackgroundTasks` to our `upload-images` endpoint:

@app.post("/upload-image/") async def retrain_upload_file(file: UploadFile, background_tasks: BackgroundTasks): ...

Remember to import it from `fastapi`:

from fastapi import FastAPI, UploadFile, BackgroundTasks

Then, we will put the fitting of the model into the `background_tasks`:

# retrain model background_tasks.add_task( ml_models["cnn"].fit, train_img, labels_data, batch_size=batch_size, epochs=epochs, validation_split=0.1 )

Also, we will update the message in the response:

return {"message": "Data received successfully, model training has started."}

Now test the endpoint again. You will see that the response has arrived much quicker, and if you look at the Run window, you’ll see that the training is running after the response has arrived.

At this point, more functionality can be added, for example, an option to notify the user later (e.g. via email) when the training is finished or track the training progress in a dashboard when a full application is built.

Develop ML FastAPI applications with PyCharm

FastAPI provides an easy way to convert your data science project into a working application in several easy steps. It is perfect for data science teams that want to provide an application prototype for their machine learning model which can be further developed into a professional web application if needed. 

PyCharm Professional is the Python IDE that allows you to develop FastAPI applications more easily with a preconfigured project for FastAPI, coding assistance, tailored run/debug configurations, and the Endpoints tool window for managing API endpoints efficiently.

Get a free trial of PyCharm Professional

In this blog post, we showed the process of providing a simple API for a pre-trained prediction model. To learn more about FastAPI, I would suggest checking out the official FastAPI documentation. If you’re choosing between different frameworks, explore how FastAPI differs from Django.

About the author Cheuk Ting Ho

Cheuk has been a Data Scientist at various companies – a job that demands high numerical and programming skills, especially in Python. Following her passion for the tech community, Cheuk has been a Developer Advocate for three years. She also contributes to multiple open-source libraries like Hypothesis, Pytest, pandas, Polars, PyO3, Jupyter Notebook, and Django. Cheuk is currently a consultant and trainer at CMD Limes.

Categories: FLOSS Project Planets

Stack Abuse: Securing Your Email Sending With Python: Authentication and Encryption

Wed, 2024-09-18 22:29

Email encryption and authentication are modern security techniques that you can use to protect your emails and their content from unauthorized access.

Everyone, from individuals to business owners, uses emails for official communication, which may contain sensitive information. Therefore, securing emails is important, especially when cyberattacks like phishing, smishing, etc. are soaring high.

In this article, I'll discuss how to send emails in Python securely using email encryption and authentication.

Setting Up Your Python Environment

Before you start creating the code for sending emails, set up your Python environment first with the configurations and libraries you'll need.

You can send emails in Python using:

  • Simple Mail Transfer Protocol (SMTP): This application-level protocol simplifies the process since Python offers an in-built library or module (smtplib) for sending emails. It's suitable for businesses of all sizes as well as individuals to automate secure email sending in Python. We're using the Gmail SMTP service in this article.

  • An email API: You can leverage a third-party API like Mailtrap Python SDK, SendGrid, Gmail API, etc., to dispatch emails in Python. This method offers more features and high email delivery speeds, although it requires some investment.

In this tutorial, we're opting for the first choice - sending emails in Python using SMTP, facilitated by the smtplib library. This library uses the RFC 821 protocol and interacts with SMTP and mail servers to streamline email dispatch from your applications. Additionally, you should install packages to enable Python email encryption, authentication, and formatting.

Step 1: Install Python

Install the Python programming language on your computer (Windows, macOS, Linux, etc.). You can visit the official Python website and download and install it from there.

If you've already installed it, run this code to verify it:

python --version

Step 2: Install Necessary Modules and Libraries
  • smtplib: This handles SMTP communications. Use the code below to import 'smtplib' and connect with your email server:

    import smtplib
  • email module: This provides classes (Subject, To, From, etc.) to construct and parse emails. It also facilitates email encoding and decoding with Multipurpose Internet Mail Extensions (MIME).

  • MIMEText: It's used for formatting your emails and supports sending emails with text and attachments like images, videos, etc. Import this using the code below:

    import MIMEText
  • MIMEMultipart: Use this library to add attachments and text sections separately in your email.

    import MIMEMultipart
  • ssl: It provides Secure Sockets Layer (SSL) encryption.

Step 3: Create a Gmail Account

To send emails using the Gmail SMTP email service, I recommend creating a test account to develop the code. Delete the account once you've tested the code.

The reason is, you'll need to modify the security settings of your Gmail account to enable access from the Python code for sending emails. This might expose the login details, compromising security. In addition, it will flood your account with too many test emails.

So, instead of using your own Gmail account, create a new one for creating and testing the code. Here's how to do this:

  • Create a fresh Gmail account
  • Set up your app password:
    Google Account > Security > Turn on 2-Step Verification > Security > Set up an App Password
    Next, define a name for the app password and click on "Generate". You'll get a 16-digit password after following some instructions on the screen. Store the password safely.

Use this password while sending emails in Python. Here, we're using Gmail SMTP, but if you want to use another mail service provider, follow the same process. Alternatively, contact your company's IT team to seek support in accessing your SMTP server.

Email Authentication With Python

Email authentication is a security mechanism that verifies the sender's identity, ensuring the emails from a domain are legitimate. If you have no email authentication mechanism in place, your emails might land in spam folders, or malicious actors can spoof or intercept them. This could affect your email delivery rates and the sender's reputation.

This is the reason you must enable Python email authentication mechanisms and protocols, such as:

  • SMTP authentication: If you're sending emails using an SMTP server like Gmail SMTP, you can use this method of authentication. It verifies the sender's authenticity when sending emails via a specific mail server.

  • SPF: Stands for Sender Policy Framework and checks whether the IP address of the sending server is among

  • DKIM: Stands for DomainKeys Identified Mail and is used to add a digital signature to emails to ensure no one can alter the email's content while it's in transmission. The receiver's server will then verify the digital signature. Thus, all your emails and their content stay secure and unaltered.

  • DMARC: Stands for Domain-based Message Authentication, Reporting, and Conformance. DMARC instructs mail servers what to do if an email fails authentication. In addition, it provides reports upon detecting any suspicious activities on your domain.

How to Implement Email Authentication in Python

To authenticate your email in Python using SMTP, the smtplib library is useful. Here's how Python SMTP security works:

import smtplib server = smtplib.SMTP('smtp.domain1.com', 587) server.starttls() # Start TLS for secure connection server.login('my_email@domain1.com', 'my_password') message = "Subject: Test Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit()

Implementing email authentication will add an additional layer of security to your emails and protect them from attackers or from being marked as spam.

Encrypting Emails With Python

Encrypting emails enables you to protect your email's content so that only authorized senders and receivers can access or view the content. Encrypting emails with Python is done using encryption techniques to encode the email message and transform it into a secure and unreadable format (also known as ciphertext).

This way, email encryption secures the message from unauthorized access or attackers even if they intercept the email.

Here are different types of email encryption:

  • SSL: This stands for Secure Sockets Layer, one of the most popular and widely used encryption protocols. SSL ensures email confidentiality by encrypting data transmitted between the mail server and the client.

  • TLS: This stands for Transport Layer Security and is a common email encryption protocol today. Many consider it a great alternative to SSL. It encrypts the connection between an email client and the mail server to prevent anyone from intercepting the email during its transmission.

  • E2EE: This stands for end-to-end encryption, ensuring only the intended recipient with valid credentials can decrypt the email content and read it. It aims to prevent email interception and secure the message.

How to Implement Email Encryption in Python

If your mail server requires SSL encryption, here's how to send an email in Python:

import smtplib import ssl context = ssl.create_default_context() server = smtplib.SMTP_SSL('smtp.domain1.com', 465, context=context) # This is for SSL connections, requiring port number 465 server.login('my_email@domain1.com', 'my_password') message = "Subject: SSL Encrypted Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit()

For TLS connections, you'll need the smtplib library:

import smtplib server = smtplib.SMTP('smtp.domain1.com', 587) # TLS requires 587 port number server.starttls() # Start TLS encryption server.login('my_email@domain1.com', 'my_password') message = "Subject: TLS Encrypted Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit()

For end-to-end encryption, you'll need more advanced libraries or tools such as GnuPG, OpenSSL, Signal Protocol, and more.

Combining Authentication and Encryption

Email Security with Python requires both encryption and authentication. This ensures that mail servers find the email legitimate and it stays safe from cyber attackers and unauthorized access during transmission. For email encryption, you can use either SSL or TLS and combine it with SMTP authentication to establish a robust email connection.

Now that you know how to enable email encryption and authentication in your emails, let's examine some complete code examples to understand how you can send secure emails in Python using Gmail SMTP and email encryption (SSL).

Code Examples

1. Sending a Plain Text Email import smtplib from email.mime.text import MIMEText subject = "Plain Text Email" body = "This is a plain text email using Gmail SMTP and SSL." sender = "sender1@gmail.com" receivers = ["receiver1@gmail.com", "receiver2@gmail.com"] password = "my_password" def send_email(subject, body, sender, receivers, password): msg = MIMEText(body) msg['Subject'] = subject msg['From'] = sender msg['To'] = ', '.join(receivers) with smtplib.SMTP_SSL('smtp.gmail.com', 465) as smtp_server: smtp_server.login(sender, password) smtp_server.sendmail(sender, receivers, msg.as_string()) print("The plain text email is sent successfully!") send_email(subject, body, sender, receivers, password)

Explanation:

  • sender: This contains the sender's address.
  • receivers: This contains email addresses of receiver 1 and receiver 2.
  • msg: This is the content of the email.
  • sendmail(): This is the SMTP object's instance method. It takes three parameters - sender, receiver, and msg and sends the message.
  • with: This is a context manager that is used to properly close an SMTP connection once an email is sent.
  • MIMEText: This holds only plain text.
2. Sending an Email with Attachments

To send an email in Python with attachments securely, you will need some additional libraries like MIMEBase and encoders. Here's the code for this case:

import smtplib from email import encoders from email.mime.base import MIMEBase from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText sender = "sender1@gmail.com" password = "my_password" receiver = "receiver1@gmail.com" subject = "Email with Attachments" body = "This is an email with attachments created in Python using Gmail SMTP and SSL." with open("attachment.txt", "rb") as attachment: part = MIMEBase("application", "octet-stream") # Adding the attachment to the email part.set_payload(attachment.read()) encoders.encode_base64(part) part.add_header( "Content-Disposition", # The header indicates that the file name is an attachment. f"attachment; filename='attachment.txt'", ) message = MIMEMultipart() message['Subject'] = subject message['From'] = sender message['To'] = receiver html_part = MIMEText(body) message.attach(html_part) # To attach the file message.attach(part) with smtplib.SMTP_SSL('smtp.gmail.com', 465) as server: server.login(sender, password) server.sendmail(sender, receiver, message.as_string())

Explanation:

  • MIMEMultipart: This library allows you to add text and attachments both to an email separately.
  • 'rb': It represents binary mode for the attachment to be opened and the content to be read.
  • MIMEBase: This object is applicable to any file type.
  • Encode and Base64: The file will be encoded in Base64 for safe email sending.
Sending an HTML Email in Python

To send an HTML email in Python using Gmail SMTP, you need a class - MIMEText.

Here's the full code for Python send HTML email:

import smtplib from email.mime.text import MIMEText sender = "sender1@gmail.com" password = "my_password" receiver = "receiver1@gmail.com" subject = "HTML Email in Python" body = """ <html> <body> <p>HTML email created in Python with SSL and Gmail SMTP.</p> </body> </html> """ message = MIMEText(body, 'html') # To attach the HTML content to the email message['Subject'] = subject message['From'] = sender message['To'] = receiver with smtplib.SMTP_SSL('smtp.gmail.com', 465) as server: server.login(sender, password) server.sendmail(sender, receiver, message.as_string()) Testing Your Email With Authentication and Encryption

Testing your emails before sending them to the recipients is important. It enables you to discover any issues or bugs in sending emails or with the formatting, content, etc.

Thus, always test your emails on a staging server before delivering them to your target recipients, especially when sending emails in bulk. Testing emails provide the following advantages:

  • Ensures the email sending functionality is working fine
  • Emails have proper formatting and no broken links or attachments
  • Prevents flooding the recipient's inbox with a large number of test emails
  • Enhances email deliverability and reduces spam rates
  • Ensures the email and its contents stay protected from attacks and unauthorized access

To test this combined setup of sending emails in Python with authentication and encryption enabled, use an email testing server like Mailtrap Email Testing. This will capture all the SMTP traffic from the staging environment, and detect and debug your emails before sending them. It will also analyze the email content, validate CSS/HTML, and provide a spam score so you can improve your email sending.

To get started:

  • Open Mailtrap Email Testing
  • Go to 'My Inbox'
  • Click on 'Show Credentials' to get your test credentials - login and password details

Here's the Full Code Example for Testing Your Emails:

import smtplib from socket import gaierror port = 2525 # Define the SMTP server separately smtp_server = "sandbox.smtp.mailtrap.io" login = "xyz123" # Paste your Mailtrap login details password = "abc$$" # Paste your Mailtrap password sender = "test_sender@test.com" receiver = "test_receiver@example.com" message = f"""\ Subject: Hello There! To: {receiver} From: {sender} This is a test email.""" try: with smtplib.SMTP(smtp_server, port) as server: # Use Mailtrap-generated credentials for port, server name, login, and password server.login(login, password) server.sendmail(sender, receiver, message) print('Sent') except (gaierror, ConnectionRefusedError): # In case of errors print('Unable to connect to the server.') except smtplib.SMTPServerDisconnected: print('Server connection failed!') except smtplib.SMTPException as e: print('SMTP error: ' + str(e))

If there's no error, you should see this message in the receiver's inbox:

This is a test email. Best Practices for Secure Email Sending

Consider the below Python email best practices for secure email sending:

  • Protect data: Take appropriate security measures to protect your sensitive data such as SMTP credentials, API keys, etc. Store them in a secure, private place like config files or environment variables, ensuring no one can access them publicly.

  • Encryption and authentication: Always use email encryption and authentication so that only authorized individuals can access your emails and their content.

    For authentication, you can use advanced methods like API keys, two-factor authentication, single sign-on (SSO), etc. Similarly, use advanced encryption techniques like SSL, TLS, E2EE, etc.

  • Error handling: Manage network issues, authentication errors, and other issues by handling errors effectively using except/try blocks in your code.

  • Rate-Limiting: Maintain high email deliverability by rate-limiting the email sending functionality to prevent exceeding your service limits.

  • Validate Emails: Validate email addresses from your list and remove invalid ones to enhance email deliverability and prevent your domain from getting marked as spam. You can use an email validation tool to do this.

  • Educate: Keep your team updated with secure email practices and cybersecurity risks. Monitor your spam score and email deliverability rates, and work to improve them.

Wrapping Up

Secure email sending with Python using advanced email encryption methods like SSL, TLS, and end-to-end encryption, as well as authentication protocols and techniques such as SPF, DMARC, 2FA, and API keys.

By combining these security measures, you can protect your confidential email information, improve email deliverability, and maintain trust with your target recipients. In this way, only individuals with appropriate credentials can access it. This will help prevent unauthorized access, data breaches, and other cybersecurity attacks.

Categories: FLOSS Project Planets

The Python Show: 47 - Python Projects of 2024

Wed, 2024-09-18 20:37

I’ve been working on lots of projects this year. Here are the ones I highlighted in this episode:

The Python Show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Categories: FLOSS Project Planets

Armin Ronacher: Accidental Spending: A Case For an Open Source Tax?

Wed, 2024-09-18 20:00

Both last week at London tech leaders and this week at the Open Source Summit in Vienna I engaged in various discussions about pledging money to Open Source. At Sentry we have been funding our Open Source dependencies for a few years now and we're trying to encourage others to do the same.

It’s not an easy ask, of course. One quite memorable point raised was what I would call “accidental spending”. The story goes like this: an engineering team spins up a bunch of Kubernetes machines. As the fleet grows in scale some inefficiencies creep in. To troubleshoot or optimize, additional services such as load balancers, firewalls, cloud provider log services, etc. are provisioned with minimal discussion. Initially none of that was part of the plan, but ever so slightly for every computing resource, some extra stuff is paid on top creating largely hidden costs. Ideally all of that pays off (after all, hopefully by debugging quicker you reduce that downtime, by having that load balancer you can auto scale and save on unused computing resources etc.). But often, the payoff feels abstract and are hard to quantify.

I call those purchases “accidental” because they are proportional to the deployed infrastructure and largely acting like a tax on top of everything. Only after a while does the scale of that line item become apparent. On the other hand intentionally purchasing a third party system is a very intentional act. It's very deliberate, requiring conversations and more scrutiny is placed for putting a credit card into a new service. Companies providing services understand this and are positioning themselves accordingly. Their play could be to make the case that that their third party solution is better, cheaper etc.

Open Source funding could be seen through both of these lenses. Today, in many ways, pledging money to Open Source is a very intentional decision. It requires discussions, persuasion and justification. The purpose and the pay-off is not entirely clear. Companies are not used to the idea of funding Open Source and they don't have a strong model to reason about these investments. Likewise many Open Source projects themselves also don't have a good way of dealing with money and might lack the governance to handle funds effectively. After all many of these projects are run by individuals and not formal organizations.

Companies are unlikely to fund something without understanding the return on investment. One better understood idea is to turn that one “random person in Nebraska” maintaining a critical dependency into a well-organized team with good op-sec. But for that to happen, funding needs to scale from pennies to dollars, making it really worthwhile.

My colleague Chad Whitacre floated an idea: what if platforms like AWS or GitHub started splitting the check? By adding a line-item to the invoices of their customers to support Open Source finding. It would turn giving to Open Source into more of a tax like thing. That might leverage the general willingness to just pile up on things to do good things. If we all pay 3% on top of our Cloud or SaaS bills to give to Open Source this would quickly add up.

While I’m intrigued by the idea, I also have my doubts that this would work. It goes back to the problem mentioned earlier that some Open Source projects just have no governance or are not even ready to receive money. How much value you put on a dependency is also very individual. Just because an NPM package has a lot of downloads does not necessarily mean it's critical to the mission of the company. rrweb is a good example for us at Sentry. It sits at the core of our session replay product but since we we vendor a pinned fork, you would not see rrweb in your dependency tree. We also value that package more than some algorithm would be able to determine about how important that package is to us.

So the challenge with the tax — as appealing as it is — is that it might make the “purchase decision” of funding Open Source easier, but it would probably make the distribution problem much worse. Deliberate, intentional funding is key. At least for the moment.

Still, it’s worth considering. The “what if” is a powerful idea. Using a restaurant analogy, the “open-source tax” is like the mandatory VAT or health surcharge on your bill: no choice is involved. Another model could be more like the tip suggestions on a receipt offering a choice but also guidance on what’s appropriate to contribute.

The current model we propose with our upcoming Open Source Pledge is to suggest like a tip what you should give in relation to your developer work force. Take the average number of full time engineers you have over a year, multiply this by 2000. That is the amount in US dollars you should give to your Open Source dependencies.

That sounds like a significant amount! But let's put this in relation for a typical developer you employ: that's less than a fifth of what you would pay for FICA (Federal Insurance Contributions Act in the US) in the US. That's less than the communal tax you would pay in Austria. I'm sure you can think of similar payroll taxes in your country.

I believe that after step one of recognizing there is a funding problem follows an obvious step two: having a baseline funding amount that stands in relation to your business (you own or are a part of) of what the amount should be. Using the size of the development team as a metric offers an objective and quantifiable starting point. The beauty in my mind of the developer count in particular is that it's somewhat independently observable from both the outside and inside [1]. The latter is important! It creates a baseline for people within a company to start a conversation about Open Source funding.

If you have feedback on this, particular the pledge I invite you mail me or to leave a comment on the Pledge's issue tracker.

[1]There is an analogy to historical taxation here. For instance the Window Tax was taxation based on the number of Windows in a building. That made enforcement easy because you could count them from street level. The downside of taht was obviously the unintended consequences that this caused. Something to always keep in mind!
Categories: FLOSS Project Planets

Python Engineering at Microsoft: Announcing the new Python Data Science Extension Pack for VS Code

Wed, 2024-09-18 19:53

We’re thrilled to announce the launch of the new Python Data Science Extension Pack for Visual Studio Code! This powerful pack brings together some of the most popular and essential VS Code extensions, making it your one-stop shop for all things data science in Python.

What’s Inside?

Our extension pack is designed to streamline your data science journey from start to finish. Whether you’re preparing data, conducting analysis, visualizing results, or building and training machine learning models, we’ve got you covered.

This Data Science extension pack currently includes four extensions:

  • Python – Provides rich support for the Python language such as IntelliSense, debugging, formatting, linting, code navigation, refactoring, variable explorer, test explorer, and more.
  • Jupyter – Used to create and edit Jupyter Notebooks, add and run code/markdown cells, render plots, create presentation-friendly versions of your notebook by exporting to HTML or PDF and more.
  • GitHub Copilot – An AI pair programmer tool that helps you write code faster and smarter.
  • Data Wrangler – A code-centric data viewing and cleaning tool to explore, visualize, and clean tabular data.
Get started today

Dive into the world of data science by installing the Python Data Science Extension Pack for VS Code from the VS Code extension marketplace.

We encourage you to provide feedback and file issues. Additionally, if there are other VS Code extensions that you feel are essential to the data science workflow, please let us know by creating a ticket in our GitHub repo.

The post Announcing the new Python Data Science Extension Pack for VS Code appeared first on Python.

Categories: FLOSS Project Planets

Real Python: Python 3.13 Preview: Free Threading and a JIT Compiler

Wed, 2024-09-18 10:00

Although the final release of Python 3.13 is scheduled for October 2024, you can download and install a preview version today to explore the new features. Notably, the introduction of free threading and a just-in-time (JIT) compiler are among the most exciting enhancements, both designed to give your code a significant performance boost.

In this tutorial, you’ll:

  • Compile a custom Python build from source using Docker
  • Disable the Global Interpreter Lock (GIL) in Python
  • Enable the Just-In-Time (JIT) compiler for Python code
  • Determine the availability of new features at runtime
  • Assess the performance improvements in Python 3.13
  • Make a C extension module targeting Python’s new ABI

Check out what’s new in the Python changelog for a complete list of the upcoming features and improvements. This document contains a quick summary of the release highlights as well as a detailed breakdown of the planned changes.

To download the sample code and other resources accompanying this tutorial, click the link below:

Get Your Code: Click here to download the free sample code that shows you how to work with the experimental free threading and JIT compiler in Python 3.13.

Take the Quiz: Test your knowledge with our interactive “Python 3.13: Free-Threading and a JIT Compiler” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python 3.13: Free-Threading and a JIT Compiler

In this quiz, you'll test your understanding of the new features in Python 3.13. You'll revisit how to compile a custom Python build, disable the Global Interpreter Lock (GIL), enable the Just-In-Time (JIT) compiler, and more.

Free Threading and JIT in Python 3.13: What’s the Fuss?

Before going any further, it’s important to note that the majority of improvements in Python 3.13 will remain invisible to the average Joe. This includes free threading (PEP 703) and the JIT compiler (PEP 744), which have already sparked a lot of excitement in the Python community.

Keep in mind that they’re both experimental features aimed at power users, who must take extra steps to enable them at Python’s build time. None of the official channels will distribute Python 3.13 with these additional features enabled by default. This is to maintain backward compatibility and to prevent potential glitches, which should be expected.

Note: Don’t try to use Python 3.13 with the experimental features in a production environment! It may cause unexpected problems, and the Python Steering Council reserves the right to remove these features entirely from future Python releases if they prove to be unstable. Treat them as an experiment to gather real-world data.

In this section, you’ll get a birds-eye view of these experimental features so you can set the right expectations. You’ll find detailed explanations on how to enable them and evaluate their impact on Python’s performance in the remainder of this tutorial.

Free Threading Makes the GIL Optional

Free threading is an attempt to remove the Global Interpreter Lock (GIL) from CPython, which has traditionally been the biggest obstacle to achieving thread-based parallelism when performing CPU-bound tasks. In short, the GIL allows only one thread of execution to run at any given time, regardless of how many cores your CPU is equipped with. This prevents Python from leveraging the available computing power effectively.

There have been many attempts in the past to bypass the GIL in Python, each with varying levels of success. You can read about these attempts in the tutorial on bypassing the GIL. While previous attempts were made by third parties, this is the first time that the core Python development team has taken similar steps with the permission of the steering council, even if some reservations remain.

Note: Python 3.12 approached the GIL obstacle from a different angle by allowing the individual subinterpreters to have their independent GILs. This can improve Python’s concurrency by letting you run different tasks in parallel, but without the ability to share data cheaply between them due to isolated memory spaces. In Python 3.13, you’ll be able to combine subinterpreters with free threading.

The removal of the GIL would have significant implications for the Python interpreter itself and especially for the large body of third-party code that relies on it. Because free threading essentially breaks backward compatibility, the long-term plan for its implementation is as follows:

  1. Experimental: Free threading is introduced as an experimental feature and isn’t a part of the official Python distribution. You must make a custom Python build to disable the GIL.
  2. Enabled: The GIL becomes optional in the official Python distribution but remains enabled by default to allow for a transition period.
  3. Disabled: The GIL is disabled by default, but you can still enable it if needed for compatibility reasons.

There are no plans to completely remove the GIL from the official Python distribution at the moment, as that would cause significant disruption to legacy codebases and libraries. Note that the steps outlined above are just a proposal subject to change. Also, free threading may not pan out at all if it makes single-threaded Python run slower than without it.

Until the GIL becomes optional in the official Python distribution, which may take a few more years, the Python development team will maintain two incompatible interpreter versions. The vanilla Python build won’t support free threading, while the special free-threaded flavor will have a slightly different Application Binary Interface (ABI) tagged with the letter “t” for threading.

This means that C extension modules built for stock Python won’t be compatible with the free-threaded version and the other way around. Maintainers of those external modules will be expected to distribute two packages with each release. If you’re one of them, and you use the Python/C API, then you’ll learn how to target CPython’s new ABI in the final section of this tutorial.

JIT Compiles Python to Machine Code

As an interpreted language, Python takes your high-level code and executes it on the fly without the need for prior compilation. This has both pros and cons. Some of the biggest advantages of interpreted languages include better portability across different hardware architectures and a quick development time due to the lack of a compilation step. At the same time, interpretation is much slower than directly executing code native to your machine.

Note: To be more precise, Python interprets bytecode instructions, an intermediate binary representation between pure Python and machine code. The Python interpreter compiles your code to bytecode when you import a module and stores the resulting bytecode in the __pycache__ folder. This doesn’t inherently make your Python scripts run faster, but loading a pre-processed bytecode can indeed speed up their startup time.

Languages like C and C++ leverage Ahead-of-Time (AOT) compilation to translate your high-level code into machine code before you ship your software. The benefit of this is faster execution since the code is already in the computer’s mother tongue. While you no longer need a separate program to interpret the code, you must compile it separately for all target platforms that you want supported. You should also handle platform-specific differences yourself.

Read the full article at https://realpython.com/python313-free-threading-jit/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Django Weblog: Last call for DjangoCon US 2024 tickets!

Wed, 2024-09-18 08:00

DjangoCon US starts next week in Durham, NC on September 22nd!

If you aren't able to join in person, please consider purchasing an online ticket: https://ti.to/defna/djangocon-us-2024

The conference is full of a variety of talks with excellent keynote speakers! It's shaping up to be an event you'll want to experience live.

If you'd like to learn more about DjangoCon US visit them at their website or reach out to them at hello@djangocon.us.

Categories: FLOSS Project Planets

Spyder IDE: Scientific IDE UX Birds of a Feather session at SciPy 2024

Tue, 2024-09-17 20:00
The Spyder team hosted a Birds of a Feather session at SciPy 2024, this time on the topic of users' experiences (good and bad) with the UI/UX of scientific interfaces and IDEs, and how their developers can better serve users. Here, we share what we learned from the session, as well as a link to the full detailed community notes.
Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #647 (Sept. 17, 2024)

Tue, 2024-09-17 15:30

#647 – SEPTEMBER 17, 2024
View in Browser »

How to Use Conditional Expressions With NumPy where()

This tutorial teaches you how to use the where() function to select elements from your NumPy arrays based on a condition. You’ll learn how to perform various operations on those elements and even replace them with elements from a separate array or arrays.
REAL PYTHON

PythonistR: A Match Made in Data Heaven

In data science you’ll sometimes hear a debate between R and Python. Cosima says ‘why not choose both?’ She outlines a data pipeline that uses the best tool for each job.
COSIMA MEYER

Transcribe Audio in 5 Lines of Code

Build AI apps that understand speech with insanely accurate speech-to-text models. Sign up for a free account and get $50 in credits to try AssemblyAI’s speech recognition models →
ASSEMBLY AI sponsor

Python HTTP Clients: Requests vs. HTTPX vs. AIOHTTP

Learn about the differences between Requests, HTTPX, and AIOHTTP, and when to use each library for your Python projects.
GEORGES HAIDAR

Quiz: Generate Images With DALL·E and the OpenAI API

In this quiz, you’ll test your understanding of generating images with DALL·E by OpenAI using Python. You’ll revisit concepts such as using the OpenAI Python library, making API calls for image generation, creating images from text prompts, and converting Base64 strings to PNG image files.
REAL PYTHON

Quiz: The Walrus Operator: Python’s Assignment Expressions

In this quiz, you’ll test your understanding of the Python Walrus Operator. This operator was introduced in Python 3.8, and understanding it can help you write more concise and efficient code.
REAL PYTHON

Python Releases 3.12.6, 3.11.10, 3.10.15, 3.9.20, and 3.8.20

PYTHON.ORG

Python Release Python 3.13.0rc2

PYTHON.ORG

Articles & Tutorials Python Community Divided Over CoC Enforcement

Over the last few months there has been a lot of back and forth in the Python community, especially on the forums, around changes to bylaws and how the Code of Conduct is enforced. This article covers the history and context of the events.
JAKE EDGE

Django: Rotate Your Secret Key, Fast or Slow

Django’s SECRET_KEY setting is used for cryptographic signing in various places, such as for session storage and password reset tokens. If you need to rotate it you can allow read-only use of the old key to smooth the transition.
ADAM JOHNSON

Posit Connect - Help Your Analytics Team Share and Collaborate

Tired of tediously send files and trying to use general-purpose collaboration tools? Posit Connect makes it easy to share, collaborate, and get feedback on your data science work including Jupyter notebooks, Plotly dashboards, Streamlit, Quarto, Shiny or other interactive analytics applications →
POSIT sponsor

Multiversion Python Thoughts

Armin has played around with enabling multiple versions of a library to be installed for the same instance of Python in the past, and recent feature additions to uv are making it come closer to fruition.
ARMIN RONACHER

How We Made Notebooks Load 10 Times Faster

“When we received feedback our Notebooks UI was taking too long too load, our engineers dove into ways to improve the developer experience — bringing some load times from 30 seconds down to less than one.”
LUIS NEVES

When to Use .__repr__() vs .__str__() in Python

In this video course, you’ll learn the difference between the string representations returned by .__repr__() vs .__str__() and understand how to use them effectively in classes that you define.
REAL PYTHON course

Python Bytes Episode #400

The Python Bytes podcast just delivered show #400. This is a huge accomplishment. This episode celebrates the achievement, and also covers: Python 3.13RC, Docker with uv, the humanize project, and more.
PYTHON BYTES podcast

Improved print Readability With pprint

The pretty print module (pprint) provides more readable output for complex data structures and this post shows you how to use the library and what you can get out of it.
JUHA-MATTI SANTALA

“Next Level Python” Humble Bundle for Charity

Make mastering Python your mission: This mix of online courses, books, exercises, and productivity tools is here to help you succeed—whether you’re a beginner or a skilled Python pro. Support Girls Who Code and get Python books, software, and video courses collectively valued at $1,882 for a pay-what-you-want price →
HUMBLEBUNDLE.COM sponsor

Switching From pyenv to uv

Will has recently switched from using a variety of packaging tools to just using uv. This post is a summary of what needed to change when going from pyenv to uv.
WILL KAHN-GREENE

uv Under Discussion on Mastodon

There is a deep conversation going on about the longevity of uv on Mastodon and for those not on the platform, Simon has summarized it.
SIMON WILLISON

Why Not Comments

This post talks about why you might want to include information in your code comments about why you didn’t take a particular approach.
HILLEL WAYNE

How to Build a Perfect Docker Image for a Poetry Project

This article describes how to build a secure, fast to build, and lightweight Docker image for your Poetry-based project
CODEMAGEDDON • Shared by Sergey

Python macOS Framework Builds

Glyph explains just what a Framework is on macOS and why CPython on macOS should be built that way.
GLYPH LEFKOWITZ

Projects & Code PSP (Python Scaffolding Projects)

GITHUB.COM/MATTEOGUADRINI • Shared by Matteo Guadrini

pocketpy: Portable Python 3.x Interpreter in Modern C

GITHUB.COM/POCKETPY

graphiti: Build Dynamic, Temporally-Aware Knowledge Graphs

GITHUB.COM/GETZEP

picows: Ultra-Fast Websocket Client and Server for Asyncio

GITHUB.COM/TARASKO

django-cotton: Component Based Design to Django Templates

GITHUB.COM/WRABIT

Events PyData Amsterdam 2024

September 18 to September 21, 2024
PYDATA.ORG

Weekly Real Python Office Hours Q&A (Virtual)

September 18, 2024
REALPYTHON.COM

PyCon India 2024

September 20 to September 24, 2024
PYCON.ORG

PyCon TW 2024

September 21 to September 23, 2024
PYCON.ORG

DjangoCon US 2024

September 22 to September 27, 2024
DJANGOCON.US

PyBay 2024

September 23 to September 24, 2024
PYBAY.ORG

PyCon Africa 2024

September 24 to September 29, 2024
PYCON.ORG

PyData Paris 2024

September 25 to September 27, 2024
PYDATA.ORG

PyCon JP 2024

September 27 to September 30, 2024
PYCON.JP

Python Norte 2024

September 27 to September 29, 2024
PYTHONNORTE.ORG

PyCon Niger 2024

September 28 to September 30, 2024
PYCON.ORG

Happy Pythoning!
This was PyCoder’s Weekly Issue #647.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Python Morsels: Understanding help() in Python

Tue, 2024-09-17 11:10

When using Python's help function, have you ever wondered what the various symbols (/, *, [, and ]) mean? Understanding those symbols will help you better understand how to use the functions and classes you're working with.

Table of contents

  1. What do all the symbols mean in help output?
  2. Multiple function signatures
  3. Pay attention to the nouns
  4. The symbols of help
  5. Default values: the = symbol
  6. Unlimited arguments: the * symbol before an argument name
  7. Keyword-only arguments: a lone * symbol
  8. Positional-only arguments: a lone / symbol
  9. Arbitrary keyword arguments: the ** symbol
  10. Square brackets: optional arguments
  11. Ellipsis (...) and other weird things
  12. The conventions of Python's help function

What do all the symbols mean in help output?

We'll cover what the * and / symbols below mean:

>>> help(sorted) Help on built-in function sorted in module builtins: sorted(iterable, /, *, key=None, reverse=False) Return a new list containing all items from the iterable in ascending order. A custom key function can be supplied to customize the sort order, and the reverse flag can be set to request the result in descending order.

We'll also talk about the different formats that help output comes in. For example, note the square brackets in [x] below and note that there are two different styles noted for calling int:

>>> help(int) Help on class int in module builtins: class int(object) | int([x]) -> integer | int(x, base=10) -> integer

We'll start by giving a name to that line which indicates how a function, method, or class is called.

Multiple function signatures

A function signature notes the …

Read the full article: https://www.pythonmorsels.com/understanding-help/
Categories: FLOSS Project Planets

Real Python: Customizing VS Code Through Color Themes

Tue, 2024-09-17 10:00

A well-designed coding environment not only enhances your focus and productivity but also makes coding sessions more enjoyable. In this Code Conversation, your instructor Philipp Ascany will guide you step-by-step through the process of finding, installing, and adjusting color themes in VS Code. You’ll explore the various options available in VS Code and learn how to make fine adjustments to create a setup that suits your personal preferences.

In this video course, you’ll:

  • Learn about Themes in VS Code
  • Find a VS Code Color Theme
  • Select a Theme
  • Install Your Theme
  • Make Additional Adjustments

By the end of the course, you’ll have a coding environment that not only looks great but also enhances your overall coding experience.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Using Python's pip to Manage Your Projects' Dependencies

Tue, 2024-09-17 08:00

In this quiz, you’ll test your understanding of Python’s standard package manager, pip. You’ll revisit the concepts behind pip, important commands, and how to install packages.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Pages