Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 19 hours 16 min ago

Yasoob Khalid: Running Python in the Browser

Wed, 2019-05-22 14:21

Running Python in the web browser has been getting a lot of attention lately. Shaun Taylor-Morgan knows what he’s talking about here – he works for Anvil, a full-featured application platform for writing full-stack web apps with nothing but Python. So I invited him to give us an overview and comparison of the open-source solutions for running Python code in your web browser.

In the past, if you wanted to build a web UI, your only choice was JavaScript.

That’s no longer true. There are quite a few ways to run Python in your web browser. This is a survey of what’s available.

SIX SYSTEMS

I’m looking at six systems that all take a different approach to the problem. Here’s a diagram that sums up their differences.

The x-axis answers the question: when does Python get compiled? At one extreme, you run a command-line script to compile Python yourself. At the other extreme, the compilation gets done in the user’s browser as they write Python code.

The y-axis answers the question: what does Python get compiled to? Three systems make a direct conversion between the Python you write and some equivalent JavaScript. The other three actually run a live Python interpreter in your browser, each in a slightly different way.

1. TRANSCRYPT

Transcrypt gives you a command-line tool you can run to compile a Python script into a JavaScript file.

You interact with the page structure (the DOM) using a toolbox of specialized Python objects and functions. For example, if you import document, you can find any object on the page by using document like a dictionary. To get the element whose ID is name-box, you would use document["name-box"]. Any readers familiar with JQuery will be feeling very at home.

Here’s a basic example. I wrote a Hello, World page with just an input box and a button:

<input id="name-box" placeholder="Enter your name"> <button id="greet-button">Say Hello</button>

To make it do something, I wrote some Python. When you click the button, an event handler fires that displays an alert with a greeting:

def greet(): alert("Hello " + document.getElementById("name-box").value + "!") document.getElementById("greet-button").addEventListener('click', greet)

I wrote this in a file called hello.py and compiled it using transcrypt hello.py. The compiler spat out a JavaScript version of my file, called hello.js.

Transcrypt makes the conversion to JavaScript at the earliest possible time – before the browser is even running. Next we’ll look at Brython, which makes the conversion on page load.

2. BRYTHON

Brython lets you write Python in script tags in exactly the same way you write JavaScript. Just as with Transcrypt, it has a document object for interacting with the DOM.

The same widget I wrote above can be written in a script tag like this:

<script type="text/python"> from browser import document, alert def greet(event): alert("Hello " + document["name-box"].value + "!") document["greet-button"].bind("click", greet) </script>

Pretty cool, huh? A script tag whose type is text/python!

There’s a good explanation of how it works on the Brython GitHub page. In short, you run a function when your page loads:

<body onload="brython()">

that transpiles anything it finds in a Python script tag:

<script type="text/python"></script>

which results in some machine-generated JavaScript that it runs using JS’s eval() function.

3. SKULPT

Skulpt sits at the far end of our diagram – it compiles Python to JavaScript at runtime. This means the Python doesn’t have to be written until after the page has loaded.

The Skulpt website has a Python REPL that runs in your browser. It’s not making requests back to a Python interpreter on a server somewhere, it’s actually running on your machine.

Skulpt does not have a built-in way to interact with the DOM. This can be an advantage, because you can build your own DOM manipulation system depending on what you’re trying to achieve. More on this later.

Skulpt was originally created to produce educational tools that need a live Python session on a web page (example: Trinket.io). While Transcrypt and Brython are designed as direct replacements for JavaScript, Skulpt is more suited to building Python programming environments on the web (such as the full-stack app platform, Anvil).

We’ve reached the end of the x-axis in our diagram. Next we head in the vertical direction: our final three technologies don’t compile Python to JavaScript, they actually implement a Python runtime in the web browser.

4. PYPY.JS

PyPy.js is a JavaScript implementation of a Python interpreter. The developers took a C-to-JavaScript compiler called emscripten and ran it on the source code of PyPy. The result is PyPy, but running in your browser.

Advantages: It’s a very faithful implementation of Python, and code gets executed quickly.

Disadvantages: A web page that embeds PyPy.js contains an entire Python interpreter, so it’s pretty big as web pages go (think megabytes).

You import the interpreter using <script> tags, and you get an object called pypyjs in the global JS scope.

There are three main functions for interacting with the interpreter. To execute some Python, run pypyjs.exec(<python code>). To pass values between JavaScript and Python, use pypyjs.set(variable, value) and pypyjs.get(variable).

Here’s a script that uses PyPy.js to calculate the first ten square numbers:

<script type="text/javascript"> pypyjs.exec( // Run some Python 'y = [x**2. for x in range(10)]' ).then(function() { // Transfer the value of y from Python to JavaScript pypyjs.get('y') }).then(function(result) { // Display an alert box with the value of y in it alert(result) }); </script>

PyPy.js has a few features that make it feel like a native Python environment – there’s even an in-memory filesystem so you can read and write files. There’s also a document object that gives you access to the DOM from Python.

The project has a great readme if you’re interested in learning more.

5. BATAVIA

Batavia is a bit like PyPy.js, but it runs bytecode rather than Python. Here’s a Hello, World script written in Batavia:

<script id="batavia-helloworld" type="application/python-bytecode"> 7gwNCkIUE1cWAAAA4wAAAAAAAAAAAAAAAAIAAABAAAAAcw4AAABlAABkAACDAQABZAEAUykCegtI ZWxsbyBXb3JsZE4pAdoFcHJpbnSpAHICAAAAcgIAAAD6PC92YXIvZm9sZGVycy85cC9uenY0MGxf OTc0ZGRocDFoZnJjY2JwdzgwMDAwZ24vVC90bXB4amMzZXJyddoIPG1vZHVsZT4BAAAAcwAAAAA= </script>

Bytecode is the ‘assembly language’ of the Python virtual machine – if you’ve ever looked at the .pyc files Python generates, that’s what they contain (Yasoob dug into some bytecode in a recent post on this blog). This example doesn’t look like assembly language because it’s base64-encoded.

Batavia is potentially faster than PyPy.js, since it doesn’t have to compile your Python to bytecode. It also makes the download smaller – around 400kB. The disadvantage is that your code needs to be written and compiled in a native (non-browser) environment, as was the case with Transcrypt.

Again, Batavia lets you manipulate the DOM using a Python module it provides (in this case it’s called dom).

The Batavia project is quite promising because it fills an otherwise unfilled niche – ahead-of-time compiled Python in the browser that runs in a full Python VM. Unfortunately, the GitHub repo’s commit rate seems to have slowed in the past year or so. If you’re interested in helping out, here’s their developer guide.

6. PYODIDE

Mozilla’s Pyodide was announced in April 2019. It solves a difficult problem: interactive data visualisation in Python, in the browser.

Python has become a favourite language for data science thanks to libraries such as NumPySciPyMatplotlib and Pandas. We already have Jupyter Notebooks, which are a great way to present a data pipeline online, but they must be hosted on a server somewhere.

If you can put the data processing on the user’s machine, they avoid the round-trip to your server so real-time visualisation is more powerful. And you can scale to so many more users if their own machines are providing the compute.

It’s easier said than done. Fortunately, the Mozilla team came across a version of the reference Python implementation (CPython) that was compiled into WebAssembly. WebAssembly is a low-level compliment to JavaScript that performs closer to native speeds, which opens the browser up for performance-critical applications like this.

Mozilla took charge of the WebAssembly CPython project and recompiled NumPy, SciPy, Matplotlib and Pandas into WebAssembly too. The result is a lot like Jupyter Notebooks in the browser – here’s an introductory notebook.

It’s an even bigger download than PyPy.js (that example is around 50MB), but as Mozilla point out, a good browser will cache that for you. And for a data processing notebook, waiting a few seconds for the page to load is not a problem.

You can write HTML, MarkDown and JavaScript in Pyodide Notebooks too. And yes, there’s a document object to access the DOM. It’s a really promising project!

MAKING A CHOICE

I’ve given you six different ways to write Python in the browser, and you might be able to find more. Which one to choose? This summary table may help you decide.

There’s a more general point here too: the fact that there is a choice.

As a web developer, it often feels like you have to write JavaScript, you have to build an HTTP API, you have to write SQL and HTML and CSS. The six systems we’ve looked at make JavaScript seem more like a language that gets compiled to, and you choose what to compile to it (And WebAssembly is actually designed to be used this way).

Why not treat the whole web stack this way? The future of web development is to move beyond the technologies that we’ve always ‘had’ to use. The future is to build abstractions on top of those technologies, to reduce the unnecessary complexity and optimise developer efficiency. That’s why Python itself is so popular – it’s a language that puts developer efficiency first.

ONE UNIFIED SYSTEM

There should be one way to represent data, from the database all the way to the UI. Since we’re Pythonistas, we’d like everything to be a Python object, not an SQL SELECT statement followed by a Python object followed by JSON followed by a JavaScript object followed by a DOM element.

That’s what Anvil does – it’s a full-stack Python environment that abstracts away the complexity of the web. Here’s a 7-minute video that covers how it works.

Remember I said that it can be an advantage that Skulpt doesn’t have a built-in way to interact with the DOM? This is why. If you want to go beyond ‘Python in the browser’ and build a fully-integrated Python environment, your abstraction of the User Interface needs to fit in with your overall abstraction of the web system.

So Python in the browser is just the start of something bigger. I like to live dangerously, so I’m going to make a prediction. In 5 years’ time, more than 50% of web apps will be built with tools that sit one abstraction level higher than JavaScript frameworks such as React and Angular. It has already happened for static sites: most people who want a static site will use WordPress or Wix rather than firing up a text editor and writing HTML. As systems mature, they become unified and the amount of incidental complexity gradually minimises.

If you’re reading this in 2024, why not get in touch and tell me whether I was right?

Categories: FLOSS Project Planets

Stack Abuse: Python: Check if String Contains Substring

Wed, 2019-05-22 12:38

In this article, we'll examine four ways to use Python to check whether a string contains a substring. Each has their own use-cases and pros/cons, some of which we'll briefly cover here:

1) The in Operator

The easiest way to check if a Python string contains a substring is to use the in operator. The in operator is used to check data structures for membership in Python. It returns a Boolean (either True or False) and can be used as follows:

fullstring = "StackAbuse" substring = "tack" if substring in fullstring: print "Found!" else: print "Not found!"

This operator is shorthand for calling an object's __contains__ method, and also works well for checking if an item exists in a list.

2) The String.index() Method

The String type in Python has a method called index that can be used to find the starting index of the first occurrence of a substring in a string. If the substring is not found, a ValueError exception is thrown, which can to be handled with a try-except-else block:

fullstring = "StackAbuse" substring = "tack" try: fullstring.index(substring) except ValueError: print "Not found!" else: print "Found!"

This method is useful if you need to know the position of the substring, as opposed to just its existence within the full string.

3) The String.find() Method

The String type has another method called find which is more convenient to use than index, because we don't need to worry about handling any exceptions. If find doesn't find a match, it returns -1, otherwise it returns the left-most index of the substring in the larger string.

fullstring = "StackAbuse" substring = "tack" if fullstring.find(substring) != -1: print "Found!" else: print "Not found!"

If you'd prefer to avoid the need to catch errors, then this method should be favored over index.

4) Regular Expressions (REGEX)

Regular expressions provide a more flexible (albeit more complex) way to check strings for pattern matching. Python is shipped with a built-in module for regular expressions, called re. The re module contains a function called search, which we can use to match a substring pattern as follows:

from re import search fullstring = "StackAbuse" substring = "tack" if search(substring, fullstring): print "Found!" else: print "Not found!"

This method is best if you are needing a more complex matching function, like case insensitive matching. Otherwise the complication and slower speed of regex should be avoided for simple substring matching use-cases.

About the Author

This article was written by Jacob Stopak, a software consultant and developer with passion for helping others improve their lives through code. Jacob is the creator of Initial Commit - a site dedicated to helping curious developers learn how their favorite programs are coded. Its featured project helps people learn Git at the code level.

Categories: FLOSS Project Planets

Made With Mu: Kushal’s Colourful Adafruit Adventures

Wed, 2019-05-22 11:45

Friend of Mu, community hero, Tor core team member, Python core developer and programmer extraordinaire Kushal Das, has blogged about the fun he’s been having with Adafruit’s Circuit Playground Express board, CircuitPython and Mu.

A fortnight ago was PyCon, the world’s largest gathering of Python developers in the world (around 3.5 thousand of us arrived in Cleveland for more than a week of Python related tutorials, presentations, keynotes, open spaces, code sprints and social events). Thanks to the enormous generosity of Adafruit and DigiKey every attendee found a Circuit Playground Express in their swag bag!

Thank you to everyone who helped make this happen!

At the time I was sorely tempted to blog about this amazing gift, but decided to wait until I found evidence of any friend of mine actually using the board. I figured that if a friend had the unprompted motivation to blog or share their experience with the device, then it would be evidence that it was a worthy sponsorship. After all, for every blogger there will be a large number of folks who simply don’t share what they’ve been up to.

Upon his return to India, Kushal (pictured above, sporting a very “Morpheus” look) had lots of fun and, happily, his experiments resulted in a positive educational outcome: now his young daughter (Py) wants to have a go too..!

Kushal explains his experiment like this:

The goal is to guess a color for the next NeoPixel on the board and then press Button A to see if you guessed right or not. Py and I are continuously playing this for the last weeks.

The idea of CircuitPython, where we can connect the device to a computer and start editing code and see the changes live, is super fantastic and straightforward. It takes almost no time to start working on these, the documentation is also unambiguous and with many examples.

You can see the his code for the game in the screenshot below:

I hope you agree, such sponsorship and the ease of CircuitPython is (literally) lighting up the lives a folks all over the world and inspiring the coders of the future.

Categories: FLOSS Project Planets

Nathan Piccini Data Science Dojo Blog: Enhance your AI superpowers with Geospatial Visualization

Wed, 2019-05-22 10:00

There is so much to explore when it comes to spatial visualization using Python's Folium library. For problems related to crime mapping, housing prices or travel route optimization, spatial visualization could be the most resourceful tool in getting a glimpse of how the instances are geographically located. This is beneficial as we are getting massive amounts of data from several sources such as cellphones, smartwatches, trackers, etc. In this case, patterns and correlations, which otherwise might go unrecognized, can be extracted visually.

This blog will attempt to show you the potential of spatial visualization using the Folium library with Python. This tutorial will give you insights into the most important visualization tools that are extremely useful while analyzing spatial data.

Inroduction to Folium

Folium is an incredible library that allows you to build Leaflet maps. Using latitude and longitude points, Folium can allow you to create a map of any location in the world. Furthermore, Folium creates interactive maps that may allow you to zoom in and out after the map is rendered.

We’ll get some hands-on practice with building a few maps using the Seattle Real-time Fire 911 calls dataset. This dataset provides Seattle Fire Department 911 dispatches and every instance of this dataset provides information about the address, location, date/time and type of emergency of a particular incident. It's extensive and we’ll limit the dataset to a few emergency types for the purpose of explanation.

Let's Begin

Folium can be downloaded using the following commands.

Using pip:

$ pip install folium

Using conda:

$ conda install -c conda-forge folium

Start by importing the required libraries.

import pandas as pd import numpy as np import folium

Let us now create an object named 'seattle_map' which is defined as a folium.Map object. We can add other folium objects on top of the folium.Map to improve the map rendered. The map has been centered to the longitude and latitude points in the location parameters. The zoom parameter sets the magnification level for the map that's going to be rendered. Moreover, we have also set the tiles parameter to 'OpenStreetMap' which is the default tile for this parameter. You can explore more tiles such as StamenTerrain or Mapbox Control in Folium's documentation.

seattle_map = folium.Map( location = [47.6062, -122.3321], tiles = 'OpenStreetMap', zoom_start = 11 ) seattle_map

We can observe the map rendered above. Let's create another map object with a different tile and zoom_level. Through 'Stamen Terrain' tile, we can visualize the terrain data which can be used for several important applications.

We've also inserted a folium.Marker to our 'seattle_map2' map object below. The marker can be placed to any location specified in the square brackets. The string mentioned in the popup parameter will be displayed once the marker is clicked as shown below.

seattle_map2 = folium.Map( location=[47.6062, -122.3321], tiles = 'Stamen Terrain', zoom_start = 10 ) #inserting marker folium.Marker( [47.6740, -122.1215], popup = 'Redmond' ).add_to(seattle_map2) seattle_map2

We are interested to use the Seattle 911 calls dataset to visualize the 911 calls in the year 2019 only. We are also limiting the emergency types to 3 specific emergencies that took place during this time.

We will now import our dataset which is available through this link (in CSV format). The dataset is huge, therefore, we’ll only import the first 10,000 rows using pandas read_csv method. We'll use the head method to display the first 5 rows.

(This process will take some time because the data-set is huge. Alternatively, you can download it to your local machine and then insert the file path below)

path = "https://data.seattle.gov/api/views/kzjm-xkqj/rows.csv?accessType=DOWNLOAD" seattle911 = pd.read_csv(path, nrows = 10000) seattle911.head()

Using the code below, we'll convert the datatype of our Datetime variable to Date-time format and extract the year, removing all other instances that occurred before 2019.

seattle911['Datetime'] = pd.to_datetime(seattle911['Datetime'], format='%m/%d/%Y %H:%M', utc=True) seattle911['Year'] = pd.DatetimeIndex(seattle911['Datetime']).year seattle911 = seattle911[seattle911.Year == 2019]

We'll now limit the Emergency type to 'Aid Response Yellow', 'Auto Fire Alarm' and 'MVI - Motor Vehicle Incident'. The remaining instances will be removed from the 'seattle911' dataframe.

seattle911 = seattle911[seattle911.Type.isin(['Aid Response Yellow', 'Auto Fire Alarm', 'MVI - Motor Vehicle Incident'])]

We'll remove any instance that has a missing longitude or latitude coordinate. Without these values, the particular instance cannot be visualized and will cause an error while rendering.

#drop rows with missing latitude/longitude values seattle911.dropna(subset = ['Longitude', 'Latitude'], inplace = True) seattle911.head()

Now let's steep towards the most interesting part. We'll map all the instances onto the map object we created above, 'seattle_map'. Using the code below, we'll loop over all our instances up to the length of the dataframe. Following this, we will create a folium.CircleMarker (which is similar to the folium.Marker we added above). We'll assign the latitude and longitude coordinates to the location parameter for each instance. The radius of the circle has been assigned to 3, whereas the popup will display the address of the particular instance.

As you can notice, the color of the circle depends on the emergency type. We will now render our map.

for i in range(len(seattle911)): folium.CircleMarker( location = [seattle911.Latitude.iloc[i], seattle911.Longitude.iloc[i]], radius = 3, popup = seattle911.Address.iloc[i], color = '#3186cc' if seattle911.Type.iloc[i] == 'Aid Response Yellow' else '#6ccc31' if seattle911.Type.iloc[i] =='Auto Fire Alarm' else '#ac31cc', ).add_to(seattle_map) seattle_map

Voila! The map above gives us insights about where and what emergency took place across Seattle during 2019. This can be extremely helpful for the local government to more efficiently place its emergency combating resources.

Advanced Features Provided by Folium

Let us now move towards slightly advanced features provided by Folium. For this, we will use the National Obesity by State dataset which is also hosted on data.gov. There are 2 types of files we'll be using, a csv file containing the list of all states and the percentage of obesity in each state, and a geojson file (based on JSON) that contains geographical features in form of polygons.

Before using our dataset, we'll create a new folium.map object with location parameters including coordinates to center the US on the map, whereas, we've set the 'zoom_start' level to 4 to visualize all the states.

usa_map = folium.Map( location=[37.0902, -95.7129], tiles = 'Mapbox Bright', zoom_start = 4 ) usa_map

We will assign the URLs of our datasets to 'obesity_link' and 'state_boundaries' variables, respectively.

obesity_link = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.csv' state_boundaries = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.geojson'

We will use the 'state_boundaries' file to visualize the boundaries and areas covered by each state on our folium.Map object. This is an overlay on our original map and similarly, we can visualize multiple layers on the same map. This overlay will assist us in creating our choropleth map that is discussed ahead.

folium.GeoJson(state_boundaries).add_to(usa_map) usa_map

The 'obesity_data' dataframe can be viewed below. It contains 5 variables. However, for the purpose of this demonstration, we are only concerned with the 'NAME' and 'Obesity' attributes.

obesity_data = pd.read_csv(obesity_link) obesity_data.head() Choropleth Map

Now comes the most interesting part! Creating a choropleth map. We'll bind the 'obesity_data' data frame with our 'state_boundaries' geojson file. We have assigned both the data files to our variables 'data' and 'geo_data' respectively. The columns parameter indicates which DataFrame columns to use, whereas, the key_on parameter indicates the layer in the GeoJSON on which to key the data.

We have additionally specified several other parameters that will define the color scheme we're going to use. Colors are generated from Color Brewer's sequential palettes.

By default, linear binning is used between the min and the max of the values. Custom binning can be achieved with the bins parameter.

folium.Choropleth( geo_data = state_boundaries, name = 'choropleth', data = obesity_data, columns = ['NAME', 'Obesity'], key_on = 'feature.properties.NAME', fill_color = 'YlOrRd', fill_opacity = 0.9, line_opacity = 0.5, legend_name = 'Obesity Percentage' ).add_to(usa_map) folium.LayerControl().add_to(usa_map) usa_map

Awesome! We've been able to create a choropleth map using a simple set of functions offered by Folium. We can visualize the obesity pattern geographically and uncover patterns not visible before. It also helped us in gaining clarity about the data, more than just simplifying the data itself.

You might now feel powerful enough after attaining the skill to visualize spatial data effectively. Go ahead and explore Folium's documentation to discover the incredible capabilities that this open-source library has to offer.

Thanks for reading! If you want more datasets to play with, check out this blog post. It consists of 30 free datasets with questions for you to solve.

References:
Categories: FLOSS Project Planets

Real Python: Python Logging: A Stroll Through the Source Code

Wed, 2019-05-22 10:00

The Python logging package is a a lightweight but extensible package for keeping better track of what your own code does. Using it gives you much more flexibility than just littering your code with superfluous print() calls.

However, Python’s logging package can be complicated in certain spots. Handlers, loggers, levels, namespaces, filters: it’s not easy to keep track of all of these pieces and how they interact.

One way to tie up the loose ends in your understanding of logging is to peek under the hood to its CPython source code. The Python code behind logging is concise and modular, and reading through it can help you get that aha moment.

This article is meant to complement the logging HOWTO document as well as Logging in Python, which is a walkthrough on how to use the package.

By the end of this article, you’ll be familiar with the following:

  • logging levels and how they work
  • Thread-safety versus process-safety in logging
  • The design of logging from an OOP perspective
  • Logging in libraries vs applications
  • Best practices and design patterns for using logging

For the most part, we’ll go line-by-line down the core module in Python’s logging package in order to build a picture of how it’s laid out.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

How to Follow Along

Because the logging source code is central to this article, you can assume that any code block or link is based on a specific commit in the Python 3.7 CPython repository, namely commit d730719. You can find the logging package itself in the Lib/ directory within the CPython source.

Within the logging package, most of the heavy lifting occurs within logging/__init__.py, which is the file you’ll spend the most time on here:

cpython/ │ ├── Lib/ │ ├── logging/ │ │ ├── __init__.py │ │ ├── config.py │ │ └── handlers.py │ ├── ... ├── Modules/ ├── Include/ ... ... [truncated]

With that, let’s jump in.

Preliminaries

Before we get to the heavyweight classes, the top hundred lines or so of __init__.py introduce a few subtle but important concepts.

Preliminary #1: A Level Is Just an int!

Objects like logging.INFO or logging.DEBUG can seem a bit opaque. What are these variables internally, and how are they defined?

In fact, the uppercase constants from Python’s logging are just integers, forming an enum-like collection of numerical levels:

CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING INFO = 20 DEBUG = 10 NOTSET = 0

Why not just use the strings "INFO" or "DEBUG"? Levels are int constants to allow for the simple, unambiguous comparison of one level with another. They are given names as well to lend them semantic meaning. Saying that a message has a severity of 50 may not be immediately clear, but saying that it has a level of CRITICAL lets you know that you’ve got a flashing red light somewhere in your program.

Now, technically, you can pass just the str form of a level in some places, such as logger.setLevel("DEBUG"). Internally, this will call _checkLevel(), which ultimately does a dict lookup for the corresponding int:

_nameToLevel = { 'CRITICAL': CRITICAL, 'FATAL': FATAL, 'ERROR': ERROR, 'WARN': WARNING, 'WARNING': WARNING, 'INFO': INFO, 'DEBUG': DEBUG, 'NOTSET': NOTSET, } def _checkLevel(level): if isinstance(level, int): rv = level elif str(level) == level: if level not in _nameToLevel: raise ValueError("Unknown level: %r" % level) rv = _nameToLevel[level] else: raise TypeError("Level not an integer or a valid string: %r" % level) return rv

Which should you prefer? I’m not too opinionated on this, but it’s notable that the logging docs consistently use the form logging.DEBUG rather than "DEBUG" or 10. Also, passing the str form isn’t an option in Python 2, and some logging methods such as logger.isEnabledFor() will accept only an int, not its str cousin.

Preliminary #2: Logging Is Thread-Safe, but Not Process-Safe

A few lines down, you’ll find the following short code block, which is sneakily critical to the whole package:

import threading _lock = threading.RLock() def _acquireLock(): if _lock: _lock.acquire() def _releaseLock(): if _lock: _lock.release()

The _lock object is a reentrant lock that sits in the global namespace of the logging/__init__.py module. It makes pretty much every object and operation in the entire logging package thread-safe, enabling threads to do read and write operations without the threat of a race condition. You can see in the module source code that _acquireLock() and _releaseLock() are ubiquitous to the module and its classes.

There’s something not accounted for here, though: what about process safety? The short answer is that the logging module is not process safe. This isn’t inherently a fault of logging—generally, two processes can’t write to same file without a lot of proactive effort on behalf of the programmer first.

This means that you’ll want to be careful before using classes such as a logging.FileHandler with multiprocessing involved. If two processes want to read from and write to the same underlying file concurrently, then you can run into a nasty bug halfway through a long-running routine.

If you want to get around this limitation, there’s a thorough recipe in the official Logging Cookbook. Because this entails a decent amount of setup, one alternative is to have each process log to a separate file based on its process ID, which you can grab with os.getpid().

Package Architecture: Logging’s MRO

Now that we’ve covered some preliminary setup code, let’s take a high-level look at how logging is laid out. The logging package uses a healthy dose of OOP and inheritance. Here’s a partial look at the method resolution order (MRO) for some of the most important classes in the package:

object │ ├── LogRecord ├── Filterer │ ├── Logger │ │ └── RootLogger │ └── Handler │ ├── StreamHandler │ └── NullHandler ├── Filter └── Manager

The tree diagram above doesn’t cover all of the classes in the module, just those that are most worth highlighting.

Note: You can use the dunder attribute logging.StreamHandler.__mro__ to see the chain of inheritance. A definitive guide to the MRO can be found in the Python 2 docs, though it is applicable to Python 3 as well.

This litany of classes is typically one source of confusion because there’s a lot going on, and it’s all jargon-heavy. Filter versus Filterer? Logger versus Handler? It can be challenging to keep track of everything, much less visualize how it fits together. A picture is worth a thousand words, so here’s a diagram of a scenario where one logger with two handlers attached to it writes a log message with level logging.INFO:

Flow of logging objects (Image: Real Python)

In Python code, everything above would look like this:

import logging import sys logger = logging.getLogger("pylog") logger.setLevel(logging.DEBUG) h1 = logging.FileHandler(filename="/tmp/records.log") h1.setLevel(logging.INFO) h2 = logging.StreamHandler(sys.stderr) h2.setLevel(logging.ERROR) logger.addHandler(h1) logger.addHandler(h2) logger.info("testing %d.. %d.. %d..", 1, 2, 3)

There’s a more detailed map of this flow in the Logging HOWTO. What’s shown above is a simplified scenario.

Your code defines just one Logger instance, logger, along with two Handler instances, h1 and h2.

When you call logger.info("testing %d.. %d.. %d..", 1, 2, 3), the logger object serves as a filter because it also has a level associated with it. Only if the message level is severe enough will the logger do anything with the message. Because the logger has level DEBUG, and the message carries a higher INFO level, it gets the go-ahead to move on.

Internally, logger calls logger.makeRecord() to put the message string "testing %d.. %d.. %d.." and its arguments (1, 2, 3) into a bona fide class instance of a LogRecord, which is just a container for the message and its metadata.

The logger object looks around for its handlers (instances of Handler), which may be tied directly to logger itself or to its parents (a concept that we’ll touch on later). In this example, it finds two handlers:

  1. One with level INFO that dumps log data to a file at /tmp/records.log
  2. One that writes to sys.stderr but only if the incoming message is at level ERROR or higher

At this point, there’s another round of tests that kicks in. Because the LogRecord and its message only carry level INFO, the record gets written to Handler 1 (green arrow), but not to Handler 2’s stderr stream (red arrow). For Handlers, writing the LogRecord to their stream is called emitting it, which is captured in their .emit().

Next, let’s further dissect everything from above.

The LogRecord Class

What is a LogRecord? When you log a message, an instance of the LogRecord class is the object you send to be logged. It’s created for you by a Logger instance and encapsulates all the pertinent info about that event. Internally, it’s little more than a wrapper around a dict that contains attributes for the record. A Logger instance sends a LogRecord instance to zero or more Handler instances.

The LogRecord contains some metadata, such as the following:

  1. A name
  2. The creation time as a Unix timestamp
  3. The message itself
  4. Information on what function made the logging call

Here’s a peek into the metadata that it carries with it, which you can introspect by stepping through a logging.error() call with the pdb module:

>>>>>> import logging >>> import pdb >>> def f(x): ... logging.error("bad vibes") ... return x / 0 ... >>> pdb.run("f(1)")

After stepping through some higher-level functions, you end up at line 1517:

(Pdb) l 1514 exc_info = (type(exc_info), exc_info, exc_info.__traceback__) 1515 elif not isinstance(exc_info, tuple): 1516 exc_info = sys.exc_info() 1517 record = self.makeRecord(self.name, level, fn, lno, msg, args, 1518 exc_info, func, extra, sinfo) 1519 -> self.handle(record) 1520 1521 def handle(self, record): 1522 """ 1523 Call the handlers for the specified record. 1524 (Pdb) from pprint import pprint (Pdb) pprint(vars(record)) {'args': (), 'created': 1550671851.660067, 'exc_info': None, 'exc_text': None, 'filename': '<stdin>', 'funcName': 'f', 'levelname': 'ERROR', 'levelno': 40, 'lineno': 2, 'module': '<stdin>', 'msecs': 660.067081451416, 'msg': 'bad vibes', 'name': 'root', 'pathname': '<stdin>', 'process': 2360, 'processName': 'MainProcess', 'relativeCreated': 295145.5490589142, 'stack_info': None, 'thread': 4372293056, 'threadName': 'MainThread'}

A LogRecord, internally, contains a trove of metadata that’s used in one way or another.

You’ll rarely need to deal with a LogRecord directly, since the Logger and Handler do this for you. It’s still worthwhile to know what information is wrapped up in a LogRecord, because this is where all that useful info, like the timestamp, come from when you see record log messages.

Note: Below the LogRecord class, you’ll also find the setLogRecordFactory(), getLogRecordFactory(), and makeLogRecord() factory functions. You won’t need these unless you want to use a custom class instead of LogRecord to encapsulate log messages and their metadata.

The Logger and Handler Classes

The Logger and Handler classes are both central to how logging works, and they interact with each other frequently. A Logger, a Handler, and a LogRecord each have a .level associated with them.

The Logger takes the LogRecord and passes it off to the Handler, but only if the effective level of the LogRecord is equal to or higher than that of the Logger. The same goes for the LogRecord versus Handler test. This is called level-based filtering, which Logger and Handler implement in slightly different ways.

In other words, there is an (at least) two-step test applied before the message that you log gets to go anywhere. In order to be fully passed from a logger to handler and then logged to the end stream (which could be sys.stdout, a file, or an email via SMTP), a LogRecord must have a level at least as high as both the logger and handler.

PEP 282 describes how this works:

Each Logger object keeps track of a log level (or threshold) that it is interested in, and discards log requests below that level. (Source)

So where does this level-based filtering actually occur for both Logger and Handler?

For the Logger class, it’s a reasonable first assumption that the logger would compare its .level attribute to the level of the LogRecord, and be done there. However, it’s slightly more involved than that.

Level-based filtering for loggers occurs in .isEnabledFor(), which in turn calls .getEffectiveLevel(). Always use logger.getEffectiveLevel() rather than just consulting logger.level. The reason has to do with the organization of Logger objects in a hierarchical namespace. (You’ll see more on this later.)

By default, a Logger instance has a level of 0 (NOTSET). However, loggers also have parent loggers, one of which is the root logger, which functions as the parent of all other loggers. A Logger will walk upwards in its hierarchy and get its effective level vis-à-vis its parent (which ultimately may be root if no other parents are found).

Here’s where this happens in the Logger class:

class Logger(Filterer): # ... def getEffectiveLevel(self): logger = self while logger: if logger.level: return logger.level logger = logger.parent return NOTSET def isEnabledFor(self, level): try: return self._cache[level] except KeyError: _acquireLock() if self.manager.disable >= level: is_enabled = self._cache[level] = False else: is_enabled = self._cache[level] = level >= self.getEffectiveLevel() _releaseLock() return is_enabled

Correspondingly, here’s an example that calls the source code you see above:

>>>>>> import logging >>> logger = logging.getLogger("app") >>> logger.level # No! 0 >>> logger.getEffectiveLevel() 30 >>> logger.parent <RootLogger root (WARNING)> >>> logger.parent.level 30

Here’s the takeaway: don’t rely on .level. If you haven’t explicitly set a level on your logger object, and you’re depending on .level for some reason, then your logging setup will likely behave differently than you expected it to.

What about Handler? For handlers, the level-to-level comparison is simpler, though it actually happens in .callHandlers() from the Logger class:

class Logger(Filterer): # ... def callHandlers(self, record): c = self found = 0 while c: for hdlr in c.handlers: found = found + 1 if record.levelno >= hdlr.level: hdlr.handle(record)

For a given LogRecord instance (named record in the source code above), a logger checks with each of its registered handlers and does a quick check on the .level attribute of that Handler instance. If the .levelno of the LogRecord is greater than or equal to that of the handler, only then does the record get passed on. A docstring in logging refers to this as “conditionally emit[ting] the specified logging record.”

Handlers and the Places They Go Show/Hide

The most important attribute for a Handler subclass instance is its .stream attribute. This is the final destination where logs get written to and can be pretty much any file-like object. Here’s an example with io.StringIO, which is an in-memory stream (buffer) for text I/O.

First, set up a Logger instance with a level of DEBUG. You’ll see that, by default, it has no direct handlers:

>>>>>> import io >>> import logging >>> logger = logging.getLogger("abc") >>> logger.setLevel(logging.DEBUG) >>> print(logger.handlers) []

Next, you can subclass logging.StreamHandler to make the .flush() call a no-op. We would want to flush sys.stderr or sys.stdout, but not the in-memory buffer in this case:

class IOHandler(logging.StreamHandler): def flush(self): pass # No-op

Now, declare the buffer object itself and tie it in as the .stream for your custom handler with a level of INFO, and then tie that handler into the logger:

>>>>>> stream = io.StringIO() >>> h = IOHandler(stream) >>> h.setLevel(logging.INFO) >>> logger.addHandler(h) >>> logger.debug("extraneous info") >>> logger.warning("you've been warned") >>> logger.critical("SOS") >>> try: ... print(stream.getvalue()) ... finally: ... stream.close() ... you've been warned SOS

This last chunk is another illustration of level-based filtering.

Three messages with levels DEBUG, WARNING, and CRITICAL are passed through the chain. At first, it may look as if they don’t go anywhere, but two of them do. All three of them make it out of the gates from logger (which has level DEBUG).

However, only two of them get emitted by the handler because it has a higher level of INFO, which exceeds DEBUG. Finally, you get the entire contents of the buffer as a str and close the buffer to explicitly free up system resources.

The Filter and Filterer Classes

Above, we asked the question, “Where does level-based filtering happen?” In answering this question, it’s easy to get distracted by the Filter and Filterer classes. Paradoxically, level-based filtering for Logger and Handler instances occurs without the help of either of the Filter or Filterer classes.

Filter and Filterer are designed to let you add additional function-based filters on top of the level-based filtering that is done by default. I like to think of it as à la carte filtering.

Filterer is the base class for Logger and Handler because both of these classes are eligible for receiving additional custom filters that you specify. You add instances of Filter to them with logger.addFilter() or handler.addFilter(), which is what self.filters refers to in the following method:

class Filterer(object): # ... def filter(self, record): rv = True for f in self.filters: if hasattr(f, 'filter'): result = f.filter(record) else: result = f(record) if not result: rv = False break return rv

Given a record (which is a LogRecord instance), .filter() returns True or False depending on whether that record gets the okay from this class’s filters.

Here is .handle() in turn, for the Logger and Handler classes:

class Logger(Filterer): # ... def handle(self, record): if (not self.disabled) and self.filter(record): self.callHandlers(record) # ... class Handler(Filterer): # ... def handle(self, record): rv = self.filter(record) if rv: self.acquire() try: self.emit(record) finally: self.release() return rv

Neither Logger nor Handler come with any additional filters by default, but here’s a quick example of how you could add one:

>>>>>> import logging >>> logger = logging.getLogger("rp") >>> logger.setLevel(logging.INFO) >>> logger.addHandler(logging.StreamHandler()) >>> logger.filters # Initially empty [] >>> class ShortMsgFilter(logging.Filter): ... """Only allow records that contain long messages (> 25 chars).""" ... def filter(self, record): ... msg = record.msg ... if isinstance(msg, str): ... return len(msg) > 25 ... return False ... >>> logger.addFilter(ShortMsgFilter()) >>> logger.filters [<__main__.ShortMsgFilter object at 0x10c28b208>] >>> logger.info("Reeeeaaaaallllllly long message") # Length: 31 Reeeeaaaaallllllly long message >>> logger.info("Done") # Length: <25, no output

Above, you define a class ShortMsgFilter and override its .filter(). In .addHandler(), you could also just pass a callable, such as a function or lambda or a class that defines .__call__().

The Manager Class

There’s one more behind-the-scenes actor of logging that is worth touching on: the Manager class. What matters most is not the Manager class but a single instance of it that acts as a container for the growing hierarchy of loggers that are defined across packages. You’ll see in the next section how just a single instance of this class is central to gluing the module together and allowing its parts to talk to each other.

The All-Important Root Logger

When it comes to Logger instances, one stands out. It’s called the root logger:

class RootLogger(Logger): def __init__(self, level): Logger.__init__(self, "root", level) # ... root = RootLogger(WARNING) Logger.root = root Logger.manager = Manager(Logger.root)

The last three lines of this code block are one of the ingenious tricks employed by the logging package. Here are a few points:

  • The root logger is just a no-frills Python object with the identifier root. It has a level of logging.WARNING and a .name of "root". As far as the class RootLogger is concerned, this unique name is all that’s special about it.

  • The root object in turn becomes a class attribute for the Logger class. This means that all instances of Logger, and the Logger class itself, all have a .root attribute that is the root logger. This is another example of a singleton-like pattern being enforced in the logging package.

  • A Manager instance is set as the .manager class attribute for Logger. This eventually comes into play in logging.getLogger("name"). The .manager does all the facilitation of searching for existing loggers with the name "name" and creating them if they don’t exist.

The Logger Hierarchy

Everything is a child of root in the logger namespace, and I mean everything. That includes loggers that you specify yourself as well as those from third-party libraries that you import.

Remember earlier how the .getEffectiveLevel() for our logger instances was 30 (WARNING) even though we had not explicitly set it? That’s because the root logger sits at the top of the hierarchy, and its level is a fallback if any nested loggers have a null level of NOTSET:

>>>>>> root = logging.getLogger() # Or getLogger("") >>> root <RootLogger root (WARNING)> >>> root.parent is None True >>> root.root is root # Self-referential True >>> root is logging.root True >>> root.getEffectiveLevel() 30

The same logic applies to the search for a logger’s handlers. The search is effectively a reverse-order search up the tree of a logger’s parents.

A Multi-Handler Design

The logger hierarchy may seem neat in theory, but how beneficial is it in practice?

Let’s take a break from exploring the logging code and foray into writing our own mini-application—one that takes advantage of the logger hierarchy in a way that reduces boilerplate code and keeps things scalable if the project’s codebase grows.

Here’s the project structure:

project/ │ └── project/ ├── __init__.py ├── utils.py └── base.py

Don’t worry about the application’s main functions in utils.py and base.py. What we’re paying more attention to here is the interaction in logging objects between the modules in project/.

In this case, say that you want to design a multipronged logging setup:

  • Each module gets a logger with multiple handlers.

  • Some of the handlers are shared between different logger instances in different modules. These handlers only care about level-based filtering, not the module where the log record emanated from. There is a handler for DEBUG messages, one for INFO, one for WARNING, and so on.

  • Each logger is also tied to one more additional handler that only receives LogRecord instances from that lone logger. You can call this a module-based file handler.

Visually, what we’re shooting for would look something like this:

A multipronged logging design (Image: Real Python)

The two turquoise objects are instances of Logger, established with logging.getLogger(__name__) for each module in a package. Everything else is a Handler instance.

The thinking behind this design is that it’s neatly compartmentalized. You can conveniently look at messages coming from a single logger, or look at messages of a certain level and above coming from any logger or module.

The properties of the logger hierarchy make it suitable for setting up this multipronged logger-handler layout. What does that mean? Here’s a concise explanation from the Django documentation:

Why is the hierarchy important? Well, because loggers can be set to propagate their logging calls to their parents. In this way, you can define a single set of handlers at the root of a logger tree, and capture all logging calls in the subtree of loggers. A logging handler defined in the project namespace will catch all logging messages issued on the project.interesting and project.interesting.stuff loggers. (Source)

The term propagate refers to how a logger keeps walking up its chain of parents looking for handlers. The .propagate attribute is True for a Logger instance by default:

>>>>>> logger = logging.getLogger(__name__) >>> logger.propagate True

In .callHandlers(), if propagate is True, each successive parent gets reassigned to the local variable c until the hierarchy is exhausted:

class Logger(Filterer): # ... def callHandlers(self, record): c = self found = 0 while c: for hdlr in c.handlers: found = found + 1 if record.levelno >= hdlr.level: hdlr.handle(record) if not c.propagate: c = None else: c = c.parent

Here’s what this means: because the __name__ dunder variable within a package’s __init__.py module is just the name of the package, a logger there becomes a parent to any loggers present in other modules in the same package.

Here are the resulting .name attributes from assigning to logger with logging.getLogger(__name__):

Module .name Attribute project/__init__.py 'project' project/utils.py 'project.utils' project/base.py 'project.base'

Because the 'project.utils' and 'project.base' loggers are children of 'project', they will latch onto not only their own direct handlers but whatever handlers are attached to 'project'.

Let’s build out the modules. First comes __init__.py:

# __init__.py import logging logger = logging.getLogger(__name__) logger.setLevel(logging.DEBUG) levels = ("DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL") for level in levels: handler = logging.FileHandler(f"/tmp/level-{level.lower()}.log") handler.setLevel(getattr(logging, level)) logger.addHandler(handler) def add_module_handler(logger, level=logging.DEBUG): handler = logging.FileHandler( f"/tmp/module-{logger.name.replace('.', '-')}.log" ) handler.setLevel(level) logger.addHandler(handler)

This module is imported when the project package is imported. You add a handler for each level in DEBUG through CRITICAL, then attach it to a single logger at the top of the hierarchy.

You also define a utility function that adds one more FileHandler to a logger, where the filename of the handler corresponds to the module name where the logger is defined. (This assumes the logger is defined with __name__.)

You can then add some minimal boilerplate logger setup in base.py and utils.py. Notice that you only need to add one additional handler with add_module_handler() from __init__.py. You don’t need to worry about the level-oriented handlers because they are already added to their parent logger named 'project':

# base.py import logging from project import add_module_handler logger = logging.getLogger(__name__) add_module_handler(logger) def func1(): logger.debug("debug called from base.func1()") logger.critical("critical called from base.func1()")

Here’s utils.py:

# utils.py import logging from project import add_module_handler logger = logging.getLogger(__name__) add_module_handler(logger) def func2(): logger.debug("debug called from utils.func2()") logger.critical("critical called from utils.func2()")

Let’s see how all of this works together from a fresh Python session:

>>>>>> from pprint import pprint >>> import project >>> from project import base, utils >>> project.logger <Logger project (DEBUG)> >>> base.logger, utils.logger (<Logger project.base (DEBUG)>, <Logger project.utils (DEBUG)>) >>> base.logger.handlers [<FileHandler /tmp/module-project-base.log (DEBUG)>] >>> pprint(base.logger.parent.handlers) [<FileHandler /tmp/level-debug.log (DEBUG)>, <FileHandler /tmp/level-info.log (INFO)>, <FileHandler /tmp/level-warning.log (WARNING)>, <FileHandler /tmp/level-error.log (ERROR)>, <FileHandler /tmp/level-critical.log (CRITICAL)>] >>> base.func1() >>> utils.func2()

You’ll see in the resulting log files that our filtration system works as intended. Module-oriented handlers direct one logger to a specific file, while level-oriented handlers direct multiple loggers to a different file:

$ cat /tmp/level-debug.log debug called from base.func1() critical called from base.func1() debug called from utils.func2() critical called from utils.func2() $ cat /tmp/level-critical.log critical called from base.func1() critical called from utils.func2() $ cat /tmp/module-project-base.log debug called from base.func1() critical called from base.func1() $ cat /tmp/module-project-utils.log debug called from utils.func2() critical called from utils.func2()

A drawback worth mentioning is that this design introduces a lot of redundancy. One LogRecord instance may go to no less than six files. That’s also a non-negligible amount of file I/O that may add up in a performance-critical application.

Now that you’ve seen a practical example, let’s switch gears and delve into a possible source of confusion in logging.

The “Why Didn’t My Log Message Go Anywhere?” Dilemma

There are two common situations with logging when it’s easy to get tripped up:

  1. You logged a message that seemingly went nowhere, and you’re not sure why.
  2. Instead of being suppressed, a log message appeared in a place that you didn’t expect it to.

Each of these has a reason or two commonly associated with it.

You logged a message that seemingly went nowhere, and you’re not sure why.

Don’t forget that the effective level of a logger for which you don’t otherwise set a custom level is WARNING, because a logger will walk up its hierarchy until it finds the root logger with its own WARNING level:

>>>>>> import logging >>> logger = logging.getLogger("xyz") >>> logger.debug("mind numbing info here") >>> logger.critical("storm is coming") storm is coming

Because of this default, the .debug() call goes nowhere.

Instead of being suppressed, a log message appeared in a place that you didn’t expect it to.

When you defined your logger above, you didn’t add any handlers to it. So, why is it writing to the console?

The reason for this is that logging sneakily uses a handler called lastResort that writes to sys.stderr if no other handlers are found:

class _StderrHandler(StreamHandler): # ... @property def stream(self): return sys.stderr _defaultLastResort = _StderrHandler(WARNING) lastResort = _defaultLastResort

This kicks in when a logger goes to find its handlers:

class Logger(Filterer): # ... def callHandlers(self, record): c = self found = 0 while c: for hdlr in c.handlers: found = found + 1 if record.levelno >= hdlr.level: hdlr.handle(record) if not c.propagate: c = None else: c = c.parent if (found == 0): if lastResort: if record.levelno >= lastResort.level: lastResort.handle(record)

If the logger gives up on its search for handlers (both its own direct handlers and attributes of parent loggers), then it picks up the lastResort handler and uses that.

There’s one more subtle detail worth knowing about. This section has largely talked about the instance methods (methods that a class defines) rather than the module-level functions of the logging package that carry the same name.

If you use the functions, such as logging.info() rather than logger.info(), then something slightly different happens internally. The function calls logging.basicConfig(), which adds a StreamHandler that writes to sys.stderr. In the end, the behavior is virtually the same:

>>>>>> import logging >>> root = logging.getLogger("") >>> root.handlers [] >>> root.hasHandlers() False >>> logging.basicConfig() >>> root.handlers [<StreamHandler <stderr> (NOTSET)>] >>> root.hasHandlers() True Taking Advantage of Lazy Formatting

It’s time to switch gears and take a closer look at how messages themselves are joined with their data. While it’s been supplanted by str.format() and f-strings, you’ve probably used Python’s percent-style formatting to do something like this:

>>>>>> print("To iterate is %s, to recurse %s" % ("human", "divine")) To iterate is human, to recurse divine

As a result, you may be tempted to do the same thing in a logging call:

>>>>>> # Bad! Check out a more efficient alternative below. >>> logging.warning("To iterate is %s, to recurse %s" % ("human", "divine")) WARNING:root:To iterate is human, to recurse divine

This uses the entire format string and its arguments as the msg argument to logging.warning().

Here is the recommended alternative, straight from the logging docs:

>>>>>> # Better: formatting doesn't occur until it really needs to. >>> logging.warning("To iterate is %s, to recurse %s", "human", "divine") WARNING:root:To iterate is human, to recurse divine

It looks a little weird, right? This seems to defy the conventions of how percent-style string formatting works, but it’s a more efficient function call because the format string gets formatted lazily rather than greedily. Here’s what that means.

The method signature for Logger.warning() looks like this:

def warning(self, msg, *args, **kwargs)

The same applies to the other methods, such as .debug(). When you call warning("To iterate is %s, to recurse %s", "human", "divine"), both "human" and "divine" get caught as *args and, within the scope of the method’s body, args is equal to ("human", "divine").

Contrast this to the first call above:

logging.warning("To iterate is %s, to recurse %s" % ("human", "divine"))

In this form, everything in the parentheses gets immediately merged together into "To iterate is human, to recurse divine" and passed as msg, while args is an empty tuple.

Why does this matter? Repeated logging calls can degrade runtime performance slightly, but the logging package does its very best to control that and keep it in check. By not merging the format string with its arguments right away, logging is delaying the string formatting until the LogRecord is requested by a Handler.

This happens in LogRecord.getMessage(), so only after logging deems that the LogRecord will actually be passed to a handler does it become its fully merged self.

All that is to say that the logging package makes some very fine-tuned performance optimizations in the right places. This may seem like minutia, but if you’re making the same logging.debug() call a million times inside a loop, and the args are function calls, then the lazy nature of how logging does string formatting can make a difference.

Before doing any merging of msg and args, a Logger instance will check its .isEnabledFor() to see if that merging should be done in the first place.

Functions vs Methods

Towards the bottom of logging/__init__.py sit the module-level functions that are advertised up front in the public API of logging. You already saw the Logger methods such as .debug(), .info(), and .warning(). The top-level functions are wrappers around the corresponding methods of the same name, but they have two important features:

  1. They always call their corresponding method from the root logger, root.

  2. Before calling the root logger methods, they call logging.basicConfig() with no arguments if root doesn’t have any handlers. As you saw earlier, it is this call that sets a sys.stdout handler for the root logger.

For illustration, here’s logging.error():

def error(msg, *args, **kwargs): if len(root.handlers) == 0: basicConfig() root.error(msg, *args, **kwargs)

You’ll find the same pattern for logging.debug(), logging.info(), and the others as well. Tracing the chain of commands is interesting. Eventually, you’ll end up at the same place, which is where the internal Logger._log() is called.

The calls to debug(), info(), warning(), and the other level-based functions all route to here. _log() primarily has two purposes:

  1. Call self.makeRecord(): Make a LogRecord instance from the msg and other arguments you pass to it.

  2. Call self.handle(): This determines what actually gets done with the record. Where does it get sent? Does it make it there or get filtered out?

Here’s that entire process in one diagram:

Internals of a logging call (Image: Real Python)

You can also trace the call stack with pdb.

Tracing the Call to logging.warning() Show/Hide

>>>>>> import logging >>> import pdb >>> pdb.run('logging.warning("%s-%s", "uh", "oh")') > <string>(1)<module>() (Pdb) s --Call-- > lib/python3.7/logging/__init__.py(1971)warning() -> def warning(msg, *args, **kwargs): (Pdb) s > lib/python3.7/logging/__init__.py(1977)warning() -> if len(root.handlers) == 0: (Pdb) unt > lib/python3.7/logging/__init__.py(1978)warning() -> basicConfig() (Pdb) unt > lib/python3.7/logging/__init__.py(1979)warning() -> root.warning(msg, *args, **kwargs) (Pdb) s --Call-- > lib/python3.7/logging/__init__.py(1385)warning() -> def warning(self, msg, *args, **kwargs): (Pdb) l 1380 logger.info("Houston, we have a %s", "interesting problem", exc_info=1) 1381 """ 1382 if self.isEnabledFor(INFO): 1383 self._log(INFO, msg, args, **kwargs) 1384 1385 -> def warning(self, msg, *args, **kwargs): 1386 """ 1387 Log 'msg % args' with severity 'WARNING'. 1388 1389 To pass exception information, use the keyword argument exc_info with 1390 a true value, e.g. (Pdb) s > lib/python3.7/logging/__init__.py(1394)warning() -> if self.isEnabledFor(WARNING): (Pdb) unt > lib/python3.7/logging/__init__.py(1395)warning() -> self._log(WARNING, msg, args, **kwargs) (Pdb) s --Call-- > lib/python3.7/logging/__init__.py(1496)_log() -> def _log(self, level, msg, args, exc_info=None, extra=None, stack_info=False): (Pdb) s > lib/python3.7/logging/__init__.py(1501)_log() -> sinfo = None (Pdb) unt 1517 > lib/python3.7/logging/__init__.py(1517)_log() -> record = self.makeRecord(self.name, level, fn, lno, msg, args, (Pdb) s > lib/python3.7/logging/__init__.py(1518)_log() -> exc_info, func, extra, sinfo) (Pdb) s --Call-- > lib/python3.7/logging/__init__.py(1481)makeRecord() -> def makeRecord(self, name, level, fn, lno, msg, args, exc_info, (Pdb) p name 'root' (Pdb) p level 30 (Pdb) p msg '%s-%s' (Pdb) p args ('uh', 'oh') (Pdb) up > lib/python3.7/logging/__init__.py(1518)_log() -> exc_info, func, extra, sinfo) (Pdb) unt > lib/python3.7/logging/__init__.py(1519)_log() -> self.handle(record) (Pdb) n WARNING:root:uh-oh What Does getLogger() Really Do?

Also hiding in this section of the source code is the top-level getLogger(), which wraps Logger.manager.getLogger():

def getLogger(name=None): if name: return Logger.manager.getLogger(name) else: return root

This is the entry point for enforcing the singleton logger design:

  • If you specify a name, then the underlying .getLogger() does a dict lookup on the string name. What this comes down to is a lookup in the loggerDict of logging.Manager. This is a dictionary of all registered loggers, including the intermediate PlaceHolder instances that are generated when you reference a logger far down in the hierarchy before referencing its parents.

  • Otherwise, root is returned. There is only one root—the instance of RootLogger discussed above.

This feature is what lies behind a trick that can let you peek into all of the registered loggers:

>>>>>> import logging >>> logging.Logger.manager.loggerDict {} >>> from pprint import pprint >>> import asyncio >>> pprint(logging.Logger.manager.loggerDict) {'asyncio': <Logger asyncio (WARNING)>, 'concurrent': <logging.PlaceHolder object at 0x10d153710>, 'concurrent.futures': <Logger concurrent.futures (WARNING)>}

Whoa, hold on a minute. What’s happening here? It looks like something changed internally to the logging package as a result of an import of another library, and that’s exactly what happened.

Firstly, recall that Logger.manager is a class attribute, where an instance of Manager is tacked onto the Logger class. The manager is designed to track and manage all of the singleton instances of Logger. These are housed in .loggerDict.

Now, when you initially import logging, this dictionary is empty. But after you import asyncio, the same dictionary gets populated with three loggers. This is an example of one module setting the attributes of another module in-place. Sure enough, inside of asyncio/log.py, you’ll find the following:

import logging logger = logging.getLogger(__package__) # "asyncio"

The key-value pair is set in Logger.getLogger() so that the manager can oversee the entire namespace of loggers. This means that the object asyncio.log.logger gets registered in the logger dictionary that belongs to the logging package. Something similar happens in the concurrent.futures package as well, which is imported by asyncio.

You can see the power of the singleton design in an equivalence test:

>>>>>> obj1 = logging.getLogger("asyncio") >>> obj2 = logging.Logger.manager.loggerDict["asyncio"] >>> obj1 is obj2 True

This comparison illustrates (glossing over a few details) what getLogger() ultimately does.

Library vs Application Logging: What Is NullHandler?

That brings us to the final hundred or so lines in the logging/__init__.py source, where NullHandler is defined. Here’s the definition in all its glory:

class NullHandler(Handler): def handle(self, record): pass def emit(self, record): pass def createLock(self): self.lock = None

The NullHandler is all about the distinctions between logging in a library versus an application. Let’s see what that means.

A library is an extensible, generalizable Python package that is intended for other users to install and set up. It is built by a developer with the express purpose of being distributed to users. Examples include popular open-source projects like NumPy, dateutil, and cryptography.

An application (or app, or program) is designed for a more specific purpose and a much smaller set of users (possibly just one user). It’s a program or set of programs highly tailored by the user to do a limited set of things. An example of an application is a Django app that sits behind a web page. Applications commonly use (import) libraries and the tools they contain.

When it comes to logging, there are different best practices in a library versus an app.

That’s where NullHandler fits in. It’s basically a do-nothing stub class.

If you’re writing a Python library, you really need to do this one minimalist piece of setup in your package’s __init__.py:

# Place this in your library's uppermost `__init__.py` # Nothing else! import logging logging.getLogger(__name__).addHandler(NullHandler())

This serves two critical purposes.

Firstly, a library logger that is declared with logger = logging.getLogger(__name__) (without any further configuration) will log to sys.stderr by default, even if that’s not what the end user wants. This could be described as an opt-out approach, where the end user of the library has to go in and disable logging to their console if they don’t want it.

Common wisdom says to use an opt-in approach instead: don’t emit any log messages by default, and let the end users of the library determine if they want to further configure the library’s loggers and add handlers to them. Here’s that philosophy worded more bluntly by the author of the logging package, Vinay Sajip:

A third party library which uses logging should not spew logging output by default which may not be wanted by a developer/user of an application which uses it. (Source)

This leaves it up to the library user, not library developer, to incrementally call methods such as logger.addHandler() or logger.setLevel().

The second reason that NullHandler exists is more archaic. In Python 2.7 and earlier, trying to log a LogRecord from a logger that has no handler set would raise a warning. Adding the no-op class NullHandler will avert this.

Here’s what specifically happens in the line logging.getLogger(__name__).addHandler(NullHandler()) from above:

  1. Python gets (creates) the Logger instance with the same name as your package. If you’re designing the calculus package, within __init__.py, then __name__ will be equal to 'calculus'.

  2. A NullHandler instance gets attached to this logger. That means that Python will not default to using the lastResort handler.

Keep in mind that any logger created in any of the other .py modules of the package will be children of this logger in the logger hierarchy and that, because this handler also belongs to them, they won’t need to use the lastResort handler and won’t default to logging to standard error (stderr).

As a quick example, let’s say your library has the following structure:

calculus/ │ ├── __init__.py └── integration.py

In integration.py, as the library developer you are free to do the following:

# calculus/integration.py import logging logger = logging.getLogger(__name__) def func(x): logger.warning("Look!") # Do stuff return None

Now, a user comes along and installs your library from PyPI via pip install calculus. They use from calculus.integration import func in some application code. This user is free to manipulate and configure the logger object from the library like any other Python object, to their heart’s content.

What Logging Does With Exceptions

One thing that you may be wary of is the danger of exceptions that stem from your calls to logging. If you have a logging.error() call that is designed to give you some more verbose debugging information, but that call itself for some reason raises an exception, that would be the height of irony, right?

Cleverly, if the logging package encounters an exception that has to do with logging itself, then it will print the traceback, but not raise the exception itself.

Here’s an example that deals with a common typo: passing two arguments to a format string that is only expecting one argument. The important distinction is that what you see below is not an exception being raised, but rather a prettified printed traceback of the internal exception, which itself was suppressed:

>>>>>> logging.critical("This %s has too many arguments", "msg", "other") --- Logging error --- Traceback (most recent call last): File "lib/python3.7/logging/__init__.py", line 1034, in emit msg = self.format(record) File "lib/python3.7/logging/__init__.py", line 880, in format return fmt.format(record) File "lib/python3.7/logging/__init__.py", line 619, in format record.message = record.getMessage() File "lib/python3.7/logging/__init__.py", line 380, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "<stdin>", line 1, in <module> Message: 'This %s has too many arguments' Arguments: ('msg', 'other')

This lets your program gracefully carry on with its actual program flow. The rationale is that you wouldn’t want an uncaught exception to come from a logging call itself and stop a program dead in its tracks.

Tracebacks can be messy, but this one is informative and relatively straightforward. What enables the suppression of exceptions related to logging is Handler.handleError(). When the handler calls .emit(), which is the method where it attempts to log the record, it falls back to .handleError() if something goes awry. Here’s the implementation of .emit() for the StreamHandler class:

def emit(self, record): try: msg = self.format(record) stream = self.stream stream.write(msg + self.terminator) self.flush() except Exception: self.handleError(record)

Any exception related to the formatting and writing gets caught rather than being raised, and handleError gracefully writes the traceback to sys.stderr.

Logging Python Tracebacks

Speaking of exceptions and their tracebacks, what about cases where your program encounters them but should log the exception and keep chugging along in its execution?

Let’s walk through a couple of ways to do this.

Here’s a contrived example of a lottery simulator using code that isn’t Pythonic on purpose. You’re developing an online lottery game where users can wager on their lucky number:

import random class Lottery(object): def __init__(self, n): self.n = n def make_tickets(self): for i in range(self.n): yield i def draw(self): pool = self.make_tickets() random.shuffle(pool) return next(pool)

Behind the frontend application sits the critical code below. You want to make sure that you keep track of any errors caused by the site that may make a user lose their money. The first (suboptimal) way is to use logging.error() and log the str form of the exception instance itself:

try: lucky_number = int(input("Enter your ticket number: ")) drawn = Lottery(n=20).draw() if lucky_number == drawn: print("Winner chicken dinner!") except Exception as e: # NOTE: See below for a better way to do this. logging.error("Could not draw ticket: %s", e)

This will only get you the actual exception message, rather than the traceback. You check the logs on your website’s server and find this cryptic message:

ERROR:root:Could not draw ticket: object of type 'generator' has no len()

Hmm. As the application developer, you’ve got a serious problem, and a user got ripped off as a result. But maybe this exception message itself isn’t very informative. Wouldn’t it be nice to see the lineage of the traceback that led to this exception?

The proper solution is to use logging.exception(), which logs a message with level ERROR and also displays the exception traceback. Replace the two final lines above with these:

except Exception: logging.exception("Could not draw ticket")

Now you get a better indication of what’s going on:

>>>ERROR:root:Could not draw ticket Traceback (most recent call last): File "<stdin>", line 3, in <module> File "<stdin>", line 9, in draw File "lib/python3.7/random.py", line 275, in shuffle for i in reversed(range(1, len(x))): TypeError: object of type 'generator' has no len()

Using exception() saves you from having to reference the exception yourself because logging pulls it in with sys.exc_info().

This makes things clearer that the problem stems from random.shuffle(), which needs to know the length of the object it is shuffling. Because our Lottery class passes a generator to shuffle(), it gets held up and raises before the pool can be shuffled, much less generate a winning ticket.

In large, full-blown applications, you’ll find logging.exception() to be even more useful when deep, multi-library tracebacks are involved, and you can’t step into them with a live debugger like pdb.

The code for logging.Logger.exception(), and hence logging.exception(), is just a single line:

def exception(self, msg, *args, exc_info=True, **kwargs): self.error(msg, *args, exc_info=exc_info, **kwargs)

That is, logging.exception() just calls logging.error() with exc_info=True, which is otherwise False by default. If you want to log an exception traceback but at a level different than logging.ERROR, just call that function or method with exc_info=True.

Keep in mind that exception() should only be called in the context of an exception handler, inside of an except block:

for i in data: try: result = my_longwinded_nested_function(i) except ValueError: # We are in the context of exception handler now. # If it's unclear exactly *why* we couldn't process # `i`, then log the traceback and move on rather than # ditching completely. logger.exception("Could not process %s", i) continue

Use this pattern sparingly rather than as a means to suppress any exception. It can be most helpful when you’re debugging a long function call stack where you’re otherwise seeing an ambiguous, unclear, and hard-to-track error.

Conclusion

Pat yourself on the back, because you’ve just walked through almost 2,000 lines of dense source code. You’re now better equipped to deal with the logging package!

Keep in mind that this tutorial has been far from exhaustive in covering all of the classes found in the logging package. There’s even more machinery that glues everything together. If you’d like to learn more, then you can look into the Formatter classes and the separate modules logging/config.py and logging/handlers.py.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Catalin George Festila: Python 3.7.3 : Using the win32com - part 001.

Wed, 2019-05-22 07:19
The tutorial is about win32com python module using the python 3.7.3. First exameple is simple: import sys import win32com.client from win32com.client import constants speaker = win32com.client.Dispatch("SAPI.SpVoice") print ("Type word or phrase, then enter.") while 1: text_to_say = input("> ") speaker.Speak(text_to_say)Let's run it. The Microsoft Speech SDK to speak what you type in
Categories: FLOSS Project Planets

EuroPython Society: EuroPython 2019: First batch of accepted sessions

Wed, 2019-05-22 06:25

europython:

Our program work group (WG) has been working hard over the weekend to select the sessions for EuroPython 2019.

We’re now happy to announce the first batch with:

brought to you by 129 speakers.

More advanced talks than in previous EuroPython editions

We are glad that the Python community has heard our call to submit more advanced talks this year. This will make EuroPython 2019 even more interesting than our previous edition.

Waiting List

Some talks are still in the waiting list. We will inform all speakers who have submitted talks about the selection status by email.

PyData EuroPython 2019

As in previous years, we will have lots of PyData talks and trainings:

  • 4 trainings on Monday and Tuesday (July 8-9)
  • 34 talks on Wednesday and Thursday (July 10-11)
  • no PyData talks on Friday (July 12)
Full Schedule

The full schedule will be available early in June.

Training Tickets & Combined Tickets

If you want to attend the trainings offered on the training days (July 8-9), please head over to the registration page in the next couple of days. Sales are going strong and we only have 300 tickets available.

Combined tickets are new this year and allow attending both training and conference days with a single ticket.


Enjoy,

EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/ 

Categories: FLOSS Project Planets

EuroPython: EuroPython 2019: First batch of accepted sessions

Wed, 2019-05-22 06:11

Our program work group (WG) has been working hard over the weekend to select the sessions for EuroPython 2019.

We’re now happy to announce the first batch with:

brought to you by 129 speakers.

More advanced talks than in previous EuroPython editions

We are glad that the Python community has heard our call to submit more advanced talks this year. This will make EuroPython 2019 even more interesting than our previous edition.

Waiting List

Some talks are still in the waiting list. We will inform all speakers who have submitted talks about the selection status by email.

PyData EuroPython 2019

As in previous years, we will have lots of PyData talks and trainings:

  • 4 trainings on Monday and Tuesday (July 8-9)
  • 34 talks on Wednesday and Thursday (July 10-11)
  • no PyData talks on Friday (July 12)
Full Schedule

The full schedule will be available early in June.

Training Tickets & Combined Tickets

If you want to attend the trainings offered on the training days (July 8-9), please head over to the registration page in the next couple of days. Sales are going strong and we only have 300 tickets available.

Combined tickets are new this year and allow attending both training and conference days with a single ticket.


Enjoy,

EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/ 

Categories: FLOSS Project Planets

Kushal Das: Game of guessing colors using CircuitPython

Wed, 2019-05-22 05:54

Every participant of PyCon US 2019 received a CircuitPython Playground Express (cpx) in the swag bag from Digikey and Adafuit, which is one of the best swag in a conference. Only another thing which comes in my mind was Yubikeys sponsored by Yubico in a rootconf a few years ago.

I did not play around much with my cpx during PyCon, but, decided to go through the documents and examples in the last week. I used Mu editor (thank you @ntoll) to write a small game.

The goal is to guess a color for the next NeoPixel on the board and then press Button A to see if you guessed right or not. Py and I are continuously playing this for the last weeks.

The idea of CircuitPython, where we can connect the device to a computer and start editing code and see the changes live, is super fantastic and straightforward. It takes almost no time to start working on these, the documentation is also unambiguous and with many examples. Py (our 4 years old daughter) is so excited that now she wants to learn programming so that she can build her things with this board).

Categories: FLOSS Project Planets

Patrick Kennedy: Setting Up GitLab CI for a Python Application

Wed, 2019-05-22 00:28

Introduction

This blog post describes how to configure a Continuous Integration (CI) process on GitLab for a python application.  This blog post utilizes one of my python applications (bild) to show how to setup the CI process:

In this blog post, I’ll show how I setup a GitLab CI process to run the following jobs on a python application:

  • Unit and functional testing using pytest
  • Linting using flake8
  • Static analysis using pylint
  • Type checking using mypy

What is CI?

To me, Continuous Integration (CI) means frequently testing your application in an integrated state.  However, the term ‘testing’ should be interpreted loosely as this can mean:

  • Integration testing
  • Unit testing
  • Functional testing
  • Static analysis
  • Style checking (linting)
  • Dynamic analysis

To facilitate running these tests, it’s best to have these tests run automatically as part of your configuration management (git) process.  This is where GitLab CI is awesome!

In my experience, I’ve found it really beneficial to develop a test script locally and then add it to the CI process that gets automatically run on GitLab CI.

Getting Started with GitLab CI

Before jumping into GitLab CI, here are a few definitions:

 – pipeline: a set of tests to run against a single git commit.

  – runner: GitLab uses runners on different servers to actually execute the tests in a pipeline; GitLab provides runners to use, but you can also spin up your own servers as runners.

  – job: a single test being run in a pipeline.

  – stage: a group of related tests being run in a pipeline.

Here’s a screenshot from GitLab CI that helps illustrate these terms:

GitLab utilizes the ‘.gitlab-ci.yml’ file to run the CI pipeline for each project.  The ‘.gitlab-ci.yml’ file should be found in the top-level directory of your project.

While there are different methods of running a test in GitLab CI, I prefer to utilize a Docker container to run each test.  I’ve found the overhead in spinning up a Docker container to be trivial (in terms of execution time) when doing CI testing.

Creating a Single Job in GitLab CI

The first job that I want to add to GitLab CI for my project is to run a linter (flake8).  In my local development environment, I would run this command:

$ flake8 --max-line-length=120 bild/*.py

This command can be transformed into a job on GitLab CI in the ‘.gitlab-ci.yml’ file:

image: "python:3.7" before_script: - python --version - pip install -r requirements.txt stages: - Static Analysis flake8: stage: Static Analysis script: - flake8 --max-line-length=120 bild/*.py

This YAML file tells GitLab CI what to run on each commit pushed up to the repository. Let’s break down each section…

The first line (image: “python: 3.7”) instructs GitLab CI to utilize Docker for performing ALL of the tests for this project, specifically to use the ‘python:3.7‘ image that is found on DockerHub.

The second section (before_script) is the set of commands to run in the Docker container before starting each job. This is really beneficial for getting the Docker container in the correct state by installing all the python packages needed by the application.

The third section (stages) defines the different stages in the pipeline. There is only a single stage (Static Analysis) at this point, but later a second stage (Test) will be added. I like to think of stages as a way to group together related jobs.

The fourth section (flake8) defines the job; it specifies the stage (Static Analysis) that the job should be part of and the commands to run in the Docker container for this job. For this job, the flake8 linter is run against the python files in the application.

At this point, the updates to ‘.gitlab-ci.yml’ file should be commited to git and then pushed up to GitLab:

git add .gitlab_ci.yml git commit -m "Updated .gitlab_ci.yml" git push origin master

GitLab Ci will see that there is a CI configuration file (.gitlab-ci.yml) and use this to run the pipeline:

This is the start of a CI process for a python project!  GitLab CI will run a linter (flake8) on every commit that is pushed up to GitLab for this project.

Running Tests with pytest on GitLab CI

When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory:

$ pytest

My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was:

image: "python:3.7" before_script: - python --version - pip install -r requirements.txt stages: - Static Analysis - Test ... pytest: stage: Test script: - pytest

However, this did not work as pytest was unable to find the ‘bild’ module (ie. the source code) to test:

$ pytest ========================= test session starts ========================== platform linux -- Python 3.7.3, pytest-4.5.0, py-1.5.4, pluggy-0.11.0 rootdir: /builds/patkennedy79/bild, inifile: pytest.ini plugins: datafiles-2.0 collected 0 items / 3 errors ============================ ERRORS ==================================== ___________ ERROR collecting tests/functional/test_bild.py _____________ ImportError while importing test module '/builds/patkennedy79/bild/tests/functional/test_bild.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: tests/functional/test_bild.py:4: in <module> from bild.directory import Directory E ModuleNotFoundError: No module named 'bild' ... ==================== 3 error in 0.24 seconds ====================== ERROR: Job failed: exit code 1

The problem encountered here is that the ‘bild’ module is not able to be found by the test_*.py files, as the top-level directory of the project was not being specified in the system path:

$ python -c "import sys;print(sys.path)"
['', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

The solution that I came up with was to add the top-level directory to the system path within the Docker container for this job:

pytest: stage: Test script: - pwd - ls -l - export PYTHONPATH="$PYTHONPATH:." - python -c "import sys;print(sys.path)" - pytest

With the updated system path, this job was able to run successfully:

$ pwd /builds/patkennedy79/bild $ export PYTHONPATH="$PYTHONPATH:." $ python -c "import sys;print(sys.path)" ['', '/builds/patkennedy79/bild', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

Final GitLab CI Configuration

Here is the final .gitlab-ci.yml file that runs the static analysis jobs (flake8, mypy, pylint) and the tests (pytest):

image: "python:3.7"

before_script:
- python --version
- pip install -r requirements.txt

stages:
- Static Analysis
- Test

mypy:
stage: Static Analysis
script:
- pwd
- ls -l
- python -m mypy bild/file.py
- python -m mypy bild/directory.py

flake8:
stage: Static Analysis
script:
- flake8 --max-line-length=120 bild/*.py

pylint:
stage: Static Analysis
allow_failure: true
script:
- pylint -d C0301 bild/*.py

unit_test:
stage: Test
script:
- pwd
- ls -l
- export PYTHONPATH="$PYTHONPATH:."
- python -c "import sys;print(sys.path)"
- pytest

Here is the resulting output from GitLab CI:

One item that I’d like to point out is that pylint is reporting some warnings, but I find this to be acceptable. However, I still want to have pylint running in my CI process, but I don’t care if it has failures. I’m more concerned with trends over time (are there warnings being created). Therefore, I set the pylint job to be allowed to fail via the ‘allow_failure’ setting:

pylint: stage: Static Analysis allow_failure: true script: - pylint -d C0301 bild/*.py
Categories: FLOSS Project Planets

The Digital Cat: Object-Oriented Programming in Python 3 - Abstract Base Classes

Tue, 2019-05-21 21:47

This post is available as an IPython Notebook here

The Inspection Club

As you know, Python leverages polymorphism at its maximum by dealing only with generic references to objects. This makes OOP not an addition to the language but part of its structure from the ground up. Moreover, Python pushes the EAFP appoach, which tries to avoid direct inspection of objects as much as possible.

It is however very interesting to read what Guido van Rossum says in PEP 3119: Invocation means interacting with an object by invoking its methods. Usually this is combined with polymorphism, so that invoking a given method may run different code depending on the type of an object. Inspection means the ability for external code (outside of the object's methods) to examine the type or properties of that object, and make decisions on how to treat that object based on that information. [...] In classical OOP theory, invocation is the preferred usage pattern, and inspection is actively discouraged, being considered a relic of an earlier, procedural programming style. However, in practice this view is simply too dogmatic and inflexible, and leads to a kind of design rigidity that is very much at odds with the dynamic nature of a language like Python.

The author of Python recognizes that forcing the use of a pure polymorphic approach leads sometimes to solutions that are too complex or even incorrect. In this section I want to show some of the problems that can arise from a pure polymorphic approach and introduce Abstract Base Classes, which aim to solve them. I strongly suggest to read PEP 3119 (as for any other PEP) since it contains a deeper and better explanation of the whole matter. Indeed I think that this PEP is so well written that any further explanation is hardly needed. I am however used to write explanations to check how much I understood about the topic, so I am going to try it this time too.

E.A.F.P the Extra Test Trial

The EAFP coding style requires you to trust the incoming objects to provide the attributes and methods you need, and to manage the possible exceptions, if you know how to do it. Sometimes, however, you need to test if the incoming object matches a complex behaviour. For example, you could be interested in testing if the object acts like a list, but you quickly realize that the amount of methods a list provides is very big and this could lead to odd EAFP code like

try: obj.append obj.count obj.extend obj.index obj.insert [...] except AttributeError: [...]

where the methods of the list type are accessed (not called) just to force the object to raise the AttributeError exception if they are not present. This code, however, is not only ugly but also wrong. If you recall the "Enter the Composition" section of the third post of this series, you know that in Python you can always customize the __getattr__() method, which is called whenever the requested attribute is not found in the object. So I could write a class that passes the test but actually does not act like a list

class FakeList: def fakemethod(self): pass def __getattr__(self, name): if name in ['append', 'count', 'extend', 'index', 'insert', ...]: return self.fakemethod

This is obviously just an example, and no one will ever write such a class, but this demonstrates that just accessing methods does not guarantee that a class acts like the one we are expecting.

There are many examples that could be done leveraging the highly dynamic nature of Python and its rich object model. I would summarize them by saying that sometimes you'd better to check the type of the incoming object.

In Python you can obtain the type of an object using the type() built-in function, but to check it you'd better use isinstance(), which returns a boolean value. Let us see an example before moving on

>>> isinstance([], list) True >>> isinstance(1, int) True >>> class Door: ... pass ... >>> d = Door() >>> isinstance(d, Door) True >>> class EnhancedDoor(Door): ... pass ... >>> ed = EnhancedDoor() >>> isinstance(ed, EnhancedDoor) True >>> isinstance(ed, Door) True

As you can see the function can also walk the class hierarchy, so the check is not so trivial like the one you would obtain by directly using type().

The isinstance() function, however, does not completely solve the problem. If we write a class that actually acts like a list but does not inherit from it, isinstance() does not recognize the fact that the two may be considered the same thing. The following code returns False regardless the content of the MyList class

>>> class MyList: ... pass ... >>> ml = MyList() >>> isinstance(ml, list) False

since isinstance() does not check the content of the class or its behaviour, it just considers the class and its ancestors.

The problem, thus, may be summed up with the following question: what is the best way to test that an object exposes a given interface? Here, the word interface is used for its natural meaning, without any reference to other programming solutions, which however address the same problem.

A good way to address the problem could be to write inside an attribute of the object the list of interfaces it promises to implement, and to agree that any time we want to test the behaviour of an object we simply have to check the content of this attribute. This is exactly the path followed by Python, and it is very important to understand that the whole system is just about a promised behaviour.

The solution proposed through PEP 3119 is, in my opinion, very simple and elegant, and it perfectly fits the nature of Python, where things are usually agreed rather than being enforced. Not only, the solution follows the spirit of polymorphism, where information is provided by the object itself and not extracted by the calling code.

In the next sections I am going to try and describe this solution in its main building blocks. The matter is complex so my explanation will lack some details: please refer to the forementioned PEP 3119 for a complete description.

Who Framed the Metaclasses

As already described, Python provides two built-ins to inspect objects and classes, which are isinstance() and issubclass() and a solution to the inspection problem should allow the programmer to go on with using those two functions.

This means that we need to find a way to inject the "behaviour promise" into both classes and instances. This is the reason why metaclasses come in play. Recall what we said about them in the fifth issue of this series: metaclasses are the classes used to build classes, which means that they are the preferred way to change the structure of a class, and, in consequence, of its instances.

Another way to do the same job would be to leverage the inheritance mechanism, injecting the behaviour through a dedicated parent class. This solution has many downsides, which I'm am not going to detail. It is enough to say that affecting the class hierarchy may lead to complex situations or subtle bugs. Metaclasses may provide here a different entry point for the introduction of a "virtual base class" (as PEP 3119 specifies, this is not the same concept as in C++).

Overriding Places

As said, isinstance() and issubclass() are built-in functions, not object methods, so we cannot simply override them providing a different implementation in a given class. So the first part of the solution is to change the behaviour of those two functions to first check if the class or the instance contain a special method, which is __instancecheck__() for isinstance() and __subclasscheck__() for issubclass(). So both built-ins try to run the respective special method, reverting to the standard algorithm if it is not present.

A note about naming. Methods must accept the object they belong to as the first argument, so the two special methods shall have the form

def __instancecheck__(cls, inst): [...] def __subclasscheck__(cls, sub): [...]

where cls is the class where they are injected, that is the one representing the promised behaviour. The two built-ins, however, have a reversed argument order, where the behaviour comes after the tested object: when you write isinstance([], list) you want to check if the [] instance has the list behaviour. This is the reason behind the name choice: just calling the methods __isinstance__() and __issubclass__() and passing arguments in a reversed order would have been confusing.

This is ABC

The proposed solution is thus called Abstract Base Classes, as it provides a way to attach to a concrete class a virtual class with the only purpose of signaling a promised behaviour to anyone inspecting it with isinstance() or issubclass().

To help programmers implement Abstract Base Classes, the standard library has been given an abc module, thet contains the ABCMeta class (and other facilities). This class is the one that implements __instancecheck__() and __subclasscheck__() and shall be used as a metaclass to augment a standard class. The latter will then be able to register other classes as implementation of its behaviour.

Sounds complex? An example may clarify the whole matter. The one from the official documentation is rather simple:

from abc import ABCMeta class MyABC(metaclass=ABCMeta): pass MyABC.register(tuple) assert issubclass(tuple, MyABC) assert isinstance((), MyABC)

Here, the MyABC class is provided the ABCMeta metaclass. This puts the two __instancecheck__() and __subclasscheck__() methods inside MyABC so that, when issuing isinstance(), what Python actually ececutes is

>>> d = {'a': 1} >>> isinstance(d, MyABC) False >>> MyABC.__class__.__instancecheck__(MyABC, d) False >>> isinstance((), MyABC) True >>> MyABC.__class__.__instancecheck__(MyABC, ()) True

After the definition of MyABC we need a way to signal that a given class is an instance of the Abstract Base Class and this happens through the register() method, provided by the ABCMeta metaclass. Calling MyABC.register(tuple) we record inside MyABC the fact that the tuple class shall be identified as a subclass of MyABC itself. This is analogous to saying that tuple inherits from MyABC but not quite the same. As already said registering a class in an Abstract Base Class with register() does not affect the class hierarchy. Indeed, the whole tuple class is unchanged.

The current implementation of ABCs stores the registered types inside the _abc_registry attribute. Actually it stores there weak references to the registered types (this part is outside the scope of this article, so I'm not detailing it)

>>> MyABC._abc_registry.data {<weakref at 0xb682966c; to 'type' at 0x83dcca0 (tuple)>} Movie Trivia

Section titles come from the following movies: The Breakfast Club (1985), E.T. the Extra-Terrestrial (1982), Who Framed Roger Rabbit (1988), Trading Places (1983), This is Spinal Tap (1984).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

Categories: FLOSS Project Planets

The Digital Cat: Object-Oriented Programming in Python 3 - Metaclasses

Tue, 2019-05-21 21:47

This post is available as an IPython Notebook here

The Type Brothers

The first step into the most intimate secrets of Python objects comes from two components we already met in the first post: class and object. These two things are the very fundamental elements of Python OOP system, so it is worth spending some time to understand how they work and relate each other.

First of all recall that in Python everything is an object, that is everything inherits from object. Thus, object seems to be the deepest thing you can find digging into Python variables. Let's check this

>>> a = 5 >>> type(a) <class 'int'> >>> a.__class__ <class 'int'> >>> a.__class__.__bases__ (<class 'object'>,) >>> object.__bases__ ()

The variable a is an instance of the int class, and the latter inherits from object, which inherits from nothing. This demonstrates that object is at the top of the class hierarchy. However, as you can see, both int and object are called classes (<class 'int'>, <class 'object'>). Indeed, while a is an instance of the int class, int itself is an instance of another class, a class that is instanced to build classes

>>> type(a) <class 'int'> >>> type(int) <class 'type'> >>> type(float) <class 'type'> >>> type(dict) <class 'type'>

Since in Python everything is an object, everything is the instance of a class, even classes. Well, type is the class that is instanced to get classes. So remember this: object is the base of every object, type is the class of every type. Sounds puzzling? It is not your fault, don't worry. However, just to strike you with the finishing move, this is what Python is built on

>>> type(object) <class 'type'> >>> type.__bases__ (<class 'object'>,)

If you are not about to faint at this point chances are that you are Guido van Rossum of one of his friends down at the Python core development team (in this case let me thank you for your beautiful creation). You may get a cup of tea, if you need it.

Jokes apart, at the very base of Python type system there are two things, object and type, which are inseparable. The previous code shows that object is an instance of type, and type inherits from object. Take your time to understand this subtle concept, as it is very important for the upcoming discussion about metaclasses.

When you think you grasped the type/object matter read this and start thinking again

>>> type(type) <class 'type'> The Metaclasses Take Python

You are now familiar with Python classes. You know that a class is used to create an instance, and that the structure of the latter is ruled by the source class and all its parent classes (until you reach object).

Since classes are objects too, you know that a class itself is an instance of a (super)class, and this class is type. That is, as already stated, type is the class that is used to build classes.

So for example you know that a class may be instanced, i.e. it can be called and by calling it you obtain another object that is linked with the class. What prepares the class for being called? What gives the class all its methods? In Python the class in charge of performing such tasks is called metaclass, and type is the default metaclass of all classes.

The point of exposing this structure of Python objects is that you may change the way classes are built. As you know, type is an object, so it can be subclassed just like any other class. Once you get a subclass of type you need to instruct your class to use it as the metaclass instead of type, and you can do this by passing it as the metaclass keyword argument in the class definition.

>>> class MyType(type): ... pass ... >>> class MySpecialClass(metaclass=MyType): ... pass ... >>> msp = MySpecialClass() >>> type(msp) <class '__main__.MySpecialClass'> >>> type(MySpecialClass) <class '__main__.MyType'> >>> type(MyType) <class 'type'> Metaclasses 2: Singleton Day

Metaclasses are a very advanced topic in Python, but they have many practical uses. For example, by means of a custom metaclass you may log any time a class is instanced, which can be important for applications that shall keep a low memory usage or have to monitor it.

I am going to show here a very simple example of metaclass, the Singleton. Singleton is a well known design pattern, and many description of it may be found on the Internet. It has also been heavily criticized mostly because its bad behaviour when subclassed, but here I do not want to introduce it for its technological value, but for its simplicity (so please do not question the choice, it is just an example).

Singleton has one purpose: to return the same instance every time it is instanced, like a sort of object-oriented global variable. So we need to build a class that does not work like standard classes, which return a new instance every time they are called.

"Build a class"? This is a task for metaclasses. The following implementation comes from Python 3 Patterns, Recipes and Idioms.

class Singleton(type): instance = None def __call__(cls, *args, **kw): if not cls.instance: cls.instance = super(Singleton, cls).__call__(*args, **kw) return cls.instance

We are defining a new type, which inherits from type to provide all bells and whistles of Python classes. We override the __call__ method, that is a special method invoked when we call the class, i.e. when we instance it. The new method wraps the original method of type by calling it only when the instance attribute is not set, i.e. the first time the class is instanced, otherwise it just returns the recorded instance. As you can see this is a very basic cache class, the only trick is that it is applied to the creation of instances.

To test the new type we need to define a new class that uses it as its metaclass

>>> class ASingleton(metaclass=Singleton): ... pass ... >>> a = ASingleton() >>> b = ASingleton() >>> a is b True >>> hex(id(a)) '0xb68030ec' >>> hex(id(b)) '0xb68030ec'

By using the is operator we test that the two objects are the very same structure in memory, that is their ids are the same, as explicitly shown. What actually happens is that when you issue a = ASingleton() the ASingleton class runs its __call__() method, which is taken from the Singleton type behind the class. That method recognizes that no instance has been created (Singleton.instance is None) and acts just like any standard class does. When you issue b = ASingleton() the very same things happen, but since Singleton.instance is now different from None its value (the previous instance) is directly returned.

Metaclasses are a very powerful programming tool and leveraging them you can achieve very complex behaviours with a small effort. Their use is a must every time you are actually metaprogramming, that is you are writing code that has to drive the way your code works. Good examples are creational patterns (injecting custom class attributes depending on some configuration), testing, debugging, and performance monitoring.

Coming to Instance

Before introducing you to a very smart use of metaclasses by talking about Abstract Base Classes (read: to save some topics for the next part of this series), I want to dive into the object creation procedure in Python, that is what happens when you instance a class. In the first post this procedure was described only partially, by looking at the __init__() method.

In the first post I recalled the object-oriented concept of constructor, which is a special method of the class that is automatically called when the instance is created. The class may also define a destructor, which is called when the object is destroyed. In languages without a garbage collection mechanism such as C++ the destructor shall be carefully designed. In Python the destructor may be defined through the __del__() method, but it is hardly used.

The constructor mechanism in Python is on the contrary very important, and it is implemented by two methods, instead of just one: __new__() and __init__(). The tasks of the two methods are very clear and distinct: __new__() shall perform actions needed when creating a new instance while __init__ deals with object initialization.

Since in Python you do not need to declare attributes due to its dynamic nature, __new__() is rarely defined by programmers, who may rely on __init__ to perform the majority of the usual tasks. Typical uses of __new__() are very similar to those listed in the previous section, since it allows to trigger some code whenever your class is instanced.

The standard way to override __new__() is

class MyClass(): def __new__(cls, *args, **kwds): obj = super().__new__(cls, *args, **kwds) [put your code here] return obj

just like you usually do with __init__(). When your class inherits from object you do not need to call the parent method (object.__init__()), because it is empty, but you need to do it when overriding __new__.

Remember that __new__() is not forced to return an instance of the class in which it is defined, even if you shall have very good reasons to break this behaviour. Anyway, __init__() will be called only if you return an instance of the container class. Please also note that __new__(), unlike __init__(), accepts the class as its first parameter. The name is not important in Python, and you can also call it self, but it is worth using cls to remember that it is not an instance.

Movie Trivia

Section titles come from the following movies: The Blues Brothers (1980), The Muppets Take Manhattan (1984), Terminator 2: Judgement Day (1991), Coming to America (1988).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

Categories: FLOSS Project Planets

The Digital Cat: Object-Oriented Programming in Python 3 - Composition and inheritance

Tue, 2019-05-21 21:47

This post is available as an IPython Notebook here

The Delegation Run

If classes are objects what is the difference between types and instances?

When I talk about "my cat" I am referring to a concrete instance of the "cat" concept, which is a subtype of "animal". So, despite being both objects, while types can be specialized, instances cannot.

Usually an object B is said to be a specialization of an object A when:

  • B has all the features of A
  • B can provide new features
  • B can perform some or all the tasks performed by A in a different way

Those targets are very general and valid for any system and the key to achieve them with the maximum reuse of already existing components is delegation. Delegation means that an object shall perform only what it knows best, and leave the rest to other objects.

Delegation can be implemented with two different mechanisms: composition and inheritance. Sadly, very often only inheritance is listed among the pillars of OOP techniques, forgetting that it is an implementation of the more generic and fundamental mechanism of delegation; perhaps a better nomenclature for the two techniques could be explicit delegation (composition) and implicit delegation (inheritance).

Please note that, again, when talking about composition and inheritance we are talking about focusing on a behavioural or structural delegation. Another way to think about the difference between composition and inheritance is to consider if the object knows who can satisfy your request or if the object is the one that satisfy the request.

Please, please, please do not forget composition: in many cases, composition can lead to simpler systems, with benefits on maintainability and changeability.

Usually composition is said to be a very generic technique that needs no special syntax, while inheritance and its rules are strongly dependent on the language of choice. Actually, the strong dynamic nature of Python softens the boundary line between the two techniques.

Inheritance Now

In Python a class can be declared as an extension of one or more different classes, through the class inheritance mechanism. The child class (the one that inherits) has the same internal structure of the parent class (the one that is inherited), and for the case of multiple inheritance the language has very specific rules to manage possible conflicts or redefinitions among the parent classes. A very simple example of inheritance is

class SecurityDoor(Door): pass

where we declare a new class SecurityDoor that, at the moment, is a perfect copy of the Door class. Let us investigate what happens when we access attributes and methods. First we instance the class

>>> sdoor = SecurityDoor(1, 'closed')

The first check we can do is that class attributes are still global and shared

>>> SecurityDoor.colour is Door.colour True >>> sdoor.colour is Door.colour True

This shows us that Python tries to resolve instance members not only looking into the class the instance comes from, but also investigating the parent classes. In this case sdoor.colour becomes SecurityDoor.colour, that in turn becomes Door.colour. SecurityDoor is a Door.

If we investigate the content of __dict__ we can catch a glimpse of the inheritance mechanism in action

>>> sdoor.__dict__ {'number': 1, 'status': 'closed'} >>> sdoor.__class__.__dict__ mappingproxy({'__doc__': None, '__module__': '__main__'}) >>> Door.__dict__ mappingproxy({'__dict__': <attribute '__dict__' of 'Door' objects>, 'colour': 'yellow', 'open': <function Door.open at 0xb687e224>, '__init__': <function Door.__init__ at 0xb687e14c>, '__doc__': None, 'close': <function Door.close at 0xb687e1dc>, 'knock': <classmethod object at 0xb67ff6ac>, '__weakref__': <attribute '__weakref__' of 'Door' objects>, '__module__': '__main__', 'paint': <classmethod object at 0xb67ff6ec>})

As you can see the content of __dict__ for SecurityDoor is very narrow compared to that of Door. The inheritance mechanism takes care of the missing elements by climbing up the classes tree. Where does Python get the parent classes? A class always contains a __bases__ tuple that lists them

>>> SecurityDoor.__bases__ (<class '__main__.Door'>,)

So an example of what Python does to resolve a class method call through the inheritance tree is

>>> sdoor.__class__.__bases__[0].__dict__['knock'].__get__(sdoor) <bound method type.knock of <class '__main__.SecurityDoor'>> >>> sdoor.knock <bound method type.knock of <class '__main__.SecurityDoor'>>

Please note that this is just an example that does not consider multiple inheritance.

Let us try now to override some methods and attributes. In Python you can override (redefine) a parent class member simply by redefining it in the child class.

class SecurityDoor(Door): colour = 'gray' locked = True def open(self): if not self.locked: self.status = 'open'

As you can forecast, the overridden members now are present in the __dict__ of the SecurityDoor class

>>> SecurityDoor.__dict__ mappingproxy({'__doc__': None, '__module__': '__main__', 'open': <function SecurityDoor.open at 0xb6fcf89c>, 'colour': 'gray', 'locked': True})

So when you override a member, the one you put in the child class is used instead of the one in the parent class simply because the former is found before the latter while climbing the class hierarchy. This also shows you that Python does not implicitly call the parent implementation when you override a method. So, overriding is a way to block implicit delegation.

If we want to call the parent implementation we have to do it explicitly. In the former example we could write

class SecurityDoor(Door): colour = 'gray' locked = True def open(self): if self.locked: return Door.open(self)

You can easily test that this implementation is working correctly.

>>> sdoor = SecurityDoor(1, 'closed') >>> sdoor.status 'closed' >>> sdoor.open() >>> sdoor.status 'closed' >>> sdoor.locked = False >>> sdoor.open() >>> sdoor.status 'open'

This form of explicit parent delegation is heavily discouraged, however.

The first reason is because of the very high coupling that results from explicitly naming the parent class again when calling the method. Coupling, in the computer science lingo, means to link two parts of a system, so that changes in one of them directly affect the other one, and is usually avoided as much as possible. In this case if you decide to use a new parent class you have to manually propagate the change to every method that calls it. Moreover, since in Python the class hierarchy can be dynamically changed (i.e. at runtime), this form of explicit delegation could be not only annoying but also wrong.

The second reason is that in general you need to deal with multiple inheritance, where you do not know a priori which parent class implements the original form of the method you are overriding.

To solve these issues, Python supplies the super() built-in function, that climbs the class hierarchy and returns the correct class that shall be called. The syntax for calling super() is

class SecurityDoor(Door): colour = 'gray' locked = True def open(self): if self.locked: return super().open()

The output of super() is not exactly the Door class. It returns a super object which representation is <super: <class 'SecurityDoor'>, <SecurityDoor object>>. This object however acts like the parent class, so you can safely ignore its custom nature and use it just like you would do with the Door class in this case.

Enter the Composition

Composition means that an object knows another object, and explicitly delegates some tasks to it. While inheritance is implicit, composition is explicit: in Python, however, things are far more interesting than this =).

First of all let us implement classic composition, which simply makes an object part of the other as an attribute

class SecurityDoor: colour = 'gray' locked = True def __init__(self, number, status): self.door = Door(number, status) def open(self): if self.locked: return self.door.open() def close(self): self.door.close()

The primary goal of composition is to relax the coupling between objects. This little example shows that now SecurityDoor is an object and no more a Door, which means that the internal structure of Door is not copied. For this very simple example both Door and SecurityDoor are not big classes, but in a real system objects can very complex; this means that their allocation consumes a lot of memory and if a system contains thousands or millions of objects that could be an issue.

The composed SecurityDoor has to redefine the colour attribute since the concept of delegation applies only to methods and not to attributes, doesn't it?

Well, no. Python provides a very high degree of indirection for objects manipulation and attribute access is one of the most useful. As you already discovered, accessing attributes is ruled by a special method called __getattribute__() that is called whenever an attribute of the object is accessed. Overriding __getattribute__(), however, is overkill; it is a very complex method, and, being called on every attribute access, any change makes the whole thing slower.

The method we have to leverage to delegate attribute access is __getattr__(), which is a special method that is called whenever the requested attribute is not found in the object. So basically it is the right place to dispatch all attribute and method access our object cannot handle. The previous example becomes

class SecurityDoor: locked = True def __init__(self, number, status): self.door = Door(number, status) def open(self): if self.locked: return self.door.open() def __getattr__(self, attr): return getattr(self.door, attr)

Using __getattr__() blends the separation line between inheritance and composition since after all the former is a form of automatic delegation of every member access.

class ComposedDoor: def __init__(self, number, status): self.door = Door(number, status) def __getattr__(self, attr): return getattr(self.door, attr)

As this last example shows, delegating every member access through __getattr__() is very simple. Pay attention to getattr() which is different from __getattr__(). The former is a built-in that is equivalent to the dotted syntax, i.e. getattr(obj, 'someattr') is the same as obj.someattr, but you have to use it since the name of the attribute is contained in a string.

Composition provides a superior way to manage delegation since it can selectively delegate the access, even mask some attributes or methods, while inheritance cannot. In Python you also avoid the memory problems that might arise when you put many objects inside another; Python handles everything through its reference, i.e. through a pointer to the memory position of the thing, so the size of an attribute is constant and very limited.

Movie Trivia

Section titles come from the following movies: The Cannonball Run (1981), Apocalypse Now (1979), Enter the Dragon (1973).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

Categories: FLOSS Project Planets

The Digital Cat: Object-Oriented Programming in Python 3 - Classes and members

Tue, 2019-05-21 21:47

This post is available as an IPython Notebook here

Python Classes Strike Again

The Python implementation of classes has some peculiarities. The bare truth is that in Python the class of an object is an object itself. You can check this by issuing type() on the class

>>> a = 1 >>> type(a) <class 'int'> >>> type(int) <class 'type'>

This shows that the int class is an object, an instance of the type class.

This concept is not so difficult to grasp as it can seem at first sight: in the real world we deal with concepts using them like things: for example we can talk about the concept of "door", telling people how a door looks like and how it works. In this case the concept of door is the topic of our discussion, so in our everyday experience the type of an object is an object itself. In Python this can be expressed by saying that everything is an object.

If the class of an object is itself an instance it is a concrete object and is stored somewhere in memory. Let us leverage the inspection capabilities of Python and its id() function to check the status of our objects. The id() built-in function returns the memory position of an object.

In the first post we defined this class

class Door: def __init__(self, number, status): self.number = number self.status = status def open(self): self.status = 'open' def close(self): self.status = 'closed'

First of all, let's create two instances of the Door class and check that the two objects are stored at different addresses

>>> door1 = Door(1, 'closed') >>> door2 = Door(1, 'closed') >>> hex(id(door1)) '0xb67e148c' >>> hex(id(door2)) '0xb67e144c'

This confirms that the two instances are separate and unrelated. Please note that your values are very likely to be different from the ones I got. Being memory addresses they change at every execution. The second instance was given the same attributes of the first instance to show that the two are different objects regardless of the value of the attributes.

However if we use id() on the class of the two instances we discover that the class is exactly the same

>>> hex(id(door1.__class__)) '0xb685f56c' >>> hex(id(door2.__class__)) '0xb685f56c'

Well this is very important. In Python, a class is not just the schema used to build an object. Rather, the class is a shared living object, which code is accessed at run time.

As we already tested, however, attributes are not stored in the class but in every instance, due to the fact that __init__() works on self when creating them. Classes, however, can be given attributes like any other object; with a terrific effort of imagination, let's call them class attributes.

As you can expect, class attributes are shared among the class instances just like their container

class Door: colour = 'brown' def __init__(self, number, status): self.number = number self.status = status def open(self): self.status = 'open' def close(self): self.status = 'closed'

Pay attention: the colour attribute here is not created using self, so it is contained in the class and shared among instances

>>> door1 = Door(1, 'closed') >>> door2 = Door(2, 'closed') >>> Door.colour 'brown' >>> door1.colour 'brown' >>> door2.colour 'brown'

Until here things are not different from the previous case. Let's see if changes of the shared value reflect on all instances

>>> Door.colour = 'white' >>> Door.colour 'white' >>> door1.colour 'white' >>> door2.colour 'white' >>> hex(id(Door.colour)) '0xb67e1500' >>> hex(id(door1.colour)) '0xb67e1500' >>> hex(id(door2.colour)) '0xb67e1500' Raiders of the Lost Attribute

Any Python object is automatically given a __dict__ attribute, which contains its list of attributes. Let's investigate what this dictionary contains for our example objects:

>>> Door.__dict__ mappingproxy({'open': <function Door.open at 0xb68604ac>, 'colour': 'brown', '__dict__': <attribute '__dict__' of 'Door' objects>, '__weakref__': <attribute '__weakref__' of 'Door' objects>, '__init__': <function Door.__init__ at 0xb7062854>, '__module__': '__main__', '__doc__': None, 'close': <function Door.close at 0xb686041c>}) >>> door1.__dict__ {'number': 1, 'status': 'closed'}

Leaving aside the difference between a dictionary and a mappingproxy object, you can see that the colour attribute is listed among the Door class attributes, while status and number are listed for the instance.

How comes that we can call door1.colour, if that attribute is not listed for that instance? This is a job performed by the magic __getattribute__() method; in Python the dotted syntax automatically invokes this method so when we write door1.colour, Python executes door1.__getattribute__('colour'). That method performs the attribute lookup action, i.e. finds the value of the attribute by looking in different places.

The standard implementation of __getattribute__() searches first the internal dictionary (__dict__) of an object, then the type of the object itself; in this case door1.__getattribute__('colour') executes first door1.__dict__['colour'] and then, since the latter raises a KeyError exception, door1.__class__.__dict__['colour']

>>> door1.__dict__['colour'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'colour' >>> door1.__class__.__dict__['colour'] 'brown'

Indeed, if we compare the objects' equality through the is operator we can confirm that both door1.colour and Door.colour are exactly the same object

>>> door1.colour is Door.colour True

When we try to assign a value to a class attribute directly on an instance, we just put in the __dict__ of the instance a value with that name, and this value masks the class attribute since it is found first by __getattribute__(). As you can see from the examples of the previous section, this is different from changing the value of the attribute on the class itself.

>>> door1.colour = 'white' >>> door1.__dict__['colour'] 'white' >>> door1.__class__.__dict__['colour'] 'brown' >>> Door.colour = 'red' >>> door1.__dict__['colour'] 'white' >>> door1.__class__.__dict__['colour'] 'red' Revenge of the Methods

Let's play the same game with methods. First of all you can see that, just like class attributes, methods are listed only in the class __dict__. Chances are that they behave the same as attributes when we get them

>>> door1.open is Door.open False

Whoops. Let us further investigate the matter

>>> Door.__dict__['open'] <function Door.open at 0xb68604ac> >>> Door.open <function Door.open at 0xb68604ac> >>> door1.open <bound method Door.open of <__main__.Door object at 0xb67e162c>>

So, the class method is listed in the members dictionary as function. So far, so good. The same happens when taking it directly from the class; here Python 2 needed to introduce unbound methods, which are not present in Python 3. Taking it from the instance returns a bound method.

Well, a function is a procedure you named and defined with the def statement. When you refer to a function as part of a class in Python 3 you get a plain function, without any difference from a function defined outside a class.

When you get the function from an instance, however, it becomes a bound method. The name method simply means "a function inside an object", according to the usual OOP definitions, while bound signals that the method is linked to that instance. Why does Python bother with methods being bound or not? And how does Python transform a function into a bound method?

First of all, if you try to call a class function you get an error

>>> Door.open() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: open() missing 1 required positional argument: 'self'

Yes. Indeed the function was defined to require an argument called 'self', and calling it without an argument raises an exception. This perhaps means that we can give it one instance of the class and make it work

>>> Door.open(door1) >>> door1.status 'open'

Python does not complain here, and the method works as expected. So Door.open(door1) is the same as door1.open(), and this is the difference between a plain function coming from a class an a bound method: the bound method automatically passes the instance as an argument to the function.

Again, under the hood, __getattribute__() is working to make everything work and when we call door1.open(), Python actually calls door1.__class__.open(door1). However, door1.__class__.open is a plain function, so there is something more that converts it into a bound method that Python can safely call.

When you access a member of an object, Python calls __getattribute__() to satisfy the request. This magic method, however, conforms to a procedure known as descriptor protocol. For the read access __getattribute__() checks if the object has a __get__() method and calls the latter. So the converstion of a function into a bound method happens through such a mechanism. Let us review it by means of an example.

>>> door1.__class__.__dict__['open'] <function Door.open at 0xb68604ac>

This syntax retrieves the function defined in the class; the function knows nothing about objects, but it is an object (remember "everything is an object"). So we can look inside it with the dir() built-in function

>>> dir(door1.__class__.__dict__['open']) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>> door1.__class__.__dict__['open'].__get__ <method-wrapper '__get__' of function object at 0xb68604ac>

As you can see, a __get__ method is listed among the members of the function, and Python recognizes it as a method-wrapper. This method shall connect the open function to the door1 instance, so we can call it passing the instance alone

>>> door1.__class__.__dict__['open'].__get__(door1) <bound method Door.open of <__main__.Door object at 0xb67e162c>>

and we get exactly what we were looking for. This complex syntax is what happens behind the scenes when we call a method of an instance.

When Methods met Classes

Using type() on functions defined inside classes reveals some other details on their internal representation

>>> Door.open <function Door.open at 0xb687e074> >>> door1.open <bound method Door.open of <__main__.Door object at 0xb6f9834c>> >>> type(Door.open) <class 'function'> >>> type(door1.open) <class 'method'>

As you can see, Python tells the two apart recognizing the first as a function and the second as a method, where the second is a function bound to an instance.

What if we want to define a function that operates on the class instead of operating on the instance? As we may define class attributes, we may also define class methods in Python, through the classmethod decorator. Class methods are functions that are bound to the class and not to an instance.

class Door: colour = 'brown' def __init__(self, number, status): self.number = number self.status = status @classmethod def knock(cls): print("Knock!") def open(self): self.status = 'open' def close(self): self.status = 'closed'

Such a definition makes the method callable on both the instance and the class

>>> door1.knock() Knock! >>> Door.knock() Knock!

and Python identifies both as (bound) methods

>>> door1.__class__.__dict__['knock'] <classmethod object at 0xb67ff6ac> >>> door1.knock <bound method type.knock of <class '__main__.Door'>> >>> Door.knock <bound method type.knock of <class '__main__.Door'>> >>> type(Door.knock) <class 'method'> >>> type(door1.knock) <class 'method'>

As you can see the knock() function accepts one argument, which is called cls just to remember that it is not an instance but the class itself. This means that inside the function we can operate on the class, and the class is shared among instances.

class Door: colour = 'brown' def __init__(self, number, status): self.number = number self.status = status @classmethod def knock(cls): print("Knock!") @classmethod def paint(cls, colour): cls.colour = colour def open(self): self.status = 'open' def close(self): self.status = 'closed'

The paint() classmethod now changes the class attribute colour which is shared among instances. Let's check how it works

>>> door1 = Door(1, 'closed') >>> door2 = Door(2, 'closed') >>> Door.colour 'brown' >>> door1.colour 'brown' >>> door2.colour 'brown' >>> Door.paint('white') >>> Door.colour 'white' >>> door1.colour 'white' >>> door2.colour 'white'

The class method can be called on the class, but this affects both the class and the instances, since the colour attribute of instances is taken at runtime from the shared class.

>>> door1.paint('yellow') >>> Door.colour 'yellow' >>> door1.colour 'yellow' >>> door2.colour 'yellow'

Class methods can be called on instances too, however, and their effect is the same as before. The class method is bound to the class, so it works on the latter regardless of the actual object that calls it (class or instance).

Movie Trivia

Section titles come from the following movies: The Empire Strikes Back (1980), Raiders of the Lost Ark (1981), Revenge of the Nerds (1984), When Harry Met Sally (1989).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

Categories: FLOSS Project Planets

The Digital Cat: Object-Oriented Programming in Python 3 - Objects and types

Tue, 2019-05-21 21:47

This post is available as an IPython Notebook here

About this series

Object-oriented programming (OOP) has been the leading programming paradigm for several decades now, starting from the initial attempts back in the 60s to some of the most important languages used nowadays. Being a set of programming concepts and design methodologies, OOP can never be said to be "correctly" or "fully" implemented by a language: indeed there are as many implementations as languages.

So one of the most interesting aspects of OOP languages is to understand how they implement those concepts. In this post I am going to try and start analyzing the OOP implementation of the Python language. Due to the richness of the topic, however, I consider this attempt just like a set of thoughts for Python beginners trying to find their way into this beautiful (and sometimes peculiar) language.

This series of posts wants to introduce the reader to the Python 3 implementation of Object Oriented Programming concepts. The content of this and the following posts will not be completely different from that of the previous "OOP Concepts in Python 2.x" series, however. The reason is that while some of the internal structures change a lot, the global philosophy doesn't, being Python 3 an evolution of Python 2 and not a new language.

So I chose to split the previous series and to adapt the content to Python 3 instead of posting a mere list of corrections. I find this way to be more useful for new readers, that otherwise sould be forced to read the previous series.

Print

One of the most noticeable changes introduced by Python 3 is the transformation of the print keyword into the print() function. This is indeed a very small change, compared to other modifications made to the internal structures, but is the most visual-striking one, and will be the source of 80% of your syntax errors when you will start writing Python 3 code.

Remember that print is now a function so write print(a) and not print a.

Back to the Object

Computer science deals with data and with procedures to manipulate that data. Everything, from the earliest Fortran programs to the latest mobile apps is about data and their manipulation.

So if data are the ingredients and procedures are the recipes, it seems (and can be) reasonable to keep them separate.

Let's do some procedural programming in Python

# This is some data data = (13, 63, 5, 378, 58, 40) # This is a procedure that computes the average def avg(d): return sum(d)/len(d) print(avg(data))

As you can see the code is quite good and general: the procedure (function) operates on a sequence of data, and it returns the average of the sequence items. So far, so good: computing the average of some numbers leaves the numbers untouched and creates new data.

The observation of the everyday world, however, shows that complex data mutate: an electrical device is on or off, a door is open or closed, the content of a bookshelf in your room changes as you buy new books.

You can still manage it keeping data and procedures separate, for example

# These are two numbered doors, initially closed door1 = [1, 'closed'] door2 = [2, 'closed'] # This procedure opens a door def open_door(door): door[1] = 'open' open_door(door1) print(door1)

I described a door as a structure containing a number and the status of the door (as you would do in languages like LISP, for example). The procedure knows how this structure is made and may alter it.

This also works like a charm. Some problems arise, however, when we start building specialized types of data. What happens, for example, when I introduce a "lockable door" data type, which can be opened only when it is not locked? Let's see

# These are two standard doors, initially closed door1 = [1, 'closed'] door2 = [2, 'closed'] # This is a lockable door, initially closed and unlocked ldoor1 = [1, 'closed', 'unlocked'] # This procedure opens a standard door def open_door(door): door[1] = 'open' # This procedure opens a lockable door def open_ldoor(door): if door[2] == 'unlocked': door[1] = 'open' open_door(door1) print(door1) open_ldoor(ldoor1) print(ldoor1)

Everything still works, no surprises in this code. However, as you can see, I had to find a different name for the procedure that opens a locked door since its implementation differs from the procedure that opens a standard door. But, wait... I'm still opening a door, the action is the same, and it just changes the status of the door itself. So why shall I remember that a locked door shall be opened with open_ldoor() instead of open_door() if the verb is the same?

Chances are that this separation between data and procedures doesn't perfectly fit some situations. The key problem is that the "open" action is not actually using the door; rather it is changing its state. So, just like the volume control buttons of your phone, which are on your phone, the "open" procedure should stick to the "door" data.

This is exactly what leads to the concept of object: an object, in the OOP context, is a structure holding data and procedures operating on them.

What About Type?

When you talk about data you immediately need to introduce the concept of type. This concept may have two meanings that are worth being mentioned in computer science: the behavioural and the structural one.

The behavioural meaning represents the fact that you know what something is by describing how it acts. This is the foundation of the so-called "duck typing" (here "typing" means "to give a type" and not "to type on a keyboard"): if it types acts like a duck, it is a duck.

The structural meaning identifies the type of something by looking at its internal structure. So two things that act in the same way but are internally different are of different type.

Both points of view can be valid, and different languages may implement and emphasize one meaning of type or the other, and even both.

Class Games

Objects in Python may be built describing their structure through a class. A class is the programming representation of a generic object, such as "a book", "a car", "a door": when I talk about "a door" everyone can understand what I'm saying, without the need of referring to a specific door in the room.

In Python, the type of an object is represented by the class used to build the object: that is, in Python the word type has the same meaning of the word class.

For example, one of the built-in classes of Python is int, which represents an integer number

>>> a = 6 >>> print(a) 6 >>> print(type(a)) <class 'int'> >>> print(a.__class__) <class 'int'>

As you can see, the built-in function type() returns the content of the magic attribute __class__ (magic here means that its value is managed by Python itself offstage). The type of the variable a, or its class, is int. (This is a very inaccurate description of this rather complex topic, so remember that at the moment we are just scratching the surface).

Once you have a class you can instantiate it to get a concrete object (an instance) of that type, i.e. an object built according to the structure of that class. The Python syntax to instantiate a class is the same of a function call

>>> b = int() >>> type(b) <class 'int'>

When you create an instance, you can pass some values, according to the class definition, to initialize it.

>>> b = int() >>> print(b) 0 >>> c = int(7) >>> print(c) 7

In this example, the int class creates an integer with value 0 when called without arguments, otherwise it uses the given argument to initialize the newly created object.

Let us write a class that represents a door to match the procedural examples done in the first section

class Door: def __init__(self, number, status): self.number = number self.status = status def open(self): self.status = 'open' def close(self): self.status = 'closed'

The class keyword defines a new class named Door; everything indented under class is part of the class. The functions you write inside the object are called methods and don't differ at all from standard functions; the nomenclature changes only to highlight the fact that those functions now are part of an object.

Methods of a class must accept as first argument a special value called self (the name is a convention but please never break it).

The class can be given a special method called __init__() which is run when the class is instantiated, receiving the arguments passed when calling the class; the general name of such a method, in the OOP context, is constructor, even if the __init__() method is not the only part of this mechanism in Python.

The self.number and self.status variables are called attributes of the object. In Python, methods and attributes are both members of the object and are accessible with the dotted syntax; the difference between attributes and methods is that the latter can be called (in Python lingo you say that a method is a callable).

As you can see the __init__() method shall create and initialize the attributes since they are not declared elsewhere. This is very important in Python and is strictly linked with the way the language handles the type of variables. I will detail those concepts when dealing with polymorphism in a later post.

The class can be used to create a concrete object

>>> door1 = Door(1, 'closed') >>> type(door1) <class '__main__.Door'> >>> print(door1.number) 1 >>> print(door1.status) closed

Now door1 is an instance of the Door class; type() returns the class as __main__.Door since the class was defined directly in the interactive shell, that is in the current main module.

To call a method of an object, that is to run one of its internal functions, you just access it as an attribute with the dotted syntax and call it like a standard function.

>>> door1.open() >>> print(door1.number) 1 >>> print(door1.status) open

In this case, the open() method of the door1 instance has been called. No arguments have been passed to the open() method, but if you review the class declaration, you see that it was declared to accept an argument (self). When you call a method of an instance, Python automatically passes the instance itself to the method as the first argument.

You can create as many instances as needed and they are completely unrelated each other. That is, the changes you make on one instance do not reflect on another instance of the same class.

Recap

Objects are described by a class, which can generate one or more instances, unrelated each other. A class contains methods, which are functions, and they accept at least one argument called self, which is the actual instance on which the method has been called. A special method, __init__() deals with the initialization of the object, setting the initial value of the attributes.

Movie Trivia

Section titles come from the following movies: Back to the Future (1985) , What About Bob? (1991), Wargames (1983).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

Categories: FLOSS Project Planets

Quansight Labs Blog: Spyder 4.0 takes a big step closer with the release of Beta 2!

Tue, 2019-05-21 16:02

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been working hard on our next release. Over the past year, we've implemented the long awaited full-interface dark theme; overhauled our entire code completion and linting architecture to use the Language Server Protocol, opening the door to supporting many other languages in the future; added a new Plots pane to view and manage the figures generated by your code; and numerous other feature enhancements, bug fixes and internal improvements.

Dark theme

A full-interface dark theme has been a long awaited feature, and is enabled by default in Spyder 4. You can still select the light theme under Preferences > Appearance by either choosing a light-background syntax-highlighting scheme, or changing Interface theme to Light.

Pretty, right :-) ?

Read more… (4 min remaining to read)

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Who put Python in the Windows 10 May 2019 Update?

Tue, 2019-05-21 15:59

Today the Windows team announced the May 2019 Update for Windows 10. In this post we’re going to look at what we, the Python team, have done to make Python easier to install on Windows by helping the community publish to the Microsoft Store and, in collaboration with Windows, adding a default “python.exe” command to help find it. You may have already heard about these on the Python Bytes podcast, at PyCon US, or through Twitter.

As software moves from the PC to the cloud, the browser, and the Internet of Things, development workflows are changing. While Visual Studio remains a great starting point for any workload on Windows, many developers now prefer to acquire tools individually and on-demand.

For other operating systems, the platform-endorsed package manager is the traditional place to find individual tools that have been customized, reviewed, and tested for your system. On Windows we are exploring ways to provide a similar experience for developers without impacting non-developer users or infringing publishers’ ability to manage their own releases. The Windows Subsystem for Linux is one approach, offering developers consistency between their build and deployment environments. But there are other developer tools that also matter.

One such tool is Python. Microsoft has been involved with the Python community for over twelve years, and currently employ four of the key contributors to the language and primary runtime. The growth of Python has been incredible, as it finds homes among data scientists, web developers, system administrators, and students, and roughly half of this work is already happening on Windows. And yet, Python developers on Windows find themselves facing more friction than on other platforms.

Installing Python on Windows

It’s been widely known for many years that Windows is the only mainstream operating system that does not include a Python interpreter out of the box. For many users who are never going to need it, this helps reduce the size and improve the security of the operating system. But for those of us who do need it, Python’s absence has been keenly felt.

Once you discover that you need to get Python, you are quickly faced with many choices. Will you download an installer from python.org? Or perhaps a distribution such as Anaconda? The Visual Studio installer is also an option. And which version? How will you access it after it’s been installed? You quickly find more answers than you need, and depending on your situation, any of them might be correct.

We spent time figuring out why someone would hit the error above and what help they need. If you’re already a Python expert with complex needs, you probably know how to install and use it. It’s much more likely that someone will hit this problem the first time they are trying to use Python. Many of the teachers we spoke to confirmed this hypothesis – students encounter this far more often than experienced developers.

So we made things easier.

First, we helped the community release their distribution of Python to the Microsoft Store. This version of Python is fully maintained by the community, installs easily on Windows 10, and automatically makes common commands such as python, pip and idle available (as well as equivalents with version numbers python3 and python3.7, for all the commands, just like on Linux).

Finally, with the May 2019 Windows Update, we are completing the picture. While Python continues to remain completely independent from the operating system, every install of Windows will include python and python3 commands that take you directly to the Python store page. We believe that the Microsoft Store package is perfect for users starting out with Python, and given our experience with and participation in the Python community we are pleased to endorse it as the default choice.

We hope everyone will be as excited as Scott Hanselman was when he discovered it. Over time, we plan to extend similar integration to other developer tools and reduce the getting started friction. We’d love to hear your thoughts, and suggestions, so feel free to post comments here or use the Windows Feedback app.

 

The post Who put Python in the Windows 10 May 2019 Update? appeared first on Python.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #369 (May 21, 2019)

Tue, 2019-05-21 15:30

#369 – MAY 21, 2019
View in Browser »

Interactive Data Visualization in Python With Bokeh

This course will get you up and running with the Bokeh library, using examples and a real-world dataset. You’ll learn how to visualize your data, customize and organize your visualizations, and add interactivity.
REAL PYTHON video

Build a Hardware-Based Face Recognition System for $150 With the Nvidia Jetson Nano and Python

“With the Nvidia Jetson Nano, you can build stand-alone hardware systems that run GPU-accelerated deep learning models on a tiny budget. It’s just like a Raspberry Pi, but a lot faster.”
ADAM GEITGEY

Leverage Data Science to Optimize Your Application

PyCharm 2019.1 Professional Edition has all-new Jupyter Notebooks support. You can use the same IDE that you use for building your application to analyze the data to improve it. Try it now →
JETBRAINS sponsor

PEP 581 Accepted (Using GitHub Issues for CPython)

CPython’s issue tracker will be migrated from Roundup to GitHub issues.
PYTHON.ORG

Batteries Included, but They’re Leaking

“Amber Brown of the Twisted project shared her criticisms of the Python standard library [at PyCon 2019]. This proved to be the day’s most controversial talk; Guido van Rossum stormed from the room during Q & A.” Related discussion on Hacker News
PYFOUND.BLOGSPOT.COM

Unicode & Character Encodings in Python: A Painless Guide

Get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.
REAL PYTHON

Python Built-Ins Worth Learning

“Which built-ins should you know about? I estimate most Python developers will only ever need about 30 built-in functions, but which 30 depends on what you’re actually doing with Python.”
TREY HUNNER

PSF Q2 2019 Fundraiser

Support the Python Software Foundation by donating in the quarterly donation drive. Your donations help fund Python conferences, workshops, user groups, community web services, and more.
PYTHON.ORG

Discussions Black: The Uncompromising Python Code Formatter

“I often dislike autoformatter output too, but then I remember that while no-one likes what the autoformatter does to their code, everyone likes what the autoformatter does to their coworkers’ code, and then I chill out about it. Having a standard is more important than the standard being excellent.”
HACKER NEWS

Python Jobs SIPS Programmer (Madison, WI)

University of Wisconsin

Senior API Developer (Copenhagen, Denmark)

GameAnalytics Ltd.

Senior Backend Python Developer (Remote)

Kiwi.com

More Python Jobs >>>

Articles & Tutorials Structuring Your Python Project

“Which functions should go into which modules? How does data flow through the project? What features and functions can be grouped together and isolated? By answering questions like these you can begin to plan, in a broad sense, what your finished product will look like.”
PYTHON-GUIDE.ORG

5 Reasons Why People Are Choosing Masonite Over Django

The creator of Masonite explains why you should consider Masonite for use in your next Python web dev project.
JOSEPH MANCUSO

Join a Community of 3.5 Million Developers on DigitalOcean

Discover why Python developers love self-hosting their apps on DigitalOcean, the simplest cloud platform. Click here to learn more and get started within minutes →
DIGITALOCEAN sponsor

New Features Planned for Python 4.0 (Satire)

“All new libraries and standard lib modules must include the phrase “for humans” somewhere in their title.” “Type-hinting has been extended to provide even fewer tangible benefits. This new and “nerfed” type hinting will be called type whispering.”
CHARLES LEIFER

Announcing the Wolfram Client Library for Python

Get full access to the Wolfram Language from Python.
WOLFRAM.COM

Scalable Python Code With Pandas UDFs

“Pandas UDFs are a feature that enable Python code to run in a distributed environment, even if the library was developed for single node execution.”
BEN WEBER

Three Ways of Storing and Accessing Lots of Images in Python

In this tutorial, you’ll cover three ways of storing and accessing lots of images in Python. You’ll also see experimental evidence for the performance benefits and drawbacks of each one.
REAL PYTHON

Support for Python 2 Ends in 2019, and It’s High Time for Developers to Take Action

Your weekly reminder :-)
REVYUH.COM

Docker Is Different: Configuring Gunicorn for Containers

This article covers preventing slowness due to worker heartbeats, configuring the number of workers, and logging to stdout.
PYTHONSPEED.COM

Come to Mexico for PyCon Latam 2019

Come join us in beautiful Puerto Vallarta in the first installment of this conference. With an all-inclusive ticket that covers food and lodging, you can’t miss this opportunity!
PYCON sponsor

Remote Development Using PyCharm

VUYISILE NDLOVU

Overview of Async IO in Python 3.7

STACKABUSE.COM

Python Project Tooling Explained

SIMONE ROBUTTI

Projects & Code Stackless Python

“Stackless Python, or Stackless, is a Python programming language interpreter, so named because it avoids depending on the C call stack for its own stack. […] The most prominent feature of Stackless is microthreads, which avoid much of the overhead associated with usual operating system threads.”
WIKIPEDIA.ORG

DeleteFB: Selenium Script to Delete All of Your Facebook Wall Posts

GITHUB.COM/WESKERFOOT

pydebug: Decorators for Debugging Python

GITHUB.COM/BENMEZGER

PyMedPhys: Common, Core Python Package for Medical Physics

PYMEDPHYS.COM

Django-CRM: Sales Dashboard

DJANGO-CRM.READTHEDOCS.IO

Events PyLadies Bratislava

Thursday, May 23
FACEBOOK.COM • Shared by Filipa Andrade

PyConWeb 2019

May 25 to May 27, 2019
PYCONWEB.COM

Python Toulouse Meetup

June 3 in Toulouse
MEETUP.COM • Shared by Thibault Ducret

PyLondinium 2019

June 14–16 in London, UK
PYLONDINIUM.ORG • Shared by Carlos Pereira Atencio

Dash Conference

July 16–17 in NYC
DASHCON.IO

Happy Pythoning!
This was PyCoder’s Weekly Issue #369.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

CodeGrades: Hello CodeGrades!

Tue, 2019-05-21 12:30

This is a blog about CodeGrades, an experiment to help folks learn about programming (initially in Python). We’ll use it to celebrate the successes, learn from the failures and reflect upon the feedback of participants. We’ll also share project news here too.

So, what are CodeGrades?

At a time when technology is finding its way into every aspect of our lives many folks want to be more than just passive consumers of technology. They feel a desire to become creators of technology. They want to take control of their digital world. They want the skills to make their technology reflect their own needs.

This is where CodeGrades come in…

CodeGrades are a programming version of time-proven techniques like music grades or belts in martial arts. Learners level up by applying the knowledge and skills needed for each grade to their own fun, interesting and challenging coding projects. Learners present their projects to professional software developers who assess the projects against the criteria for the grade being taken and provide a set of marks and written feedback so the learner can see where they’re doing well, what needs to improve and what their next steps may be.

CodeGrades are eight cumulative steps for learning how to write code. The first grade is easy enough for most people to take as a first step into programming. The eighth grade is of equivalent standard to the skills and knowledge needed to be an effective junior professional software developer. The middle grades bridge the way so the skill gaps between each of the grades is achievable. They’re like stepping stones into coding, or perhaps a modern day Gradus ad Parnassum.

The syllabus for CodeGrades is written by professional software developers. The grades reflect current best practice found in the software industry. They offer a framework for sustained and structured long term learning to write code. All the resources associated with CodeGrades are free, learners only pay to take the grading. Grades will be competitively priced and will certainly not cost the many thousands needed to attend a code bootcamp.

Passing a grade is undeniable evidence that an expert programmer believes the learner has attained the level of competence, knowledge and skill for the grade taken. Nobody can take that achievement away. It’s something to be celebrated and gives learners the confidence and momentum to continue on their path to programming mastery.

The professional developers who assess the candidates in a grading (we call them code mentors because that sounds more friendly than examiners) are paid for their time at a level comensurate to that of a senior software engineer. We like to think this may be an alternative source of income for FLOSS developers who want to concentrate on their software projects rather than work in an office.

That’s it in a nutshell.

It’s early days but we have already successfully graduated a first cohort of learners through “grade 1 Python” (with better than expected outcomes). We have just started a second cohort of learners to test the new syllabus (more on that soon) and hope to engage with further test cohorts over the summer. Eventually we will open up our website so learners will be able to book and pay for grading. We expect this to happen by the end of 2019 at the latest.

There is still much to do! If you think you could support our work, or perhaps you have feedback or maybe want to get more involved, please don’t hesitate to get in touch via the email address at the bottom of this page.

Onwards and upwards.

Categories: FLOSS Project Planets

Trey Hunner: Python built-ins worth learning

Tue, 2019-05-21 11:40

In every Intro to Python class I teach, there’s always at least one “how can we be expected to know all this” question.

It’s usually along the lines of either:

  1. Python has so many functions in it, what’s the best way to remember all these?
  2. What’s the best way to learn the functions we’ll need day-to-day like enumerate and range?
  3. How do you know about all the ways to solve problems in Python? Do you memorize them?

There are dozens of built-in functions and classes, hundreds of tools bundled in Python’s standard library, and thousands of third-party libraries on PyPI. There’s no way anyone could ever memorize all of these things.

I recommend triaging your knowledge:

  1. Things I should memorize such that I know them well
  2. Things I should know about so I can look them up more effectively later
  3. Things I shouldn’t bother with at all until/unless I need them one day

We’re going to look through the Built-in Functions page in the Python documentation with this approach in mind.

This will be a very long article, so I’ve linked to 5 sub-sections and 20 specific built-in functions in the next section so you can jump ahead if you’re pressed for time or looking for one built-in in particular.

    Which built-ins should you know about?

    I estimate most Python developers will only ever need about 30 built-in functions, but which 30 depends on what you’re actually doing with Python.

    We’re going to take a look at all 69 of Python’s built-in functions, in a birds eye view sort of way.

    I’ll attempt to categorize these built-ins into five categories:

    1. Commonly known: most newer Pythonistas get exposure to these built-ins pretty quickly out of necessity
    2. Overlooked by beginners: these functions are useful to know about, but they’re easy to overlook when you’re newer to Python
    3. Learn it later: these built-ins are generally useful to know about, but you’ll find them when/if you need them
    4. Maybe learn it eventually: these can come in handy, but only in specific circumstances
    5. You likely don’t need these: you’re unlikely to need these unless you’re doing something fairly specialized

    The built-in functions in categories 1 and 2 are the essential built-ins that nearly all Python programmers should eventually learn about. The built-ins in categories 3 and 4 are the specialized built-ins, which are often very useful but your need for them will vary based on your use for Python. And category 5 are arcane built-ins, which might be very handy when you need them but which many Python programmers are likely to never need.

    Note for pedantic Pythonistas: I will be referring to all of these built-ins as functions, even though 27 of them aren’t actually functions (as discussed in my functions and callables article).

    The commonly known built-in functions (which you likely already know about):

    1. print
    2. len
    3. str
    4. int
    5. float
    6. list
    7. tuple
    8. dict
    9. set
    10. range

    The built-in functions which are often overlooked by newer Python programmers:

    1. sum
    2. enumerate
    3. zip
    4. bool
    5. reversed
    6. sorted
    7. min
    8. max
    9. any
    10. all

    There are also 5 commonly overlooked built-ins which I recommend knowing about solely because they make debugging easier: dir, var, breakpoint, type, help.

    In addition to the 25 built-in functions above, we’ll also briefly see the other 44 built-ins in the learn it later maybe learn it eventually and you likely don’t need these sections.

    10 Commonly known built-in functions

    If you’ve been writing Python code, these built-ins are likely familiar already.

    print

    You already know the print function. Implementing hello world requires print.

    You may not know about the various keyword arguments accepted by print though:

    1 2 3 4 5 6 7 8 9 >>> words = ["Welcome", "to", "Python"] >>> print(words) ['Welcome', 'to', 'Python'] >>> print(*words, end="!\n") Welcome to Python! >>> print(*words, sep="\n") Welcome to Python

    You can look up print on your own.

    len

    In Python, we don’t write things like my_list.length() or my_string.length; instead we strangely (for new Pythonistas at least) say len(my_list) and len(my_string).

    1 2 3 >>> words = ["Welcome", "to", "Python"] >>> len(words) 3

    Regardless of whether you like this operator-like len function, you’re stuck with it so you’ll need to get used to it.

    str

    Unlike many other programming languages, you cannot concatenate strings and numbers in Python.

    1 2 3 4 5 >>> version = 3 >>> "Python " + version Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate str (not "int") to str

    Python refuses to coerce that 3 integer to a string, so we need to manually do it ourselves, using the built-in str function (class technically, but as I said, I’ll be calling these all functions):

    1 2 3 >>> version = 3 >>> "Python " + str(version) 'Python 3' int

    Do you have user input and need to convert it to a number? You need the int function!

    The int function can convert strings to integers:

    1 2 3 4 >>> program_name = "Python 3" >>> version_number = program_name.split()[-1] >>> int(version_number) 3

    You can also use int to truncate a floating point number to an integer:

    1 2 3 4 5 >>> from math import sqrt >>> sqrt(28) 5.291502622129181 >>> int(sqrt(28)) 5

    Note that if you need to truncate while dividing, the // operator is likely more appropriate (though this works differently with negative numbers): int(3 / 2) == 3 // 2.

    float

    Is the string you’re converting to a number not actually an integer? Then you’ll want to use float instead of int for this conversion.

    1 2 3 4 5 6 7 8 9 >>> program_name = "Python 3" >>> version_number = program_name.split()[-1] >>> float(version_number) 3.0 >>> pi_digits = '3.141592653589793238462643383279502884197169399375' >>> len(pi_digits) 50 >>> float(pi_digits) 3.141592653589793

    You can also use float to convert integers to floating point numbers.

    In Python 2, we used to use float to convert integers to floating point numbers to force float division instead of integer division. “Integer division” isn’t a thing anymore in Python 3 (unless you’re specifically using the // operator), so we don’t need float for that purpose anymore. So if you ever see float(x) / y in your Python 3 code, you can change that to just x / y.

    list

    Want to make a list out of some other iterable?

    The list function does that:

    1 2 3 4 5 6 7 >>> numbers = [2, 1, 3, 5, 8] >>> squares = (n**2 for n in numbers) >>> squares <generator object <genexpr> at 0x7fd52dbd5930> >>> list_of_squares = list(squares) >>> list_of_squares [4, 1, 9, 25, 64]

    If you know you’re working with a list, you could use the copy method to make a new copy of a list:

    1 >>> copy_of_squares = list_of_squares.copy()

    But if you don’t know what the iterable you’re working with is, the list function is the more general way to loop over an iterable and copy it:

    1 >>> copy_of_squares = list(list_of_squares)

    You could also use a list comprehension for this, but I wouldn’t recommend it.

    Note that when you want to make an empty list, using the list literal syntax (those [] brackets) is recommended:

    1 2 >>> my_list = list() # Don't do this >>> my_list = [] # Do this instead

    Using [] is considered more idiomatic since those square brackets ([]) actually look like a Python list.

    tuple

    The tuple function is pretty much just like the list function, except it makes tuples instead:

    1 2 3 >>> numbers = [2, 1, 3, 4, 7] >>> tuple(numbers) (2, 1, 3, 4, 7)

    If you need a tuple instead of a list, because you’re trying to make a hashable collection for use in a dictionary key for example, you’ll want to reach for tuple over list.

    dict

    The dict function makes a new dictionary.

    Similar to like list and tuple, the dict function is equivalent to looping over an iterable of key-value pairs and making a dictionary from them.

    Given a list of two-item tuples:

    1 >>> color_counts = [('red', 2), ('green', 1), ('blue', 3), ('purple', 5)]

    This:

    1 2 3 4 5 6 >>> colors = {} >>> for color, n in color_counts: ... colors[color] = n ... >>> colors {'red': 2, 'green': 1, 'blue' 3, 'purple': 5}

    Can instead be done with the dict function:

    1 2 3 >>> colors = dict(color_counts) >>> colors {'red': 2, 'green': 1, 'blue' 3, 'purple': 5}

    The dict function accepts two types of arguments:

    1. another dictionary (mapping is the generic term), in which case that dictionary will be copied
    2. a list of key-value tuples (more correctly, an iterable of two-item iterables), in which case a new dictionary will be constructed from these

    So this works as well:

    1 2 3 4 5 >>> colors {'red': 2, 'green': 1, 'blue' 3, 'purple': 5} >>> new_dictionary = dict(colors) >>> new_dictionary {'red': 2, 'green': 1, 'blue' 3, 'purple': 5}

    The dict function can also accept keyword arguments to make a dictionary with string-based keys:

    1 2 3 >>> person = dict(name='Trey Hunner', profession='Python Trainer') >>> person {'name': 'Trey Hunner', 'profession': 'Python Trainer'}

    But I very much prefer to use a dictionary literal instead:

    1 2 3 >>> person = {'name': 'Trey Hunner', 'profession': 'Python Trainer'} >>> person {'name': 'Trey Hunner', 'profession': 'Python Trainer'}

    The dictionary literal syntax is more flexible and a bit faster but most importantly I find that it more clearly conveys the fact that we are creating a dictionary.

    Like with list and tuple, an empty dictionary should be made using the literal syntax as well:

    1 2 >>> my_list = dict() # Don't do this >>> my_list = {} # Do this instead

    Using {} is slightly more CPU efficient, but more importantly it’s more idiomatic: it’s common to see curly braces ({}) used for making dictionaries but dict is seen much less frequently.

    set

    The set function makes a new set. It takes an iterable of hashable values (strings, numbers, or other immutable types) and returns a set:

    1 2 3 >>> numbers = [1, 1, 2, 3, 5, 8] >>> set(numbers) {1, 2, 3, 5, 8}

    There’s no way to make an empty set with the {} set literal syntax (plain {} makes a dictionary), so the set function is the only way to make an empty set:

    1 2 3 >>> numbers = set() >>> numbers set()

    Actually that’s a lie because we have this:

    1 2 >>> {*()} # This makes an empty set set()

    But that syntax is confusing (it relies on a lesser-used feature of the * operator), so I don’t recommend it.

    range

    The range function gives us a range object, which represents a range of numbers:

    1 2 3 4 >>> range(10_000) range(0, 10000) >>> range(-1_000_000_000, 1_000_000_000) range(-1000000000, 1000000000)

    The resulting range of numbers includes the start number but excludes the stop number (range(0, 10) does not include 10).

    The range function is useful when you’d like to loop over numbers.

    1 2 3 4 5 6 7 8 >>> for n in range(0, 50, 10): ... print(n) ... 0 10 20 30 40

    A common use case is to do an operation n times (that’s a list comprehension by the way):

    1 first_five = [get_things() for _ in range(5)]

    Python 2’s range function returned a list, which means the expressions above would make very very large lists. Python 3’s range works like Python 2’s xrange (though they’re a bit different) in that numbers are computed lazily as we loop over these range objects.

    Built-ins overlooked by new Pythonistas

    If you’ve been programming Python for a bit or if you just taken an introduction to Python class, you probably already knew about the built-in functions above.

    I’d now like to show off 15 built-in functions that are very handy to know about, but are more frequently overlooked by new Pythonistas.

    The first 10 of these functions you’ll find floating around in Python code, but the last 5 you’ll most often use while debugging.

    bool

    The bool function checks the truthiness of a Python object.

    For numbers, truthiness is a question of non-zeroness:

    1 2 3 4 5 6 >>> bool(5) True >>> bool(-1) True >>> bool(0) False

    For collections, truthiness is usually a question of non-emptiness (whether the collection has a length greater than 0):

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >>> bool('hello') True >>> bool('') False >>> bool(['a']) True >>> bool([]) False >>> bool({}) False >>> bool({1: 1, 2: 4, 3: 9}) True >>> bool(range(5)) True >>> bool(range(0)) False >>> bool(None) False

    Truthiness (called truth value testing in the docs) is kind of a big deal in Python.

    Instead of asking questions about the length of a container, many Pythonistas ask questions about truthiness instead:

    1 2 3 4 5 6 7 # Instead of doing this if len(numbers) == 0: print("The numbers list is empty") # Many of us do this if not numbers: print("The numbers list is empty")

    You likely won’t see bool used often, but on the occasion that you need to coerce a value to a boolean to ask about its truthiness, you’ll want to know about bool.

    enumerate

    Whenever you need to count upward, one number at a time, while looping over an iterable at the same time, the enumerate function will come in handy.

    That might seem like a very niche task, but it comes up quite often.

    For example we might want to keep track of the line number in a file:

    1 2 3 4 5 6 7 >>> with open('hello.txt', mode='rt') as my_file: ... for n, line in enumerate(my_file, start=1): ... print(f"{n:03}", line) ... 001 This is the first line of the file 002 This is the second line 003 This is the last line of the file

    The enumerate function is also very commonly used to keep track of the index of items in a sequence.

    1 2 3 4 5 6 def palindromic(sequence): """Return True if the sequence is the same thing in reverse.""" for i, item in enumerate(sequence): if item != sequence[-(i+1)]: return False return True

    Note that you may see newer Pythonistas use range(len(sequence)) in Python. If you ever see code with range(len(...)), you’ll almost always want to use enumerate instead.

    1 2 3 4 5 6 def palindromic(sequence): """Return True if the sequence is the same thing in reverse.""" for i in range(len(sequence)): if sequence[i] != sequence[-(i+1)]: return False return True

    If enumerate is news to you (or if you often use range(len(...))), see my article on looping with indexes in Python.

    zip

    The zip function is even more specialized than enumerate.

    The zip function is used for looping over multiple iterables at the same time. We actually used it above in the explanations of list and dict.

    1 2 3 4 5 6 7 8 9 10 11 >>> one_iterable = [2, 1, 3, 4, 7, 11] >>> another_iterable = ['P', 'y', 't', 'h', 'o', 'n'] >>> for n, letter in zip(one_iterable, another_iterable): ... print(letter, n) ... P 2 y 1 t 3 h 4 o 7 n 11

    If you ever have to loop over two lists (or any other iterables) at the same time, zip is preferred over enumerate. The enumerate function is handy when you need indexes while looping, but zip is great when we care specifically about looping over two iterables at once.

    If you’re new to zip, I also talk about it in my looping with indexes article.

    Both enumerate and zip return iterators to us. Iterators are the lazy iterables that power for loops. I have a whole talk on iterators as well as a somewhat advanced article on how to make your own iterators.

    By the way, if you need to use zip on iterables of different lengths, you may want to look up itertools.zip_longest in the Python standard library.

    reversed

    The reversed function, like enumerate and zip, returns an iterator.

    1 2 3 >>> numbers = [2, 1, 3, 4, 7] >>> reversed(numbers) <list_reverseiterator object at 0x7f3d4452f8d0>

    The only thing we can do with this iterator is loop over it (but only once):

    1 2 3 4 5 >>> reversed_numbers = reversed(numbers) >>> list(reversed_numbers) [7, 4, 3, 1, 2] >>> list(reversed_numbers) []

    Like enumerate and zip, reversed is a sort of looping helper function. You’ll pretty much see reversed used exclusively in the for part of a for loop:

    1 2 3 4 5 6 7 8 >>> for n in reversed(numbers): ... print(n) ... 7 4 3 1 2

    There are some other ways to reverse Python lists besides the reversed function:

    1 2 3 4 5 6 7 8 # Slicing syntax for n in numbers[::-1]: print(n) # In-place reverse method numbers.reverse() for n in numbers: print(n)

    But the reversed function is usually the best way to reverse any iterable in Python.

    Unlike the list reverse method (e.g. numbers.reverse()), reversed doesn’t mutate the list (it returns an iterator of the reversed items instead).

    Unlike the numbers[::-1] slice syntax, reversed(numbers) doesn’t build up a whole new list: the lazy iterator it returns retrieves the next item in reverse as we loop. Also reversed(numbers) is a lot more readable than numbers[::-1] (which just looks weird if you’ve never seen that particular use of slicing before).

    If we combine the non-copying nature of the reversed and zip functions, we can rewrite the palindromic function (from enumerate above) without taking any extra memory (no copying of lists is done here):

    1 2 3 4 5 6 def palindromic(sequence): """Return True if the sequence is the same thing in reverse.""" for n, m in zip(sequence, reversed(sequence)): if n != m: return False return True sum

    The sum function takes an iterable of numbers and returns the sum of those numbers.

    1 2 >>> sum([2, 1, 3, 4, 7]) 17

    There’s not much more to it than that.

    Python has lots of helper functions that do the looping for you, partly because they pair nicely with generator expressions:

    1 2 3 >>> numbers = [2, 1, 3, 4, 7, 11, 18] >>> sum(n**2 for n in numbers) 524

    If you’re curious about generator expressions, I discuss them in my Comprehensible Comprehensions talk (and my 3 hour tutorial on comprehensions and generator expressions).

    min and max

    The min and max functions do what you’d expect: they give you the minimum and maximum items in an iterable.

    1 2 3 4 5 >>> numbers = [2, 1, 3, 4, 7, 11, 18] >>> min(numbers) 1 >>> max(numbers) 18

    The min and max functions compare the items given to them by using the < operator. So all values need to be orderable and comparable to each other (fortunately many objects are orderable in Python).

    The min and max functions also accept a key function to allow customizing what “minimum” and “maximum” really mean for specific objects.

    sorted

    The sorted function takes any iterable and returns a new list of all the values in that iterable in sorted order.

    1 2 3 4 5 6 >>> numbers = [1, 8, 2, 13, 5, 3, 1] >>> words = ["python", "is", "lovely"] >>> sorted(words) ['is', 'lovely', 'python'] >>> sorted(numbers, reverse=True) [13, 8, 5, 3, 2, 1, 1]

    The sorted function, like min and max, compares the items given to it by using the < operator, so all values given to it need so to be orderable.

    The sorted function also allows customization of its sorting via a key function (just like min and max).

    By the way, if you’re curious about sorted versus the list.sort method, Florian Dahlitz wrote an article comparing the two.

    any and all

    The any and all functions can be paired with a generator expression to determine whether any or all items in an iterable match a given condition.

    Our palindromic function from earlier checked whether all items were equal to their corresponding item in the reversed sequence (is the first value equal to the last, second to the second from last, etc.).

    We could rewrite palindromic using all like this:

    1 2 3 4 5 6 def palindromic(sequence): """Return True if the sequence is the same thing in reverse.""" return all( n == m for n, m in zip(sequence, reversed(sequence)) )

    Negating the condition and the return value from all would allow us to use any equivalently (though this is more confusing in this example):

    1 2 3 4 5 6 def palindromic(sequence): """Return True if the sequence is the same thing in reverse.""" return not any( n != m for n, m in zip(sequence, reversed(sequence)) )

    If the any and all functions are new to you, you may want to read my article on them: Checking Whether All Items Match a Condition in Python.

    The 5 debugging functions

    The following 5 functions will be useful for debugging and troubleshooting code.

    breakpoint

    Need to pause the execution of your code and drop into a Python command prompt? You need breakpoint!

    Calling the breakpoint function will drop you into pdb, the Python debugger. There are many tutorials and talks out there on PDB: here’s a short one and here’s a long one.

    This built-in function was added in Python 3.7, but if you’re on older versions of Python you can get the same behavior with import pdb ; pdb.set_trace().

    dir

    The dir function can be used for two things:

    1. Seeing a list of all your local variables
    2. Seeing a list of all attributes on a particular object

    Here we can see that our local variables, right after starting a new Python shell and then after creating a new variable x:

    1 2 3 4 5 >>> dir() ['__annotations__', '__doc__', '__name__', '__package__'] >>> x = [1, 2, 3, 4] >>> dir() ['__annotations__', '__doc__', '__name__', '__package__', 'x']

    If we pass that x list into dir we can see all the attributes it has:

    1 2 >>> dir(x) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

    We can see the typical list methods, append, pop, remove, and more as well as many dunder methods for operator overloading.

    vars

    The vars function is sort of a mashup of two related things: checking locals() and testing the __dict__ attribute of objects.

    When vars is called with no arguments, it’s equivalent to calling the locals() built-in function (which shows a dictionary of all local variables and their values).

    1 2 >>> vars() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}

    When it’s called with an argument, it accesses the __dict__ attribute on that object (which on many objects represents a dictionary of all instance attributes).

    1 2 3 >>> from itertools import chain >>> vars(chain) mappingproxy({'__getattribute__': <slot wrapper '__getattribute__' of 'itertools.chain' objects>, '__iter__': <slot wrapper '__iter__' of 'itertools.chain' objects>, '__next__': <slot wrapper '__next__' of 'itertools.chain' objects>, '__new__': <built-in method __new__ of type object at 0x5611ee76fac0>, 'from_iterable': <method 'from_iterable' of 'itertools.chain' objects>, '__reduce__': <method '__reduce__' of 'itertools.chain' objects>, '__setstate__': <method '__setstate__' of 'itertools.chain' objects>, '__doc__': 'chain(*iterables) --> chain object\n\nReturn a chain object whose .__next__() method returns elements from the\nfirst iterable until it is exhausted, then elements from the next\niterable, until all of the iterables are exhausted.'})

    If you ever try to use my_object.__dict__, you can use vars instead.

    I usually reach for dir just before using vars.

    type

    The type function will tell you the type of the object you pass to it.

    The type of a class instance is the class itself:

    1 2 3 >>> x = [1, 2, 3] >>> type(x) <class 'list'>

    The type of a class is its metaclass, which is usually type:

    1 2 3 4 >>> type(list) <class 'type'> >>> type(type(x)) <class 'type'>

    If you ever see someone reach for __class__, know that they could reach for the higher-level type function instead:

    1 2 3 4 >>> x.__class__ <class 'list'> >>> type(x) <class 'list'>

    The type function is sometimes helpful in actual code (especially object-oriented code with inheritance and custom string representations), but it’s also useful when debugging.

    Note that when type checking, the isinstance function is usually used instead of type (also note that we tend not to type check in Python because we prefer to practice duck typing).

    help

    If you’re in an interactive Python shell (the Python REPL as I usually call it), maybe debugging code using breakpoint, and you’d like to know how a certain object, method, or attribute works, the help function will come in handy.

    Realistically, you’ll likely resort to getting help from your favorite search engine more often than using help. But if you’re already in a Python REPL, it’s quicker to call help(list.insert) than it would be to look up the list.insert method documentation in Google.

    Learn it later

    There are quite a few built-in functions you’ll likely want eventually, but you may not need right now.

    I’m going to mention 14 more built-in functions which are handy to know about, but not worth learning until you actually need to use them.

    open

    Need to open a file in Python? You need the open function!

    Don’t work with files directly? Then you likely don’t need the open function!

    You might think it’s odd that I’ve put open in this section because working with files is so common. While most programmers will read or write to files using open at some point, some Python programmers, such as Django developers, may not use the open function very much (if at all).

    Once you need to work with files, you’ll learn about open. Until then, don’t worry about it.

    By the way, you might want to look into pathlib (which is in the Python standard library) as an alternative to using open. I love the pathlib module so much I’ve considered teaching files in Python by mentioning pathlib first and the built-in open function later.

    input

    The input function prompts the user for input, waits for them to hit the Enter key, and then returns the text they typed.

    Reading from standard input (which is what the input function does) is one way to get inputs into your Python program, but there are so many other ways too! You could accept command-line arguments, read from a configuration file, read from a database, and much more.

    You’ll learn this once you need to prompt the user of a command-line program for input. Until then, you won’t need it. And if you’ve been writing Python for a while and don’t know about this function, you may simply never need it.

    repr

    Need the programmer-readable representation of an object? You need the repr function!

    For many objects, the str and repr representations are the same:

    1 2 3 4 >>> str(4), repr(4) ('4', '4') >>> str([]), repr([]) ('[]', '[]')

    But for some objects, they’re different:

    1 2 3 4 5 >>> str('hello'), repr("hello") ('hello', "'hello'") >>> from datetime import date >>> str(date(2020, 1, 1)), repr(date(2020, 1, 1)) ('2020-01-01', 'datetime.date(2020, 1, 1)')

    The string representation we see at the Python REPL uses repr, while the print function relies on str:

    1 2 3 4 5 6 7 8 >>> date(2020, 1, 1) datetime.date(2020, 1, 1) >>> "hello!" 'hello!' >>> print(date(2020, 1, 1)) 2020-01-01 >>> print("hello!") hello!

    You’ll see repr used when logging, handling exceptions, and implementing dunder methods.

    super

    If you create classes in Python, you’ll likely need to use super. The super function is pretty much essential whenever you’re inheriting from another Python class.

    Many Python users rarely create classes. Creating classes isn’t an essential part of Python, though many types of programming require it. For example, you can’t really use the Django web framework without creating classes.

    If you don’t already know about super, you’ll end up learning this if and when you need it.

    property

    The property function is a decorator and a descriptor (only click those weird terms if you’re extra curious) and it’ll likely seem somewhat magical when you first learn about it.

    This decorator allows us to create an attribute which will always seem to contain the return value of a particular function call. It’s easiest to understand with an example.

    Here’s a class that uses property:

    1 2 3 4 5 6 7 8 class Circle: def __init__(self, radius=1): self.radius = radius @property def diameter(self): return self.radius * 2

    Here’s an access of that diameter attribute on a Circle object:

    1 2 3 4 5 6 >>> circle = Circle() >>> circle.diameter 2 >>> circle.radius = 5 >>> circle.diameter 10

    If you’re doing object-oriented Python programming (you’re making classes a whole bunch), you’ll likely want to learn about property at some point. Unlike other object-orient programming languages, we use properties instead of getter methods and setter methods.

    issubclass and isinstance

    The issubclass function checks whether a class is a subclass of one or more other classes.

    1 2 3 4 5 6 >>> issubclass(int, bool) False >>> issubclass(bool, int) True >>> issubclass(bool, object) True

    The isinstance function checks whether an object is an instance of one or more classes.

    1 2 3 4 5 6 7 8 >>> isinstance(True, str) False >>> isinstance(True, bool) True >>> isinstance(True, int) True >>> isinstance(True, object) True

    You can think of isinstance as delegating to issubclass:

    1 2 3 4 5 6 7 8 >>> issubclass(type(True), str) False >>> issubclass(type(True), bool) True >>> issubclass(type(True), int) True >>> issubclass(type(True), object) True

    If you’re overloading operators (e.g. customizing what the + operator does on your class) you might need to use isinstance, but in general we try to avoid strong type checking in Python so we don’t see these much.

    In Python we usually prefer duck typing over type checking. These functions actually do a bit more than the strong type checking I noted above (the behavior of both can be customized) so it’s actually possible to practice a sort of isinstance-powered duck typing with abstract base classes like collections.abc.Iterable. But this isn’t seen much either (partly because we tend to practice exception-handling and EAFP a bit more than condition-checking and LBYL in Python).

    The last two paragraphs were filled with confusing jargon that I may explain more thoroughly in a future serious of articles if there’s enough interest.

    hasattr, getattr, setattr, and delattr

    Need to work with an attribute on an object but the attribute name is dynamic? You need hasattr, getattr, setattr, and delattr.

    Say we have some thing object we want to check for a particular value on:

    1 2 3 >>> class Thing: pass ... >>> thing = Thing()

    The hasattr function allows us to check whether the object has a certain attribute:

    1 2 3 4 5 >>> hasattr(thing, 'x') False >>> thing.x = 4 >>> hasattr(thing, 'x') True

    The getattr function allows us to retrieve the value of that attribute:

    1 2 >>> getattr(thing, 'x') 4

    The setattr function allows for setting the value:

    1 2 3 >>> setattr(thing, 'x', 5) >>> thing.x 5

    And delattr deletes the attribute:

    1 2 3 4 5 >>> delattr(thing, 'x') >>> thing.x Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Thing' object has no attribute 'x'

    These functions allow for a specific flavor of metaprogramming and you likely won’t see them often.

    classmethod and staticmethod

    The classmethod and staticmethod decorators are somewhat magical in the same way the property decorator is somewhat magical.

    If you have a method that should be callable on either an instance or a class, you want the classmethod decorator. Factory methods (alternative constructors) are a common use case for this:

    1 2 3 4 5 6 7 8 9 10 class RomanNumeral: """A Roman numeral, represented as a string and numerically.""" def __init__(self, number): self.value = number @classmethod def from_string(cls, string): return cls(roman_to_int(string)) # function doesn't exist yet

    It’s a bit harder to come up with a good use for staticmethod, since you can pretty much always use a module-level function instead of a static method.

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class RomanNumeral: """A Roman numeral, represented as a string and numerically.""" SYMBOLS = {'M': 1000, 'D': 500, 'C': 100, 'L': 50, 'X': 10, 'V': 5, 'I': 1} def __init__(self, number): self.value = number @classmethod def from_string(cls, string): return cls(cls.roman_to_int(string)) @staticmethod def roman_to_int(numeral): total = 0 for symbol, next_symbol in zip_longest(numeral, numeral[1:]): value = RomanNumeral.SYMBOLS[symbol] next_value = RomanNumeral.SYMBOLS.get(next_symbol, 0) if value < next_value: value = -value total += value return total

    The above roman_to_int function doesn’t require access to the instance or the class, so it doesn’t even need to be a @classmethod. There’s no actual need to make this function a staticmethod (instead of a classmethod): staticmethod is just more restrictive to signal the fact that we’re not reliant on the class our function lives on.

    I find that learning these causes folks to think they need them when they often don’t. You can go looking for these if you really need them eventually.

    next

    The next function returns the next item in an iterator.

    I’ve written about iterators before (how for loops work and how to make an iterator) but a very quick summary of iterators you’ll likely run into includes:

    • enumerate objects
    • zip objects
    • the return value of the reversed function
    • files (the thing you get back from the open function)
    • csv.reader objects
    • generator expressions
    • generator functions

    You can think of next as a way to manually loop over an iterator to get a single item and then break.

    1 2 3 4 5 6 7 8 9 10 11 >>> numbers = [2, 1, 3, 4, 7, 11] >>> squares = (n**2 for n in numbers) >>> next(squares) 4 >>> for n in squares: ... break ... >>> n 1 >>> next(squares) 9 Maybe learn it eventually

    We’ve already covered nearly half of the built-in functions.

    The rest of Python’s built-in functions definitely aren’t useless, but they’re a bit more special-purposed.

    The 15 built-ins I’m mentioning in this section are things you may eventually need to learn, but it’s also very possible you’ll never reach for these in your own code.

    • iter: get an iterator from an iterable: this function powers for loops and it can be very useful when you’re making helper functions for looping lazily
    • callable: return True if the argument is a callable (I talked about this a bit in my article functions and callables)
    • filter and map: as I discuss in my article on overusing lambda functions, I recommend using generator expressions over the built-in map and filter functions
    • id, locals, and globals: these are great tools for teaching Python and you may have already seen them, but you won’t see these much in real Python code
    • round: you’ll look this up if you need to round a number
    • divmod: this function does a floor division (//) and a modulo operation (%) at the same time
    • bin, oct, and hex: if you need to display a number as a string in binary, octal, or hexadecimal form, you’ll want these functions
    • abs: when you need the absolute value of a number, you’ll look this up
    • hash: dictionaries and sets rely on the hash function to test for hashability, but you likely won’t need it unless you’re implementing a clever de-duplication algorithm
    • object: this function (yes it’s a class) is useful for making unique default values and sentinel values, if you ever need those

    You’re unlikely to need all the above built-ins, but if you write Python code for long enough you’re likely to see nearly all of them.

    You likely don’t need these

    You’re unlikely to need these built-ins. There are sometimes really appropriate uses for a few of these, but you’ll likely be able to get away with never learning about these.

    • ord and chr: these are fun for teaching ASCII tables and unicode code points, but I’ve never really found a use for them in my own code
    • exec and eval: for evaluating a string as if it was code
    • compile: this is related to exec and eval
    • slice: if you’re implementing __getitem__ to make a custom sequence, you may need this (some Python Morsels exercises require this actually), but unless you make your own custom sequence you’ll likely never see slice
    • bytes, bytearray, and memoryview: if you’re working with bytes often, you’ll reach for some of these (just ignore them until then)
    • ascii: like repr but returns an ASCII-only representation of an object; I haven’t needed this in my code yet
    • frozenset: like set, but it’s immutable; neat but not something I’ve reached for in my own code
    • __import__: this function isn’t really meant to be used by you, use importlib instead
    • format: this calls the __format__ method, which is used for string formatting (f-strings and str.format); you usually don’t need to call this function directly
    • pow: the exponentiation operator (**) usually supplants this… unless you’re doing modulo-math (maybe you’re implementing RSA encryption from scratch…?)
    • complex: if you didn’t know that 4j+3 is valid Python code, you likely don’t need the complex function
    There’s always more to learn

    There are 69 built-in functions in Python (technically only 42 of them are actually functions).

    When you’re newer in your Python journey, I recommend focusing on only 20 of these built-in functions in your own code (the 10 commonly known built-ins and the 10 built-ins that are often overlooked), in addition to the 5 debugging functions.

    After that there are 14 more built-ins which you’ll probably learn later (depending on the style of programming you do).

    Then come the 15 built-ins which you may or may not ever end up needing in your own code. Some people love these built-ins and some people never use them: as you get more specific in your coding needs, you’ll likely find yourself reaching for considerably more niche tools.

    After that I mentioned the last 17 built-ins which you’ll likely never need (again, very much depending on how you use Python).

    You don’t need to learn all the Python built-in functions today. Take it slow: focus on those first 20 important built-ins and then work your way into learning about others if and when you eventually need them.

    Categories: FLOSS Project Planets

    Pages