Feeds
Zero to Mastery: Python Monthly Newsletter đ»đ
Welcome to My Blog
Heyho together!
I am from now on writing my posts on GitHub pages. Apart from it being useful to keep my posts versioned using git, I had some issues with my previous blog. The idea was to simply use write.as and publish a post from time to time. This worked well except for more than a month ago me wanting to do a post about my KRunner plugins. It naturally contained a lot of links and thus the publishing was prevented and even the account blocked due to apparent spam. There was no response via mail for over a month.
So here we are not on another blog where I hopefully write more often and also be able to spent more time on KDE!
Real Python: How to Iterate Through a Dictionary in Python
Python offers several ways to iterate through a dictionary, such as using .items() to access key-value pairs directly and .values() to retrieve values only.
By understanding these techniques, youâll be able to efficiently access and manipulate dictionary data. Whether youâre updating the contents of a dictionary or filtering data, this guide will equip you with the tools you need.
By the end of this tutorial, youâll understand that:
- You can directly iterate over the keys of a Python dictionary using a for loop and access values with dict_object[key].
- You can iterate through a Python dictionary in different ways using the dictionary methods .keys(), .values(), and .items().
- You should use .items() to access key-value pairs when iterating through a Python dictionary.
- The fastest way to access both keys and values when you iterate over a dictionary in Python is to use .items() with tuple unpacking.
To get the most out of this tutorial, you should have a basic understanding of Python dictionaries, know how to use Python for loops, and be familiar with comprehensions. Knowing other tools like the built-in map() and filter() functions, as well as the itertools and collections modules, is also a plus.
Get Your Code: Click here to download the sample code that shows you how to iterate through a dictionary with Python.
Take the Quiz: Test your knowledge with our interactive âPython Dictionary Iterationâ quiz. Youâll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Dictionary IterationDictionaries are one of the most important and useful data structures in Python. Learning how to iterate through a Dictionary can help you solve a wide variety of programming problems in an efficient way. Test your understanding on how you can use them better!
Getting Started With Python DictionariesDictionaries are a cornerstone of Python. Many aspects of the language are built around dictionaries. Modules, classes, objects, globals(), and locals() are all examples of how dictionaries are deeply wired into Pythonâs implementation.
Hereâs how the Python official documentation defines a dictionary:
An associative array, where arbitrary keys are mapped to values. The keys can be any object with __hash__() and __eq__() methods. (Source)
There are a couple of points to notice in this definition:
- Dictionaries map keys to values and store them in an array or collection. The key-value pairs are commonly known as items.
- Dictionary keys must be of a hashable type, which means that they must have a hash value that never changes during the keyâs lifetime.
Unlike sequences, which are iterables that support element access using integer indices, dictionaries are indexed by keys. This means that you can access the values stored in a dictionary using the associated key rather than an integer index.
The keys in a dictionary are much like a set, which is a collection of hashable and unique objects. Because the keys need to be hashable, you canât use mutable objects as dictionary keys.
On the other hand, dictionary values can be of any Python type, whether theyâre hashable or not. There are literally no restrictions for values. You can use anything as a value in a Python dictionary.
Note: The concepts and topics that youâll learn about in this section and throughout this tutorial refer to the CPython implementation of Python. Other implementations, such as PyPy, IronPython, and Jython, could exhibit different dictionary behaviors and features that are beyond the scope of this tutorial.
Before Python 3.6, dictionaries were unordered data structures. This means that the order of items typically wouldnât match the insertion order:
Python >>> # Python 3.5 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'pet': 'dog', 'fruit': 'apple'} Copied!Note how the order of items in the resulting dictionary doesnât match the order in which you originally inserted the items.
In Python 3.6 and greater, the keys and values of a dictionary retain the same order in which you insert them into the underlying dictionary. From 3.6 onward, dictionaries are compact ordered data structures:
Python >>> # Python 3.6 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'fruit': 'apple', 'pet': 'dog'} Copied!Keeping the items in order is a pretty useful feature. However, if you work with code that supports older Python versions, then you must not rely on this feature, because it can generate buggy behaviors. With newer versions, itâs completely safe to rely on the feature.
Another important feature of dictionaries is that theyâre mutable data types. This means that you can add, delete, and update their items in place as needed. Itâs worth noting that this mutability also means that you canât use a dictionary as a key in another dictionary.
Understanding How to Iterate Through a Dictionary in Python Read the full article at https://realpython.com/iterate-through-dictionary-python/ »[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
This Week in Plasma: Battery Charge Cycles in Info Center
This week we of course continued the customary bug-fixing, but got some nice new features and UI improvements too!
Let me also remind folks about KDE's end-of-year fundraiser. We're 84% of the way to our goal, and it would be amazing to get all the way to 100% before December! Then we can focus on those stretch goals from December to January.
Anyway, enough of the sales pitch, back to the free stuff!
And isn't that amazing? Let's zoom out a bit here and remind ourselves just how incredible it is that this software is made available for free, with no contract or license agreement, to everyone. To you, to your school, to community organizations, businesses, governments, even our direct competitors to study and examine (which goes both ways, and helped me fix a bug in GTK this week; read on for details). It's kind of wild, if you think about it. But, here we are, and we want to keep on being a light in a tech world that sometimes seems to be darkening. Help us keep that light glowing!
Notable New FeaturesInfo Center now shows your battery's cycle count. (Kai Uwe Broulik, 6.3.0. Link 1 and link 2)
Added the ability to convert to and from the CFP franc currency in KRunner-powered searches. (someone going by the pseudonym "Mr. Athozus", Frameworks 6.9. Link)
Notable UI ImprovementsMiddle-clicking on the Brightness and Color widget no longer does anything when the Night Light hasn't been turned on. (Elias Probst, 6.2.4. Link)
Improved some sources of visual awkwardness in System Monitor: now the loading screen no longer sometimes has a scrollbar; and clicking something selected in a table view visibly de-selects it. (Akseli Lahtinen, 6.2.4. Link 1 and link 2)
Improved the way Discover presents external links to be less visually heavy. (Nate Graham, 6.3.0. Link)
Re-did the "Apply Plasma Settings" dialog on System Settings' Login Screen page to look better and more consistent with other dialogs in QML-based software these days. (Oliver Beard, 6.3.0. Link)
Notable Bug FixesFixed a regression in the Power and Battery widget that broke its ability to notice that power-profiles-daemon was installed instead of TLP after some porting work. (MĂ©ven Car, 6.2.4. Link)
Fixed a regression that caused the Disks & Devices widget to not show the correct actions for non-mounted optical discs after some porting work. (Kai Uwe Broulik, 6.2.4. Link)
Fixed an issue that caused screenshots and screen recordings to look too dim while using HDR mode. (Xaver Hugl, 6.2.4. Link)
Fixed a case where Plasma could crash when logging in with an external screen connected to a laptop via HDMI. (Marco Martin, 6.2.4. Link)
Fixed a rare case where Plasma could crash when copying data to the clipboard. (David Edmundson, 6.3.0. Link)
Fixed a bug affecting people using panels in "Fit content" mode that could, under certain circumstances, cause them to be too small until you manually entered Edit Mode once. (NiccolĂČ Venerandi, 6.3.0, Link)
KWin now behaves better when you plug in a weird defective TV that asks for an inappropriate resolution. (Xaver Hugl, 6.3.0. Link)
Discover once again shows update-able "Get New [Stuff]" content on the updates page. (Harald Sitter, 6.3.0. Link)
XWayland-using apps can no longer crash KWin with ludicrously large icon sizes. (David Redondo, Frameworks 6.9. Link)
Fixed a bizarre and annoying bug that caused text displayed at fractional scale factors in Plasma and QtQuick-based KDE apps and to look, for lack of a better term, wobbly. Wobbly windows good, wobbly text bad! Text has now been put on the straight and narrow. (David Edmundson, Frameworks 6.9. Link)
Fixed a strange Qt bug that manifested as Plasma notifications sometimes being vertically squished. (David Edmundson, Qt 6.8.1. Link)
GTK 3 apps once again have the correct icon for their spinboxes' "decrease the value" buttons when using the Breeze icon theme or any other icon theme whose list-remove icon isn't a minus sign. (Nate Graham, GTK 3.24.44. Link)
Other bug information of note:
- 2 Very high priority Plasma bugs (same as last week). Current list of bugs
- 35 15-minute Plasma bugs (down from 36 last week). Current list of bugs
- 94 KDE bugs of all kinds fixed over the last week. Full list of bugs
The feature to let you record the screen without re-approval now also works for virtual outputs. Additionally, virtual outputs now have a better name that indicates which app records them. (David Redondo, 6.3.0. Link)
Fixed a memory leak caused by having a lot of OverlayFS mounts, e.g. from Docker containers. (Joshua Goins, Frameworks 6.9. Link)
How You Can HelpKDE has become important in the world, and your time and contributions have helped us get there. As we grow, we need your support to keep KDE sustainable.
You can help KDE by becoming an active community member and getting involved somehow. Each contributor makes a huge difference in KDE â you are not a number or a cog in a machine!
You donât have to be a programmer, either. Many other opportunities exist:
- Filter and confirm bug reports, maybe even identify their root cause
- Contribute designs for wallpapers, icons, and app interfaces
- Design and maintain websites
- Translate user interface text items into your own language
- Promote KDE in your local community
- âŠAnd a ton more things!
You can also help us by donating to our yearly fundraiser! Any monetary contribution â however small â will help us cover operational costs, salaries, travel expenses for contributors, and in general just keep KDE bringing Free Software to the world.
To get a new Plasma feature or a bugfix mentioned here, feel free to push a commit to the relevant merge request on invent.kde.org.
Eli Bendersky: GoMLX: ML in Go without Python
In the previous post I talked about running ML inference in Go through a Python sidecar process. In this post, let's see how we can accomplish the same tasks without using Python at all.
How ML models are implementedLet's start with a brief overview of how ML models are implemented under the hood [1]. The model is typically written in Python, using one of the ML frameworks like TensorFlow, JAX or PyTorch. The framework takes care of at least 2 high-level concerns for developers:
- Expressive way to describe the model architecture, including auto-differentiation for training.
- Efficient implementation of computational primitives on common HW: CPUs, GPUs and TPUs.
In-between these two concerns there exists a standardized model definition format (or several) that helps multiple tools interoperate. While it's by no means the only solution [2], let's look at the OpenXLA stack as a way to run models on diverse hardware:
- The top layer are the frameworks that provide high-level primitives to define ML models, and translate them to a common interchange format called StableHLO (where "HLO" stands for High-Level Operations). I've added the gopher on the very right - it will soon become clear why.
- The bottom layer is the HW that executes these models efficiently.
- In the middle is the OpenXLA system, which includes two major components: the XLA compiler translating HLO to HW machine code, and PJRT - the runtime component responsible for managing HW devices, moving data (tensors) between the host CPU and these devices, executing tasks, sharding and so on.
There's a huge amount of complexity hidden by the bottom layers of this diagram. Efficient compilation and code generation for diverse HW - including using fixed blocks and libraries (like cuDNN), runtime management etc. All of this is really something one shouldn't try to re-implement unless there's a really, really good reason to do so. And the best part? There's no Python there - this is C and C++; Python only exists on the upper layer - in the high-level ML frameworks.
GoMLXGoMLX is a relatively new Go package for ML that deserves some attention. GoMLX slots in as one of the frameworks, exactly where the Gopher is in the diagram above [3]. This is absolutely the right approach to the problem. There's no point in re-implementing the low-level primitives - whatever works for TF and JAX will work for Go as well! Google, NVIDIA, Intel and several other companies invest huge resources into these systems, and it's a good idea to benefit from these efforts.
In this post I will showcase re-implementations of some of the samples from the previous post, but with no Python in sight. But first, a few words about what GoMLX does.
GoMLX should be familiar if you've used one of the popular Python ML frameworks. You build a computational graph representing your model - the usual operations are supported and sufficient to implement anything from linear regression to cutting-edge transformers. Since GoMLX wraps XLA, it has access to all the same building blocks TF and JAX use (and it adds its own higher-level primitives, similarly to the Python frameworks).
GoMLX supports automatic differentiation to create the backward propagation operations required to update weights in training. It also provides many helpers for training and keeping track of progress, as well as Jupyter notebook support.
An image model for the CIFAR-10 dataset with GoMLXIn the previous post we built a CNN (convolutional neural network) model using TF+Keras in Python, and ran its inference in a sidecar process we could control from Go.
Here, let's build a similar model in Go, without using Python at all; we'll be training it on the same CIFAR-10 dataset we've used before.
The full code for this sample is here; it is heavily based on GoMLX's own example, with some modifications for simplicity and clarity. Here's the code defining the model graph:
func C10ConvModel(mlxctx *mlxcontext.Context, spec any, inputs []*graph.Node) []*graph.Node { batchedImages := inputs[0] g := batchedImages.Graph() dtype := batchedImages.DType() batchSize := batchedImages.Shape().Dimensions[0] logits := batchedImages layerIdx := 0 nextCtx := func(name string) *mlxcontext.Context { newCtx := mlxctx.Inf("%03d_%s", layerIdx, name) layerIdx++ return newCtx } // Convolution / activation layers logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 32, 32, 32) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.3), true) logits.AssertDims(batchSize, 16, 16, 32) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 8, 8, 64) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 4, 4, 128) // Flatten logits, and apply dense layer logits = graph.Reshape(logits, batchSize, -1) logits = layers.Dense(nextCtx("dense"), logits, true, 128) logits = activations.Relu(logits) logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) numClasses := 10 logits = layers.Dense(nextCtx("dense"), logits, true, numClasses) return []*graph.Node{logits} }As you might expect, the Go code is longer and more explicit (nodes are threaded explicitly between builder calls, instead of being magically accumulated). It's not hard to envision a Keras-like high level library on top of this.
Here's a snippet from the classifier (inference):
func main() { flagCheckpoint := flag.String("checkpoint", "", "Directory to load checkpoint from") flag.Parse() mlxctx := mlxcontext.New() backend := backends.New() _, err := checkpoints.Load(mlxctx).Dir(*flagCheckpoint).Done() if err != nil { panic(err) } mlxctx = mlxctx.Reuse() // helps sanity check the loaded context exec := mlxcontext.NewExec(backend, mlxctx.In("model"), func(mlxctx *mlxcontext.Context, image *graph.Node) *graph.Node { // Convert our image to a tensor with batch dimension of size 1, and pass // it to the C10ConvModel graph. image = graph.ExpandAxes(image, 0) // Create a batch dimension of size 1. logits := cnnmodel.C10ConvModel(mlxctx, nil, []*graph.Node{image})[0] // Take the class with highest logit value, then remove the batch dimension. choice := graph.ArgMax(logits, -1, dtypes.Int32) return graph.Reshape(choice) }) // classify takes a 32x32 image and returns a Cifar-10 classification according // to the models. Use C10Labels to convert the returned class to a string // name. The returned class is from 0 to 9. classify := func(img image.Image) int32 { input := images.ToTensor(dtypes.Float32).Single(img) outputs := exec.Call(input) classID := tensors.ToScalar[int32](outputs[0]) return classID } // ...Now classify is a function that takes an image.Image and runs it through the network, returning the index of the most likely label out of the list of CIFAR-10 labels.
The README file in the sample explains how to run it locally on a GPU; the model trains and runs successfully, with similar results to the TF+Keras model we trained in Python earlier.
Gemma2 with GoMLXFor a (much) more involved example, GoMLX has a full implementation of Gemma2 inference. The model implementation itself is in the transformers package. It should look fairly familiar if you've seen a transformer implementation in another language.
The official example in that repository shows how to run it with weights downloaded from HuggingFace; since I've already downloaded the Gemma2 weights from Kaggle for the previous post, here's a simple adaptation:
var ( flagDataDir = flag.String("data", "", "dir with converted weights") flagVocabFile = flag.String("vocab", "", "tokenizer vocabulary file") ) func main() { flag.Parse() ctx := context.New() // Load model weights from the checkpoint downloaded from Kaggle. err := kaggle.ReadConvertedWeights(ctx, *flagDataDir) if err != nil { log.Fatal(err) } // Load tokenizer vocabulary. vocab, err := sentencepiece.NewFromPath(*flagVocabFile) if err != nil { log.Fatal(err) } // Create a Gemma sampler and start sampling tokens. sampler, err := samplers.New(backends.New(), ctx, vocab, 256) if err != nil { log.Fatalf("%+v", err) } start := time.Now() output, err := sampler.Sample([]string{ "Are bees and wasps similar?", }) if err != nil { log.Fatalf("%+v", err) } fmt.Printf("\tElapsed time: %s\n", time.Since(start)) fmt.Printf("Generated text:\n%s\n", strings.Join(output, "\n\n")) }The complete code together with installation and setup instructions is here.
gomlx/gemma demonstrates that GoMLX has sufficiently advanced capabilities to run a real production-grade open LLM, without Python in the loop.
SummaryThe previous post discussed some options for incorporating ML inference into a Go project via a minimal Python sidecar process. Here, we take it a step further and implement ML inference in Go without using Python. We do so by leveraging GoMLX, which itself relies on XLA and PJRT to do the heavy lifting.
If we strip down a framework like TensorFlow to its layers, GoMLX reuses the bottom layers (which is where most of the magic lies), and replaces the model builder library with a Go variant.
Since GoMLX is still a relatively new project, it may be a little risky for production uses at this point. That said, I find this direction very promising and will be following the project's development with interest.
CodeThe full code for the samples in this post is on GitHub.
[1]This assumes you know the basics of neural network graphs, their training, etc. If not, check out this post and some of my other posts in the Machine Learning category. [2]It's likely the most common production solution, and pretty much the only way to access Google's TPUs. [3]It does so by including Go bindings for both XLA and PJRT; these are wrapped in higher-level APIs for users.parallel @ Savannah: GNU Parallel 20241122 ('Ahoo Daryaei') released
GNU Parallel 20241122 ('Ahoo Daryaei') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
 GNU parallel is so satisfying
   -- James Coman @jcoman.bsky.social
New in this release:
- --pipe --block works similar to --pipepart --block if --block size is negative.
- DBURLs can be written with / instead of %2F for sqlite and CSV.
- Bug fixes and man page updates.
News about GNU Parallel:
- Embarrassingly GNU parallel https://dengin.xyz/blog/2024/10/24/embarrassingly-gnu-parallel/
- GNU Parallel for Your Terminal Tasks https://erolrecep.github.io/posts/gnuparallel_for_your_terminal_tasks/
- How to leverage GNU parallel to utilize multiple cores while running AUGUSTUS https://lifescienceshub.wixsite.com/lifesciencehub/post/how-to-leverage-gnu-parallel-to-utilize-multiple-cores-while-running-augustus
- GNU Parallel: The Good Parts https://diekmeier.de/posts/2024-11-17-gnu-parallel/
- Put your CPU to work with GNU Parallel https://www.redhat.com/en/blog/gnu-parallel
GNU Parallel - For people who live life in the parallel lane.
If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
For example you can run this to convert all jpeg files into png and gif files and have a progress bar:
 parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif
Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:
 find . -name '*.jpg' |
   parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
   $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
      fetch -o - http://pi.dk/3 ) > install.sh
   $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
   12345678 883c667e 01eed62f 975ad28b 6d50e22a
   $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
   cc21b4c9 43fd03e9 3ae1ae49 e28573c0
   $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
   79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
   fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
   $ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.
If you like GNU Parallel:
- Give a demo at your local user group/team/colleagues
- Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
- Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
- Request or write a review for your favourite blog or magazine
- Request or build a package for your favourite distribution (if it is not already there)
- Invite me for your next conference
If you use programs that use GNU Parallel for research:
- Please cite GNU Parallel in you publications (use --citation)
If GNU Parallel saves you money:
- (Have your company) donate to FSF https://my.fsf.org/donate/
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.
Matthew Palmer: Your Release Process Sucks
For the past decade-plus, every piece of software I write has had one of two release processes.
Software that gets deployed directly onto servers (websites, mostly, but also the infrastructure that runs Pwnedkeys, for example) is deployed with nothing more than git push prod main. Iâll talk more about that some other day.
Today is about the release process for everything else I maintain â Rust / Ruby libraries, standalone programs, and so forth. To release those, I use the following, extremely intricate process:
-
Create an annotated git tag, where the name of the tag is the software version Iâm releasing, and the annotation is the release notes for that version.
-
Run git release in the repository.
-
There is no step 3.
Yes, it absolutely is that simple. And if your release process is any more complicated than that, then you are suffering unnecessarily.
But donât worry. Iâm from the Internet, and Iâm here to help.
Sidebar: âannotated what-now?!?âThe annotated tag is one gitâs best-kept secrets. Theyâve been available in git for practically forever (Iâve been using them since at least 2014, which is âpractically foreverâ in software development), yet almost everyone I mention them to has never heard of them.
A âtagâ, in git parlance, is a repository-unique named label that points to a single commit (as identified by the commitâs SHA1 hash). Annotating a tag is simply associating a block of free-form text with that tag.
Creating an annotated tag is simple-sauce: git tag -a tagname will open up an editor window where you can enter your annotation, and git tag -a -m "some annotation" tagname will create the tag with the annotation âsome annotationâ. Retrieving the annotation for a tag is straightforward, too: git show tagname will display the annotation along with all the other tag-related information.
Now that we know all about annotated tags, letâs talk about how to use them to make software releases freaking awesome.
Step 1: Create the Annotated Git TagAs I just mentioned, creating an annotated git tag is pretty simple: just add a -a (or --annotate, if you enjoy typing) to your git tag command, and WHAM! annotation achieved.
Releases, though, typically have unique and ever-increasing version numbers, which we want to encode in the tag name. Rather than having to look at the existing tags and figure out the next version number ourselves, we can have software do the hard work for us.
Enter: git-version-bump. This straightforward program takes one mandatory argument: major, minor, or patch, and bumps the corresponding version number component in line with Semantic Versioning principles. If you pass it -n, it opens an editor for you to enter the release notes, and when you save out, the tag is automagically created with the appropriate name.
Because the program is called git-version-bump, you can call it as a git command: git version-bump. Also, because version-bump is long and unwieldy, I have it aliased to vb, with the following entry in my ~/.gitconfig:
[alias] vb = version-bump -nOf course, you donât have to use git-version-bump if you donât want to (although why wouldnât you?). The important thing is that the only step you take to go from âhere is our current codebase in mainâ to âeverything as of this commit is version X.Y.Z of this softwareâ, is the creation of an annotated tag that records the version number being released, and the metadata that goes along with that release.
Step 2: Run git releaseAs I said earlier, Iâve been using this release process for over a decade now. So long, in fact, that when I started, GitHub Actions didnât exist, and so a lot of the things youâd delegate to a CI runner these days had to be done locally, or in a more ad-hoc manner on a server somewhere.
This is why step 2 in the release process is ârun git releaseâ. Itâs because historically, you canât do everything in a CI run. Nowadays, most of my repositories have this in the .git/config:
[alias] release = push --tagsOlder repositories which, for one reason or another, havenât been updated to the new hawtness, have various other aliases defined, which run more specialised scripts (usually just rake release, for Ruby libraries), but theyâre slowly dying out.
The reason why I still have this alias, though, is that it standardises the release process. Whether itâs a Ruby gem, a Rust crate, a bunch of protobuf definitions, or whatever else, I run the same command to trigger a release going out. It means I donât have to think about how I do it for this project, because every project does it exactly the same way.
The Wiring Behind the ButtonIt wasnât the button that was the problem. It was the miles of wiring, the hundreds of miles of cables, the circuits, the relays, the machinery. The engine was a massive, sprawling, complex, mind-bending nightmare of levers and dials and buttons and switches. You couldnât just slap a button on the wall and expect it to work. But there should be a button. A big, fat button that you could press and everything would be fine again. Just press it, and everything would be back to normal.
- Red Dwarf: Better Than Life
Once youâve accepted that your release process should be as simple as creating an annotated tag and running one command, you do need to consider what happens afterwards. These days, with the near-universal availability of CI runners that can do anything you need in an isolated, reproducible environment, the work required to go from âannotated tagâ to ârelease artifactsâ can be scripted up and left to do its thing.
What that looks like, of course, will probably vary greatly depending on what youâre releasing. I canât really give universally-applicable guidance, since I donât know your situation. All I can do is provide some of my open source work as inspirational examples.
For starters, letâs look at a simple Rust crate Iâve written, called strong-box. Itâs a straightforward crate, that provides ergonomic and secure cryptographic functionality inspired by the likes of NaCl. As itâs just a crate, its release script is very straightforward. Most of the complexity is working around Cargoâs inelegant mandate that crate version numbers are specified in a TOML file. Apart from that, itâs just a matter of building and uploading the crate. Easy!
Slightly more complicated is action-validator. This is a Rust CLI tool which validates GitHub Actions and Workflows (how very meta) against a published JSON schema, to make sure you havenât got any syntax or structural errors. As not everyone has a Rust toolchain on their local box, the release process helpfully build binaries for several common OSes and CPU architectures that people can download if they choose. The release process in this case is somewhat larger, but not particularly complicated. Almost half of it is actually scaffolding to build an experimental WASM/NPM build of the code, because someone seemed rather keen on that.
Moving away from Rust, and stepping up the meta another notch, we can take a look at the release process for git-version-bump itself, my Ruby library and associated CLI tool which started me down the âJust Tag It Alreadyâ rabbit hole many years ago. In this case, since gemspecs are very amenable to programmatic definition, the release process is practically trivial. Remove the boilerplate and workarounds for GitHub Actions bugs, and youâre left with about three lines of actual commands.
These approaches can certainly scale to larger, more complicated processes. Iâve recently implemented annotated-tag-based releases in a proprietary software product, that produces Debian/Ubuntu, RedHat, and Windows packages, as well as Docker images, and it takes all of the information it needs from the annotated tag. Iâm confident that this approach will successfully serve them as they expand out to build AMIs, GCP machine images, and whatever else they need in their release processes in the future.
Objection, Your Honour!I can hear the howl of the âbut, actuallysâ coming over the horizon even as I type. People have a lot of Big Feelings about why this release process wonât work for them. Rather than overload this article with them, Iâve created a companion article that enumerates the objections Iâve come across, and answers them. Iâm also available for consulting if youâd like a personalised, professional opinion on your specific circumstances.
DVD Bonus Feature: Pre-releasesUnless youâre addicted to surprises, itâs good to get early feedback about new features and bugfixes before they make it into an official, general-purpose release. For this, you canât go past the pre-release.
The major blocker to widespread use of pre-releases is that cutting a release is usually a pain in the behind. If youâve got to edit changelogs, and modify version numbers in a dozen places, then youâre entirely justified in thinking that cutting a pre-release for a customer to test that bugfix that only occurs in their environment is too much of a hassle.
The thing is, once youâve got releases building from annotated tags, making pre-releases on every push to main becomes practically trivial. This is mostly due to another fantastic and underused Git command: git describe.
How git describe works is, basically, that it finds the most recent commit that has an associated annotated tag, and then generates a string that contains that tagâs name, plus the number of commits between that tag and the current commit, with the current commitâs hash included, as a bonus. That is, imagine that three commits ago, you created an annotated release tag named v4.2.0. If you run git describe now, it will print out v4.2.0-3-g04f5a6f (assuming that the current commitâs SHA starts with 04f5a6f).
You might be starting to see where this is going. With a bit of light massaging (essentially, removing the leading v and replacing the -s with .s), that string can be converted into a version number which, in most sane environments, is considered ânewerâ than the official 4.2.0 release, but will be superceded by the next actual release (say, 4.2.1 or 4.3.0). If youâre already injecting version numbers into the release build process, injecting a slightly different version number is no work at all.
Then, you can easily build release artifacts for every commit to main, and make them available somewhere they wonât get in the way of the âofficialâ releases. For example, in the proprietary product I mentioned previously, this involves uploading the Debian packages to a separate component (prerelease instead of main), so that users that want to opt-in to the prerelease channel simply modify their sources.list to change main to prerelease. Management have been extremely pleased with the easy availability of pre-release packages; theyâve been gleefully installing them willy-nilly for testing purposes since I rolled them out.
In fact, even while Iâve been writing this article, I was asked to add some debug logging to help track down a particularly pernicious bug. I added the few lines of code, committed, pushed, and went back to writing. A few minutes later (next weekâs job is to cut that in-process time by at least half), the person who asked for the extra logging ran apt update; apt upgrade, which installed the newly-built package, and was able to progress in their debugging adventure.
Continuous Delivery: Itâs Not Just For Hipsters.
â+1, InformativeâHopefully, this has spurred you to commit your immortal soul to the Church of the Annotated Tag. You may tithe by buying me a refreshing beverage. Alternately, if youâre really keen to adopt more streamlined release management processes, Iâm available for consulting engagements.
Matthew Palmer: Invalid Excuses for Why Your Release Process Sucks
In my companion article, I made the bold claim that your release process should consist of no more than two steps:
-
Create an annotated Git tag;
-
Run a single command to trigger the release pipeline.
As I have been on the Internet for more than five minutes, Iâm aware that a great many people will have a great many objections to this simple and straightforward idea. In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid.
If you have an objection I donât cover here, the comment box is down the bottom of the article. If you think youâve got a real stumper, Iâm available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, Iâll waive my fees.
âBut I automatically generate my release notes from commit messages!âThis one is really easy to solve: have the release note generation tool feed directly into the annotation. Boom! Headshot.
âBut all these files need to be edited to make a release!âNo, they absolutely donât. But I can see why you might think you do, given how inflexible some packaging environments can seem, and since âthatâs how weâve always done itâ.
Language PackagesMost languages require you to encode the version of the library or binary in a file that you want to revision control. This is teh suck, but Iâm yet to encounter a situation that canât be worked around some way or another.
In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (thatâs part of git-version-bump, as an aside) to calculate the version number from the git tags. The Rust build tool, Cargo, uses a TOML file, which isnât as easy, but a small amount of release automation is used to take care of that.
Distribution PackagesIf youâre building Linux distribution packages, you can easily apply similar automation faffery. For example, Debian packages take their metadata from the debian/changelog file in the build directory. Donât keep that file in revision control, though: build it at release time. Everything you need to construct a Debian (or RPM) changelog is in the tag â version numbers, dates, times, authors, release notes. Use it for much good.
The Dreaded ChangelogFinally, thereâs the CHANGELOG file. If itâs maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an âUnreleasedâ heading at the top. Itâs one more place to remember to have to edit when making that âpreparing release X.Y.Zâ commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of âevery commit must add a changelog entryâ.
My solution: just burn it to the ground. Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if thatâs your bag) and never open it ever again.
âBut I need to know other things about my release, too!âFor some reason, you might think you need some other metadata about your releases. Youâre probably wrong â itâs amazing how much information you can obtain or derive from the humble tag â so think creatively about your situation before you start making unnecessary complexity for yourself.
But, on the off chance youâre in a situation that legitimately needs some extra release-related information, hereâs the secret: structured annotation. The annotation on a tag can be literally any sequence of octets you like. How that data is interpreted is up to you.
So, require that annotations on release tags use some sort of structured data format (say YAML or TOML â or even XML if you hate your release manager), and mandate that it contain whatever information you need. You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesnât meet the requirements, and youâre sorted.
âBut I have multiple packages in my repo, with different release cadences and versions!âThis one is common enough that I just refer to it as âthe monorepo dramaâ. Personally, Iâm not a huge fan of monorepos, but you do you, boo. Annotated tags can still handle it just fine.
The trick is to include the package name being released in the tag name. So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z. The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.
âBut we donât semver our releases!âOh, thatâs easy. The tag pattern that marks a release doesnât have to be vX.Y.Z. It can be anything you want.
Relatedly, there is a (rare, but existent) need for packages that donât really have a conception of âreleasesâ in the traditional sense. The example Iâve hit most often is automatically generated âbindingsâ packages, such as protobuf definitions. The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) youâre using. But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change.
The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day).
This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.
Th-th-th-th-thatâs all, folks!I hope youâve enjoyed this bit of mild debunking. Show your gratitude by buying me a refreshing beverage, or purchase my professional expertise and Iâll answer all of your questions and write all your CI jobs.
EuroPython Society: 2024 General Assembly Announcement
We’re excited to invite you to this year’s General Assembly meeting! We’ll gather on Sunday, December 1st, 2024, from 20:00 to 21:00 CET. Just like in recent years, we’ll use Zoom, and additional joining instructions will be shared closer to the date.
The General Assembly is the highest decision making body of the society and EPS membership is required to participate. Membership is open to individuals who wish to actively engage in implementing the EPS mission. If you want to become a member of EuroPython Society you can sign-up here: https://www.europython-society.org/application/
You can find more details about the agenda of the meeting, as it is defined in our bylaws here https://www.europython-society.org/bylaws/ (Article 8).
One of the items on the Agenda is electing the new Board.
What does the Board do?The Board consists of a chairperson, a vice chairperson and 2-7 Board members. The duties and responsibilities of the Board are substantial: the board collectively takes up the fiscal and legal responsibility of the Society.
A major topic is the annual EuroPython conference. While we would like to transition to a model with an independent organising team, we are not there yet. Therefore, the Board still needs to be involved in the conference organisation.
Beyond the conference, the Board also manages several critical areas, including:
- Managing EPS membership
- Overseeing finances and budgets
- Running the grant programme
- Maintaining infrastructure and resources
Furthermore, specifically for 2025, and following the recommendation from the previous Board, we would like to focus on four key topics that are important for the Society&aposs future and sustainability:
- Hiring an Event Manager/Coordinator
- Selecting a location for 2026 and possibly 2027
- Strengthen community outreach
- Improving the fiscal and legal framework
The Society is entirely volunteer-driven and serving on the board requires a significant time commitment. Everyone has a different schedule, so most of the work is usually done asynchronously. However, all board members attend the 1.5-hour board call held every two weeks in the evening, CE(S)T timezone. Everyone&aposs time is valuable and please consider that the less time or effort you can dedicate, the more the workload may shift to other Board members.
All things considered you will need a few hours every week.
Who should apply?You want to invest your time and knowledge into building a better structure for the EuroPython Society? Or you want to work on building connections between different Python-based communities? Then this might be for you! Please keep in mind the time commitments mentioned above.
You are not expected to be perfect in any of the skills needed and you will be supported in learning how things work. That being said, having experience in a non-profit organisation, whether within the Python world (such as EPS, PSF, DSF, local Python communities etc.) or any other similar organisation, would be beneficial for onboarding and understanding the organisational structure, culture and dynamics.
In the past having or willing to learn the following skills helped organising the conference:
- Good communication skills
- Organisation skills
- Experience organising events with more than 1000 people
- Working with volunteer-based communities
- Working in big teams
You get the chance to shape and influence the future of EuroPython
You gain skills useful to run non-profits in different European countries - including cross border challenges
You can help grow and empower local communities
You can build relationships and connections with fellow community members
You can build a more diverse and inclusive Python community by serving the mission of EuroPython Society
I am interested, what should I do?If you’re considering running for the Board or nominating another EPS member, we’d love to hear from you! Although the formal deadline is during the General Assembly, we kindly request you send your nomination as early as possible to board@europython.eu. We will publish the initial list of candidates on Tuesday, 26th of November 2024. If you’re not sure if this is a good idea or not – please email anyway and we will help you figure it out! 🙂
If you&aposre on our EPS Organisers&apos Discord, there&aposs a dedicated channel for interested candidates. Please ask in the general channel, and we’ll be happy to add you.
You can find examples of previous nominations here: https://www.europython-society.org/list-of-eps-board-candidates-for-2023-2024/.
Your nomination should highlight why you want to run for the Board. What is your vision for EPS and in which projects you want to be involved. During the General Assembly, you will have the opportunity to introduce yourself and share with our members why you believe they should vote for you. Each candidate will typically be given one minute to present themselves before members cast their votes.
It sounds a lot, I want to help, but I can’t commit to thatThat’s completely understandable! Serving on the Board comes with significant responsibilities, time commitments, and administrative tasks. If that’s not the right fit for you, but you’re still interested in supporting us, we’d love your help! There are many other ways to get involved. We have several teams (see 2024 Teams Description document, as an example) that work on conference preparations during the months leading up to the event, and we also need volunteers to assist onsite during the conference.
Your help does not need to be limited to the conference. Infrastructure and connections need to be maintained all around the year for example. Your time and support would make a big difference! Stay tuned to our social platforms for announcements about these opportunities.
mark.ie: My LocalGov Drupal contributions for week-ending November 22nd, 2024
This week, lots of work on the LocalGov News module.
Web Review, Week 2024-47
Let’s go for my web review for the week 2024-47.
The Big Data Center Water ProblemTags: tech, hardware, ecology, economics, energy, water
We always think about the energy consumption, but large data centers gobble billion liters of water too. This would need to be improved.
https://www.asianometry.com/p/the-big-data-center-water-problem
Tags: tech, vr, hardware, foss
Nice to see open hardware for VR hitting such a price point.
Tags: tech, social-media, fediverse, tools
You’re on the fediverse and you want to reach out bluesky users? This might be the right tool for you (unclear if it’ll scale yet though). At least if and when Bluesky turns bad, people will know where to reach friends next.
Tags: tech, social-media, business, politics
Excellent post showing reasons to be skeptical about Bluesky’s future. Despite all their likely sincere claims I don’t see how they’ll escape enclosure and enshittification when their sketchy VCs will want to see money back.
https://www.tbray.org/ongoing/When/202x/2024/11/15/Not-Bluesky
Tags: tech, social-media, politics, twitter
Sad to see people predominantly jumping from Twitter to other tech moguls walled gardens. This feels more and more like a missed opportunity for the fediverse. That said I’m amazed at how efficient Musk has been at killing the network effect of his platform. This proves it’s actually doable.
Tags: tech, social-media, politics, twitter
This is what we get for refusing to regulate social media and for not auditing their algorithms. Their owners can game and bias the platforms as they see fit for their own gains. They became massive forces of manipulation in the process.
https://eprints.qut.edu.au/253211/
Tags: tech, ai, machine-learning, gpt, vendor-lockin
Good reminder that models shouldn’t be used as a service except maybe for prototyping. This has felt obvious to me since the beginning of this hype cycle… but here we are people are falling in the trap today.
https://adriano.fyi/posts/chatgpt-is-slipping/
Tags: tech, python, performance, pandas, data, data-science
OK, the numbers are indeed impressive. And it’s API is fully compatible apparently, looks like a good replacement if you got Pandas code around.
https://hwisnu.bearblog.dev/fireducks-pandas-but-100x-faster/
Tags: tech, tools, debugging
Looks like a nice tool. Maybe it’ll replace my trusty cgdb in some cases.
https://github.com/epasveer/seer
Tags: tech, c++, security
Will we see more deployments of C++ standard library with bound checking by default? It definitely looks tempting.
https://security.googleblog.com/2024/11/retrofitting-spatial-safety-to-hundreds.html?m=1
Tags: tech, php, security
Seeing the amount of PHP code open on the internet, it’s indeed important to harden the runtime (at long last).
https://dustri.org/b/upcoming-hardening-in-php.html
Tags: tech, graphics, gpu
Really nice in depth post. Everything you ever wanted to know about antialiasing but didn’t dare asking.
https://blog.frost.kiwi/analytical-anti-aliasing/
Tags: tech, framework, career, learning
Good advice, no one should be a “React developer”. Make sure you learn more fundamental skills.
https://www.keithcirkel.co.uk/i-dont-have-time-to-learn-react/
Tags: tech, craftsmanship, learning
If you’re just doing the minimum to deal with a task to “mark it done” you’re probably not doing enough and missing out on learning opportunities.
https://edanparker.hashnode.dev/going-a-little-further
Tags: tech, career, learning, engineering
This can change from organization to organization. This post proposes a career ladder which will work in some contexts. What’s clear is that it’s all about scope and impact.
https://matt.blwt.io/post/what-is-a-senior-engineer-anyway/
Tags: tech, engineering, management, learning
Interesting tips to keep learning on the technical side of the job as you get more managerial responsibilities.
Bye for now!
Real Python: The Real Python Podcast â Episode #229: The Joy of Tinkering & Python Free-Threading Performance
What keeps your spark alive for developing software and learning Python? Do you like to try new frameworks, build toy projects, or collaborate with other developers? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Talk Python to Me: #486: CSnakes: Embed Python code in .NET
Matt Layman: Huey Background Worker - Building SaaS #207
Brian Perry: Two Modules to Help Tame Large Drupal Menus
Stop me if you've heard this one before. At some point in the life of your Drupal site, you have a menu that has gotten out of control. Dragging and dropping is basically a lost cause, your hand hurts from scrolling, and a sense of dread approaches every time you find yourself in the menu administration screen. If it isn't possible to re-structure the menu to address the root cause, you'll need to turn to other solutions to make menu administration more manageable.
I recently used two modules to address this issue for a client. They may not be a huge surprise to those who have run into this problem repeatedly, but it seemed worth documenting for both future me and also our search engine and LLM overlords.
Big MenuThe first module is Big Menu. The project page on this one seems to be describing the Drupal 7 implementation of the module, which is quite a bit different. The 'modern Drupal' version of the module essentially re-works the menu administration page to focus on a single level of the menu tree at a time. Any menu item that has children will have an 'Edit child items' link that you can drill into. This results in more clicks to get to the item you want to edit, but it makes the menu administration page much more manageable and reduces cognitive load quite a bit.
You can also configure the module to use a different depth for the menu tree, which can be useful if wanted to see more of the menu in a single view. Personally I prefer to go all the way with this one and stick with the single level view that is used by default.
Menu SelectThe Menu Select module addresses the experience of selecting a parent menu item in the menu settings for a node or menu item. By default, this is a select list containing the entire menu, which can get very long. Menu Select replaces this with an autocomplete search and a hierarchal collapsible unordered list.
Bonus: Menu FirstchildMenu Firstchild is a little less about the admin experience, but can be useful in cases where a large menu needs some additional grouping but you don't want to turn to a full mega menu style approach. The module provides an option to have a menu item that doesn't have it's own path, but instead links to its first direct child.
Used together, these modules made a substantial difference in addressing the client's menu administration related feedback.
This was also a reminder of the impact that the ongoing work on Drupal CMS will hopefully have. I'm looking forward to a Drupal CMS future that can theoretically pre-package user experience improvements like these. Or in cases where it might not be the right choice for Drupal CMS, opinionated community developed recipes can be created to address common use cases like this one.
Seth Michael Larson: Visualizing the Python package SBOM data flow
Published 2024-11-22 by Seth Larson
Reading time: minutes
TLDR: Skip intro, take me to the visualization!
I'm working on improving measurability of Python packages by allowing Software Bill-of-Materials documents (SBOM) to be included in Python packages so that projects and build tools can record information about a package for downstream use.
This is a cross-functional project where I need input from Python projects, Python packaging tools (build backends+tools and installers), but also from folks completely outside the Python community like SBOM tooling maintainers. With projects like this, it can be difficult to "see the forest through the trees". When you're reviewing the packaging PEP, it can be difficult to imagine how or who is using the new standard. This article is to help visualize the end-to-end data flow.
How SBOM data will be included in Python packagesIn short, the proposal is:
- Allow Python projects to manually specify SBOM documents in pyproject.toml with [project].sbom-files = ["..."]
- Allow Python package archives to include self-describing SBOM documents and reference them in metadata via Sbom-File field.
- Zero-or-more SBOM documents per Python package archive. Each tool adding SBOM data creates a new SBOM inside the archive to avoid conflicts. End-user SBOM tools need to handle multiple SBOMs to "stitch" them together.
There are two Python packages being shown, Package A on the left and Package B on the right. Package A depends on Package B. Package A is a pure-Python package with no bundled dependencies. Package B uses binary extensions and uses auditwheel to bundle shared libraries.
@import url(https://fonts.googleapis.com/css2?family=Inter:wght@400;500);
AuditwheelAuditwheelPython EnvironmentPython EnvironmentBuild BackendBuild BackendPythonPackage
Python...Python
Package B
Python...Source ForgeSource ForgeSource Code BSource Code BSBOM GeneratorSBOM GeneratorSrc
SBOMSrc...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...SO /
DLLsSO /...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...Py
Pkg BPy...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...METADATAMETADATAPython
Package B
Python...METADATAMETADATAOperational SBOM (OBOM)Operational SBOM (OBOM)1122335566Package BPackage BDataDataDataDataDataDataDataDataBuild BackendBuild BackendPython
Package A
Python...Source ForgeSource ForgeSource Code ASource Code AMETADATAMETADATAPackage APackage ADataDataPython
Package A
Python...METADATAMETADATAPython Package IndexPython Package Indexinstall_requiresinstall_re...44DEPENDS_ONDEPENDS_ONrefrefrefrefrefrefText is not SVG - cannot display
How SBOM data flows from Python package source code, build, to an SBOM generation tool
Stage 1: If the Python project bundles third-party software in their own source code then the project may specify one or more SBOM documents through project.sbom-files in pyproject.toml. Build backends copy these documents into source distributions and wheels.
Stage 2: If the Python build-backend pulls dependencies (like Maturin and Cargo) while building a wheel those dependencies can be recorded in another SBOM document in the wheel.
Stage 3: If a tool that modifies wheels by adding dependencies is used (like auditwheel) then that tool can record modifications in an SBOM document. At this point there are three separate SBOM documents included in the Package B archive.
Stage 4: Archives are uploaded to an index like PyPI. The index can do some validation of included SBOM documents, if any.
Stage 5: Installers download and install the Python package archives. The SBOM files are placed into the .dist-info/sboms/ directory in the Python environment and referenced in package metadata.
Stage 6: SBOM generation tools scan the Python environment and using existing Python package metadata and new SBOM documents with per-package data stitch together an Operational SBOM (OBOM) detailing the Python environment.
Who does what?The plan is to allow each "actor" in the system adding SBOM data to a Python package to create their own SBOM document inside the Python package.
This means they can choose any SBOM standard (although we'll recommend sticking to a well-known one like CycloneDX and SPDX) and that intermediate tools won't need to "merge" SBOM data together. Avoiding this merging is extremely important, because cross-standard SBOM data merges are a very hard problem. This problem is deferred to SBOM generation tools which already need to support multiple SBOM standards.
- Pure-Python projects that don't vendor software are easy, there's nothing to do here.
- Python projects that vendor software can annotate that software using an SBOM and specify the SBOM in pyproject.toml. Keeping this up-to-date is a non-zero amount of work, but I am hoping that by providing this PEP it will enable these types of contributions. I'm also hoping to provide a lightweight pre-commit hook to help keeping these SBOM documents up-to-date, similar to what CPython already uses.
- Python project which use a build backend that pull dependencies should be able to annotate what those dependencies are at build time. There will be exceptions, looking into tools like Meson and multibuild to see what can be done.
- Python bundling tools like auditwheel, delocate, etc can annotate shared libraries and DLLs that are pulled into wheels.
My hope is that the most difficult part of this work (manually annotating a package if automatic tools can't) will enable a new type of contribution from users of Python packages to provide SBOM data. Previously there was no standardized method to have SBOM data propagate through Python packages, thus discouraged this type of contribution.
If you're interested in having your use-case covered or you have concerns about the approach, please open a GitHub issue on the project tracker.
That's all for this post! đ If you're interested in more you can read the last report.
Have thoughts or questions? Let's chat over email or social:
sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org
Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!
Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.
Find a typo? This blog is open source, pull requests are appreciated.
Thanks for reading! ⥠This work is licensed under CC BY-SA 4.0
︎ImageX: Instantly Enhance Your Website with Drupal Recipes for Exciting Features
Authored by Nadiia Nykolaichuk.
An exciting recipe is brewing in the Drupal kitchen. Picture a cookbook filled with delightful dishes, each requiring just one simple step. Similarly, Drupal users will soon enjoy the ability to add valuable functionalities to their websites with a single click, thanks to Recipes.
FSF Events: Free Software Directory meeting on IRC: Friday, November 22, starting at 12:00 EST (17:00 UTC)
Metadrop: Artisan Drupal SDC theme: What you need to know
Artisan is a Drupal base theme built on Bootstrap 5 and Sass. It offers easy theme configurations, theme presets (or variants), and extensive use of CSS variables.
Why Artisan?The inspiration for Artisan comes from Radix, a well-known theme we used for a long time. However, once you master something that is not directly tailored to your needs, you may start to wish for changesâsmall ones at first, but larger ones over time. For example, we found ourselves overwriting too many base templates for our Drupal projects. We wanted the templates provided by the base theme to be extensible enough to avoid being discarded based on the needs of specific projects. In the end, we decided to create our own theme.
The main goal of the Artisan base theme is to provide a foundation that allows most of its components to be reused without requiring complete overwrites in the custom theme of a specific project. To achieve this, Artisan offers a functional design base that is easily extensible, as explained below.
Artisan also makes extensive use of CSS custom properties (commonly known as CSS variables) to fully leverage their benefits. By using these variables, you can easily reuse styles across your project, ensuring greater design consistency. Additionally, they simplifyâŠ
Django Weblog: 2024 Django Developers Survey
The DSF is once again partnering with JetBrains to run the 2024 Django Developers Survey đ
Please take a moment to fill it out! It should only take about 10 minutes to complete. Itâs an important metric of Django usage, and is immensely helpful to guide future technical and community decisions.
The survey will be open until December 21st, 2024. After the survey is over, we will publish the aggregated results. JetBrains will also randomly choose 10 winners (from those who complete the survey in its entirety with meaningful answers), who will each receive a $100 Amazon Gift Card or a local equivalent.
How you can helpTake a moment to re-share the survey on socials, and with your respective communities? The more diverse the answers, the better the results for all of us.
Thank you for taking the time to contribute to this community effort, and thank you to JetBrains for their consistent support over the years!