Feeds
Eli Bendersky: GoMLX: ML in Go without Python
In the previous post I talked about running ML inference in Go through a Python sidecar process. In this post, let's see how we can accomplish the same tasks without using Python at all.
How ML models are implementedLet's start with a brief overview of how ML models are implemented under the hood [1]. The model is typically written in Python, using one of the ML frameworks like TensorFlow, JAX or PyTorch. The framework takes care of at least 2 high-level concerns for developers:
- Expressive way to describe the model architecture, including auto-differentiation for training.
- Efficient implementation of computational primitives on common HW: CPUs, GPUs and TPUs.
In-between these two concerns there exists a standardized model definition format (or several) that helps multiple tools interoperate. While it's by no means the only solution [2], let's look at the OpenXLA stack as a way to run models on diverse hardware:
- The top layer are the frameworks that provide high-level primitives to define ML models, and translate them to a common interchange format called StableHLO (where "HLO" stands for High-Level Operations). I've added the gopher on the very right - it will soon become clear why.
- The bottom layer is the HW that executes these models efficiently.
- In the middle is the OpenXLA system, which includes two major components: the XLA compiler translating HLO to HW machine code, and PJRT - the runtime component responsible for managing HW devices, moving data (tensors) between the host CPU and these devices, executing tasks, sharding and so on.
There's a huge amount of complexity hidden by the bottom layers of this diagram. Efficient compilation and code generation for diverse HW - including using fixed blocks and libraries (like cuDNN), runtime management etc. All of this is really something one shouldn't try to re-implement unless there's a really, really good reason to do so. And the best part? There's no Python there - this is C and C++; Python only exists on the upper layer - in the high-level ML frameworks.
GoMLXGoMLX is a relatively new Go package for ML that deserves some attention. GoMLX slots in as one of the frameworks, exactly where the Gopher is in the diagram above [3]. This is absolutely the right approach to the problem. There's no point in re-implementing the low-level primitives - whatever works for TF and JAX will work for Go as well! Google, NVIDIA, Intel and several other companies invest huge resources into these systems, and it's a good idea to benefit from these efforts.
In this post I will showcase re-implementations of some of the samples from the previous post, but with no Python in sight. But first, a few words about what GoMLX does.
GoMLX should be familiar if you've used one of the popular Python ML frameworks. You build a computational graph representing your model - the usual operations are supported and sufficient to implement anything from linear regression to cutting-edge transformers. Since GoMLX wraps XLA, it has access to all the same building blocks TF and JAX use (and it adds its own higher-level primitives, similarly to the Python frameworks).
GoMLX supports automatic differentiation to create the backward propagation operations required to update weights in training. It also provides many helpers for training and keeping track of progress, as well as Jupyter notebook support.
An image model for the CIFAR-10 dataset with GoMLXIn the previous post we built a CNN (convolutional neural network) model using TF+Keras in Python, and ran its inference in a sidecar process we could control from Go.
Here, let's build a similar model in Go, without using Python at all; we'll be training it on the same CIFAR-10 dataset we've used before.
The full code for this sample is here; it is heavily based on GoMLX's own example, with some modifications for simplicity and clarity. Here's the code defining the model graph:
func C10ConvModel(mlxctx *mlxcontext.Context, spec any, inputs []*graph.Node) []*graph.Node { batchedImages := inputs[0] g := batchedImages.Graph() dtype := batchedImages.DType() batchSize := batchedImages.Shape().Dimensions[0] logits := batchedImages layerIdx := 0 nextCtx := func(name string) *mlxcontext.Context { newCtx := mlxctx.Inf("%03d_%s", layerIdx, name) layerIdx++ return newCtx } // Convolution / activation layers logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 32, 32, 32) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.3), true) logits.AssertDims(batchSize, 16, 16, 32) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 8, 8, 64) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 4, 4, 128) // Flatten logits, and apply dense layer logits = graph.Reshape(logits, batchSize, -1) logits = layers.Dense(nextCtx("dense"), logits, true, 128) logits = activations.Relu(logits) logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) numClasses := 10 logits = layers.Dense(nextCtx("dense"), logits, true, numClasses) return []*graph.Node{logits} }As you might expect, the Go code is longer and more explicit (nodes are threaded explicitly between builder calls, instead of being magically accumulated). It's not hard to envision a Keras-like high level library on top of this.
Here's a snippet from the classifier (inference):
func main() { flagCheckpoint := flag.String("checkpoint", "", "Directory to load checkpoint from") flag.Parse() mlxctx := mlxcontext.New() backend := backends.New() _, err := checkpoints.Load(mlxctx).Dir(*flagCheckpoint).Done() if err != nil { panic(err) } mlxctx = mlxctx.Reuse() // helps sanity check the loaded context exec := mlxcontext.NewExec(backend, mlxctx.In("model"), func(mlxctx *mlxcontext.Context, image *graph.Node) *graph.Node { // Convert our image to a tensor with batch dimension of size 1, and pass // it to the C10ConvModel graph. image = graph.ExpandAxes(image, 0) // Create a batch dimension of size 1. logits := cnnmodel.C10ConvModel(mlxctx, nil, []*graph.Node{image})[0] // Take the class with highest logit value, then remove the batch dimension. choice := graph.ArgMax(logits, -1, dtypes.Int32) return graph.Reshape(choice) }) // classify takes a 32x32 image and returns a Cifar-10 classification according // to the models. Use C10Labels to convert the returned class to a string // name. The returned class is from 0 to 9. classify := func(img image.Image) int32 { input := images.ToTensor(dtypes.Float32).Single(img) outputs := exec.Call(input) classID := tensors.ToScalar[int32](outputs[0]) return classID } // ...Now classify is a function that takes an image.Image and runs it through the network, returning the index of the most likely label out of the list of CIFAR-10 labels.
The README file in the sample explains how to run it locally on a GPU; the model trains and runs successfully, with similar results to the TF+Keras model we trained in Python earlier.
Gemma2 with GoMLXFor a (much) more involved example, GoMLX has a full implementation of Gemma2 inference. The model implementation itself is in the transformers package. It should look fairly familiar if you've seen a transformer implementation in another language.
The official example in that repository shows how to run it with weights downloaded from HuggingFace; since I've already downloaded the Gemma2 weights from Kaggle for the previous post, here's a simple adaptation:
var ( flagDataDir = flag.String("data", "", "dir with converted weights") flagVocabFile = flag.String("vocab", "", "tokenizer vocabulary file") ) func main() { flag.Parse() ctx := context.New() // Load model weights from the checkpoint downloaded from Kaggle. err := kaggle.ReadConvertedWeights(ctx, *flagDataDir) if err != nil { log.Fatal(err) } // Load tokenizer vocabulary. vocab, err := sentencepiece.NewFromPath(*flagVocabFile) if err != nil { log.Fatal(err) } // Create a Gemma sampler and start sampling tokens. sampler, err := samplers.New(backends.New(), ctx, vocab, 256) if err != nil { log.Fatalf("%+v", err) } start := time.Now() output, err := sampler.Sample([]string{ "Are bees and wasps similar?", }) if err != nil { log.Fatalf("%+v", err) } fmt.Printf("\tElapsed time: %s\n", time.Since(start)) fmt.Printf("Generated text:\n%s\n", strings.Join(output, "\n\n")) }The complete code together with installation and setup instructions is here.
gomlx/gemma demonstrates that GoMLX has sufficiently advanced capabilities to run a real production-grade open LLM, without Python in the loop.
SummaryThe previous post discussed some options for incorporating ML inference into a Go project via a minimal Python sidecar process. Here, we take it a step further and implement ML inference in Go without using Python. We do so by leveraging GoMLX, which itself relies on XLA and PJRT to do the heavy lifting.
If we strip down a framework like TensorFlow to its layers, GoMLX reuses the bottom layers (which is where most of the magic lies), and replaces the model builder library with a Go variant.
Since GoMLX is still a relatively new project, it may be a little risky for production uses at this point. That said, I find this direction very promising and will be following the project's development with interest.
CodeThe full code for the samples in this post is on GitHub.
[1]This assumes you know the basics of neural network graphs, their training, etc. If not, check out this post and some of my other posts in the Machine Learning category. [2]It's likely the most common production solution, and pretty much the only way to access Google's TPUs. [3]It does so by including Go bindings for both XLA and PJRT; these are wrapped in higher-level APIs for users.parallel @ Savannah: GNU Parallel 20241122 ('Ahoo Daryaei') released
GNU Parallel 20241122 ('Ahoo Daryaei') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
GNU parallel is so satisfying
-- James Coman @jcoman.bsky.social
New in this release:
- --pipe --block works similar to --pipepart --block if --block size is negative.
- DBURLs can be written with / instead of %2F for sqlite and CSV.
- Bug fixes and man page updates.
News about GNU Parallel:
- Embarrassingly GNU parallel https://dengin.xyz/blog/2024/10/24/embarrassingly-gnu-parallel/
- GNU Parallel for Your Terminal Tasks https://erolrecep.github.io/posts/gnuparallel_for_your_terminal_tasks/
- How to leverage GNU parallel to utilize multiple cores while running AUGUSTUS https://lifescienceshub.wixsite.com/lifesciencehub/post/how-to-leverage-gnu-parallel-to-utilize-multiple-cores-while-running-augustus
- GNU Parallel: The Good Parts https://diekmeier.de/posts/2024-11-17-gnu-parallel/
- Put your CPU to work with GNU Parallel https://www.redhat.com/en/blog/gnu-parallel
GNU Parallel - For people who live life in the parallel lane.
If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
For example you can run this to convert all jpeg files into png and gif files and have a progress bar:
parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif
Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:
find . -name '*.jpg' |
parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.
If you like GNU Parallel:
- Give a demo at your local user group/team/colleagues
- Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
- Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
- Request or write a review for your favourite blog or magazine
- Request or build a package for your favourite distribution (if it is not already there)
- Invite me for your next conference
If you use programs that use GNU Parallel for research:
- Please cite GNU Parallel in you publications (use --citation)
If GNU Parallel saves you money:
- (Have your company) donate to FSF https://my.fsf.org/donate/
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.
Matthew Palmer: Your Release Process Sucks
For the past decade-plus, every piece of software I write has had one of two release processes.
Software that gets deployed directly onto servers (websites, mostly, but also the infrastructure that runs Pwnedkeys, for example) is deployed with nothing more than git push prod main. I’ll talk more about that some other day.
Today is about the release process for everything else I maintain – Rust / Ruby libraries, standalone programs, and so forth. To release those, I use the following, extremely intricate process:
-
Create an annotated git tag, where the name of the tag is the software version I’m releasing, and the annotation is the release notes for that version.
-
Run git release in the repository.
-
There is no step 3.
Yes, it absolutely is that simple. And if your release process is any more complicated than that, then you are suffering unnecessarily.
But don’t worry. I’m from the Internet, and I’m here to help.
Sidebar: “annotated what-now?!?”The annotated tag is one git’s best-kept secrets. They’ve been available in git for practically forever (I’ve been using them since at least 2014, which is “practically forever” in software development), yet almost everyone I mention them to has never heard of them.
A “tag”, in git parlance, is a repository-unique named label that points to a single commit (as identified by the commit’s SHA1 hash). Annotating a tag is simply associating a block of free-form text with that tag.
Creating an annotated tag is simple-sauce: git tag -a tagname will open up an editor window where you can enter your annotation, and git tag -a -m "some annotation" tagname will create the tag with the annotation “some annotation”. Retrieving the annotation for a tag is straightforward, too: git show tagname will display the annotation along with all the other tag-related information.
Now that we know all about annotated tags, let’s talk about how to use them to make software releases freaking awesome.
Step 1: Create the Annotated Git TagAs I just mentioned, creating an annotated git tag is pretty simple: just add a -a (or --annotate, if you enjoy typing) to your git tag command, and WHAM! annotation achieved.
Releases, though, typically have unique and ever-increasing version numbers, which we want to encode in the tag name. Rather than having to look at the existing tags and figure out the next version number ourselves, we can have software do the hard work for us.
Enter: git-version-bump. This straightforward program takes one mandatory argument: major, minor, or patch, and bumps the corresponding version number component in line with Semantic Versioning principles. If you pass it -n, it opens an editor for you to enter the release notes, and when you save out, the tag is automagically created with the appropriate name.
Because the program is called git-version-bump, you can call it as a git command: git version-bump. Also, because version-bump is long and unwieldy, I have it aliased to vb, with the following entry in my ~/.gitconfig:
[alias] vb = version-bump -nOf course, you don’t have to use git-version-bump if you don’t want to (although why wouldn’t you?). The important thing is that the only step you take to go from “here is our current codebase in main” to “everything as of this commit is version X.Y.Z of this software”, is the creation of an annotated tag that records the version number being released, and the metadata that goes along with that release.
Step 2: Run git releaseAs I said earlier, I’ve been using this release process for over a decade now. So long, in fact, that when I started, GitHub Actions didn’t exist, and so a lot of the things you’d delegate to a CI runner these days had to be done locally, or in a more ad-hoc manner on a server somewhere.
This is why step 2 in the release process is “run git release”. It’s because historically, you can’t do everything in a CI run. Nowadays, most of my repositories have this in the .git/config:
[alias] release = push --tagsOlder repositories which, for one reason or another, haven’t been updated to the new hawtness, have various other aliases defined, which run more specialised scripts (usually just rake release, for Ruby libraries), but they’re slowly dying out.
The reason why I still have this alias, though, is that it standardises the release process. Whether it’s a Ruby gem, a Rust crate, a bunch of protobuf definitions, or whatever else, I run the same command to trigger a release going out. It means I don’t have to think about how I do it for this project, because every project does it exactly the same way.
The Wiring Behind the ButtonIt wasn’t the button that was the problem. It was the miles of wiring, the hundreds of miles of cables, the circuits, the relays, the machinery. The engine was a massive, sprawling, complex, mind-bending nightmare of levers and dials and buttons and switches. You couldn’t just slap a button on the wall and expect it to work. But there should be a button. A big, fat button that you could press and everything would be fine again. Just press it, and everything would be back to normal.
- Red Dwarf: Better Than Life
Once you’ve accepted that your release process should be as simple as creating an annotated tag and running one command, you do need to consider what happens afterwards. These days, with the near-universal availability of CI runners that can do anything you need in an isolated, reproducible environment, the work required to go from “annotated tag” to “release artifacts” can be scripted up and left to do its thing.
What that looks like, of course, will probably vary greatly depending on what you’re releasing. I can’t really give universally-applicable guidance, since I don’t know your situation. All I can do is provide some of my open source work as inspirational examples.
For starters, let’s look at a simple Rust crate I’ve written, called strong-box. It’s a straightforward crate, that provides ergonomic and secure cryptographic functionality inspired by the likes of NaCl. As it’s just a crate, its release script is very straightforward. Most of the complexity is working around Cargo’s inelegant mandate that crate version numbers are specified in a TOML file. Apart from that, it’s just a matter of building and uploading the crate. Easy!
Slightly more complicated is action-validator. This is a Rust CLI tool which validates GitHub Actions and Workflows (how very meta) against a published JSON schema, to make sure you haven’t got any syntax or structural errors. As not everyone has a Rust toolchain on their local box, the release process helpfully build binaries for several common OSes and CPU architectures that people can download if they choose. The release process in this case is somewhat larger, but not particularly complicated. Almost half of it is actually scaffolding to build an experimental WASM/NPM build of the code, because someone seemed rather keen on that.
Moving away from Rust, and stepping up the meta another notch, we can take a look at the release process for git-version-bump itself, my Ruby library and associated CLI tool which started me down the “Just Tag It Already” rabbit hole many years ago. In this case, since gemspecs are very amenable to programmatic definition, the release process is practically trivial. Remove the boilerplate and workarounds for GitHub Actions bugs, and you’re left with about three lines of actual commands.
These approaches can certainly scale to larger, more complicated processes. I’ve recently implemented annotated-tag-based releases in a proprietary software product, that produces Debian/Ubuntu, RedHat, and Windows packages, as well as Docker images, and it takes all of the information it needs from the annotated tag. I’m confident that this approach will successfully serve them as they expand out to build AMIs, GCP machine images, and whatever else they need in their release processes in the future.
Objection, Your Honour!I can hear the howl of the “but, actuallys” coming over the horizon even as I type. People have a lot of Big Feelings about why this release process won’t work for them. Rather than overload this article with them, I’ve created a companion article that enumerates the objections I’ve come across, and answers them. I’m also available for consulting if you’d like a personalised, professional opinion on your specific circumstances.
DVD Bonus Feature: Pre-releasesUnless you’re addicted to surprises, it’s good to get early feedback about new features and bugfixes before they make it into an official, general-purpose release. For this, you can’t go past the pre-release.
The major blocker to widespread use of pre-releases is that cutting a release is usually a pain in the behind. If you’ve got to edit changelogs, and modify version numbers in a dozen places, then you’re entirely justified in thinking that cutting a pre-release for a customer to test that bugfix that only occurs in their environment is too much of a hassle.
The thing is, once you’ve got releases building from annotated tags, making pre-releases on every push to main becomes practically trivial. This is mostly due to another fantastic and underused Git command: git describe.
How git describe works is, basically, that it finds the most recent commit that has an associated annotated tag, and then generates a string that contains that tag’s name, plus the number of commits between that tag and the current commit, with the current commit’s hash included, as a bonus. That is, imagine that three commits ago, you created an annotated release tag named v4.2.0. If you run git describe now, it will print out v4.2.0-3-g04f5a6f (assuming that the current commit’s SHA starts with 04f5a6f).
You might be starting to see where this is going. With a bit of light massaging (essentially, removing the leading v and replacing the -s with .s), that string can be converted into a version number which, in most sane environments, is considered “newer” than the official 4.2.0 release, but will be superceded by the next actual release (say, 4.2.1 or 4.3.0). If you’re already injecting version numbers into the release build process, injecting a slightly different version number is no work at all.
Then, you can easily build release artifacts for every commit to main, and make them available somewhere they won’t get in the way of the “official” releases. For example, in the proprietary product I mentioned previously, this involves uploading the Debian packages to a separate component (prerelease instead of main), so that users that want to opt-in to the prerelease channel simply modify their sources.list to change main to prerelease. Management have been extremely pleased with the easy availability of pre-release packages; they’ve been gleefully installing them willy-nilly for testing purposes since I rolled them out.
In fact, even while I’ve been writing this article, I was asked to add some debug logging to help track down a particularly pernicious bug. I added the few lines of code, committed, pushed, and went back to writing. A few minutes later (next week’s job is to cut that in-process time by at least half), the person who asked for the extra logging ran apt update; apt upgrade, which installed the newly-built package, and was able to progress in their debugging adventure.
Continuous Delivery: It’s Not Just For Hipsters.
“+1, Informative”Hopefully, this has spurred you to commit your immortal soul to the Church of the Annotated Tag. You may tithe by buying me a refreshing beverage. Alternately, if you’re really keen to adopt more streamlined release management processes, I’m available for consulting engagements.
Matthew Palmer: Invalid Excuses for Why Your Release Process Sucks
In my companion article, I made the bold claim that your release process should consist of no more than two steps:
-
Create an annotated Git tag;
-
Run a single command to trigger the release pipeline.
As I have been on the Internet for more than five minutes, I’m aware that a great many people will have a great many objections to this simple and straightforward idea. In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid.
If you have an objection I don’t cover here, the comment box is down the bottom of the article. If you think you’ve got a real stumper, I’m available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, I’ll waive my fees.
“But I automatically generate my release notes from commit messages!”This one is really easy to solve: have the release note generation tool feed directly into the annotation. Boom! Headshot.
“But all these files need to be edited to make a release!”No, they absolutely don’t. But I can see why you might think you do, given how inflexible some packaging environments can seem, and since “that’s how we’ve always done it”.
Language PackagesMost languages require you to encode the version of the library or binary in a file that you want to revision control. This is teh suck, but I’m yet to encounter a situation that can’t be worked around some way or another.
In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (that’s part of git-version-bump, as an aside) to calculate the version number from the git tags. The Rust build tool, Cargo, uses a TOML file, which isn’t as easy, but a small amount of release automation is used to take care of that.
Distribution PackagesIf you’re building Linux distribution packages, you can easily apply similar automation faffery. For example, Debian packages take their metadata from the debian/changelog file in the build directory. Don’t keep that file in revision control, though: build it at release time. Everything you need to construct a Debian (or RPM) changelog is in the tag – version numbers, dates, times, authors, release notes. Use it for much good.
The Dreaded ChangelogFinally, there’s the CHANGELOG file. If it’s maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an “Unreleased” heading at the top. It’s one more place to remember to have to edit when making that “preparing release X.Y.Z” commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of “every commit must add a changelog entry”.
My solution: just burn it to the ground. Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if that’s your bag) and never open it ever again.
“But I need to know other things about my release, too!”For some reason, you might think you need some other metadata about your releases. You’re probably wrong – it’s amazing how much information you can obtain or derive from the humble tag – so think creatively about your situation before you start making unnecessary complexity for yourself.
But, on the off chance you’re in a situation that legitimately needs some extra release-related information, here’s the secret: structured annotation. The annotation on a tag can be literally any sequence of octets you like. How that data is interpreted is up to you.
So, require that annotations on release tags use some sort of structured data format (say YAML or TOML – or even XML if you hate your release manager), and mandate that it contain whatever information you need. You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesn’t meet the requirements, and you’re sorted.
“But I have multiple packages in my repo, with different release cadences and versions!”This one is common enough that I just refer to it as “the monorepo drama”. Personally, I’m not a huge fan of monorepos, but you do you, boo. Annotated tags can still handle it just fine.
The trick is to include the package name being released in the tag name. So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z. The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.
“But we don’t semver our releases!”Oh, that’s easy. The tag pattern that marks a release doesn’t have to be vX.Y.Z. It can be anything you want.
Relatedly, there is a (rare, but existent) need for packages that don’t really have a conception of “releases” in the traditional sense. The example I’ve hit most often is automatically generated “bindings” packages, such as protobuf definitions. The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) you’re using. But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change.
The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day).
This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.
Th-th-th-th-that’s all, folks!I hope you’ve enjoyed this bit of mild debunking. Show your gratitude by buying me a refreshing beverage, or purchase my professional expertise and I’ll answer all of your questions and write all your CI jobs.
EuroPython Society: 2024 General Assembly Announcement
We’re excited to invite you to this year’s General Assembly meeting! We’ll gather on Sunday, December 1st, 2024, from 20:00 to 21:00 CET. Just like in recent years, we’ll use Zoom, and additional joining instructions will be shared closer to the date.
The General Assembly is the highest decision making body of the society and EPS membership is required to participate. Membership is open to individuals who wish to actively engage in implementing the EPS mission. If you want to become a member of EuroPython Society you can sign-up here: https://www.europython-society.org/application/
You can find more details about the agenda of the meeting, as it is defined in our bylaws here https://www.europython-society.org/bylaws/ (Article 8).
One of the items on the Agenda is electing the new Board.
What does the Board do?The Board consists of a chairperson, a vice chairperson and 2-7 Board members. The duties and responsibilities of the Board are substantial: the board collectively takes up the fiscal and legal responsibility of the Society.
A major topic is the annual EuroPython conference. While we would like to transition to a model with an independent organising team, we are not there yet. Therefore, the Board still needs to be involved in the conference organisation.
Beyond the conference, the Board also manages several critical areas, including:
- Managing EPS membership
- Overseeing finances and budgets
- Running the grant programme
- Maintaining infrastructure and resources
Furthermore, specifically for 2025, and following the recommendation from the previous Board, we would like to focus on four key topics that are important for the Society&aposs future and sustainability:
- Hiring an Event Manager/Coordinator
- Selecting a location for 2026 and possibly 2027
- Strengthen community outreach
- Improving the fiscal and legal framework
The Society is entirely volunteer-driven and serving on the board requires a significant time commitment. Everyone has a different schedule, so most of the work is usually done asynchronously. However, all board members attend the 1.5-hour board call held every two weeks in the evening, CE(S)T timezone. Everyone&aposs time is valuable and please consider that the less time or effort you can dedicate, the more the workload may shift to other Board members.
All things considered you will need a few hours every week.
Who should apply?You want to invest your time and knowledge into building a better structure for the EuroPython Society? Or you want to work on building connections between different Python-based communities? Then this might be for you! Please keep in mind the time commitments mentioned above.
You are not expected to be perfect in any of the skills needed and you will be supported in learning how things work. That being said, having experience in a non-profit organisation, whether within the Python world (such as EPS, PSF, DSF, local Python communities etc.) or any other similar organisation, would be beneficial for onboarding and understanding the organisational structure, culture and dynamics.
In the past having or willing to learn the following skills helped organising the conference:
- Good communication skills
- Organisation skills
- Experience organising events with more than 1000 people
- Working with volunteer-based communities
- Working in big teams
You get the chance to shape and influence the future of EuroPython
You gain skills useful to run non-profits in different European countries - including cross border challenges
You can help grow and empower local communities
You can build relationships and connections with fellow community members
You can build a more diverse and inclusive Python community by serving the mission of EuroPython Society
I am interested, what should I do?If you’re considering running for the Board or nominating another EPS member, we’d love to hear from you! Although the formal deadline is during the General Assembly, we kindly request you send your nomination as early as possible to board@europython.eu. We will publish the initial list of candidates on Tuesday, 26th of November 2024. If you’re not sure if this is a good idea or not – please email anyway and we will help you figure it out! 🙂
If you&aposre on our EPS Organisers&apos Discord, there&aposs a dedicated channel for interested candidates. Please ask in the general channel, and we’ll be happy to add you.
You can find examples of previous nominations here: https://www.europython-society.org/list-of-eps-board-candidates-for-2023-2024/.
Your nomination should highlight why you want to run for the Board. What is your vision for EPS and in which projects you want to be involved. During the General Assembly, you will have the opportunity to introduce yourself and share with our members why you believe they should vote for you. Each candidate will typically be given one minute to present themselves before members cast their votes.
It sounds a lot, I want to help, but I can’t commit to thatThat’s completely understandable! Serving on the Board comes with significant responsibilities, time commitments, and administrative tasks. If that’s not the right fit for you, but you’re still interested in supporting us, we’d love your help! There are many other ways to get involved. We have several teams (see 2024 Teams Description document, as an example) that work on conference preparations during the months leading up to the event, and we also need volunteers to assist onsite during the conference.
Your help does not need to be limited to the conference. Infrastructure and connections need to be maintained all around the year for example. Your time and support would make a big difference! Stay tuned to our social platforms for announcements about these opportunities.
mark.ie: My LocalGov Drupal contributions for week-ending November 22nd, 2024
This week, lots of work on the LocalGov News module.
Web Review, Week 2024-47
Let’s go for my web review for the week 2024-47.
The Big Data Center Water ProblemTags: tech, hardware, ecology, economics, energy, water
We always think about the energy consumption, but large data centers gobble billion liters of water too. This would need to be improved.
https://www.asianometry.com/p/the-big-data-center-water-problem
Tags: tech, vr, hardware, foss
Nice to see open hardware for VR hitting such a price point.
Tags: tech, social-media, fediverse, tools
You’re on the fediverse and you want to reach out bluesky users? This might be the right tool for you (unclear if it’ll scale yet though). At least if and when Bluesky turns bad, people will know where to reach friends next.
Tags: tech, social-media, business, politics
Excellent post showing reasons to be skeptical about Bluesky’s future. Despite all their likely sincere claims I don’t see how they’ll escape enclosure and enshittification when their sketchy VCs will want to see money back.
https://www.tbray.org/ongoing/When/202x/2024/11/15/Not-Bluesky
Tags: tech, social-media, politics, twitter
Sad to see people predominantly jumping from Twitter to other tech moguls walled gardens. This feels more and more like a missed opportunity for the fediverse. That said I’m amazed at how efficient Musk has been at killing the network effect of his platform. This proves it’s actually doable.
Tags: tech, social-media, politics, twitter
This is what we get for refusing to regulate social media and for not auditing their algorithms. Their owners can game and bias the platforms as they see fit for their own gains. They became massive forces of manipulation in the process.
https://eprints.qut.edu.au/253211/
Tags: tech, ai, machine-learning, gpt, vendor-lockin
Good reminder that models shouldn’t be used as a service except maybe for prototyping. This has felt obvious to me since the beginning of this hype cycle… but here we are people are falling in the trap today.
https://adriano.fyi/posts/chatgpt-is-slipping/
Tags: tech, python, performance, pandas, data, data-science
OK, the numbers are indeed impressive. And it’s API is fully compatible apparently, looks like a good replacement if you got Pandas code around.
https://hwisnu.bearblog.dev/fireducks-pandas-but-100x-faster/
Tags: tech, tools, debugging
Looks like a nice tool. Maybe it’ll replace my trusty cgdb in some cases.
https://github.com/epasveer/seer
Tags: tech, c++, security
Will we see more deployments of C++ standard library with bound checking by default? It definitely looks tempting.
https://security.googleblog.com/2024/11/retrofitting-spatial-safety-to-hundreds.html?m=1
Tags: tech, php, security
Seeing the amount of PHP code open on the internet, it’s indeed important to harden the runtime (at long last).
https://dustri.org/b/upcoming-hardening-in-php.html
Tags: tech, graphics, gpu
Really nice in depth post. Everything you ever wanted to know about antialiasing but didn’t dare asking.
https://blog.frost.kiwi/analytical-anti-aliasing/
Tags: tech, framework, career, learning
Good advice, no one should be a “React developer”. Make sure you learn more fundamental skills.
https://www.keithcirkel.co.uk/i-dont-have-time-to-learn-react/
Tags: tech, craftsmanship, learning
If you’re just doing the minimum to deal with a task to “mark it done” you’re probably not doing enough and missing out on learning opportunities.
https://edanparker.hashnode.dev/going-a-little-further
Tags: tech, career, learning, engineering
This can change from organization to organization. This post proposes a career ladder which will work in some contexts. What’s clear is that it’s all about scope and impact.
https://matt.blwt.io/post/what-is-a-senior-engineer-anyway/
Tags: tech, engineering, management, learning
Interesting tips to keep learning on the technical side of the job as you get more managerial responsibilities.
Bye for now!
Real Python: The Real Python Podcast – Episode #229: The Joy of Tinkering & Python Free-Threading Performance
What keeps your spark alive for developing software and learning Python? Do you like to try new frameworks, build toy projects, or collaborate with other developers? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Talk Python to Me: #486: CSnakes: Embed Python code in .NET
Krita for Android Update
We have updated Krita for Android and ChromeOS in the Google Play Store to 5.2.8, an Android/ChromeOS-only emergency release. This release fixes startup problems that happened on some devices with 5.2.6. Krita 5.2.8 for Android is now available both for beta-track users as well as in the "stable" release track. Note, however, that we still recommend treating Krita on Android as a beta release that might have bugs that impair your work, as well as a user interface that is not optimized for touch devices.
Matt Layman: Huey Background Worker - Building SaaS #207
Brian Perry: Two Modules to Help Tame Large Drupal Menus
Stop me if you've heard this one before. At some point in the life of your Drupal site, you have a menu that has gotten out of control. Dragging and dropping is basically a lost cause, your hand hurts from scrolling, and a sense of dread approaches every time you find yourself in the menu administration screen. If it isn't possible to re-structure the menu to address the root cause, you'll need to turn to other solutions to make menu administration more manageable.
I recently used two modules to address this issue for a client. They may not be a huge surprise to those who have run into this problem repeatedly, but it seemed worth documenting for both future me and also our search engine and LLM overlords.
Big MenuThe first module is Big Menu. The project page on this one seems to be describing the Drupal 7 implementation of the module, which is quite a bit different. The 'modern Drupal' version of the module essentially re-works the menu administration page to focus on a single level of the menu tree at a time. Any menu item that has children will have an 'Edit child items' link that you can drill into. This results in more clicks to get to the item you want to edit, but it makes the menu administration page much more manageable and reduces cognitive load quite a bit.
You can also configure the module to use a different depth for the menu tree, which can be useful if wanted to see more of the menu in a single view. Personally I prefer to go all the way with this one and stick with the single level view that is used by default.
Menu SelectThe Menu Select module addresses the experience of selecting a parent menu item in the menu settings for a node or menu item. By default, this is a select list containing the entire menu, which can get very long. Menu Select replaces this with an autocomplete search and a hierarchal collapsible unordered list.
Bonus: Menu FirstchildMenu Firstchild is a little less about the admin experience, but can be useful in cases where a large menu needs some additional grouping but you don't want to turn to a full mega menu style approach. The module provides an option to have a menu item that doesn't have it's own path, but instead links to its first direct child.
Used together, these modules made a substantial difference in addressing the client's menu administration related feedback.
This was also a reminder of the impact that the ongoing work on Drupal CMS will hopefully have. I'm looking forward to a Drupal CMS future that can theoretically pre-package user experience improvements like these. Or in cases where it might not be the right choice for Drupal CMS, opinionated community developed recipes can be created to address common use cases like this one.
Seth Michael Larson: Visualizing the Python package SBOM data flow
Published 2024-11-22 by Seth Larson
Reading time: minutes
TLDR: Skip intro, take me to the visualization!
I'm working on improving measurability of Python packages by allowing Software Bill-of-Materials documents (SBOM) to be included in Python packages so that projects and build tools can record information about a package for downstream use.
This is a cross-functional project where I need input from Python projects, Python packaging tools (build backends+tools and installers), but also from folks completely outside the Python community like SBOM tooling maintainers. With projects like this, it can be difficult to "see the forest through the trees". When you're reviewing the packaging PEP, it can be difficult to imagine how or who is using the new standard. This article is to help visualize the end-to-end data flow.
How SBOM data will be included in Python packagesIn short, the proposal is:
- Allow Python projects to manually specify SBOM documents in pyproject.toml with [project].sbom-files = ["..."]
- Allow Python package archives to include self-describing SBOM documents and reference them in metadata via Sbom-File field.
- Zero-or-more SBOM documents per Python package archive. Each tool adding SBOM data creates a new SBOM inside the archive to avoid conflicts. End-user SBOM tools need to handle multiple SBOMs to "stitch" them together.
There are two Python packages being shown, Package A on the left and Package B on the right. Package A depends on Package B. Package A is a pure-Python package with no bundled dependencies. Package B uses binary extensions and uses auditwheel to bundle shared libraries.
@import url(https://fonts.googleapis.com/css2?family=Inter:wght@400;500);
AuditwheelAuditwheelPython EnvironmentPython EnvironmentBuild BackendBuild BackendPythonPackage
Python...Python
Package B
Python...Source ForgeSource ForgeSource Code BSource Code BSBOM GeneratorSBOM GeneratorSrc
SBOMSrc...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...SO /
DLLsSO /...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...Py
Pkg BPy...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...METADATAMETADATAPython
Package B
Python...METADATAMETADATAOperational SBOM (OBOM)Operational SBOM (OBOM)1122335566Package BPackage BDataDataDataDataDataDataDataDataBuild BackendBuild BackendPython
Package A
Python...Source ForgeSource ForgeSource Code ASource Code AMETADATAMETADATAPackage APackage ADataDataPython
Package A
Python...METADATAMETADATAPython Package IndexPython Package Indexinstall_requiresinstall_re...44DEPENDS_ONDEPENDS_ONrefrefrefrefrefrefText is not SVG - cannot display
How SBOM data flows from Python package source code, build, to an SBOM generation tool
Stage 1: If the Python project bundles third-party software in their own source code then the project may specify one or more SBOM documents through project.sbom-files in pyproject.toml. Build backends copy these documents into source distributions and wheels.
Stage 2: If the Python build-backend pulls dependencies (like Maturin and Cargo) while building a wheel those dependencies can be recorded in another SBOM document in the wheel.
Stage 3: If a tool that modifies wheels by adding dependencies is used (like auditwheel) then that tool can record modifications in an SBOM document. At this point there are three separate SBOM documents included in the Package B archive.
Stage 4: Archives are uploaded to an index like PyPI. The index can do some validation of included SBOM documents, if any.
Stage 5: Installers download and install the Python package archives. The SBOM files are placed into the .dist-info/sboms/ directory in the Python environment and referenced in package metadata.
Stage 6: SBOM generation tools scan the Python environment and using existing Python package metadata and new SBOM documents with per-package data stitch together an Operational SBOM (OBOM) detailing the Python environment.
Who does what?The plan is to allow each "actor" in the system adding SBOM data to a Python package to create their own SBOM document inside the Python package.
This means they can choose any SBOM standard (although we'll recommend sticking to a well-known one like CycloneDX and SPDX) and that intermediate tools won't need to "merge" SBOM data together. Avoiding this merging is extremely important, because cross-standard SBOM data merges are a very hard problem. This problem is deferred to SBOM generation tools which already need to support multiple SBOM standards.
- Pure-Python projects that don't vendor software are easy, there's nothing to do here.
- Python projects that vendor software can annotate that software using an SBOM and specify the SBOM in pyproject.toml. Keeping this up-to-date is a non-zero amount of work, but I am hoping that by providing this PEP it will enable these types of contributions. I'm also hoping to provide a lightweight pre-commit hook to help keeping these SBOM documents up-to-date, similar to what CPython already uses.
- Python project which use a build backend that pull dependencies should be able to annotate what those dependencies are at build time. There will be exceptions, looking into tools like Meson and multibuild to see what can be done.
- Python bundling tools like auditwheel, delocate, etc can annotate shared libraries and DLLs that are pulled into wheels.
My hope is that the most difficult part of this work (manually annotating a package if automatic tools can't) will enable a new type of contribution from users of Python packages to provide SBOM data. Previously there was no standardized method to have SBOM data propagate through Python packages, thus discouraged this type of contribution.
If you're interested in having your use-case covered or you have concerns about the approach, please open a GitHub issue on the project tracker.
That's all for this post! 👋 If you're interested in more you can read the last report.
Have thoughts or questions? Let's chat over email or social:
sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org
Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!
Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.
Find a typo? This blog is open source, pull requests are appreciated.
Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0
︎ImageX: Unlocking Drupal Recipes: Instantly Boost Your Website's Features
Authored by Nadiia Nykolaichuk.
An exciting recipe is brewing in the Drupal kitchen. Picture a cookbook filled with delightful dishes, each requiring just one simple step. Similarly, Drupal users will soon enjoy the ability to add valuable functionalities to their websites with a single click, thanks to Recipes.
ImageX: Instantly Enhance Your Website with Drupal Recipes for Exciting Features
Authored by Nadiia Nykolaichuk.
An exciting recipe is brewing in the Drupal kitchen. Picture a cookbook filled with delightful dishes, each requiring just one simple step. Similarly, Drupal users will soon enjoy the ability to add valuable functionalities to their websites with a single click, thanks to Recipes.
FSF Events: Free Software Directory meeting on IRC: Friday, November 22, starting at 12:00 EST (17:00 UTC)
Metadrop: Artisan Drupal SDC theme: What you need to know
Artisan is a Drupal base theme built on Bootstrap 5 and Sass. It offers easy theme configurations, theme presets (or variants), and extensive use of CSS variables.
Why Artisan?The inspiration for Artisan comes from Radix, a well-known theme we used for a long time. However, once you master something that is not directly tailored to your needs, you may start to wish for changes—small ones at first, but larger ones over time. For example, we found ourselves overwriting too many base templates for our Drupal projects. We wanted the templates provided by the base theme to be extensible enough to avoid being discarded based on the needs of specific projects. In the end, we decided to create our own theme.
The main goal of the Artisan base theme is to provide a foundation that allows most of its components to be reused without requiring complete overwrites in the custom theme of a specific project. To achieve this, Artisan offers a functional design base that is easily extensible, as explained below.
Artisan also makes extensive use of CSS custom properties (commonly known as CSS variables) to fully leverage their benefits. By using these variables, you can easily reuse styles across your project, ensuring greater design consistency. Additionally, they simplify…
Django Weblog: 2024 Django Developers Survey
The DSF is once again partnering with JetBrains to run the 2024 Django Developers Survey 🌈
Please take a moment to fill it out! It should only take about 10 minutes to complete. It’s an important metric of Django usage, and is immensely helpful to guide future technical and community decisions.
The survey will be open until December 21st, 2024. After the survey is over, we will publish the aggregated results. JetBrains will also randomly choose 10 winners (from those who complete the survey in its entirety with meaningful answers), who will each receive a $100 Amazon Gift Card or a local equivalent.
How you can helpTake a moment to re-share the survey on socials, and with your respective communities? The more diverse the answers, the better the results for all of us.
Thank you for taking the time to contribute to this community effort, and thank you to JetBrains for their consistent support over the years!
LN Webworks: Drupal Theming: A Comprehensive Guide For Developers
Drupal theming system is one of the most flexible and powerful tools for web developers, especially when it comes to creating visually appealing and highly functional websites. As a Content management system (CMS), drupal provides the best customization capabilities, making it a top choice for developers worldwide.
Today we are going to delve deeper into Drupal's Theming system, and its core component.
In this blog, we'll dive into Drupal’s theming system, its core components, and how LN Webworks, with its expert team, leverages Drupal development services to ensure that every Drupal-based website is not just functional but also visually engaging.
LN Webworks: How To Integrate Pipedrive With Webform: Step By Step Guide
Integrating Pipedrive, a powerful CRM tool, with a Drupal Webform can automate lead capturing, tracking, and data management. By using Webform, we can create a custom form and submit form data directly to Pipedrive, enabling a seamless flow of information from your website to your CRM
In this post, we’ll walk through the process of creating a Drupal Webform and then show how to configure a submit handler to send form data to Pipedrive.
Prerequisites
Before we begin, ensure that you have the following:
- A Pipedrive account and API access (API key).
- A Drupal installation with the Webform module installed and enabled.
The Webform module allows you to create forms and manage submissions in Drupal. To install the Webform module, follow these steps: