Feeds
Golems GABB: Efficient Token Usage in Drupal: Practical Tips and Examples
If you want hassle-free and efficient content generation and management, Drupal is the right choice. With several modules and tokens, it will help you create a dynamic and versatile data environment to cater to your audience’s needs and search engine guidelines. Lamborghini, Doctors Without Borders, and Nokia illustrate how applicable and productive this system is.
You must discover the solution’s features in more detail to get started and advance your content generation strategy. It will come in handy to lead your Drupal scenario from scratch to scratch without difficulty. Stay tuned to find out more about token implementation scenarios at your disposal. Mind the gap!
The Drop Times: Inspiring Inclusion: Celebrating the Women in Drupal | #2
ImageX: Boosting Drupal Website Management Workflows: New Administrative Toolbar Is Coming!
Authored by: Nadiia Nykolaichuk
The administrative navigation toolbar is an essential piece in the puzzle of your website’s overall capability to boost website management tasks. It serves as the guiding compass for your team, leading them across the administrative sections quickly and confidently. To achieve this, the toolbar needs to be intuitive, visually clear, straightforward, logically organized, and well-positioned.
Real Python: Reading and Writing WAV Files in Python
There’s an abundance of third-party tools and libraries for manipulating and analyzing audio WAV files in Python. At the same time, the language ships with the little-known wave module in its standard library, offering a quick and straightforward way to read and write such files. Knowing Python’s wave module can help you dip your toes into digital audio processing.
If topics like audio analysis, sound editing, or music synthesis get you excited, then you’re in for a treat, as you’re about to get a taste of them!
In this tutorial, you’ll learn how to:
- Read and write WAV files using pure Python
- Handle the 24-bit PCM encoding of audio samples
- Interpret and plot the underlying amplitude levels
- Record online audio streams like Internet radio stations
- Animate visualizations in the time and frequency domains
- Synthesize sounds and apply special effects
Although not required, you’ll get the most out of this tutorial if you’re familiar with NumPy and Matplotlib, which greatly simplify working with audio data. Additionally, knowing about numeric arrays in Python will help you better understand the underlying data representation in computer memory.
Click the link below to access the bonus materials, where you’ll find sample audio files for practice, as well as the complete source code of all the examples demonstrated in this tutorial:
Get Your Code: Click here to download the free sample code that shows you how to read and write WAV files in Python.
You can also take the quiz to test your knowledge and see how much you’ve learned:
Take the Quiz: Test your knowledge with our interactive “Reading and Writing WAV Files in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time:
Understand the WAV File FormatIn the early nineties, Microsoft and IBM jointly developed the Waveform Audio File Format, often abbreviated as WAVE or WAV, which stems from the file’s extension (.wav). Despite its older age in computer terms, the format remains relevant today. There are several good reasons for its wide adoption, including:
- Simplicity: The WAV file format has a straightforward structure, making it relatively uncomplicated to decode in software and understand by humans.
- Portability: Many software systems and hardware platforms support the WAV file format as standard, making it suitable for data exchange.
- High Fidelity: Because most WAV files contain raw, uncompressed audio data, they’re perfect for applications that require the highest possible sound quality, such as with music production or audio editing. On the flipside, WAV files take up significant storage space compared to lossy compression formats like MP3.
It’s worth noting that WAV files are specialized kinds of the Resource Interchange File Format (RIFF), which is a container format for audio and video streams. Other popular file formats based on RIFF include AVI and MIDI. RIFF itself is an extension of an even older IFF format originally developed by Electronic Arts to store video game resources.
Before diving in, you’ll deconstruct the WAV file format itself to better understand its structure and how it represents sounds. Feel free to jump ahead if you just want to see how to use the wave module in Python.
The Waveform Part of WAVWhat you perceive as sound is a disturbance of pressure traveling through a physical medium, such as air or water. At the most fundamental level, every sound is a wave that you can describe using three attributes:
- Amplitude is the measure of the sound wave’s strength, which you perceive as loudness.
- Frequency is the reciprocal of the wavelength or the number of oscillations per second, which corresponds to the pitch.
- Phase is the point in the wave cycle at which the wave starts, not registered by the human ear directly.
The word waveform, which appears in the WAV file format’s name, refers to the graphical depiction of the audio signal’s shape. If you’ve ever opened a sound file using audio editing software, such as Audacity, then you’ve likely seen a visualization of the file’s content that looked something like this:
Waveform in AudacityThat’s your audio waveform, illustrating how the amplitude changes over time.
The vertical axis represents the amplitude at any given point in time. The midpoint of the graph, which is a horizontal line passing through the center, represents the baseline amplitude or the point of silence. Any deviation from this equilibrium corresponds to a higher positive or negative amplitude, which you experience as a louder sound.
As you move from left to right along the graph’s horizontal scale, which is the timeline, you’re essentially moving forward in time through your audio track.
Having such a view can help you visually inspect the characteristics of your audio file. The series of the amplitude’s peaks and valleys reflect the volume changes. Therefore, you can leverage the waveform to identify parts where certain sounds occur or find quiet sections that may need editing.
Coming up next, you’ll learn how WAV files store these amplitude levels in digital form.
The Structure of a WAV File Read the full article at https://realpython.com/python-wav-files/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
mark.ie: Using the LocalGov Drupal Subsites Extras module
Create subsites with a different look and feel to the rest of your LocalGov Drupal website.
Haruna 1.0.2
Haruna version 1.0.2 is out.
There are not a lot of changes in this release as the focus was on porting to Qt6 and KF6 and code refactoring. Some hwdec options have been removed, if needed they can be set in the settings under "Custom commands" as set hwdec decoding_method_name and choose "Run at startup".
You can get it now on flathub:
Windows version can be found here. Availability of other package formats depends on your distro and the people who package Haruna.
If you like Haruna then support its development: GitHub Sponsors | Liberapay | PayPal
Feature requests and bugs should be posted on bugs.kde.org, but for bugs make sure to fill in the template and provide as much information as possible.
Changelog: 1.0.2Features:
- Opening items from the playlist is faster
- If Maximum recent files setting is set to zero the recent files are removed from the config file
Bugfixes:
- Opening file through Open File action was not playing the file
- Opening playlist file from playlist header was not doing anything
- Hiding/showing Playlist toolbar setting was not working
- Track sub-menus in Audio and Subtiles global menus being empty
- Freeze when opening HamburgerMenu
LN Webworks: How Can Drupal Commerce Drive Your E-Commerce Revenue to New Heights?
Every e-commerce platform is different and comes with its own unique and specific needs. Some can use simple ready-made tools, while others need special software made just for them. New trends like smart personalization, easy shopping on phones, and caring for the environment are also important for online shops.
To help businesses achieve personalized website goals, Drupal comes into the scene. It's like a toolbox that lets you build your online store just the way you want. Drupal commerce is great because it's flexible and lets you try out new ideas. But there’s more to it.
Today, we'll learn more about what makes Drupal Commerce special, like its features and how it's built.
Talking Drupal: Skills Upgrade #4
Welcome back to “Skills Upgrade” a Talking Drupal mini-series following the journey of a D7 developer learning D10. This is episode 4.
Topics-
Review Chad's goals for the previous week
- Install Drush
- Setup git repo
- Examples module
-
Review Chad's questions
- .gitignore
- Core file naming
-
Tasks for the upcoming week
- Reminder of the capstone goal: create MR for new automated test in contrib module.
- Examples module: field_example. New RGB field type with formatter and widgets. Focus on stuff in field_example/src/Plugin/Field
- Background info on Plugins: https://www.drupal.org/docs/drupal-apis/plugin-api
- Focus on the following sections:
Chad's Drupal 10 Learning Curriclum & Journal Chad's Drupal 10 Learning Notes
The Linux Foundation is offering a discount of 30% off e-learning courses, certifications and bundles with the code, all uppercase DRUPAL24 and that is good until June 5th https://training.linuxfoundation.org/certification-catalog/
HostsAmyJune Hineline - @volkswagenchick
GuestsChad Hester - chadkhester.com @chadkhest Mike Anello - DrupalEasy.com @ultimike
Python GUIs: Q&A: How Do I Display Images in PySide6? — Using QLabel to easily add images to your applications
Adding images to your application is a common requirement, whether you're building an image/photo viewer, or just want to add some decoration to your GUI. Unfortunately, because of how this is done in Qt, it can be a little bit tricky to work out at first.
In this short tutorial, we will look at how you can insert an external image into your PySide6 application layout, using both code and Qt Designer.
Table of Contents Which widget to use?Since you're wanting to insert an image you might be expecting to use a widget named QImage or similar, but that would make a bit too much sense! QImage is actually Qt's image object type, which is used to store the actual image data for use within your application. The widget you use to display an image is QLabel.
The primary use of QLabel is of course to add labels to a UI, but it also has the ability to display an image — or pixmap — instead, covering the entire area of the widget. Below we'll look at how to use QLabel to display a widget in your applications.
Using Qt DesignerFirst, create a MainWindow object in Qt Designer and add a "Label" to it. You can find Label at in Display Widgets in the bottom of the left hand panel. Drag this onto the QMainWindow to add it.
MainWindow with a single QLabel added
Next, with the Label selected, look in the right hand QLabel properties panel for the pixmap property (scroll down to the blue region). From the property editor dropdown select "Choose File…" and select an image file to insert.
As you can see, the image is inserted, but the image is kept at its original size, cropped to the boundaries of theQLabel box. You need to resize the QLabel to be able to see the entire image.
In the same controls panel, click to enable scaledContents.
When scaledContents is enabled the image is resized to the fit the bounding box of the QLabel widget. This shows the entire image at all times, although it does not respect the aspect ratio of the image if you resize the widget.
You can now save your UI to file (e.g. as mainwindow.ui).
To view the resulting UI, we can use the standard application template below. This loads the .ui file we've created (mainwindow.ui) creates the window and starts up the application.
PySide6 import sys from PySide6 import QtWidgets from PySide6.QtUiTools import QUiLoader loader = QUiLoader() app = QtWidgets.QApplication(sys.argv) window = loader.load("mainwindow.ui", None) window.show() app.exec()Running the above code will create a window, with the image displayed in the middle.
QtDesigner application showing a Cat
Using CodeInstead of using Qt Designer, you might also want to show an image in your application through code. As before we use a QLabel widget and add a pixmap image to it. This is done using the QLabel method .setPixmap(). The full code is shown below.
PySide6 import sys from PySide6.QtGui import QPixmap from PySide6.QtWidgets import QMainWindow, QApplication, QLabel class MainWindow(QMainWindow): def __init__(self): super(MainWindow, self).__init__() self.title = "Image Viewer" self.setWindowTitle(self.title) label = QLabel(self) pixmap = QPixmap('cat.jpg') label.setPixmap(pixmap) self.setCentralWidget(label) self.resize(pixmap.width(), pixmap.height()) app = QApplication(sys.argv) w = MainWindow() w.show() sys.exit(app.exec())The block of code below shows the process of creating the QLabel, creating a QPixmap object from our file cat.jpg (passed as a file path), setting this QPixmap onto the QLabel with .setPixmap() and then finally resizing the window to fit the image.
python label = QLabel(self) pixmap = QPixmap('cat.jpg') label.setPixmap(pixmap) self.setCentralWidget(label) self.resize(pixmap.width(), pixmap.height())Launching this code will show a window with the cat photo displayed and the window sized to the size of the image.
QMainWindow with Cat image displayed
Just as in Qt designer, you can call .setScaledContents(True) on your QLabel image to enable scaled mode, which resizes the image to fit the available space.
python label = QLabel(self) pixmap = QPixmap('cat.jpg') label.setPixmap(pixmap) label.setScaledContents(True) self.setCentralWidget(label) self.resize(pixmap.width(), pixmap.height())Notice that you set the scaled state on the QLabel widget and not the image pixmap itself.
ConclusionIn this quick tutorial we've covered how to insert images into your Qt UIs using QLabel both from Qt Designer and directly from PySide6 code.
KTextAddons 1.5.4
Free software wisdom
Every week veteran KDE contributor Kevin Ottens posts a bunch of thought-provoking links on his blog, and last week’s post contained one that I found particularly enlightening:
It’s a collection of wisdom written from someone named Lars Wirzenius who started his software development career decades ago and has seen it all. While I don’t have 40 years of programming under my belt, I do have 16 years in programming, QA, release engineering, and management, and everything Lars wrote rings true to me. I’d encourage everyone to give it a read!
Here are my favorite takeaways:
- Take care of yourself, or else you’re no good to others.
- Useful software is too big to create alone, so your most important skill is the ability to collaborate.
- Write caveman code anyone can understand, unless complexity can be justified by measurably and consistently better performance.
- Do work in small chunks, and repeat.
- Diversity of perspective is important, or else you’ll end up accidentally making something that only works for a narrow slice of people.
- Know who the intended user is, and try to see things from their point of view.
- Developing software is political. Deal with it.
- Learn to write, and write stuff down.
But do check out the whole thing!
PyCoder’s Weekly: Issue #622 (March 26, 2024)
#622 – MARCH 26, 2024
View in Browser »
In this step-by-step tutorial, you’ll use Python’s turtle module to write a Space Invaders clone. You’ll learn about techniques used in animations and games, and consolidate your knowledge of key Python topics.
REAL PYTHON
When trying to remember just where sleep() was in the Python standard library, Ishaan stumbled through the built-in help and learned how to use it to answer just these kinds of questions.
ISHAAN ARORA
Master concise risk reporting for a stronger partnership with your CISO. Translate technical jargon into actionable insights for your CISO with Snyk’s guide on strategies on how to bridge visibility gaps and provide meaningful risk reports →
SNYK.IO sponsor
Ever wonder just how many special methods there are in Python? This post explains all of Python’s 100+ dunder methods and 50+ dunder attributes.
TREY HUNNER
In this video course, you’ll learn how to store and retrieve data using Python, SQLite, and SQLAlchemy as well as with flat files. Using SQLite with Python brings with it the additional benefit of accessing data with SQL. By adding SQLAlchemy, you can work with data in terms of objects and methods.
REAL PYTHON course
The more flexible the language, the more likely you’re going to have a variety of styles in the code. The larger the project the harder it is to manage. This opinion piece explains why having someone dictate how code should look at the language level can be valuable.
ADAM GORDON BELL
Discover how to write elegant, efficient Python code with our FREE eBook “Pybites Python Tips”. From basics to advanced techniques, these 250 actionable insights will transform your coding approach. Perfect for Pythonistas aiming for mastery →
PYBITES sponsor
One of the most useful data structures in Python is the dictionary. In this video course, you’ll practice working with Python dictionaries, see how dictionaries differ from lists and tuples, and define and use dictionaries in your own code.
REAL PYTHON course
An opinion piece on the perils of utilizing notebooks in a production system. It highlights some of their inherent challenges and presents an alternative approach where notebooks can co-exist with a production system.
CHASE GRECO • Shared by Chase Greco
This tutorial conceptually explains the Model-View-Controller (MVC) pattern in Python web apps using Lego bricks. Finally understand this important architecture to streamline your web development process.
REAL PYTHON
Correctly parsing a URL can be tough, in fact the built-in Python functions aren’t fully compliant with the RFC. This post talks about how that is, and a library that gets it right.
TYLER KENNEDY
This post talks about using Python as a prototyping language for more complex projects in other languages. Rather than write pseudo-code, write actual code to test your ideas.
AMJITH
Sameer talks about his use of Go, Python, and Rust, and how their approaches effect your application’s safety, along with how that impacts coding for AI systems.
SAMEER AJMANI
An opinionated list of Django third-party packages that Will (author of Django for Beginners) uses to add features to his Django web projects.
WILL VINCENT
Numba can make your numeric code faster, but only if you use it right. Learn what “right” means and what to avoid.
ITAMAR TURNER-TRAURING
This tutorial explains various methods for checking and correcting English language grammatical errors using Python.
DEEPANSHU BHALLA
Progress on WASI and CPython continues. Brett gives a summary of changes since last year’s post.
BRETT CANNON
GITHUB.COM/POMPONCHIK • Shared by Evgeniy Blinov
hancho: A Simple, Pleasant Build System in Python flect: Python Framework for Full-Stack Web Applications Events Weekly Real Python Office Hours Q&A (Virtual) March 27, 2024
REALPYTHON.COM
March 28, 2024
MEETUP.COM
March 29 to April 2, 2024
PYCAMP.ES
March 29, 2024
MEETUP.COM
March 30, 2024
PYTHON.ORG.BR
April 2 to April 7, 2024
PYCON.LT
April 5 to April 9, 2024
PYCASCADES.COM
Happy Pythoning!
This was PyCoder’s Weekly Issue #622.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
PyPy: Fixing a Bug in PyPy's Incremental GC
Since last summer, I've been looking on and off into a weird and hard to reproduce crash bug in PyPy. It was manifesting only on CI, and it seemed to always happen in the AST rewriting phase of pytest, the symptoms being that PyPy would crash with a segfault. All my attempts to reproduce it locally failed, and my attempts to try to understand the problem by dumping the involved ASTs lead nowhere.
A few weeks ago, we got two more bug reports, the last one by the authors of the nanobind binding generator, with the same symptoms: crash in AST rewriting, only on CI. I decided to make a more serious push to try to find the bug this time. Ultimately the problem turned out to be several bugs in PyPy's garbage collector (GC) that had been there since its inception in 2013. Understanding the situation turned out to be quite involved, additionally complicated by this being the first time that I was working on this particular aspect of PyPy's GC. Since the bug was so much work to find, I thought I'd write a blog post about it.
The blog post consists of three parts: first a chronological description of what I did to find the bug, a technical explanation of what goes wrong, some reflections on the bug (and then a bonus bug I also found in the process).
Finding the BugI started from the failing nanobind CI runs that ended with a segfault of the PyPy interpreter. This was only an intermittent problem, not every run was failing. When I tried to just run the test suite locally, I couldn't get it to fail. Therefore at first I tried to learn more about what was happening by looking on the CI runners.
Running on CII forked the nanobind repo and hacked the CI script in order to get it to use a PyPy build with full debug information and more assertions turned on. In order to increase the probability of seeing the crash I added an otherwise unused matrix variable to the CI script that just contained 32 parameters. This means every build is done 32 times (sorry Github for wasting your CPUs 😕). With that amount of repetition, I got at least one job of every build that was crashing.
Then I added the -Xfaulthandler option to the PyPy command which will use the faulthandler module try to print a Python stacktrace if the VM segfaults to confirm that PyPy was indeed crashing in the AST rewriting phase of pytest, which pytest uses for nicer assertions. I experimented with hacking our faulthandler implementation to also give me a C-level callstack, but that didn't work as well as I hoped.
Then I tried to run gdb on CI to try to get it to print a C callstack at the crash point. You can get gdb to execute commands as if typed at the prompt with the -ex commandline option, I used something like this:
gdb -ex "set confirm off" -ex "set pagination off" -ex \ "set debuginfod enabled off" -ex run -ex where -ex quit \ --args <command> <arguments>But unfortunately the crash never occurred when running in gdb.
Afterwards I tried the next best thing, which was configuring the CI runner to dump a core file and upload it as a build artifact, which worked. Looking at the cores locally only sort of worked, because I am running a different version of Ubuntu than the CI runners. So I used tmate to be able to log into the CI runner after a crash and interactively used gdb there. Unfortunately what I learned from that was that the bug was some kind of memory corruption, which is always incredibly unpleasant to debug. Basically the header word of a Python object had been corrupted somehow at the point of the crash, which means that it's vtable wasn't usable any more.
(Sidenote: PyPy doesn't really use a vtable pointer, instead it uses half a word in the header for the vtable, and the other half for flags that the GC needs to keep track of the state of the object. Corrupting all this is still bad.)
Reproducing LocallyAt that point it was clear that I had to push to reproduce the problem on my laptop, to allow me to work on the problem more directly and not to always have to go via the CI runner. Memory corruption bugs often have a lot of randomness (depending on which part of memory gets modified, things might crash or more likely just happily keep running). Therefore I decided to try to brute-force reproducing the crash by simply running the tests many many times. Since the crash happened in the AST rewriting phase of pytest, and that happens only if no pyc files of the bytecode-compiled rewritten ASTs exist, I made sure to delete them before every test run.
To repeat the test runs I used multitime, which is a simple program that runs a command repeatedly. It's meant for lightweight benchmarking purposes, but it also halts the execution of the command if that command exits with an error (and it sleeps a small random time between runs, which might help with randomizing the situation, maybe). Here's a demo:
(Max pointed out autoclave to me when reviewing this post, which is a more dedicated tool for this job.)
Thankfully, running the tests repeatedly eventually lead to a crash, solving my "only happens on CI" problem. I then tried various variants to exclude possible sources of errors. The first source of errors to exclude in PyPy bugs is the just-in-time compiler, so I reran the tests with --jit off to see whether I could still get it to crash, and thankfully I eventually could (JIT bugs are often very annoying).
Next source of bugs to exclude where C-extensions. Since those were the tests of nanobind, a framework for creating C-extension modules I was a bit worried that the bug might be in our emulation of CPython's C-API. But running PyPy with the -v option (which will print all the imports as they happen) confirmed that at the point of crash no C-extension had been imported yet.
Using rrI still couldn't get the bug to happen in GDB, so the tool I tried next was rr, the "reverse debugger". rr can record the execution of a program and later replay it arbitrarily often. This gives you a time-traveling debugger that allows you to execute the program backwards in addition to forwards. Eventually I managed to get the crash to happen when running the tests with rr record --chaos (--chaos randomizes some decisions that rr takes, to try to increase the chance of reproducing bugs).
Using rr well is quite hard, and I'm not very good at it. The main approach I use with rr to debug memory corruption is to replay the crash, then set a watchpoint for the corrupted memory location, then use the command reverse-continue to find the place in the code that mutated the memory location. reverse-continue is like continue, except that it will execute the program backwards from the current point. Here's a little demo of this:
Doing this for my bug revealed that the object that was being corrupted was erroneously collected by the garbage collector. For some reason the GC had wrongly decided that the object was no longer reachable and therefore put the object into a freelist by writing a pointer to the next entry in the freelist into the first word of the object, overwriting the object's header. The next time the object was used things crashed.
Side-quest: wrong GC assertionsAt this point in the process, I got massively side-tracked. PyPy's GC has a number of debug modes that you can optionally turn on. Those slow down the program execution a lot, but they should in theory help to understand why the GC goes wrong. When I turned them on, I was getting a failing assertion really early in the test execution, complaining about an invariant violation in the GC logic. At first this made me very happy. I thought that this would help me fix the bug more quickly.
Extremely frustratingly, after two days of work I concluded that the assertion logic itself was wrong. I have fixed that in the meantime too, the details of that are in the bonus section at the end of the post.
Using GDB scripting to find the real bugAfter that disaster I went back to the earlier rr recording without GC assertions and tried to understand in more detail why the GC decided to free an object that was still being referenced. To be able to do that I used the GDB Python scripting API to write some helper commands to understand the state of the GC heap (rr is an extension of GDB, so the GDB scripting API works in rr too).
The first (small) helper command I wrote with the GDB scripting API was a way to pretty-print the currently active GC flags of a random PyPy object, starting just from the pointer. The more complex command I wrote was an object tracer, which follows pointers to GC objects starting from a root object to explore the object graph. The object tracer isn't complete, it doesn't deal with all the complexities of PyPy's GC. But it was good enough to help me with my problem, I found out that the corrupted object was stored in an array.
As an example, here's a function that uses the GDB API to walk one of the helper data structures of the GC, a stack of pointers:
def walk_addr_stack(obj): """ walk an instance of the AddressStack class (which is a linked list of arrays of 1019 pointers). the first of the arrays is only partially filled with used_in_last_chunk items, all the other chunks are full.""" if obj.type.code == gdb.TYPE_CODE_PTR: obj = obj.dereference() used_in_last_chunk = lookup(obj, "used_in_last_chunk") chunk = lookup(obj, "inst_chunk").dereference() while 1: items = lookup(chunk, "items") for i in range(used_in_last_chunk): yield items[i] chunk = lookup(chunk, "next") if not chunk: break chunk = chunk.dereference() used_in_last_chunk = 1019The full file of supporting code I wrote can be found in this gist. This is pretty rough throw-away code, however.
In the following recording I show a staged debugging session with some of the extra commands I wrote with the Python API. The details aren't important, I just wanted to give a bit of a flavor of what inspecting objects looks like:
The next step was to understand why the array content wasn't being correctly traced by the GC, which I eventually managed with some conditional breakpoints, more watchpoints, and using reverse-continue. It turned out to be a bug that occurs when the content of one array was memcopied into another array. The technical details of why the array wasn't traced correctly are described in detail in the next section.
Writing a unit testTo try to make sure I really understood the bug correctly I then wrote a GC unit test that shows the problem. Like most of PyPy, our GC is written in RPython, a (somewhat strange) subset/dialect of Python2, which can be compiled to C code. However, since it is also valid Python2 code, it can be unit-tested on top of a Python2 implementation (which is one of the reasons why we keep maintaining PyPy2).
In the GC unit tests you have a lot of control about what order things happen in, e.g. how objects are allocated, when garbage collection phases happen, etc. After some trying I managed to write a test that crashes with the same kind of memory corruption that my original crash exhibited: an object that is still reachable via an array is collected by the GC. To give you a flavor of what this kind of test looks like, here's an (edited for clarity) version of the test I eventually managed to write
def test_incrementality_bug_arraycopy(self): source = self.malloc(VAR, 8) # first array # the stackroots list emulates the C stack self.stackroots.append(source) target = self.malloc(VAR, 8) # second array self.stackroots.append(target) node = self.malloc(S) # unrelated object, will be collected node.x = 5 # store reference into source array, calling the write barrier self.writearray(source, 0, node) val = self.gc.collect_step() source = self.stackroots[0] # reload arrays, they might have moved target = self.stackroots[1] # this GC step traces target val = self.gc.collect_step() # emulate what a memcopy of arrays does res = self.gc.writebarrier_before_copy(source, target, 0, 0, 2) assert res target[0] = source[0] # copy two elements of the arrays target[1] = source[1] # now overwrite the reference to node in source self.writearray(source, 0, lltype.nullptr(S)) # this GC step traces source self.gc.collect_step() # some more collection steps, crucially target isn't traced again # but node is deleted for i in range(3): self.gc.collect_step() # used to crash, node got collected assert target[0].x == 5One of the good properties of testing our GC that way is that all the memory is emulated. The crash in the last line of the test isn't a segfault at all, instead you get a nice exception saying that you tried to access a freed chunk of memory and you can then debug this with a python2 debugger.
Fixing the BugWith the unit test in hand, fixing the test was relatively straightforward (the diff in its simplest form is anyway only a single line change). After this first version of my fix, I talked to Armin Rigo who helped me find different case that was still wrong, in the same area of the code.
I also got help by the developers at PortaOne who are using PyPy on their servers and had seen some mysterious PyPy crashes recently, that looked related to the GC. They did test deployments of my fixes in their various stages to their servers to try to see whether stability improved for them. Unfortunately in the end it turned out that their crashes are an unrelated GC bug related to object pinning, which we haven't resolved yet.
Writing a GC fuzzer/property based testFinding bugs in the GC is always extremely disconcerting, particularly since this one manged to hide for so long (more than ten years!). Therefore I wanted to use these bugs as motivation to try to find more problems in PyPy's GC. Given the ridiculous effectiveness of fuzzing, I used hypothesis to write a property-based test. Every test performs a sequence of randomly chosen steps from the following list:
- allocate an object
- read a random field from a random object
- write a random reference into a random object
- drop a random stack reference
- perform one GC step
- allocate an array
- read a random index from a random array
- write to an array
- memcopy between two arrays
This approach of doing a sequence of steps is pretty close to the stateful testing approach of hypothesis, but I just implemented it manually with the data strategy.
Every one of those steps is always performed on both the tested GC, and on some regular Python objects. The Python objects provide the "ground truth" of what the heap should look like, so we can compare the state of the GC objects with the state of the Python objects to find out whether the GC made a mistake.
In order to check whether the test is actually useful, I reverted my bug fixes and made sure that the test re-finds both the spurious GC assertion error and the problems with memcopying an array.
In addition, the test also found corner cases in my fix. There was a situation that I hadn't accounted for, which the test found after eventually. I also plan on adding a bunch of other GC features as steps in the test to stress them too (for example weakrefs, identity hashes, pinning, maybe finalization).
At the point of publishing this post, the fixes got merged to the 2.7/3.9/3.10 branches of PyPy, and will be part of the next release (v7.3.16).
The technical details of the bugIn order to understand the technical details of the bug, I need to give some background explanations about PyPy's GC.
PyPy's incremental GCPyPy uses an incremental generational mark-sweep GC. It's generational and therefore has minor collections (where only young objects get collected) and major collections (collecting long-lived objects eventually, using a mark-and-sweep algorithm). Young objects are allocated in a nursery using a bump-pointer allocator, which makes allocation quite efficient. They are moved out of the nursery by minor collections. In order to find references from old to young objects the GC uses a write barrier to detect writes into old objects.
The GC is also incremental, which means that its major collections aren't done all at once (which would lead to long pauses). Instead, major collections are sliced up into small steps, which are done directly after a minor collection (the GC isn't concurrent though, which would mean that the GC does work in a separate thread).
The incremental GC uses tri-color marking to reason about the reachable part of the heap during the marking phase, where every old object can be:
- black: already marked, reachable, definitely survives the collection
- grey: will survive, but still needs to be marked
- white: potentially dead
The color of every object is encoded by setting flags in the object header.
The GC maintains the invariant that black objects must never point to white objects. At the start of a major collection cycle the stack roots are turned gray. During the mark phase of a major collection cycle, the GC will trace gray objects, until none are left. To trace a gray object, all the objects it references have to be marked grey if they are white so far. After a grey object is traced, it can be marked black (because all the referenced objects are now either black or gray). Eventually, there are no gray objects left. At that point (because no white object can be reached from a black one) all the white objects are known to be unreachable and can therefore be freed.
The GC is incremental because every collection step will only trace a limited number of gray objects, before giving control back to the program. This leads to a problem: if an already traced (black) object is changed between two marking steps of the GC, the program can mutate that object and write a new reference into one of its fields. This could lead to an invariant violation, if the referenced object is white. Therefore, the GC uses the write barrier (which it needs anyway to find references from old to young objects) to mark all black objects that are modified gray, and then trace them again at one of the later collection steps.
The special write barrier of memcopyArrays use a different kind of write barrier than normal objects. Since they can be arbitrarily large, tracing them can take a long time. Therefore it's potentially wasteful to trace them fully at a minor collection. To fix this, the array write barrier keeps more granular information about which parts of the array have been modified since the last collection step. Then only the modified parts of the array need to be traced, not the whole array.
In addition, there is another optimization for arrays, which is that memcopy is treated specially by the GC. If memcopy is implemented by simply writing a loop that copies the content of one array to the other, that will invoke the write barrier every single loop iteration for the write of every array element, costing a lot of overhead. Here's some pseudo-code:
def arraycopy(source, dest, source_start, dest_start, length): for i in range(length): value = source[source_start + i] dest[dest_start + i] = value # <- write barrier inserted hereTherefore the GC has a special memcopy-specific write barrier that will perform the GC logic once before the memcopy loop, and then use a regular (typically SIMD-optimized) memcopy implementation from libc. Roughly like this:
def arraycopy(source, dest, source_start, dest_start, length): gc_writebarrier_before_array_copy(source, dest, source_start, dest_start, length) raw_memcopy(cast_to_voidp(source) + source_start, cast_to_voidp(dest) + dest_start, sizeof(itemtype(source)) * length)(this is really a rough sketch. The real code is much more complicated.)
The bugThe bugs turned out to be precisely in this memcopy write barrier. When we implemented the current GC, we adapted our previous GC, which was a generational mark-sweep GC but not incremental. We started with most of the previous GC's code, including the write barriers. The regular write barriers were adapted to the new incremental assumptions, in particular the need for the write barrier to also turn black objects back to gray when they are modified during a marking phase. This was simply not done at all for the memcopy write barrier, at least in two of the code paths. Fixing this problem fixes the unit tests and stops the crashes.
ReflectionsThe way the bug was introduced is really typical. A piece of code (the memcopy write barrier) was written under a set of assumptions. Then those assumptions changed later. Not all the code pieces that relied on these assumptions to be correct were updated. It's pretty hard to prevent this in all situations.
I still think we could have done more to prevent the bug occurring. Writing a property-based test for the GC would have been a good idea given the complexity of the GC, and definitely something we did in other parts of our code at the time (just using the random module mostly, we started using hypothesis later).
It's a bit of a mystery to me why this bug managed to be undetected for so long. Memcopy happens in a lot of pretty core operations of e.g. lists in Python (list.extend, to name just one example). To speculate, I would suspect that all the other preconditions for the bug occurring made it pretty rare:
- the content of an old list that is not yet marked needs to be copied into another old list that is marked already
- the source of the copy needs to also store an object that has no other references
- the source of the copy then needs to be overwritten with other data
- then the next collection steps need to be happening at the right points
- ...
Given the complexity of the GC logic I also wonder whether some lightweight formal methods would have been a good idea. Formalizing some of the core invariants in B or TLA+ and then model checking them up to some number of objects would have found this problem pretty quickly. There are also correctness proofs for GC algorithms in some research papers, but I don't have a good overview of the literature to point to any that are particularly good or bad. Going such a more formal route might have fixed this and probably a whole bunch of other bugs, but of course it's a pretty expensive (and tedious) approach.
While it was super annoying to track this down, it was definitely good to learn a bit more about how to use rr and the GDB scripting interface.
Bonus Section: The Wrong AssertionSome more technical information about the wrong assertion is in this section.
Background: pre-built objectsPyPy's VM-building bootstrapping process can "freeze" a bunch of heap objects into the final binary. This allows the VM to start up quickly, because those frozen objects are loaded by the OS as part of the binary.
Those frozen pre-built objects are parts of the 'roots' of the garbage collector and need to be traced. However, tracing all the pre-built objects at every collection would be very expensive, because there are a lot of them (about 150,000 in a PyPy 3.10 binary). Tracing them all is also not necessary, because most of them are never modified. Unmodified pre-built objects can only reference other pre-built objects, which can never be deallocated anyway. Therefore we have an optimization that uses the write barrier (which we need anyway to find old-to-young pointers) to notice when a pre-built object gets modified for the very first time. If that happens, it gets added to the set of pre-built objects that gets counted as a root, and is traced as a root at collections from then on.
The wrong assertionThe assertion that triggered when I turned on the GC debug mode was saying that the GC found a reference from a black to a white object, violating its invariant. Unmodified pre-built objects count as black, and they aren't roots, because they can only ever reference other pre-built objects. However, when a pre-built object gets modified for the first time, it becomes part of the root set and will be marked gray. This logic works fine.
The wrong assertion triggers if a pre-built object is mutated for the very first time in the middle of an incremental marking phase. While the pre-built object gets added to the root set just fine, and will get traced before the marking phase ends, this is encoded slightly differently for pre-built objects, compared to "regular" old objects. Therefore, the invariant checking code wrongly reported a black->white pointer in this situation.
To fix it I also wrote a unit test checking the problem, made sure that the GC hypothesis test also found the bug, and then fixed the wrong assertion to take the color encoding of pre-built objects into account.
The bug managed to be invisible because we don't tend to turn on the GC assertions very often. We only do that when we find a GC bug, which is of course also when we need it the most to be correct.
AcknowledgementsThanks to Matti Picus, Max Bernstein, Wouter van Heyst for giving me feedback on drafts of the post. Thanks to Armin Rigo for reviewing the code and pointing out holes in my thinking. Thanks to the original reporters of the various forms of the bug, including Lily Foote, David Hewitt, Wenzel Jakob.
Emmanuel Kasper: Adding a private / custom Certificate Authority to the firefox trust store
Today at $WORK I needed to add the private company Certificate Authority (CA) to Firefox, and I found the steps were unnecessarily complex. Time to blog about that, and I also made a Debian wiki article of that post, so that future generations can update the information, when Firefox 742 is released on Debian 17.
The cacert certificate authority is not included in Debian and Firefox, and is thus a good example of adding a private CA. Note that this does not mean I specifically endorse that CA.
- Test that SSL connections to a site signed by the private CA is failing
- Download the private CA
- test that a connection works with the private CA
- add the private CA to the Debian trust store located in /etc/ssl/certs/ca-certificates.crt
- verify that we can connect without passing the private CA on the command line
-
At that point most applications are able to connect to systems with a certificate signed by the private CA (curl, Gnome builtin Browser …). However Firefox is using its own trust store and will still display a security error if connecting to https://wiki.cacert.org. To make Firefox trust the Debian trust store, we need to add a so called security device, in fact an extra library wrapping the Debian trust store. The library will wrap the Debian trust store in the PKCS#11 industry format that Firefox supports.
-
install the pkcs#11 wrapping library and command line tools
- verify that the private CA is accessible via PKCS#11
- now we need to add a new security device in Firefox pointing to the pkcs11 trust store. The pkcs11 trust store is located in /usr/lib/x86_64-linux-gnu/pkcs11/p11-kit-trust.so
-
in Firefox (tested in version 115 esr), go to Settings -> Privacy & Security -> Security -> Security Devices.
Then click “Load”, in the popup window use “My local trust” as a module name, and /usr/lib/x86_64-linux-gnu/pkcs11/p11-kit-trust.so as a module filename. After adding the module, you should see it in the list of Security Devices, having /etc/ssl/certs/ca-certificates.crt as a description. -
now restart Firefox and you should be able to browse https://wiki.cacert.org without security errors
Real Python: Finding Python Easter Eggs
In this Code Conversation, you’ll follow a chat between Philipp and Bartosz as they go on an Easter egg hunt. Along the way, you’ll:
- Learn about Easter egg hunt traditions
- Uncover the first Easter egg in software
- Explore Easter eggs in Python
There won’t be many code examples in this Code Conversation, so you can lean back and join Philipp and Bartosz on their Easter egg hunt.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Software Engineering Training in the Age of Generative AI
This is a piece I also wrote for the enioka blog, so there is a French version available.
At enioka Haute Couture we started offering trainings a little while ago. True to our DNA this is focusing on software engineering practice rather than a given tool, framework or API. This is why we have courses on topics like software architecture, refactoring, dealing with legacy code, test driven development (TDD), code review and so on.
Also, not every team has the same needs. Some prefer a few intensive days during a given week, others prefer smaller sessions over an extended period. That’s why our courses are designed to be flexible. They can be tailored to build a multi-day session, or all the way down to many one hour knowledge building sessions.
Hopefully, this all sound great to you (for sure it does sound great to me). So why talk about this now? Is it some kind of advertisement stunt? Well… not really.
You see, while working on our training offers something happened. Almost three years ago GitHub announced GitHub Copilot. It was just a technical preview at the time. Since then, there has been an arms race in the large language model (LLM) domain. Like it or not, generative AI is here to stay and code assistants based on such models are used more and more.
I’m not one of those doom sayers claiming such models and assistants are going to take over our jobs. Likewise, I don’t think they’re going to double the daily productvity of developers. Still, they will necessarily impact how we work and the code which is produced. So keeping an eye on development practices, I’m less concerned about disappearing developer jobs and more concerned about a drop in the quality of the code produced.
Indeed, early studies indicate that code assistants when introduced in an unchecked manner tend to push the code quality down and tend to increase the amount of security issues introduced. Interestingly the main factors highlighted are behaviorial. Which means that before waiting for a magical new assistant which would code perfectly (spoiler: it won’t happen), we should rather improve the way we introduce and use those tools.
Which gets me back to the [enioka Haute Couture trainings]. In this new era, we have to acknowledge coding assistants during our trainings. This perfuses all the topics I mentioned previously. There is now a nagging question for all our software development practices: when is a coding assistant the right tool for the job?
If you’re practicing TDD or trying to improve your use of it, is it a good idea to have the coding assistant write the tests for you? Maybe not… since it is where you make important design decisions, you likely want to stay at the helm. Might come in handy to generate the code which must pass the tests though.
If you’re dealing with a legacy code base which needs to be modernized, for which part of the process the coding assistant will make you faster? Updating the code to a newer version of the language or dependencies? Extracting clearer modules and functions? Writing approval tests to secure all of that?
There are many more such questions… and you can explore the answers with us during one of our training sessions. We’ll keep talking TDD, legacy code… with a twist!
And of course, just like any other tools, what we’re proposing is not specific to a given solution. You use GitHub Copilot? Codeium? A specific in-house fine-tuned model? This is fine. We’ll take this into account during the training to adapt it as much as possible to the involved developers and their context.
If you want to discuss this further, feel free to get in touch with us.
Robin Wilson: My geospatial PDF talk at FOSS4G 2021
This is only about 3 years late – but I gave a talk at FOSS4G 2021 on geospatial PDFs. The full title was:
From static PDFs to interactive, geospatial PDFs, or, ‘I never knew PDFs could do that!’
The video is below:
In the talk I cover what a geospatial PDF is, how to export as a geospatial PDF from QGIS, how to import that PDF again to extract the geospatial data from it, how to create geospatial PDFs using GDAL (including styling vector data) – and then take things to the nth degree by showing a fully interactive geospatial PDF, providing a UI within the PDF file. Some people attending the talk described it as "the best talk of the conference"!
A few relevant resources are below:
Specbee: 7 Most Popular Marketing Automation Drupal Modules - A Marketer's Guide
Python Bytes: #376 Every dunder method in a Python Lockbox
KDE Plasma 6.0.3, Bugfix Release for March
Tuesday, 26 March 2024. Today KDE releases a bugfix update to KDE Plasma 6, versioned 6.0.3.
This release adds two weeks' worth of new translations and fixes from KDE's contributors. The bugfixes are typically small but important and include:
- System Monitor: Colorgrid: Use the same background color as pie/bar charts. Commit. Fixes bug #482664
- Plasma SDK: Fix desktop file name for icon explorer. Commit.
- Startplasma: Use the sound theme setting for startup sound. Commit.