Łukasz Langa: Weekly Report, June 6 - 12

Planet Python - Mon, 2022-06-20 13:35

The week in numbers: 7 closed issues, 1 opened. 47 closed PRs, 4 more reviewed.

Categories: FLOSS Project Planets

John Goerzen: Pipe Issue Likely a Kernel Bug

Planet Debian - Mon, 2022-06-20 12:31

Saturday, I wrote in Pipes, deadlocks, and strace annoyingly fixing them about an issue where a certain pipeline seems to have a deadlock. I described tracing it into kernel code. Indeed, it appears to be kernel bug 212295, which has had a patch for over a year that has never been merged.

After continuing to dig into the issue, I eventually reported it as a bug in ZFS. One of the ZFS people connected this to an older issue my searching hadn’t uncovered.

rincebrain summarized:

I believe, if I understand the bug correctly, it only triggers if you F_SETPIPE_SZ when the writer has put nonzero but not a full unit’s worth in yet, which is why the world isn’t on fire screaming about this – you need to either have a very slow but nonzero or otherwise very strange write pattern to hit it, which is why it doesn’t come up in, say, the CI or most of my testbeds, but my poor little SPARC (440 MHz, 1c1t) and Raspberry Pis were not so fortunate.

You might recall in Saturday’s post that I explained that Filespooler reads a few bytes from the gpg/zstdcat pipeline before spawning and connecting it to zfs receive. I think this is the critical piece of the puzzle; it makes it much more likely to encounter the kernel bug. zfs receive is calls F_SETPIPE_SZ when it starts. Let’s look at how this could be triggered:

In the pre-Filespooler days, the gpg|zstdcat|zfs pipeline was all being set up at once. There would be no data sent to zfs receive until gpg had initialized and begun to decrypt the data, and then zstdcat had begun to decompress it. Those things almost certainly took longer than zfs receive’s initialization, meaning that usually F_SETPIPE_SZ would have been invoked before any data entered the pipe.

After switching to Filespooler, the particular situation here has Filespooler reading somewhere around 100 bytes from the gpg|zstdcat part of the pipeline before ever invoking zfs receive. zstdcat generally emits more than 100 bytes at a time. Therefore, when Filespooler invokes zfs receive and hooks the pipeline up to it, it has a very high chance of there already being data in the pipeline when zfs receive uses F_SETPIPE_SZ. This means that the chances of encountering the conditions that trigger the particular kernel bug are also elevated.

ZFS is integrating a patch to no longer use F_SETPIPE_SZ in zfs receive. I have applied that on my local end to see what happens, and hopefully in a day or two will know for sure if it resolves things.

In the meantime, I hope you enjoyed this little exploration. It resulted in a new bug report to Rust as well as digging up an existing kernel bug. And, interestingly, no bugs in filespooler. Sometimes the thing that changed isn’t the source of the bug!

Categories: FLOSS Project Planets

Mike Herchel's Blog: Pitfalls (and fixes) when lazy-loading images in Drupal

Planet Drupal - Mon, 2022-06-20 12:00
Pitfalls (and fixes) when lazy-loading images in Drupal mherchel Mon, 06/20/2022 - 12:00
Categories: FLOSS Project Planets

OSSummit North America is going to be weird – and I can’t wait

Open Source Initiative - Mon, 2022-06-20 11:35

I’ve stuffed my OSSummit North America 2022 schedule with interesting talks, leaving lots of time for the hallway track. If you’ll be there, let’s catch up.

The post OSSummit North America is going to be weird – and I can’t wait first appeared on Voices of Open Source.

Categories: FLOSS Research

Andy Wingo: blocks and pages and large objects

GNU Planet! - Mon, 2022-06-20 10:59

Good day! In a recent dispatch we talked about the fundamental garbage collection algorithms, also introducing the Immix mark-region collector. Immix mostly leaves objects in place but can move objects if it thinks it would be profitable. But when would it decide that this is a good idea? Are there cases in which it is necessary?

I promised to answer those questions in a followup article, but I didn't say which followup :) Before I get there, I want to talk about paged spaces.

enter the multispace

We mentioned that Immix divides the heap into blocks (32kB or so), and that no object can span multiple blocks. "Large" objects -- defined by Immix to be more than 8kB -- go to a separate "large object space", or "lospace" for short.

Though the implementation of a large object space is relatively simple, I found that it has some points that are quite subtle. Probably the most important of these points relates to heap size. Consider that if you just had one space, implemented using mark-compact maybe, then the procedure to allocate a 16 kB object would go:

  1. Try to bump the allocation pointer by 16kB. Is it still within range? If so we are done.

  2. Otherwise, collect garbage and try again. If after GC there isn't enough space, the allocation fails.

In step (2), collecting garbage could decide to grow or shrink the heap. However when evaluating collector algorithms, you generally want to avoid dynamically-sized heaps.


Here is where I need to make an embarrassing admission. In my role as co-maintainer of the Guile programming language implementation, I have long noodled around with benchmarks, comparing Guile to Chez, Chicken, and other implementations. It's good fun. However, I only realized recently that I had a magic knob that I could turn to win more benchmarks: simply make the heap bigger. Make it start bigger, make it grow faster, whatever it takes. For a program that does its work in some fixed amount of total allocation, a bigger heap will require fewer collections, and therefore generally take less time. (Some amount of collection may be good for performance as it improves locality, but this is a marginal factor.)

Of course I didn't really go wild with this knob but it now makes me doubt all benchmarks I have ever seen: are we really using benchmarks to select for fast implementations, or are we in fact selecting for implementations with cheeky heap size heuristics? Consider even any of the common allocation-heavy JavaScript benchmarks, DeltaBlue or Earley or the like; to win these benchmarks, web browsers are incentivised to have large heaps. In the real world, though, a more parsimonious policy might be more appreciated by users.

Java people have known this for quite some time, and are therefore used to fixing the heap size while running benchmarks. For example, people will measure the minimum amount of memory that can allow a benchmark to run, and then configure the heap to be a constant multiplier of this minimum size. The MMTK garbage collector toolkit can't even grow the heap at all currently: it's an important feature for production garbage collectors, but as they are just now migrating out of the research phase, heap growth (and shrinking) hasn't yet been a priority.


So now consider a garbage collector that has two spaces: an Immix space for allocations of 8kB and below, and a large object space for, well, larger objects. How do you divide the available memory between the two spaces? Could the balance between immix and lospace change at run-time? If you never had large objects, would you be wasting space at all? Conversely is there a strategy that can also work for only large objects?

Perhaps the answer is obvious to you, but it wasn't to me. After much reading of the MMTK source code and pondering, here is what I understand the state of the art to be.

  1. Arrange for your main space -- Immix, mark-sweep, whatever -- to be block-structured, and able to dynamically decomission or recommission blocks, perhaps via MADV_DONTNEED. This works if the blocks are even multiples of the underlying OS page size.

  2. Keep a counter of however many bytes the lospace currently has.

  3. When you go to allocate a large object, increment the lospace byte counter, and then round up to number of blocks to decommission from the main paged space. If this is more than are currently decommissioned, find some empty blocks and decommission them.

  4. If no empty blocks were found, collect, and try again. If the second try doesn't work, then the allocation fails.

  5. Now that the paged space has shrunk, lospace can allocate. You can use the system malloc, but probably better to use mmap, so that if these objects are collected, you can just MADV_DONTNEED them and keep them around for later re-use.

  6. After GC runs, explicitly return the memory for any object in lospace that wasn't visited when the object graph was traversed. Decrement the lospace byte counter and possibly return some empty blocks to the paged space.

There are some interesting aspects about this strategy. One is, the memory that you return to the OS doesn't need to be contiguous. When allocating a 50 MB object, you don't have to find 50 MB of contiguous free space, because any set of blocks that adds up to 50 MB will do.

Another aspect is that this adaptive strategy can work for any ratio of large to non-large objects. The user doesn't have to manually set the sizes of the various spaces.

This strategy does assume that address space is larger than heap size, but only by a factor of 2 (modulo fragmentation for the large object space). Therefore our risk of running afoul of user resource limits and kernel overcommit heuristics is low.

The one underspecified part of this algorithm is... did you see it? "Find some empty blocks". If the main paged space does lazy sweeping -- only scanning a block for holes right before the block will be used for allocation -- then after a collection we don't actually know very much about the heap, and notably, we don't know what blocks are empty. (We could know it, of course, but it would take time; you could traverse the line mark arrays for all blocks while the world is stopped, but this increases pause time. The original Immix collector does this, however.) In the system I've been working on, instead I have it so that if a mutator finds an empty block, it puts it on a separate list, and then takes another block, only allocating into empty blocks once all blocks are swept. If the lospace needs blocks, it sweeps eagerly until it finds enough empty blocks, throwing away any nonempty blocks. This causes the next collection to happen sooner, but that's not a terrible thing; this only occurs when rebalancing lospace versus paged-space size, because if you have a constant allocation rate on the lospace side, you will also have a complementary rate of production of empty blocks by GC, as they are recommissioned when lospace objects are reclaimed.

What if your main paged space has ample space for allocating a large object, but there are no empty blocks, because live objects are equally peppered around all blocks? In that case, often the application would be best served by growing the heap, but maybe not. In any case in a strict-heap-size environment, we need a solution.

But for that... let's pick up another day. Until then, happy hacking!

Categories: FLOSS Project Planets

Peoples BLOG: Usage of PHPCS on Github via Pull Request for Drupal Applications

Planet Drupal - Mon, 2022-06-20 10:30
In this article, we are going to see how some tools & libraries will make people's lives easier during the development & code review process. And to make developer life easier, developers look for tools or libraries which can automated code review and if needed make any corrections in the code automatically. Here comes the PHP codesniffer and Drupal coder module. If you are maintaini
Categories: FLOSS Project Planets

Real Python: How Can You Emulate Do-While Loops in Python?

Planet Python - Mon, 2022-06-20 10:00

If you came to Python from a language like C, C++, Java, or JavaScript, then you may be missing their do-while loop construct. A do-while loop is a common control flow statement that executes its code block at least once, regardless of whether the loop condition is true or false. This behavior relies on the fact that the loop condition is evaluated at the end of each iteration. So, the first iteration always runs.

One of the most common use cases for this type of loop is accepting and processing the user’s input. Consider the following example written in C:

#include <stdio.h> int main() { int number; do { printf("Enter a positive number: "); scanf("%d", &number); printf("%d\n", number); } while (number > 0); return 0; }

This small program runs a do … while loop that asks the user to enter a positive number. The input is then stored in number and printed to the screen. The loop keeps running these operations until the user enters a non-positive number.

If you compile and run this program, then you’ll get the following behavior:

Enter a positive number: 1 1 Enter a positive number: 4 4 Enter a positive number: -1 -1

The loop condition, number > 0, is evaluated at the end of the loop, which guarantees that the loop’s body will run at least once. This characteristic distinguishes do-while loops from regular while loops, which evaluate the loop condition at the beginning. In a while loop, there’s no guarantee of running the loop’s body. If the loop condition is false from the start, then the body won’t run at all.

Note: In this tutorial, you’ll refer to the condition that controls a while or do-while loop as the loop condition. This concept shouldn’t be confused with the loop’s body, which is the code block that’s sandwiched between curly brackets in languages like C or indented in Python.

One reason for having a do-while loop construct is efficiency. For example, if the loop condition implies costly operations and the loop must run n times (n ≥ 1), then the condition will run n times in a do-while loop. In contrast, a regular while loop will run the costly condition n + 1 times.

Python doesn’t have a do-while loop construct. Why? Apparently, the core developers never found a good syntax for this type of loop. Probably, that’s the reason why Guido van Rossum rejected PEP 315, which was an attempt to add do-while loops to the language. Some core developers would prefer to have a do-while loop and are looking to revive the discussion around this topic.

In the meantime, you’ll explore the alternatives available in Python. In short, how can you emulate do-while loops in Python? In this tutorial, you’ll learn how you can create loops with while that behave like do-while loops.

Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

In Short: Use a while Loop and the break Statement

The most common technique to emulate a do-while loop in Python is to use an infinite while loop with a break statement wrapped in an if statement that checks a given condition and breaks the iteration if that condition becomes true:

while True: # Do some processing... # Update the condition... if condition: break Read the full article at https://realpython.com/python-do-while/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Iustin Pop: Experiment: A week of running

Planet Debian - Mon, 2022-06-20 09:17

My sports friends know that I wasn’t able to really run in many, many years, due to a recurring injury that was not fully diagnosed and which, after many sessions with the doctor, ended up with OK-ish state for day-to-day life but also with these words: “Maybe, running is just not for you?”

The year 2012 was my “running year”. I went to a number of races, wrote blog posts, then slowly started running only rarely, then a few years later I was really only running once in a while, and coupled with a number of bad ideas of the type “lets run today after a long break, but a lot”, I started injuring my foot.

Add a few more years, some more kilograms on my body, a one event of jumping with a kid on my shoulders and landing on my bad foot, and the setup was complete.

Doctor visits, therapy, slow improvements, but not really solving the problem. 6 months breaks, small attempts at running, pain again, repeat, pain again, etc. It ended up with me acknowledging that yes, maybe running is not for me, and I should really give it up.

Incidentally, in 2021, as part of me trying to improve my health/diet, I tried some thing that is not important for this post and for the first time in a long time, I was fully, 100%, pain free in my leg during day-to-day activities. Huh, maybe this is not purely related to running? From that point on, my foot became, very slowly, better. I started doing short runs (2-3km), especially on holidays where I can’t bike, and if I was careful, it didn’t go too bad. But I knew I can’t run, so these were rare events.

In April this year, on vacation, I run a couple of times - 20km distance. In May, 12km. Then, there was a Garmin Badge I really wanted, so against my good judgement, I did a run/walk (2:1 ratio) the previous weekend, and to my surprise, no unwanted side-effect. And I got an idea: what if I do short run/walks an entire week? When does my foot “break”?

I mean, by now I knew that a short (3-4, maybe 5km) run that has pauses doesn’t negatively impact my foot. What about the 2nd one? Or the 3rd one? When does it break? Is it distance, or something else?

The other problem was - when to run? I mean, on top of hybrid work model. When working from home, all good, but when working from the office? So the other, somewhat more impossible task for me, was to wake up early and run before 8 AM. Clearly destined to fail!

But, the following day (Monday), I did wake up and 3km. Then Tuesday again, 3.3km (and later, one hour of biking). Wed - 3.3km. Thu - 4.40km, at 4:1 ratio (2m:30s). Friday, 3.7km (4:1), plus a very long for me (112km) bike ride.

By this time, I was physically dead. Not my foot, just my entire body. On Saturday morning, Training Peaks said my form is -52, and it starts warning below -15. I woke up late and groggy, and I had to extra motivate myself to go for the last, 5.3km run, to round up the week.

On Friday and Saturday, my problem leg did start to… how to say, remind me it is problematic? But not like previously, no waking in the morning with a stiff tendon. No, just… not fully happy. And, to my surprise, correlated again with my consumption of problematic food (I was getting hungrier and hungrier, and eating too much of things I should keep an eye on).

At this point, with the week behind me:

  • am ultra-surprised that my foot is not in pieces (yet?)
  • am still pretty tired (form: -48), but I did manage to run again after a day of pause from running (and my foot is still OK-ish).
  • am confused as to what are really my problems…
  • am convinced that I have some way of running a bit, if I take it careful (which is hard!)
  • am really, really hungry; well, not anymore, I ate like a pig for the last two days.
  • beat my all-time Garmin record for “weekly intensity minutes” (1174, damn, 1 more minute and would have been rounder number)…

Did my experiment make me wiser? Not really. Happier? Yes, 100%. I plan to buy some new running clothes, my current ones are really old.

But did I really understand how my body function? A loud no. Sigh.

The next challenge will be, how to manage my time across multiple sports (and work, and family, and other hobbies). Still, knowing that I can anytime go for 25-35 minutes of running, without preparation, is very reassuring.

Freedom, health and injury-free sports to everyone!

Categories: FLOSS Project Planets

Python for Beginners: Read CSV Into a List of Lists in Python

Planet Python - Mon, 2022-06-20 09:00

We often need to process csv files to analyze data related to a business problem. In this article, we will discuss how we can read a csv file into a list of lists in python.

Read CSV Into a List of Lists Using CSV.reader()

Python provides us with the csv module to work with csv files in python. To Access data from a csv file, we often use a reader object created with the help of the csv.reader() method.

After creating a reader object, we can read the csv file into a list of lists. For this, we will first open the csv file using the open() function in the read mode. The open() function takes the filename of the csv file as its first input argument and the literal “r” as its second input argument to denote that the file will be opened in the read-only mode. After execution, the open() method returns a file object that refers to the csv file.

Now, we will pass the file object to the reader() method to create a reader object. The reader object is actually an iterator that contains each row of the csv file as a list. We can access each row of the csv file using a for loop and will read it into a list of lists as follows.

import csv myFile = open('Demo.csv', 'r') reader = csv.reader(myFile) myList = [] for record in reader: myList.append(record) print("The list of lists is:") print(myList)


The list of lists is: [['Roll', 'Name', 'Language'], ['1', 'Aditya', 'Python'], ['2', 'Sam', ' Java'], ['3', ' Chris', ' C++']]

If you want to skip the header of the csv file, you can do it using the next() function. The next() function, when executed on an iterator, returns an element from the iterator and moves the iterator to the next element. Outside the for loop, you can use the next() function once to read the csv file into a list without the header as follows.

import csv myFile = open('Demo.csv', 'r') reader = csv.reader(myFile) print("The header is:") print(next(reader)) myList = [] for record in reader: myList.append(record) print("The list of lists is:") print(myList)


The header is: ['Roll', 'Name', 'Language'] The list of lists is: [['1', 'Aditya', 'Python'], ['2', 'Sam', ' Java'], ['3', ' Chris', ' C++']]

Instead of using the for loop to read the csv from the reader object , you can use the list() constructor. Here, we will pass the reader object to the list() constructor and it will return a list of lists as shown below.

import csv myFile = open('Demo.csv', 'r') reader = csv.reader(myFile) myList = list(reader) print("The list of lists is:") print(myList)


The list of lists is: [['Roll', 'Name', 'Language'], ['1', 'Aditya', 'Python'], ['2', 'Sam', ' Java'], ['3', ' Chris', ' C++']]

Similarly, you can use the list() constructor along with the next() method and the csv.reader() method to read a csv file without its header as follows.

import csv myFile = open('Demo.csv', 'r') reader = csv.reader(myFile) print("The header is:") print(next(reader)) myList = list(reader) print("The list of lists is:") print(myList)


The header is: ['Roll', 'Name', 'Language'] The list of lists is: [['1', 'Aditya', 'Python'], ['2', 'Sam', ' Java'], ['3', ' Chris', ' C++']] Conclusion

In this article, we have discussed different approaches to read a csv file into a list of lists in python. To know more about lists, you can read this article on list comprehension in python. You might also like this article on dictionary comprehension in python.

I hope you enjoyed reading this article.

Happy Learning!

The post Read CSV Into a List of Lists in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Mike Driscoll: PyDev of the Week: Jürgen Gmach

Planet Python - Mon, 2022-06-20 08:30

This week we welcome Jürgen Gmach (@jugmac00) as our PyDev of the Week! Jürgen is a maintainer of the tox automation project. You can see what else Jürgen is up to over on his website. You can also check out Jürgen's code over on GitHub or Launchpad.

Let's spend some time getting to know Jürgen better!

Can you tell us a little about yourself (hobbies, education, etc):

Hi, I'm Jürgen. I am a software engineer at Canonical. I live in southern Germany, just between the beautiful Danube river and the Bavarian Forest.

I have been into computers since my earliest childhood, at first playing computer games on my Commodore C64, later writing simple applications in Basic.

A very passionate teacher at school piqued my interest in economics, so I decided to study that.

A couple of years into my studies, I was more and more sucked into this new thing called the Internet.

I created websites with HTML, most noticeable a quite successful online pool billiards community, which I later ported to PHPNuke for which I had to learn PHP and how to write patches.

At one point I decided I needed to follow my heart, so I started working as a software engineer at a local company.

In my spare time, I love to be outside. Depending on the weather and the season, I love to hike, bike, swim or hunt mushrooms, sometimes alone, but most of the time with my lovely family.

Why did you start using Python?

For my first engineering job, I was hired to work on a major intranet application, based on Python and Zope. So I had to learn Python on the job.

There is a little background story about this tech stack. My back then colleague first tried to create his own application server in Ruby, but his attempts always segfaulted, so at one point he picked up Zope and with that Python.

Since then Python has always been in my life.

I am forever in my colleague's debt.

What other programming languages do you know and which is your favorite?

As already mentioned, I started programming in Basic, I learned Bash and Pascal at University, I created static websites with HTML before CSS and JavaScript were a thing, dynamic websites with Perl, I created smaller and larger websites with PHP, I created command line applications in Python, Rust, Bash and Go, I wrote and maintained a fair bit of JavaScript, I contributed fixes to projects using Java or C, I debugged Lua and Sieve scripts, but I am certainly most familiar with Python, which I also like the most.

What projects are you working on now?

I joined Canonical in October 2021 to work on the Launchpad project, which consists of many pieces, most notably a code hosting system similar to GitHub, and a build farm, where all the fine packages get built for Ubuntu and other systems.

My team is currently building a CI system from scratch, which is a super interesting task. While I contribute to all involved systems, for the most part, I work on the CI runner. And one of the best parts - this is all open-source.

I also spend some of my spare time working on multiple open-source projects.

That would be tox, the task automation tool, the almost 300 projects of the Zope Foundation, the Morepath web framework, Flask-Reuploaded, which I forked and so saved from being unmaintained. I also do many more drive-by contributions.

Which Python libraries are your favorite (core or 3rd party)?

I certainly would not want to maintain the 300 Zope repositories without tox, which offers a standard interface to testing, running linters, and building documentation.

Speaking of linters, I never go without pre-commit and flake8, and some more depending on the project.

When I need to create a command-line application, argparse is my first choice. I especially love its versatility and that it comes with the standard library.

all-repos is a fantastic niche application and library, which I use when I need to update dozens or in the case of Zope, even hundreds of repositories with a single command. I gave a quick introduction at PyConUS.

How did you become involved with the tox project?

Oh, this is a fun one. I even have blogged about it in "Testing the tox 4 Pre-Release at Scale".

The short form:
In order to be able to maintain the 300 Zope projects with only a couple of people, we need uniform interfaces, so we use tox for testing. Just clone it and run `tox` - no setting up a virtual environment, no reading documentation, no fiddling with test paths.

As Bernát Gabor, the core maintainer of tox, announced on Twitter that he plans to release tox 4, which would be a complete rewrite, I thought it would be a good idea to run tox 4 alpha against all 300 projects. For that, I used all-repos to drive tox. I found and reported quite a couple of edge cases, and at one point I tried to fix some of them myself - which, with some help from Bernát, worked out pretty well.

As I enjoyed working with tox so much, I not only contributed code but also answered questions on StackOverflow and triaged new bug reports.

One day, out of the blue, Bernát asked me to become a maintainer - of my favorite open-source project!!! Crazy!

What are the top 3 things you have learned as a maintainer of an open-source package?

You cannot help everybody. Let's take tox as an example. It is not a big project, but with more than 5 million downloads per month, and thousands of users, two things regularly happen. Users ask questions that even I as a maintainer cannot answer, as maybe the original poster uses a different IDE, a different operating system, or some very specific software... but it is ok to not know everything.

Also, with so many users you will be asked to implement things that are only helpful for very few users on the one hand, and on the other hand, will make maintenance harder in the long term. So you need to learn to say "no".

Also, don't be the single maintainer of a project. It is much more fun with some fellow maintainers. You can split the workload, learn how others maintain a project, and most importantly, you can have a break whenever you want. Life happens! I am currently building a house for my family so I cannot spend too much time on my projects - but this is ok!

And finally, and maybe most important. Let loose. When you no longer feel joy in maintaining your project, pass it on to your fellow maintainers, or look for new maintainers. That is ok. It is also ok to declare your project as unmaintained. You do not owe anybody anything, except yourself.

I think the above things are not only valid for open-source projects, but also for work, and possibly also for life in general.

Is there anything else you’d like to say?

If I could give my younger self three tips, these would be:

Take down notes of things you learn, write a developer journal or even a public blog. That way you reinforce what you learn and you can always look it up later on.

Go to conferences!!! You will pick up so many new things, and most importantly, you will meet and get to know so many great people from all over the world.

Shut down your computer. Go outside and have some fresh air and some fun!

Thanks for doing the interview, Jürgen!

The post PyDev of the Week: Jürgen Gmach appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Petter Reinholdtsen: My free software activity of late (2022)

Planet Debian - Mon, 2022-06-20 08:30

I guess it is time to bring some light on the various free software and open culture activities and projects I have worked on or been involved in the last year and a half.

First, lets mention the book releases I managed to publish. The Cory Doctorow book "Hvordan knuse overvåkningskapitalismen" argue that it is not the magic machine learning of the big technology companies that causes the surveillance capitalism to thrive, it is the lack of trust busting to enforce existing anti-monopoly laws. I also published a family of dictionaries for machinists, one sorted on the English words, one sorted on the Norwegian and the last sorted on the North Sámi words. A bit on the back burner but not forgotten is the Debian Administrators Handbook, where a new edition is being worked on. I have not spent as much time as I want to help bring it to completion, but hope I will get more spare time to look at it before the end of the year.

With my Debian had I have spent time on several projects, both updating existing packages, helping to bring in new packages and working with upstream projects to try to get them ready to go into Debian. The list is rather long, and I will only mention my own isenkram, openmotor, vlc bittorrent plugin, xprintidle, norwegian letter style for latex, bs1770gain, and recordmydesktop. In addition to these I have sponsored several packages into Debian, like audmes.

The last year I have looked at several infrastructure projects for collecting meter data and video surveillance recordings. This include several ONVIF related tools like onvifviewer and zoneminder as well as rtl-433, wmbusmeters and rtl-wmbus.

In parallel with this I have looked at fabrication related free software solutions like pycam and LinuxCNC. The latter recently gained improved translation support using po4a and weblate, which was a harder nut to crack that I had anticipated when I started.

Several hours have been spent translating free software to Norwegian Bokmål on the Weblate hosted service. Do not have a complete list, but you will find my contributions in at least gnucash, minetest and po4a.

I also spent quite some time on the Norwegian archiving specification Noark 5, and its companion project Nikita implementing the API specification for Noark 5.

Recently I have been looking into free software tools to do company accounting here in Norway., which present an interesting mix between law, rules, regulations, format specifications and API interfaces.

I guess I should also mention the Norwegian community driven government interfacing projects Mimes Brønn and Fiksgatami, which have ended up in a kind of limbo while the future of the projects is being worked out.

These are just a few of the projects I have been involved it, and would like to give more visibility. I'll stop here to avoid delaying this post.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Categories: FLOSS Project Planets

Jamie McClelland: A very liberal spam assassin rule

Planet Debian - Mon, 2022-06-20 08:27

I just sent myself a test message via Powerbase (a hosted CiviCRM project for community organizers) and it didn’t arrive. Wait, nope, there it is in my junk folder with a spam score of 6!

X-Spam-Status: Yes, score=6.093 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, DMARC_MISSING=0.1, HTML_MESSAGE=0.001, KAM_WEBINAR=3.5, KAM_WEBINAR2=3.5, NO_DNS_FOR_FROM=0.001, SPF_HELO_NONE=0.001, ST_KGM_DEALS_SUB_11=1.1, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=no autolearn_force=no

What just happened?

A careful look at the scores suggest that the KAM_WEBINAR and KAM_WEBINAR2 rules killed me. I’ve never heard of them (this email came through a system I’m not administering). So, I did some searching and found a page with the rules:

# SEMINARS AND WORKSHOPS SPAM header __KAM_WEBINAR1 From =~ /education|career|manage|learning|webinar|project|efolder/i header __KAM_WEBINAR2 Subject =~ /last chance|increase productivity|workplace morale|payroll dept|trauma.training|case.study|issues|follow.up|service.desk|vip.(lunch|breakfast)|manage.your|private.business|professional.checklist|customers.safer|great.timesaver|prep.course|crash.course|hunger.to.learn|(keys|tips).(to|for).smarter/i header __KAM_WEBINAR3 Subject =~ /webinar|strateg|seminar|owners.meeting|webcast|our.\d.new|sales.video/i body __KAM_WEBINAR4 /executive.education|contactid|register now|\d+.minute webinar|management.position|supervising.skills|discover.tips|register.early|take.control|marketing.capabilit|drive.more.sales|leveraging.cloud|solution.provider|have.a.handle|plan.to.divest|being.informed|upcoming.webinar|spearfishing.email|increase.revenue|industry.podcast|\d+.in.depth.tips|early.bird.offer|pmp.certified|lunch.briefing/i meta KAM_WEBINAR (__KAM_WEBINAR1 + __KAM_WEBINAR2 + __KAM_WEBINAR3 + __KAM_WEBINAR4 >= 3) describe KAM_WEBINAR Spam for webinars score KAM_WEBINAR 3.5 meta KAM_WEBINAR2 (__KAM_WEBINAR1 + __KAM_WEBINAR2 + __KAM_WEBINAR3 + __KAM_WEBINAR4 >= 4) describe KAM_WEBINAR2 Spam for webinars score KAM_WEBINAR2 3.5

For those of you who don’t care to parse those regular expressions, here’s a summary:

  • There are four tests. If you fail 3 or more, you get 3.5 points, if you fail 4 you get another 3.5 points (my email failed all 4).
  • Here is how I failed them:
    • The from address can’t have a bunch of words, including “project.” My from address includes my organization’s name: The Progressive Technology Project.
    • The subject line cannot include a number of strings, including “last chance.” My subject line was “Last change to register for our webinar.”
    • The subject line cannot include a number of other strings, including “webinar” (and also webcast and even strategy). My subject line was “Last chance to register for our webinar.”
    • The body of the message cannot include a bunch of strings, including “register now.” Well, you won’t be suprised to know that my email contained the string “Register now.”

Hm. I’m glad I can now fix our email, but this doesn’t work so well for people with a name that includes “project” that like to organize webinars for which you have to register.

Categories: FLOSS Project Planets

KDE e.V. votes 2022Q2

Planet KDE - Sun, 2022-06-19 21:30

KDE e.V. makes it known that two votes took place in 2022Q2 (April-June 2022): A change to the rules of online voting, and accepting the FLA 2.0.

  • Change rules for online voting, expanding section 2.1 to allow for more phrases (email subject lines) to start a vote. Associated merge request !43.
  • Allow the use of the Fiduciary Licensing Agreement 2.0 (for individual or entity use). Associated merge request !5.
Categories: FLOSS Project Planets

Python⇒Speed: Why new Macs break your Docker build, and how to fix it

Planet Python - Sun, 2022-06-19 20:00

One of the promises of Docker is reproducibility: you can build an image on a different machine, and assuming you’ve done the appropriate setup, get the same result. So it can be a little confusing when you try to build your Python-based Dockerfile on a new Mac, and everything starts failing. What used to work before—on an older Mac, or on a Linux machine—fails in completely unexpected ways.

The problem is that the promise of reproducibility relies on certain invariants that don’t apply on newer Macs. The symptoms can be non-obvious, though, so in this article we’ll cover:

  • Common symptoms of the problem when using Python.
  • The cause of the problem: a different CPU instruction set.
  • Solving the problem by ensuring the code is installable or compilable.
  • Solving problem with CPU emulation, some of the downsides of this solution, and future improvements to look forward to.
  • A takeaway for maintainers of open source Python packages.
Categories: FLOSS Project Planets

Zero-with-Dot (Oleg Żero): Convenient scheduler in python

Planet Python - Sun, 2022-06-19 18:00

Python has become an all-purpose language. It is especially commonly used in analytics and solving algorithmic problems within data science but is also popular in web development. This combination makes it a reasonable choice for various extract-transform-load (ETL) tasks.

However, many of these tasks are rather small and don’t require large frameworks such as Airflow or Luigi. When polling one or several web pages for data, a simple python script plus crontab is more than sufficient. Still, when a project gets a little bigger, managing multiple jobs using cron may become cumbersome. At the same time, bare installation of Airflow for “small jobs” needs at least 4GB RAM and 2 CPUs (here). Thinking about AWS costs, it is at least a t2.small instance running at all times.

Is there anything in between? Small enough to use, say t2.nano (very cheap) and fairly “maintainable” and “extendable”?

In this post, I would like to share with you a simple approach that uses python’s schedule package with a few modifications.

Python scheduler

Python schedule library offers simple task scheduling. It is installable using pip, and fairly easy to use. Unfortunately, the documentation doesn’t provide examples of using it within a larger project:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import schedule import time def job(): print("I'm working...") # Run job every 3 second/minute/hour/day/week, # Starting 3 second/minute/hour/day/week from now schedule.every(3).seconds.do(job) schedule.every(3).minutes.do(job) schedule.every(3).hours.do(job) schedule.every(3).days.do(job) schedule.every(3).weeks.do(job) # Run job every minute at the 23rd second schedule.every().minute.at(":23").do(job) # Run job every hour at the 42rd minute schedule.every().hour.at(":42").do(job) # Run jobs every 5th hour, 20 minutes and 30 seconds in. # If current time is 02:00, first execution is at 06:20:30 schedule.every(5).hours.at("20:30").do(job) # Run job every day at specific HH:MM and next HH:MM:SS schedule.every().day.at("10:30").do(job) schedule.every().day.at("10:30:42").do(job) # Run job on a specific day of the week schedule.every().monday.do(job) schedule.every().wednesday.at("13:15").do(job) schedule.every().minute.at(":17").do(job) while True: schedule.run_pending() time.sleep(1)

As you can see, all functions are called at the level of the module, which is OK for placing it in a script. However, if you have several different jobs, the code quickly becomes cluttered, especially if different callables require different parameters.

In other words, it may preferable to take advantage of the object-oriented approach and define some “architecture” around it.

Using it in a project

Let’s say, for the sake of an argument, that we have a set of dedicated ETL tasks, modeled using the following abstract class:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from abc import ABC, abstractmethod from typing import Any, Dict, TypeVar E = TypeVar("ETL") class BaseETL(ABC): def __init__(self, **kwargs: Dict) -> None: self.raw_data = None self.transformed_data = None @abstractmethod def extract(self, **kwargs: Dict) -> E: ... @abstractmethod def transform(self, **kwargs: Dict) -> E: ... @abstractmethod def load(self, **kwargs: Dict) -> Any: ... def run(self, **kwargs: Dict) -> None: self.extract(**kwargs).transform(**kwargs).load(**kwargs)

Any class that would implement an ETL process would inherit from this base class. The extract method could, for example, fetch a website. Then transform would transform the raw HTML into a format acceptable by a database. Finally, the load would save the data to the database. All methods, executed in this order can be wrapped using the run method.

Now, after the ETL classes are defined, we would like to schedule each of them through the schedule module in a nice fashion.

Two example ETL tasks

For brevity, in the following examples, let’s skip the inheritance and only focus on the run method. Assume, that their extract, transform and load methods are implemented elsewhere.

etl.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class DummyETL: # normally DummyETL(BaseETL) def __init__(self, init_param: int) -> None: # super().__init__() # - not needed here self.init_param = init_param def run(self, p1: int, p2: int) -> None: print(f"{self.__class__.__name__}({self.init_param}, p1={p1}, p2={p1})") class EvenDummierETL: # same... def __init__(self, init_param: int) -> None: # super().__init__() # - same self.init_param = init_param def run(self, p1: int) -> None: print(f"{self.__class__.__name__}({self.init_param}, p1={p1})")

The constructors’ parameters can, for instance, specify the URLs of the pages for scraping. The run methods’ parameters, for a change, can be used to pass secrets.

Now, that we have the ETL classes defined, let’s create a separate registry to associate the processes with some sort of schedule.

registry.py 1 2 3 4 5 6 7 8 9 10 11 12 13 import schedule from etl import DummyETL, EvenDummierETL def get_registry(): dummy_etl = DummyETL(init_param=13) dummier_etl = EvenDummierETL(init_param=15) return [ (dummy_etl, schedule.every(1).seconds), (dummier_etl, schedule.every(1).minutes.at(":05")), ]

The get_registry function is a place to define the schedule. Although the parameters’ values are hard-coded, you can think of a situation where the function loads them from a config file. Either way, it returns a list of tuples that matches the ETL objects with Jobs (from schedule). Note that this is our convention. The jobs are not yet associated with any particular Scheduler (again, from schedule). However, the convention allows us to do so in any other part of the project. We don’t have to bind them with the module-level object, as shown in the documentation example.

Our scheduler-based scheduler

Finally, let’s create a new class that will activate the whole mechanism.

scheduler.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import time from typing import Dict, List, Tuple, TypeVar from schedule import Job, Scheduler from etl import DummyETL, EvenDummierETL from etl import E # we could do so from e.g. etl.base S = TypeVar("Scheduler") class TaskScheduler: def __init__(self, registry: List[Tuple[E, Job]]) -> None: self.scheduler = Scheduler() self.registry = [] for task, job in registry: self.registry.append(task) self.scheduler.jobs.append(job) def register(self, run_params: Dict) -> S: jobs = self.scheduler.get_jobs() for task, job in zip(self.registry, jobs): params = run_params.get(task.__class__.__name__) job.do(task.run, **params) return self def run(self, polling_seconds: int) -> None: while True: time.sleep(polling_seconds) self.scheduler.run_pending()

Our TaskScheduler uses composition to create a single Scheduler instance and add previously registered jobs to it. Although not enforced, we use typing to give a strong hint on what should be provided to the constructor to properly register the jobs. Then, the register method is a separate method that provides the binding. Last, but not least, run activates the machinery.

A script that uses this implementation would look like this:

run.py 1 2 3 4 5 6 7 8 9 10 11 12 13 from registry import get_registry from scheduler import TaskScheduler if __name__ == "__main__": run_params = { "DummyETL": dict(p1=1, p2=2), # e.g. from environmental variables "EvenDummierETL": dict(p1=3), } registry = get_registry() # e.g. from script's args or config file task_scheduler = TaskScheduler(registry).register(run_params) task_scheduler.run()

Probably the weakest point of this solution is the convention that uses the __class__.__name__ as keys in the run_params dictionary. However, considering the simplicity of the approach, it may be OK, especially if these parameters would be defined at runtime. There are many alternatives, one of which could be creating an additional abstraction layer with e.g. objects like DummyTask that would serve as a bridge between ETL objects and the registry.

Another approach to TaskScheduler

Coming back to the TaskScheduler, we can also define it through inheritance as opposed to composition (as before). That would mean expanding the functionality of the schedule’s native Scheduler class. In this case, the TaskScheduler would be the following:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 class TaskScheduler(Scheduler): # <- here def __init__(self, registry: List[Tuple[E, Job]]) -> None: super().__init__() # <- here self.registry = [] for task, job in registry: self.registry.append(task) self.jobs.append(job) # <- here def register(self, run_params: Dict) -> S: jobs = self.get_jobs() # <- here for task, job in zip(self.registry, jobs): params = run_params.get(task.__class__.__name__) job.do(task.run, **params) return self def run(self, polling_seconds: int) -> None: while True: time.sleep(polling_seconds) self.run_pending() # <- and here

You decide which way is better if any ;).


In this brief article, we have shown how the simple schedule module can be expanded to create a small ETL working machine. Most importantly, the approach allows to better organize the code within a small project without having to fetch the big cannons.

Categories: FLOSS Project Planets

"Morphex's Blogologue": Some more easy sunday accounting hacks

Planet Python - Sun, 2022-06-19 16:53
I added some code to the ravencoin-taxman tool today:


which enables the calculation of earnings in a local currency. So if you're mining for example, it is possible to get the entire earnings in a year, into the local currency, which should be sufficient for accounting and tax purposes for that year.

The commit also included a bit of refactoring, of I guess you can call it standard debug stuff, to lessen the amount of code necessary to write later. Support for CSV files using ; as a separator was also added, the Norwegian bank uses that as a separator in its datafiles, as , is the decimal separator in the Norwegian locale.

I also added a test, to illustrate the command line and how to operate it:


Categories: FLOSS Project Planets

#! code: Drupal 9: Removing Base64 Encoded Files From Content

Planet Drupal - Sun, 2022-06-19 15:07

Occasionally, I have come across Drupal sites that have base64 encoded images embedded into content fields. This is the approach of taking the binary data contained in a file and converting it into a string of characters. The original binary data can then be re-created using this string and the data is understood by lots of different technologies (including web browsers).

Whilst this is technically possible, it massively balloons the size of the database and can often slow down page load times due to the database being slow to respond to the request. Instead of fetching a few kilobytes of data from the table the database is forced to fetch many megabytes of data, which can create a bottleneck for other requests.

When you download a file from the web your browser can make a decision on whether to fetch that file a second time. By injecting files into the content you are forcing your users to download very large pages every time they want to request a page. It isn't possible for the browser to make that decision any more and that can lead to more slowdown for the user.

If you can't tell, I really dislike this method of image storage. Whilst it is technically possible, it creates more problems than it solves and even sites with a couple of thousand nodes can have databases of many gigabytes in size due to this issue. It can also put unnecessary strain on the database due to the increased time taken to return data.

Let's say that when you embed an image into some copy on a Drupal site using the normal media or file embed features. You might see an image element that looks like this.

In certain situations it is possible to embed images directly into content. The image element would look something like this.

Read more.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: #38: Faster Feedback Systems

Planet Debian - Sun, 2022-06-19 11:46

Engineers build systems. Good engineers always stress and focus efficiency of these systems.

Two recent examples of engineering thinking follow. One was in a video / podcast interview with Martin Thompson (who is a noted high-performance code expert) I came across recently. The overall focus of the hour-long interview is on ‘managing software complexity’. Around minute twenty-two, the conversation turns to feedback loops and systems, and a strong preference for simple and fast systems for more immediate feedback. An important topic indeed.

The second example connects to this and permeates many tweets and other writings by Erik Bernhardsson. He had an earlier 2017 post on ‘Optimizing for iteration speed’, as well as a 17 May 2022 tweet on minimizing feedback loop size, another 28 Mar 2022 tweet reply on shorter feedback loops, then a 14 Feb 2022 post on problems with slow feedback loops, as well as a 13 Jan 2022 post on a priority for tighter feedback loops, and lastly a 23 Jul 2021 post on fast feedback cycles. You get the idea: Erik really digs faster feedback loops. Nobody likes to wait: immediatecy wins each time.

A few years ago, I had touched on this topic with two posts on how to make (R) package compilation (and hence installation) faster. One idea (which I still use whenever I must compile) was in post #11 on caching compilation. Another idea was in post #13: make it faster by not doing it, in this case via binary installation which skip the need for compilation (and which is what I aim for with, say, CI dependencies). Several subsequent posts can be found by scrolling down the r^4 blog section: we stressed the use of the amazing Rutter PPA ‘c2d4u’ for CRAN binaries (often via Rocker containers, the (post #28) promise of RSPM, and the (post #29) awesomeness of bspm. And then in the more recent post #34 from last December we got back to a topic which ties all these things together: Dependencies. We quoted Mies van der Rohe: Less is more. Especially when it comes to dependencies as these elongate the feedback loop and thereby delay feedback.

Our most recent post #37 on r2u connects these dots. Access to a complete set of CRAN binaries with full-dependency resolution accelerates use and installation. This of course also covers testing and continuous integration. Why wait minutes to recompile the same packages over and over when you can install the full Tidyverse in 18 seconds or the brms package and all it needs in 13 seconds as shown in the two gifs also on the r2u documentation site.

You can even power up the example setup of the second gif via this gitpod link giving you a full Ubuntu 22.04 session in your browser to try this: so go forth and install something from CRAN with ease! The benefit of a system such our r2u CRAN binaries is clear: faster feedback loops. This holds whether you work with few or many dependencies, tiny or tidy. Faster matters, and feedback can be had sooner.

And with the title of this post we now get a rallying cry to advocate for faster feedback systems: “FFS”.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Python Software Foundation: The PSF Board Election is Open!

Planet Python - Sun, 2022-06-19 04:04

It’s time to cast your vote! Voting takes place from Monday, June 20 AoE, through Friday, June 30, 2022 AoE. Check here to see how much time you have left to vote. If you are a voting member of the PSF, you will get an email from “Helios Voting Bot <no-reply@mail.heliosvoting.org>” with your ballot, subject line will read “Vote: Python Software Foundation Board of Directors Election 2022”. If you haven’t seen your ballot by Tuesday, please 1) check your spam folder for a message from “no-reply@mail.heliosvoting.org” and if you don’t see anything 2) get in touch by emailing psf-elections@python.org so we can make sure we have the most up to date email for you.

This might be the largest number of nominees we’ve ever had! Make sure you schedule some time to look at all their statements. We’re overwhelmed by how many of you are willing to contribute to the Python community by serving on the PSF board.

Who can vote? You need to be a Contributing, Managing, Supporting, or Fellow member as of June 15, 2022 to vote in this election. Read more about our membership types here or if you have questions about your membership status please email psf-elections@python.org

#toc, .toc, .mw-warning { border: 1px solid #aaa; background-color: #f9f9f9; padding: 5px; font-size: 95%; }#toc h2, .toc h2 { display: inline; border: none; padding: 0; font-size: 100%; font-weight: bold; }#toc #toctitle, .toc #toctitle, #toc .toctitle, .toc .toctitle { text-align: center; }#toc ul, .toc ul { list-style-type: none; list-style-image: none; margin-left: 0; padding-left: 0; text-align: left; }#toc ul ul, .toc ul ul { margin: 0 0 0 2em; }#toc .toctoggle, .toc .toctoggle { font-size: 94%; }p, h1, h2, h3, li { }body{ padding-top : 1in; padding-bottom : 1in; padding-left : 1in; padding-right : 1in; }

Categories: FLOSS Project Planets