Feeds

Armin Ronacher: Playground Wisdom: Threads Beat Async/Await

Planet Python - Sun, 2024-11-17 19:00

It's been a few years since I wrote about my challenges with async/await-based systems and how they just seem to not support back pressure well. A few years later, I do not think that this problem has subsided much, but my thinking and understanding have perhaps evolved a bit. I'm now convinced that async/await is, in fact, a bad abstraction for most languages, and we should be aiming for something better instead and that I believe to be thread.

In this post, I'm also going to rehash many arguments from very clever people that came before me. Nothing here is new, I just hope to bring it to a new group of readers. In particular, you should really consider these who highly influential pieces:

Your Child Loves Actor Frameworks

As programmers, we are so used to how things work that we make some implicit assumptions that really cloud our ability to think freely. Let me present you with a piece of code that demonstrates this:

def move_mouse(): while mouse.x < 200: mouse.x += 5 sleep(10) def move_cat(): while cat.x < 200: cat.x += 10 sleep(10) move_mouse() move_cat()

Read that code and then answer this question: do the mouse and cat move at the same time, or one after another? I guarantee you that 10 out of 10 programmers will correctly state that they move one after another. It makes sense because we know Python and the concept of threads, scheduling and whatnot. But if you speak to a group of children familiar with Scratch, they are likely to conclude that mouse and cat move simultaneously.

The reason is that if you are exposed to programming via Scratch you are exposed to a primitive form of actor programming. The cat and the mouse are both actors. In fact, the UI makes this pretty damn clear, just that the actors are called “sprites”. You attach logic to a sprite on the screen and all these pieces of logic run at the same time. Mind-blowing. You can even send messages from sprite to sprite.

The reason I want you to think about this for a moment is that I think this is rather profound. Scratch is a very, very simple system and it's intended to teaching programming to young kids. Yet the model it promotes is an actor system! If you were to foray into programming via a traditional book on Python, C# or some other language, it's quite likely that you will only learn about threads at the very end. Not just that, it will likely make it sound really complex and scary. Worse, you will probably only learn about actor patterns in some advanced book that will bombard you with all the complexities of large scale applications.

There is something else though you should keep in mind: Scratch will not talk about threads, it will not talk about monads, it will not talk about async/await, it will not talk about schedulers. As far as you are concerned as a programmer, it's an imperative (though colorful and visual) language with some basic “syntax” support for message passing. Concurrency comes natural. A child can program it. It's not something to be afraid of.

Imperative Programming Is Not Inferior

The second thing I want you to take away is that imperative languages are not inferior to functional ones.

While probably most of us are using imperative programming languages to solve problems, I think we all have been exposed to the notion that it's inferior and not particularly pure. There is this world of functional programming, with monads and other things. This world have these nice things involving composition, logic and maths and fancy looking theorems. If you program in that, you're almost transcending to a higher plane and looking down to the folks who are stitching together if statements, for loops, make side effects everywhere, and are doing highly inappropriate things with IO.

Okay, maybe it's not quite as bad, but I don't think I'm completely wrong with those vibes. And look, I get it. I feel happy chaining together lambdas in Rust and JavaScript. But we should also be aware that these constructs are, in many languages, bolted on. Go, for instance, gets away without most of this, and that does not make it an inferior language!

So what you should keep in mind here is that there are different paradigms, and mentally you should try to stop thinking for a moment that functional programming has all its stuff figured out, and imperative programming does not.

Instead, I want to talk about how functional languages and imperative languages are dealing with “waiting”.

The first thing I want to back to is the example from above. Both of the functions (for the cat and the mouse) can be seen as separate threads of execution. When the code calls sleep(10) there's clearly an expectation by the programmer that the computer will temporarily pause the execution and continue later. I don't want to bore you with monads, so as my “functional” programming language, I will use JavaScript and promises. I think that's an abstraction that most readers will be sufficiently familiar with:

function moveMouseBlocking() { while (mouse.x < 200) { mouse.x += 5; sleep(10); // a blocking sleep } } function moveMouseAsync() { return new Promise((resolve) => { function iterate() { if (mouse.x < 200) { mouse.x += 5; sleep(10).then(iterate); // non blocking sleep } else { resolve(); } } iterate(); }); }

You can immediately see a challenge here: it's very hard to translate the blocking example into a non blocking example because all the sudden we need to find a way to express our loop (or really any control flow). We need to manually decompose it into a form of recursive function calling and we need the help of a scheduler and executor here to do the waiting.

This style obviously eventually became annoying enough to deal with that async/await was introduced to mostly restore the sanity of the old code. So it now can look more like this:

async function moveMouseAsync() { while (mouse.x < 200) { mouse.x += 5; await sleep(10); } }

Behind the scenes though, nothing has really changed, and in particular, when you call that function, you just get an object that encompasses the “composition of the computation”. That object is a promise which will eventually hold the resulting value. In fact, in some languages like C#, the compiler will really just transpile this into chained function calls. With the promise in hand, you can await the result, or register a callback with then which gets invoked if this thing ever runs to completion.

For a programmer, I think async/await is clearly understood as some sort of neat abstraction — an abstraction over promises and callbacks. However strictly speaking, it's just worse than where we started out, because in terms of expressiveness, we have lost an important affordance: we cannot freely suspend.

In the original blocking code, when we invoked sleep we suspended for 10 milliseconds implicitly; we cannot do the same with the async call. Here we have to “await” the sleep operation. This is the crucial aspect of why we're having these “colored functions”. Only an async function can call another async function, as you cannot await in a sync function.

Halting Problems

The above example shows another problem that async/await causes: what if we never resolve? A normal function call eventually returns, the stack unwinds, and we're ready to receive the result. In an async world, someone has to call resolve at the very end. What if that is never called? Now in theory, that does not seem all that different from someone calling sleep() with a large number to suspend for a very long time, or waiting on a pipe that never gets data sent into. But it is different! In one case, we keep the call stack and everything that relates to it alive; in another case, we just have a promise and are waiting for independent garbage collection with everything already unwound.

Contract wise, there is absolutely nothing that says one has to call resolve. As we know from theory the halting problem is undecidable so it's going to be actually impossible to know if someone will call resolve or not.

That sounds pedantic, but it's very important because promises/futures and async/await are making something strictly worse than not having them. Let's consider a JavaScript promise to be the most canonical example of what this looks like. A promise is created by an anonymous function, that is invoked to eventually call resolve. Take this example:

let neverSettle = new Promise((resolve) => { // this function ends, but we never called resolve });

Let me clarify first that this is not a JavaScript specific problem, but it's nice to show it this way. This is a completely legal thing! It's a promise, that never resolves. That is not a bug! The anonymous function in the promise itself will return, the stack will unwind, and we are left with a “pending” promise that will eventually get garbage collected. That is a bit of a problem because since it will never resolve, you can also never await it.

Think of the following example, which demonstrates this problem a bit. In practice you might want to reduce how many things can work at once, so let's imagine a system that can handle up to 10 things that run concurrently. So we might want to use a semaphore to give out 10 tokens so up to 10 things can run at once; otherwise, it applies back pressure. So the code looks like this:

const semaphore = new Semaphore(10); async function execute(f) { let token = await semaphore.acquire(); try { await f(); } finally { await semaphore.release(token); } }

But now we have a problem. What if the function passed to the execute function returns neverSettle? Well, clearly we will never release the semaphore token. This is strictly worse compared to blocking functions! The closest equivalent would be a stupid function that calls a very long running sleep. But it's different! In one case, we keep the call stack and everything that relates to it alive; in the other case case we just have a promise that will eventually get garbage collected, and we will never see it again. In the promise case, we have effectively decided that the stack is not useful.

There are ways to fix this, like making promise finalization available so we can get informed if a promise gets garbage collected etc. However I want to point out that as per contract, what this promise is doing is completely acceptable and we have just caused a new problem, one that we did not have before.

And if you think Python does not have that problem, it does too. Just await Future() and you will be waiting until the heat death of the universe (or really when you shut down your interpreter).

The promise that sits there unresolved has no call stack. But that problem also comes back in other ways, even if you use it correctly. The decomposed functions calling functions via the scheduler flow means that now you need extra affordances to stitch these async calls together into full call stacks. This all creates extra problems that did not exist before. Call stacks are really, really important. They help with debugging and are also crucial for profiling.

Blocking is an Abstraction

Okay, so we know there is at least some challenge with the promise model. What other abstractions are there? I will make the argument that a function being able to “suspend” a thread of execution is a bloody great capability and abstraction. Think of it for a moment: no matter where I am, I can say I need to wait for something and continue later where I left off. This is particularly crucial to apply back-pressure if you decide to need it later. The biggest footgun in Python asyncio remains that write is non blocking. That function will stay problematic forever and you need to follow up with await s.drain() to avoid buffer bloat.

In particular it's an important abstraction because in the real world we have constantly faced with things in fact not being async all the time, and some of the things we think might not block, will in fact block. Just like Python did not think that write should be able to block when it was designed. I want to give you a colorful example of this. Why is the following code blocking, and what is?

def decode_object(idx): header = indexes[idx] object_buf = buffer[header.start:header.start + header.size] return brotli.decompress(object_buf)

It's a bit of a trick question, but not really. The reason it's blocking is because memory access can be blocking! You might not think of it this way, but there are many reasons why just touching a memory region can take time. The most obvious one is memory-mapped files. If you're touching a page that hasn't been loaded yet, the operating system will have to shovel it into memory before returning back to you. There is no “await touching this memory” expression, because if there were, we would have to await everywhere. That might sound petty but blocking memory reads were at the source of a series of incidents at Sentry [1].

The trade-off that async/await makes today is that the idea is that not everything needs to block or needs to suspend. The reality, however, has shown me that many more things really want to suspend, and if a random memory access is a case for suspending, then is the abstraction worth anything?

So maybe to allow any function call block and suspend really was the right abstraction to begin with.

But then we need to talk about spawning threads next, because a single thread is not worth much. The one affordance that async/await system gives you that you don't have otherwise, is actually telling two things to run concurrently. You get that by starting the async operation and deferring the awaiting to later. This is where I will have to concede that async/await has something going for it. It moves the reality of concurrent execution right into the language. The reason concurrency comes so natural to a Scratch programmer is that it's right there, so async/await solves a very similar purpose here.

In a traditional imperative language based on threads, the act of spawning a thread is usually hidden behind a (often convoluted) standard library function. More annoyingly threads very much feel bolted on and completely inadequate to even to the most basic of operations. Because not only do we want to spawn threads, we want to join on them, we want to send values across thread boundaries (including errors!). We want to wait for either a task to be done, or a keyboard input, messages being passed etc.

Classic Threading

So lets focus on threads for a second. As said before, what we are looking for is the ability for any function to yield / suspend. That's what threads allow us to do!

When I am talking about “threads” here, I'm not necessarily referring to a specific kind of implementation of threads. Think of the example of promises from above for a moment: we had the concept of “sleeping”, but we did not really say how that is implemented. There is clearly some underlying scheduler that can enable that, but how that takes places is outside the scope of the language. Threads can be like that. They could be real OS threads, they could be virtual and be implemented with fibers or coroutines. At the end of the day, we don't necessarily have to care about it as developer if the language gets it right.

The reason this matters is that when I talk about “suspending” or “continuing somewhere else,” immediately the thought of coroutines and fibers come to mind. That's because many languages that support them give you those capabilities. But it's good to step back for a second and just think about general affordances that we want, and not how they are implemented.

We need a way to say: run this concurrently, but don't wait for it to return, we want to wait later (or never!). Basically, the equivalent in some languages to call an async function, but to not await. In other words: to schedule a function call. And that is, in essence, just what spawning a thread is. If we think about Scratch: one of the reasons concurrency comes natural there is because it's really well integrated, and a core affordance of the language. There is a real programming language that works very much the same: go with its goroutines. There is syntax for it!

So now we can spawn, and that thing runs. But now we have more problems to solve: synchronization, waiting, message passing and all that jazz are not solved. Even Scratch has answers to that! So clearly there is something else missing to make this work. And what even does that spawn call return?

A Detour: What is Async Even

There is an irony in async/await and that irony is that it exists in multiple languages, it looks completely the same on the surface, but works completely different under the hood. Not only that, the origin stories of async/await in different languages are not even the same.

I mentioned earlier that code that can arbitrary block is an abstraction of sorts. That abstraction for many applications really only makes sense is if the CPU time while you're blocking can be used in other useful ways. On the one hand, because the computer would be pretty bored if it was only doing things in sequence, on the other hand, because we might need things to run in parallel. At times as programmers we need to do two things to make progress simultaneously before we can continue. Enter creating more threads. But if threads are so great, why all that talking about coroutines and promises that underpins so much of async/await in different languages?

I think this is the point where the story actually becomes confusing quickly. For instance JavaScript has entirely different challenges than Python, C# or Rust. Yet somehow all those languages ended up with a form of async/await.

Let's start with JavaScript. JavaScript is a single threaded language where a function scope cannot yield. There is no affordance in the language to do that and threads do not exist. So before async/await, the best you could do is different forms of callback hell. The first iteration of improving that experience was adding promises. async/await only became sugar for that afterward. The reason that JavaScript did not have much choice here is that promises was the only thing that could be accomplished without language changes, and async/await is something that can be implemented as a transpilation step. So really; there are no threads in JavaScript. But here is an interesting thing that happens: JavaScript on the language level has the concept of concurrency. If you call setTimeout, you tell the runtime to schedule a function to be called later. This is crucial! In particular it also means that a promise created, will be scheduled automatically. Even if you forget about it, it will run!

Python on the other hand had a completely different origin story. In the days before async/await, Python already had threads — real, operating system level threads. What it did not have however was the ability for multiple of those threads to run in parallel. The reason for this obviously the GIL (Global Interpreter Lock). However that “just” makes things not to scale to more than one core, so let's ignore that for a second. Because it had threads, it also rather early had people experiment with implementing virtual threads in Python. Back in the day (and to some extend today) the cost of an OS level thread was pretty high, so virtual threads were seen as a fast way to spawn more of these concurrent things. There were two ways in which Python got virtual threads. One was the Stackless Python project, which was an alternative implementation of Python (many patches for cpython rather) that implemented what's called a “stackless VM” (basically a VM that does not maintain a C stack). In short, what that enabled is implementing something that stackless called “tasklets” which were functions that could be suspended and resumed. Stackless did not have a bright future because the stackless nature meant that you could not have interleaving Python -> C -> Python calls and suspend with them on the stack.

There was a second attempt in Python called “greenlet”. The way greenlet worked was implementing coroutines in a custom extension module. It is pretty gnarly in its implementation, but it does allow for cooperative multi tasking. However, like stackless, that did not win out. Instead, what actually happened is that the generator system that Python had for years was gradually upgraded into a coroutine system with syntax support, and the async system was built on top of that.

One of the consequences of this is that it requires syntax support to suspend from a coroutine. This meant that you cannot implement a function like sleep that, when called, yields to a scheduler. You need to await it (or in earlier times you could use yield from). So we ended up with async/await because of how coroutines work in Python under the hood. The motivation for this was that it was seen as a positive thing that you know when something suspends.

One interesting consequence of the Python coroutine model is that at least on the coroutine model it can transcend OS level threads. I could make a coroutine on one thread, ship it off to another, and continue it there. In practice, that does not work because once hooked up with the IO system, it cannot travel to another event loop on anther thread any more. But you can already see that fundamentally it does something quite different to JavaScript. It can travel between threads at least in theory; there are threads; there is syntax to yield. A coroutine in Python will also start out with not running, unlike in JavaScript where it's effectively always scheduled. This is also in parts because the scheduler in python can be swapped out, and there are competing and incompatible implementations.

Lastly let's talk about C#. Here the origin story is once again entirely different. C# has real threads. Not only does it have real threads, it also has per-object locks and absolutely no problems with dealing with multiple threads running in parallel. But that does not mean that it does not have other issues. The reality is that threads alone are just not enough. You need to synchronize and talk between threads quite often and sometimes you just need to wait. For instance you need to wait for user input. You still want to do something, while you're stuck there processing that input. So over time .NET introduced “tasks” which are an abstraction over async operations. They are part of the .NET threading system and the way you interact with them is that you write your code in there, you can suspend from tasks with syntax. .NET will run the task on the current thread, and if you do some blocking you stay blocked. This is in that sense, quite different from JavaScript where while no new “thread” is created, you pend the execution in the scheduler. The reason it works this way in .NET is that some of the motivation of this system was to allow UI triggered code to access the main UI thread without blocking it. But the consequence again is, that if you block for real, you just screwed something up. That however is also why at least at one point what C# did was just to splice functions into chained closures whenever it hit an await. It just decomposes one logical piece of code into many separate functions.

I really don't want to go into Rust, but Rust's async system is probably the weirdest of them all because it's polling-based. In short: unless you actively “wait” for a task to complete, it will not make progress. So the purpose of a scheduler there is to make sure that a task actually can make progress. Why did rust end up with async/await? Primarily because they wanted something that works without a runtime and a scheduler and the limitations of the borrow checker and memory model.

Of all those languages, I think the argument for async/await is the strongest for Rust and JavaScript. Rust because it's a systems language and they wanted a design that works with a limited runtime. JavaScript to me also makes sense because the language does not have real threads, so the only alternative to async/await is callbacks. But for C# the argument seems much weaker. Even the problem of having to force code to run on the UI thread could be just used by having a scheduling policy for virtual threads. The worst offender here in my mind is Python. async/await has ended up with a really complex system where the language now has coroutines and real threads, different synchronization primitives for each and async tasks that end up being pinned to one OS thread. The language even has different futures in the standard library for threads and async tasks!

The reason I wanted you to understand all this is that all these different languages share the same syntax, yet what you can do with it is completely different. What they all have in common is that async functions can only be called by async functions (or the scheduler).

What Async Isn't

Over the years I heard a lot of arguments about why for instance Python ended up with async/await and some of the arguments presented don't hold up to scrutiny from my perspective. One argument that I have heard repeatedly is that if you control when you suspend, you don't need to deal with locking or synchronization. While there is some truth to that (you don't randomly suspend), you still end up with having to lock. There is still concurrency so you need to still protect all your stuff. In Python in particular this is particularly frustrating because not only do you have colored functions, you also have colored locks. There are locks for threads and there are locks for async code, and they are different.

There is a very good reason why I showed the example above of the semaphore: semaphores are real in async programming. They are very often needed to protect a system from taking on too much work. In fact, one of the core challenges that many async/await-based programs suffer from is bloating buffers because there is an inability to exert back pressure (I once again point you to my post on that). Why can they not? Because unless an API is async, it is forced to buffer or fail. What it cannot do, is block.

Async also does not magically solve the issues with GIL in Python. It does not magically make real threads appear in JavaScript, it does not solve issues when random code starts blocking (and remember, even memory access can block). Or you very slowly calculate a large Fibonacci number.

Threads are the Answer, Not Coroutines

I already alluded to this above a few times, but when we think about being able to “suspend” from an arbitrary point in time, we often immediately think of coroutines as a programmers. For good reasons: coroutines are amazing, they are fun, and every programming language should have them!

Coroutines are an important building block, and if any future language designer is looking at this post: you should put them in.

But coroutines should be very lightweight, and they can be abused in ways that make it very hard to follow what's going on. Lua, for instance, gives you coroutines, but it does not give you the necessary structure to do something with them easily. You will end up building your own scheduler, your own threading system, etc.

So what we really want is where we started out with: threads! Good old threads!

The irony in all of this is, that the language that I think actually go this right is modern Java. Project Loom in Java has coroutines and all the bells and whistles under the hood, but what it exposes to the developer is good old threads. There are virtual threads, which are mounted on carrier OS threads, and these virtual threads can travel from thread to thread. If you end up issuing a blocking call on a virtual thread, it yields to the scheduler.

Now I happen to think that threads alone are not good enough! Threads require synchronization, they require communication primitives etc. Scratch has message passing! So there is more that needs to be built to make them work well.

I want to follow up on an another blog post about what is needed to make threads easier to work with. Because what async/await clearly innovated is bringing some of these core capabilities closer to the user of the language, and often modern async/await code looks easier to read than traditional code using threads is.

Structured Concurrency and Channels

Lastly I do want to say something nice about async/await and celebrate the innovations that it has brought up. I believe that this language feature singlehandedly drove some crucial innovation about concurrent programming by making it widely accessible. In particular it moved many developers from a basic “single thread per request” model to breaking down tasks into smaller chunks, even in languages like Python. For me, the biggest innovation here goes to Trio, which introduced the concept of structured concurrency via its nursery. That concept has eventually found a home even in asyncio with the concept of the TaskGroup API and is finding its way into Java.

I recommend you to read Nathaniel J. Smith's Notes on structured concurrency, or: Go statement considered harmful for a much better introduction. However if you are unfamiliar with it, here is my attempt of explaining it:

  • There is a clear start and end of work: every thread or task has a clear beginning and end, which makes it easier to follow what each thread is doing. All threads spawned in the context of a thread, are known to that thread. Think of it like creating a small team to work on a task: they start together, finish together, and then report back.
  • Threads don't outlive their parent: if for whatever reason the parent is done before the children threads, it automatically awaits before returning.
  • Error propagate and cause cancellations: If something goes wrong in one thread, the error is passed back to the parent. But more importantly, it also automatically causes other child threads to cancel. Cancellations are a core of the system!

I believe that structured concurrrency needs to become a thing in a threaded world. Threads must know their parents and children. Threads also need fo find convenient ways to ways to pass their success values back. Lastly context should flow from thread to thread implicity through context locals.

The second part is that async/await made it much more apparent that tasks / threads need to talk with each other. In particular the concept of channels and selecting on channels became more prevalent. This is an essential building block which I think can be further improved upon. As food for thought: if you have structured concurrency, in principle each thread's return value really can be represented as a buffered channel attached to the thread, holding up to a single value (successful return value or error) that you can select on.

Today, although no language has perfected this model, thanks to many years of experimentation, the solution seems clearer than ever, with structured concurrency at its core.

Conclusion

I hope I was able to demonstrate to you that async/await has been a mixed bag. It brought some relief from callback hell, but it also saddled us with new issues like colored functions, new back-pressure challenges, and introduced new problems all entirely such as promises that can just sit around forever without resolving. It has also taken away a lot of utility that call stacks brought, in particular for debugging and profiling. These aren't minor hiccups; they're real obstacles that get in the way of the straightforward, intuitive concurrency we should be aiming for.

If we take a step back, it seems pretty clear to me that we have veered off course by adopting async/await in languages that have real threads. Innovations like Java's Project Loom feel like the right fit here. Virtual threads can yield when they need to, switch contexts when blocked, and even work with message-passing systems that make concurrency feel natural. If we free ourselves from the idea that the functional, promise system has figured out all the problems we can look at threads properly again.

However at the same time async/await has moved concurrent programming to the forefront and has resulted in real innovation. Making concurrency a core feature of the language (via syntax even!) is a good thing. Maybe the increased adoption and people struggling with it, was what made structured concurrency a real thing in the Python async/await world.

Future language design should rethink concurrency once more: Instead of adopting async/await, new languages should model themselves more like Java's Project Loom but with more user friendly primitives. But like Scratch, it should give programmers really good APIs that make concurrency natural. I don't think actor frameworks are the right fit, but a combination of structured concurrency, channels, syntax support for spawning/joining/selecting will go a long way. Watch this space for a future blog post about some things I found to work better than others.

[1]Sentry works with large debug information files such as PDB or DWARF. These files can be gigabytes in size and we memory map terabytes of preprocessed files into memory during processing. Memory mapped files can block is hardly a surprise, but what we learned in the process is that thanks to containerization and memory limits, you can easily navigate yourself into a situation where you spend much more time on page faults than you expected and the system crawls to a halt.
Categories: FLOSS Project Planets

Django Weblog: 2025 DSF Board Election Results

Planet Python - Sun, 2024-11-17 18:56

The 2025 DSF Board Election has closed, and the following candidates have been elected:

  • Abigail Gbadago
  • Jeff Triplett
  • Paolo Melchiorre
  • Tom Carrick

They will all serve two years for their term.

Directors elected for the 2024 DSF Board, Jacob, Sarah, and Thibaud are continuing with one year left to serve on the board.

Therefore, the combined 2025 DSF Board of Directors are:

  • Jacob Kaplan-Moss
  • Sarah Abderemane
  • Thibaud Colas
  • Abigail Gbadago*
  • Jeff Triplett*
  • Paolo Melchiorre*
  • Tom Carrick*

  • Elected to a two (2) year term

Congratulations to our winners, and a huge thank you to our departing board members Çağıl Uluşahin Sonmez, Chaim Kirby, Kátia Yoshime Nakamura, Katie McLaughlin.

Thank you again to everyone who nominated themselves. Even if you were not successful, you gave our community the chance to make their voices heard in who they wanted to represent them.

Categories: FLOSS Project Planets

Go Deh: There's the easy way...

Planet Python - Sun, 2024-11-17 18:44

 

Best seen on a larger than landscape phone

Someone blogged about a particular problem:

From: https://theweeklychallenge.org/blog/perl-weekly-challenge-294/#TASK1
Given an unsorted array of integers, `ints`Write a script to return the length of the longest consecutive elements sequence.Return -1 if none found. *The algorithm must run in O(n) time.*

The solution they blogged used a sort which meant it could not be O(n) in time, but the problem looked good so I gave it some thought.

Sets! sets are O(1) in Python and are good for looking things up.

What if when looking at the inputted numbers, one at a time, you also looked for other ints in the input that would extend the int you have to form a longer  range?  Keep tab of the longest range so far and if you remove ints from the pool as they form ranges, when the pool is empty, you should know the longest range.

I added the printout of the longest range too.

My codedef consec_seq(ints) -> tuple[int, int, int]:    "Extract longest_seq_length, its_min, its_max"    pool = set(ints)    longest, longest_mn, longest_mx = 0, 1, 0    while pool:        this = start = pool.pop()        ln = 1        # check down        while (this:=(this - 1)) in pool:            ln += 1            pool.remove(this)        mn = this + 1        # check up        this = start        while (this:=(this + 1)) in pool:            ln += 1            pool.remove(this)        mx = this - 1        # check longest        if ln > longest:            longest, longest_mn, longest_mx = ln, mn, mx
    return longest,longest_mn,longest_mx
def _test():    for ints in[(),            (69,),            (-20, 78, 79, 1, 100),            (10, 4, 20, 1, 3, 2),            (0, 6, 1, 8, 5, 2, 4, 3, 0, 7),            (10, 30, 20),            (2,4,3,1,0, 10,12,11,8,9),  # two runs of five            (10,12,11,8,9, 2,4,3,1,0),  # two runs of five - reversed            (2,4,3,1,0,-1, 10,12,11,8,9),  # runs of 6 and 5            (2,4,3,1,0, 10,12,11,8,9,7),   # runs of 5 and 6            ]:        print(f"Input {ints = }")        longest, longest_mn, longest_mx = consec_seq(ints)
        if longest <2:            print("  -1")        else:            print(f"  The/A longest sequence has {longest} elements {longest_mn}..{longest_mx}")

# %%if __name__ == '__main__':    _test()

Sample outputInput ints = ()  -1Input ints = (69,)  -1Input ints = (-20, 78, 79, 1, 100)  The/A longest sequence has 2 elements 78..79Input ints = (10, 4, 20, 1, 3, 2)  The/A longest sequence has 4 elements 1..4Input ints = (0, 6, 1, 8, 5, 2, 4, 3, 0, 7)  The/A longest sequence has 9 elements 0..8Input ints = (10, 30, 20)  -1Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9)  The/A longest sequence has 5 elements 0..4Input ints = (10, 12, 11, 8, 9, 2, 4, 3, 1, 0)  The/A longest sequence has 5 elements 0..4Input ints = (2, 4, 3, 1, 0, -1, 10, 12, 11, 8, 9)  The/A longest sequence has 6 elements -1..4Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9, 7)  The/A longest sequence has 6 elements 7..12Another Algorithm

What if, you kept and extended ranges untill you amassed all ranges then chose the longest? I need to keep the hash lookup. dict key lookup should also be O(1).  What to look up? Look up ints that would extend a range!

If you have an existing (integer) range, say 1..3 inclusive of end points then finding 0 would extend the range to 0..3 or finding one more than the range maximum, 4 would extend the original range to 1..4 

So if you have ranges then they could be extended by finding rangemin - 1 or rangemax +1. I call then extends

If you do find that the next int from the input ints is also an extends value then you need to find the range that it extends, (by lookup), so you can modify that range. - use a dict to map extends to their range and checking if an int is in the extends dict keys should also take O(1) time.

I took that sketch of an algorithm and started to code. It took two evenings to finally get something that worked and I had to work out several details that were trying. The main problem was what about coalescing ranges? if you have ranges 1..2 and 4..5 what happens when you see a 3? the resultant is the single range 1..5. It took particular test cases and extensive debugging to work out that the extends2range mapping should map to potentially more than one range and that you need to combine ranges if two of them are present for any extend value being hit.


So for 1..2 the extends being looked for are 0 and 3. For 4..5 the extends being looked for are 3, again, and 6. The extends2ranges data structure for just this should look like:

{0: [[1, 2]], 3: [[1, 2], [4, 5]], 6: [[4, 5]]}

 The Code #2from collections import defaultdict

def combine_ranges(min1, max1, min2, max2):    "Combine two overlapping ranges return the new range as [min, max], and a set of limits unused in the result"    assert (min1 <= max1 and min2 <= max2          # Well formed            and ( min1 <= max2 and min2 <= max1 )) # and ranges touch or overlap    range_limits = set([min1, max1, min2, max2])    new_mnmx = [min(range_limits), max(range_limits)]    unused_limits = range_limits - set(new_mnmx)
    return new_mnmx, unused_limits
def consec_seq2(ints) -> tuple[int, int, int]:    "Extract longest_seq_length, its_min, its_max"    if not ints:        return -1, 1, -1    seen = set()  # numbers seen so far    extends2ranges = defaultdict(list)  # map extends to its ranges    for this in ints:        if this in seen:            continue        else:            seen.add(this)
        if this not in extends2ranges:            # Start new range            mnmx = [this, this]    # Range of one int            # add in the extend points            extends2ranges[this + 1].append(mnmx)            extends2ranges[this - 1].append(mnmx)        else:            # Extend an existing range            ranges = extends2ranges[this]  # The range(s) that could be extended by this            if len(ranges) == 2:                # this joins the two ranges                extend_and_join_ranges(extends2ranges, this, ranges)            else:                # extend one range, copied                extend_and_join_ranges(extends2ranges, this, [ranges[0], ranges[0].copy()])
    all_ranges = sum(extends2ranges.values(), start=[])    longest_mn, longest_mx = max(all_ranges, key=lambda mnmx: mnmx[1] - mnmx[0])
    return (longest_mx - longest_mn + 1), longest_mn, longest_mx
def extend_and_join_ranges(extends2ranges, this, ranges):    mnmx, mnmx2 = ranges    mnmx_orig, mnmx2_orig = mnmx.copy(), mnmx2.copy() # keep copy of originals    mn, mx = mnmx    mn2, mx2 = mnmx2    if this == mn - 1:        mnmx[0] = mn = this  # Extend lower limit of the range    if this == mn2 - 1:        mnmx2[0] = mn2 = this  # Extend lower limit of the range    if this == mx + 1:        mnmx[1] = mx = this  # Extend upper limit of the range    if this == mx2 + 1:        mnmx2[1] = mx2 = this  # Extend lower limit of the range    new_mnmx, _unused_limits = combine_ranges(mn, mx, mn2, mx2)
    remove_merged_from_extends(extends2ranges, this, mnmx, mnmx2)    add_combined_range_to_extends(extends2ranges, new_mnmx)

def add_combined_range_to_extends(extends2ranges, new_mnmx):    "Add in the combined of two ranges's extends"    new_mn, new_mx = new_mnmx    for extend in (new_mn - 1, new_mx + 1):        r = extends2ranges[extend]  # ranges at new limit extension        if new_mnmx not in r:            r.append(new_mnmx)
def remove_merged_from_extends(extends2ranges, this, mnmx, mnmx2):    "Remove original ranges that were merged from extends"    for lohi in (mnmx, mnmx2):        lo, hi = lohi        for extend in (lo - 1, hi + 1):            if extend in extends2ranges:                r = extends2ranges[extend]                for r_old in (mnmx, mnmx2):                    if r_old in r:                        r.remove(r_old)                if not r:                    del extends2ranges[extend]    # remove joining extend, this    del extends2ranges[this]

def _test():    for ints in[            (),            (69,),            (-20, 78, 79, 1, 100),            (4, 1, 3, 2),            (10, 4, 20, 1, 3, 2),            (0, 6, 1, 8, 5, 2, 4, 3, 0, 7),            (10, 30, 20),            (2,4,3,1,0, 10,12,11,8,9),  # two runs of five            (10,12,11,8,9, 2,4,3,1,0),  # two runs of five - reversed            (2,4,3,1,0,-1, 10,12,11,8,9),  # runs of 6 and 5            (2,4,3,1,0, 10,12,11,8,9,7),   # runs of 5 and 6            ]:        print(f"Input {ints = }")        longest, longest_mn, longest_mx = consec_seq2(ints)
        if longest <2:            print("  -1")        else:            print(f"  The/A longest sequence has {longest} elements {longest_mn}..{longest_mx}")

# %%if __name__ == '__main__':    _test()


Its Output
 Input ints = ()  -1Input ints = (69,)  -1Input ints = (-20, 78, 79, 1, 100)  The/A longest sequence has 2 elements 78..79Input ints = (4, 1, 3, 2)  The/A longest sequence has 4 elements 1..4Input ints = (10, 4, 20, 1, 3, 2)  The/A longest sequence has 4 elements 1..4Input ints = (0, 6, 1, 8, 5, 2, 4, 3, 0, 7)  The/A longest sequence has 9 elements 0..8Input ints = (10, 30, 20)  -1Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9)  The/A longest sequence has 5 elements 0..4Input ints = (10, 12, 11, 8, 9, 2, 4, 3, 1, 0)  The/A longest sequence has 5 elements 8..12Input ints = (2, 4, 3, 1, 0, -1, 10, 12, 11, 8, 9)  The/A longest sequence has 6 elements -1..4Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9, 7)  The/A longest sequence has 6 elements 7..12


This second algorithm gives correct results but is harder to develop and explain. It's a testament to my stubbornness as I thought their was a solution there, and debugging was me flexing my skills to keep them honed.


END.


Categories: FLOSS Project Planets

Paolo Melchiorre: Thoughts on my election as a DSF board member

Planet Python - Sun, 2024-11-17 18:00

My thoughts on my election as a member of the Django Software Foundation (DSF) board of directors.

Categories: FLOSS Project Planets

Real Python: Using the Python zip() Function for Parallel Iteration

Planet Python - Sun, 2024-11-17 09:00

Python’s zip() function combines elements from multiple iterables. Calling zip() generates an iterator that yields tuples, each containing elements from the input iterables. This function is essential for tasks like parallel iteration and dictionary creation, offering an efficient way to handle multiple sequences in Python programming.

By the end of this tutorial, you’ll understand that:

  • zip() in Python aggregates elements from multiple iterables into tuples, facilitating parallel iteration.
  • dict(zip()) creates dictionaries by pairing keys and values from two sequences.
  • zip() is lazy in Python, meaning it returns an iterator instead of a list.
  • There’s no unzip() function in Python, but the same zip() function can reverse the process using the unpacking operator *.
  • Alternatives to zip() include itertools.zip_longest() for handling iterables of unequal lengths.

In this tutorial, you’ll explore how to use zip() for parallel iteration. You’ll also learn how to handle iterables of unequal lengths and discover the convenience of using zip() with dictionaries. Whether you’re working with lists, tuples, or other data structures, understanding zip() will enhance your coding skills and streamline your Python projects.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.

Understanding the Python zip() Function

zip() is available in the built-in namespace. If you use dir() to inspect __builtins__, then you’ll see zip() at the end of the list:

Python >>> dir(__builtins__) ['ArithmeticError', 'AssertionError', 'AttributeError', ..., 'zip'] Copied!

You can see that 'zip' is the last entry in the list of available objects.

According to the official documentation, Python’s zip() function behaves as follows:

Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator. (Source)

You’ll unpack this definition throughout the rest of the tutorial. As you work through the code examples, you’ll see that Python zip operations work just like the physical zipper on a bag or pair of jeans. Interlocking pairs of teeth on both sides of the zipper are pulled together to close an opening. In fact, this visual analogy is perfect for understanding zip(), since the function was named after physical zippers!

Using zip() in Python

The signature of Python’s zip() function is zip(*iterables, strict=False). You’ll learn more about strict later. The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.

Passing n Arguments

If you use zip() with n arguments, then the function will return an iterator that generates tuples of length n. To see this in action, take a look at the following code block:

Python >>> numbers = [1, 2, 3] >>> letters = ["a", "b", "c"] >>> zipped = zip(numbers, letters) >>> zipped # Holds an iterator object <zip object at 0x7fa4831153c8> >>> type(zipped) <class 'zip'> >>> list(zipped) [(1, 'a'), (2, 'b'), (3, 'c')] Copied!

Here, you use zip(numbers, letters) to create an iterator that produces tuples of the form (x, y). In this case, the x values are taken from numbers and the y values are taken from letters. Notice how the Python zip() function returns an iterator. To retrieve the final list object, you need to use list() to consume the iterator.

If you’re working with sequences like lists, tuples, or strings, then your iterables are guaranteed to be evaluated from left to right. This means that the resulting list of tuples will take the form [(numbers[0], letters[0]), (numbers[1], letters[1]),..., (numbers[n], letters[n])]. However, for other types of iterables (like sets), you might see some weird results:

Python >>> s1 = {2, 3, 1} >>> s2 = {"b", "a", "c"} >>> list(zip(s1, s2)) [(1, 'a'), (2, 'c'), (3, 'b')] Copied!

In this example, s1 and s2 are set objects, which don’t keep their elements in any particular order. This means that the tuples returned by zip() will have elements that are paired up randomly. If you’re going to use the Python zip() function with unordered iterables like sets, then this is something to keep in mind.

Passing No Arguments

You can call zip() with no arguments as well. In this case, you’ll simply get an empty iterator:

Python >>> zipped = zip() >>> zipped <zip object at 0x7f196294a488> >>> list(zipped) [] Copied!

Here, you call zip() with no arguments, so your zipped variable holds an empty iterator. If you consume the iterator with list(), then you’ll see an empty list as well.

Read the full article at https://realpython.com/python-zip-function/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Russ Allbery: Review: Dark Deeds

Planet Debian - Sun, 2024-11-17 00:55

Review: Dark Deeds, by Michelle Diener

Series: Class 5 #2 Publisher: Eclipse Copyright: January 2016 ISBN: 0-6454658-4-4 Format: Kindle Pages: 340

Dark Deeds is the second book of the self-published Class 5 science fiction romance series. It is a sequel to Dark Horse and will spoil the plot of that book, but it follows the romance series convention of switching to a new protagonist in the same universe and telling a loosely-connected story.

Fiona, like Rose in the previous book, was kidnapped by the Tecran in one of their Class 5 ships, although that's not entirely obvious at the start of the story. The book opens with her working as a slave on a Garmman trading ship while its captain works up the nerve to have her killed. She's spared this fate when the ship is raided by Krik pirates. Some brave fast-talking, and a touch of honor among thieves, lets her survive the raid and be rescued by a pursuing Grih battleship, with a useful electronic gadget as a bonus.

The author uses the nickname "Fee" for Fiona throughout this book and it was like nails on a chalkboard every time. I had to complain about that before getting into the review.

If you've read Dark Horse, you know the formula: lone kidnapped human woman, major violations of the laws against mistreatment of sentient beings that have the Grih furious on her behalf, hunky Grih starship captain who looks like a space elf, all the Grih are fascinated by her musical voice, she makes friends with a secret AI... Diener found a formula that worked well enough that she tried it again, and it would not surprise me if the formula repeated through the series. You should not go into this book expecting to be surprised.

That said, the formula did work the first time, and it largely does work again. I thoroughly enjoyed Dark Horse and wanted more, and this is more, delivered on cue. There are worse things, particularly if you're a Kindle Unlimited reader (I am not) and are therefore getting new installments for free. The Tecran fascination with kidnapping human women is explained sufficiently in Fiona's case, but I am mildly curious how Diener will keep justifying it through the rest of the series. (Maybe the formula will change, but I doubt it.)

To give Diener credit, this is not a straight repeat of the first book. Fiona is similar to Rose but not identical; Rose had an unshakable ethical calm, and Fiona is more of a scrapper. The Grih are not stupid and, given the amount of chaos Rose unleashed in the previous book, treat the sudden appearance of another human woman with a great deal more caution and suspicion. Unfortunately, this also means far less of my favorite plot element of the first book: the Grih being constantly scandalized and furious at behavior the protagonist finds sadly unsurprising.

Instead, this book has quite a bit more action. Dark Horse was mostly character interactions and tense negotiations, with most of the action saved for the end. Dark Deeds replaces a lot of the character work with political plots and infiltrating secret military bases and enemy ships. The AI (named Eazi this time) doesn't show up until well into the book and isn't as much of a presence as Sazo. Instead, there's a lot more of Fiona being drafted into other people's fights, which is entertaining enough while it's happening but which wasn't as delightful or memorable as Rose's story.

The writing continues to be serviceable but not great. It's a bit cliched and a bit awkward.

Also, Diener uses paragraph breaks for emphasis.

It's hard to stop noticing it once you see it.

Thankfully, once the story gets going and there's more dialogue, she tones that down, or perhaps I stopped noticing. It's that kind of book (and that kind of series): it's a bit rough to get started, but then there's always something happening, the characters involve a whole lot of wish-fulfillment but are still people I like reading about, and it's the sort of unapologetic "good guys win" type of light science fiction that is just the thing when one simply wants to be entertained. Once I get into the book, it's easy to overlook its shortcomings.

I spent Dark Horse knowing roughly what would happen but wondering about the details. I spent Dark Deeds fairly sure of the details and wondering when they would happen. This wasn't as fun of an experience, but the details were still enjoyable and I don't regret reading it. I am hoping that the next book will be more of a twist, or will have a character more like Rose (or at least a character with a better nickname). Sort of recommended if you liked Dark Horse and really want more of the same.

Followed by Dark Minds, which I have already purchased.

Rating: 6 out of 10

Categories: FLOSS Project Planets

Test and Code: 223: Writing Stuff Down is a Super Power

Planet Python - Sat, 2024-11-16 20:55

Taking notes well can help to listen better, remember things, show respect, be more accountable, free up mind space to solve problems.

This episode discusses

  • the benefits of writing things down
  • preparing for a meeting
  • taking notes in meetings
  • reviewing notes for action items, todo items, things to follow up on, etc.
  • taking notes to allow for better focus
  • writing well structured emails
  • writing blog posts and books

 Learn pytest

<p>Taking notes well can help to listen better, remember things, show respect, be more accountable, free up mind space to solve problems.</p><p>This episode discusses</p><ul><li>the benefits of writing things down</li><li>preparing for a meeting</li><li>taking notes in meetings</li><li>reviewing notes for action items, todo items, things to follow up on, etc.</li><li>taking notes to allow for better focus</li><li>writing well structured emails</li><li>writing blog posts and books</li></ul> <br><p><strong> Learn pytest</strong></p><ul><li>pytest is the number one test framework for Python.</li><li>Learn the basics super fast with <a href="https://courses.pythontest.com/hello-pytest">Hello, pytest!</a></li><li>Then later you can become a pytest expert with <a href="https://courses.pythontest.com/the-complete-pytest-course">The Complete pytest Course</a></li><li>Both courses are at <a href="https://courses.pythontest.com/">courses.pythontest.com</a></li></ul>
Categories: FLOSS Project Planets

Real Python: Using the len() Function in Python

Planet Python - Sat, 2024-11-16 09:00

The len() function in Python is a powerful and efficient tool used to determine the number of items in objects, such as sequences or collections. You can use len() with various data types, including strings, lists, dictionaries, and third-party types like NumPy arrays and pandas DataFrames. Understanding how len() works with different data types helps you write more efficient and concise Python code.

Using len() in Python is straightforward for built-in types, but you can extend it to your custom classes by implementing the .__len__() method. This allows you to customize what length means for your objects. For example, with pandas DataFrames, len() returns the number of rows. Mastering len() not only enhances your grasp of Python’s data structures but also empowers you to craft more robust and adaptable programs.

By the end of this tutorial, you’ll understand that:

  • The len() function in Python returns the number of items in an object, such as strings, lists, or dictionaries.
  • To get the length of a string in Python, you use len() with the string as an argument, like len("example").
  • To find the length of a list in Python, you pass the list to len(), like len([1, 2, 3]).
  • The len() function operates in constant time, O(1), as it accesses a length attribute in most cases.

In this tutorial, you’ll learn when to use the len() Python function and how to use it effectively. You’ll discover which built-in data types are valid arguments for len() and which ones you can’t use. You’ll also learn how to use len() with third-party types like ndarray in NumPy and DataFrame in pandas, and with your own classes.

Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Getting Started With Python’s len()

The function len() is one of Python’s built-in functions. It returns the length of an object. For example, it can return the number of items in a list. You can use the function with many different data types. However, not all data types are valid arguments for len().

You can start by looking at the help for this function:

Python >>> help(len) Help on built-in function len in module builtins: len(obj, /) Return the number of items in a container. Copied!

The function takes an object as an argument and returns the length of that object. The documentation for len() goes a bit further:

Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set). (Source)

When you use built-in data types and many third-party types with len(), the function doesn’t need to iterate through the data structure. The length of a container object is stored as an attribute of the object. The value of this attribute is modified each time items are added to or removed from the data structure, and len() returns the value of the length attribute. This ensures that len() works efficiently.

In the following sections, you’ll learn about how to use len() with sequences and collections. You’ll also learn about some data types that you cannot use as arguments for the len() Python function.

Using len() With Built-in Sequences

A sequence is a container with ordered items. Lists, tuples, and strings are three of the basic built-in sequences in Python. You can find the length of a sequence by calling len():

Python >>> greeting = "Good Day!" >>> len(greeting) 9 >>> office_days = ["Tuesday", "Thursday", "Friday"] >>> len(office_days) 3 >>> london_coordinates = (51.50722, -0.1275) >>> len(london_coordinates) 2 Copied!

When finding the length of the string greeting, the list office_days, and the tuple london_coordinates, you use len() in the same manner. All three data types are valid arguments for len().

The function len() always returns an integer as it’s counting the number of items in the object that you pass to it. The function returns 0 if the argument is an empty sequence:

Python >>> len("") 0 >>> len([]) 0 >>> len(()) 0 Copied!

In the examples above, you find the length of an empty string, an empty list, and an empty tuple. The function returns 0 in each case.

A range object is also a sequence that you can create using range(). A range object doesn’t store all the values but generates them when they’re needed. However, you can still find the length of a range object using len():

Python >>> len(range(1, 20, 2)) 10 Copied!

This range of numbers includes the integers from 1 to 19 with increments of 2. The length of a range object can be determined from the start, stop, and step values.

In this section, you’ve used the len() Python function with strings, lists, tuples, and range objects. However, you can also use the function with any other built-in sequence.

Read the full article at https://realpython.com/len-python-function/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Testing application first use experience

Planet KDE - Sat, 2024-11-16 05:15

When working on an application it’s not uncommon to be testing with your own configuration and data, and often more in a power-user setup of that application. While that has advantages it’s easy to lose sight of how the application looks and behaves when first opened in a clean environment.

Testing in a clean environment

Testing the first use experience is technically easy, you just have to delete the entire application state and configuration, or create a new user account. However that’s very cumbersome and thus wont be done regularly.

Fortunately there are more convenient and less invasive shortcuts.

Isolated XDG environment

For many applications we get very far already by separating the XDG directories. That includes configuration files, application data and state as well as cached data.

This means creating four new directories and pointing the following environment variables to one of those:

  • XDG_CACHE_HOME (cached data)
  • XDG_CONFIG_HOME (configuration files)
  • XDG_DATA_HOME (application data)
  • XDG_STATE_HOME (application state)

Running an application in such an environment will make it not see any of its existing state and configuration (without destroying that). That is, as long as the entire state and configuration is actually stored in those locations.

A somewhat common exception is credential storage in a platform service like Secret Service or KWallet. Those wont be isolated and depending on the application you might not get a clean first use state or you might be risking damaging the existing state.

Multi-instance Akonadi

Other services used by an applications might need special attention as well. A particularly complex one in this context is Akonadi, as it contains a lot of configuration, state and data.

Fortunately Akonadi has built-in support for running multiple isolated instances for exactly that reason. All we need is setting the AKOANDI_INSTANCE environment variable to a unique name and we get our own separated instance.

Automation

Given the above building blocks we can create a little wrapper script that launches a given application in a clean ephemeral environment:

import os import subprocess import sys import tempfile xdgHome = tempfile.TemporaryDirectory(prefix='testing-') for d in ['CACHE', 'CONFIG', 'DATA', 'STATE']: os.mkdir(os.path.join(xdgHome.name, d)) os.environ[f"XDG_{d}_HOME"] = os.path.join(xdgHome.name, d) os.environ['AKONADI_INSTANCE'] = 'testing' subprocess.call(sys.argv[1:]) subprocess.call(['akonadictl', '--instance', 'testing', 'stop', '--wait']) xdgHome.cleanup()

I’ve been using this on Itinerary since some time, and it became additionally useful with the introduction of Appium-based UI tests, as those run in a similarly isolated environment.

If you need something slightly longer living, launching a shell with this wrapper is also possible. In that you then can launch your application multiple times, e.g. for testing whether changes are persisted correctly.

Limitations

There’s one unsolved issue with how this isolates applications though: D-Bus. Applications claiming a unique D-Bus service name wont be able to run alongside a second instance this way, so you will have to shut down an already running instance during testing. In most cases that’s not a big deal, but quite inconvenient when working on one of your main communication apps.

I looked at two possible ways to isolate D-Bus (both relatively easy to integrate in a wrapper script):

  • xdg-dbus-proxy: This can limit access to certain host services, but has no way of having a second isolated instance of a service.
  • Running a separate D-Bus session bus: Having a second instance of a service is no problem then, but we have no way to access host services anymore (which means also no credential storage service etc).

Neither of those help with the applications I work on, but they might nevertheless be viable in other scenarios.

Overall, the more entangled an application is in platform state, the harder it becomes to achieve this kind of isolation, and the more you’ll need to customize how to do this. It quickly pays off though, an easy and always available way to quickly test things in a clean state has been super helpful.

Categories: FLOSS Project Planets

This Week in Plasma: Discover and System Monitor with a side of WINE

Planet KDE - Fri, 2024-11-15 23:00

This week no major new features were merged, so we focused on polishing up what we already have and fixing bugs. That's right, Phoronix readers; we do in fact regularly do this! And let me also remind folks about our ongoing 2024 fundraiser: in it, you can adopt a KDE app to have your name displayed as an official supporter of that app. If you love KDE or its apps, this is a great way to show your appreciation.

We're almost halfway to our year-end goal with 6 weeks to go. That's not bad, but I know we can get there quickly and unlock the stretch goals. So check it out! And after that, check out this stuff too:

Notable UI Improvements

When using a color scheme with Header colors such as Breeze Light and Breeze Dark, the color scheme editor no longer confusingly offers the opportunity to edit the Titlebar colors, which aren't used for such color schemes. Instead, you need to edit the Header colors. (Akseli Lahtinen, 6.2.4. Link)

The System Tray no longer shows tooltips for items in the hidden/expanded view that would be identical to the visible text of the item being hovered with the pointer. (Nate Graham, 6.2.4. Link)

The first time you use Plasma to create a network hotspot, it gets assigned a random password by default, rather than no password. (Albert Astals Cid, 6.3.0. Link)

In KRunner-powered searches, you can now jump between categories using the Page Up/Page Down and Ctrl+Up/Ctrl+Down. (Alexander Lohnau, 6.3.0. Link 1 and link 2)

Implemented support for the "Highlight changed settings" feature for most of System Settings' Drawing Tablet page. (Joshua Goins, 6.3.0. Link)

Discover now shows installation progress more accurately when downloading an app that also requires downloading any new Flatpak runtimes. (Harald Sitter, 6.3.0. Link)

When you have multiple Brightness and Color widgets, adjusting the screen brightness in one of them now mirrors this change to all of them, so they stay in sync. (Jakob Petsovits, 6.2.4. Link)

Added a new symbolic icon for WINE, which allows the category that WINE creates in Kickoff to use a symbolic icon that matches all the others. Also improved the existing colorful icon to better match the upstream branding. (Andy Betts, Frameworks 6.9. Link)

Notable Bug Fixes

Speaking of WINE, we fixed a recent regression that caused WINE windows to display black artifacts around them. (Vlad Zahorodnii, 6.2.4. Link)

The feature to save a customized Plasma System Monitor widget as a new preset once again works. And we added an autotest to make sure it doesn't break again! (Arjen Hiemstra, 6.2.4. Link)

Fixed an extremely strange issue that could cause an actively focused XWayland window to lose the ability to receive keyboard and pointer input when the screen was locked using the Meta+L keyboard shortcut. (Adam Nydahl, 6.2.4. Link)

Fixed a recent regression that caused System Monitor to stop gathering statistics for some ARM-based CPUs. (Hector Martin, 6.2.4. Link)

Discover once again allows you to update update-able add-ons acquired using the "Get New [thing]" windows, which had gotten broken in the initial release of Plasma 6. (Harald Sitter, 6.3.0. Link 1 and link 2)

Fixed a case where the real session restoration feature in the X11 session wouldn't restore everything correctly. (David Edmundson, 6.3.0. Link)

Fixed a visual glitch affecting Kirigami's SwipeListItem component which would give it the wrong background color when using Breeze Dark and other similar color schemes, and could be prominently seen on Discover's Settings page. (Marco Martin, Frameworks 6.9. Link)

Fixed a major Qt regression that caused the lock and login screens to become non-functional under various circumstances. (Olivier De Cannière, Qt 6.8.1, but distros will be back-porting it to their Qt 6.8.0 packages soon, if they haven't already. Link)

Fixed a Qt regression that caused the error dialog on "Get New [Thing]" windows to be visually broken until the window was resized. (David Edmundson, Qt 6.8.1. Link)

Fixed another Qt regression that caused clicking on a virtual desktop to switch to it in KWin's overview effect to stop working after you use the Desktop Cube at least once. (David Edmundson, Qt 6.8.1. Link)

Other bug information of note:

Notable in Performance & Technical

We've re-enabled the ability to turn on HDR mode when using version 565.57.1 or later of the NVIDIA driver for NVIDIA GPU users, or version 6.11 or later of the Linux kernel for Intel GPU users. These are the versions of those pieces of software that have fixed the worst bugs affecting HDR on those GPUs. (Xaver Hugl, 6.2.4. Link 1 and link 2)

Fixed a performance issue that affected users of multi-monitor setups while using a VR headset. (Xaver Hugl, 6.2.4. Link)

Reduced the slowness and lag that you could experience when drag-selecting over a hundred items on the desktop. (Akseli Lahtinen, 6.3.0. Link)

Implemented support for the xdg_toplevel_icon Wayland protocol in KWin. (David Edmundson, 6.3.0. Link)

How You Can Help

KDE has become important in the world, and your time and contributions have helped us get there. As we grow, we need your support to keep KDE sustainable.

You can help KDE by becoming an active community member and getting involved somehow. Each contributor makes a huge difference in KDE — you are not a number or a cog in a machine!

You don’t have to be a programmer, either. Many other opportunities exist:

You can also help us by donating to our yearly fundraiser! Any monetary contribution — however small — will help us cover operational costs, salaries, travel expenses for contributors, and in general just keep KDE bringing Free Software to the world.

To get a new Plasma feature or a bugfix mentioned here, feel free to push a commit to the relevant merge request on invent.kde.org.

Categories: FLOSS Project Planets

Oliver Davies' daily list: An interesting thing I spotted about the Override Node Options module

Planet Drupal - Fri, 2024-11-15 19:00

Before my remote talk for the Drupal London meetup, I'm updating the usage statistics for the Override Node Options module - one of the modules I maintain on Drupal.org.

In my slides for DrupalCamp Belgium, I showed the usage figures from October 2023, which showed 38,096 installations and it being the 173rd most installed module.

This week, the number of installations has slightly increased to 38,223.

What's interesting is that whilst the number of installations has been consistent, there are a lot less Drupal 7 websites using the module and a lot more Drupal 8+ sites using it.

October 2023
  • 5.x-1.x: 1
  • 6.x-1.x: 297
  • 7.x-1.x: 13,717
  • 8.x-2.x: 24,081
  • Total: 38,096
November 2024
  • 5.x-1.x: 4
  • 6.x-1.x: 202
  • 7.x-1.x: 10,429
  • 8.x-2.x: 27,588
  • Total: 38,223

Assuming these numbers are correct, this makes me feel very positive and happy about the adoption of newer versions of Drupal and that people are upgrading their D7 websites to Drupal 10 or 11.

Categories: FLOSS Project Planets

digiKam 8.5.0 is released

Planet KDE - Fri, 2024-11-15 19:00

Dear digiKam fans and users,

After five months of active maintenance and many weeks triaging bugs, the digiKam team is proud to present version 8.5.0 of its open source digital photo manager.

Generalities

More than 160 bugs have been fixed and we spent a lot of time contacting users to validate changes in pre-release versions to confirm fixes before deploying the program to production.

Categories: FLOSS Project Planets

FSF Blogs: FSD meeting recap 2024-11-15

GNU Planet! - Fri, 2024-11-15 16:52
Check out the important work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.
Categories: FLOSS Project Planets

FSD meeting recap 2024-11-15

FSF Blogs - Fri, 2024-11-15 16:52
Check out the important work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.
Categories: FLOSS Project Planets

Keep warm with GNU winter swag

FSF Blogs - Fri, 2024-11-15 16:30
Categories: FLOSS Project Planets

Web Review, Week 2024-46

Planet KDE - Fri, 2024-11-15 10:47

Let’s go for my web review for the week 2024-46.

No GPS required: our app can now locate underground trains

Tags: tech, mobile, sensors, gps, transportation

Now this is definitely a smart trick to estimate position in tunnels.

https://blog.transitapp.com/go-underground/


OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI - Bloomberg

Tags: tech, ai, machine-learning, gpt

More signs of the generative AI companies hitting a plateau…

https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai


Releasing the largest multilingual open pretraining dataset

Tags: tech, ai, machine-learning, gpt, data, copyright, licensing

It shouldn’t be, but it is a big deal. Having such training corpus openly available is one of the big missing pieces to build models.

https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-blogmarks


Everything I’ve learned so far about running local LLMs

Tags: tech, ai, machine-learning, gpt, foss

This is an interesting and balanced view. Also nice to see that local inference is really getting closer. This is mostly a UI problem now.

https://nullprogram.com/blog/2024/11/10/


When Machine Learning Tells the Wrong Story

Tags: tech, cpu, hardware, security, privacy, research

Fascinating research about side-channel attacks. Learned a lot about them and website fingerprinting here. Also interesting the explanations of how the use of machine learning models can actually get in the way of proper understanding of the side-channel really used by an attack which can prevent developing actually useful counter-measures.

https://jackcook.com/2024/11/09/bigger-fish.html


Abusing Ubuntu 24.04 features for root privilege escalation

Tags: tech, linux, security

Nice chain of attacks. This shows more than one vulnerability needs to be leveraged to lead to root access. This provides valuable lessons.

https://snyk.io/blog/abusing-ubuntu-root-privilege-escalation/


Way too many ways to wait on a child process with a timeout

Tags: tech, unix, linux, system

The title says it all. This is very fragmented and there are several options to fulfill the task. Knowing the tradeoffs can be handy.

https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html


The CVM Algorithm

Tags: tech, databases, algorithm

This is a nice view into how a query planner roughly works and a nice algorithm which can be used internally to properly estimate the number of distinct values in a column.

https://buttondown.com/jaffray/archive/the-cvm-algorithm/


Mergiraf

Tags: tech, version-control, git, tools, conflict

Looks like a nice way to improve handling of merge conflicts. I’ll test this one out.

https://mergiraf.org/


Opposite of Cloud Native is?

Tags: tech, cloud, complexity, vendor-lockin, self-hosting

Definitely a good post. No you don’t have to go all in with cloud providers and signing with your blood. It’s often much more expensive for little gain but much more complexity and vendor lock in.

https://mkennedy.codes/posts/opposite-of-cloud-native-is-stack-native/


Booleans Are a Trap

Tags: tech, design, type-systems

Avoiding boolean parameters in library APIs should be a well known advice by now. Still they should probably be avoided when modeling domain types as well.

https://katafrakt.me/2024/11/09/booleans-are-a-trap/


Complex for Whom?

Tags: tech, design, complexity

Good musing about complexity. Very often we need to move it around, the important question is where should it appear. For sure you don’t want it scattered everywhere.

https://notes.billmill.org/link_blog/2024/11/Complex_forWhom.html


What makes concurrency so hard?

Tags: tech, distributed, complexity

Interesting reasoning about what is hard in systems with concurrency. It’s definitely about the state space of the system and the structure of that space.

https://buttondown.com/hillelwayne/archive/what-makes-concurrency-so-hard/


Algorithms we develop software by

Tags: tech, programming, craftsmanship, engineering, problem-solving

Interesting musing on the heuristics we use when solving problems. There are good advices in there to make progress and become a better developer.

https://grantslatton.com/software-pathfinding


Bye for now!

Categories: FLOSS Project Planets

KDE Gear 24.12 Beta Testing

Planet KDE - Fri, 2024-11-15 09:26

KDE Gear is our release service for many apps such as mail and calendaring supremo Kontact, geographers dream Marble, social media influencing Kdenlive and dozens of others. KDE needs you to test that your favourite feature has been added and your worst bug has been squished.

You can do this with KDE neon Testing edition, built from the Git branches which get used to make releases from. You can download the ISO and try it on spare hardware or on a virtual machine to test them out.

But maybe you don’t want the faff of installing a distro. Containers give an easier way to test thanks to Distrobox.

Install Distrobox on your normal computer. Make sure Docker or podman are working.

Download the container with

distrobox create -i invent-registry.kde.org/neon/docker-images/plasma:testing-all

Then start it with
distrobox enter all-testing
And voila it will mount the necessary bits to get Wayland connections working and keep your home directory available and you can run say

kontact

and test the beta for the mail app.

Categories: FLOSS Project Planets

Metadrop: Local tasks hierarchy on Drupal 10

Planet Drupal - Fri, 2024-11-15 07:29

Recently, in one of our projects with Drupal 10, we faced an interesting challenge: implementing two-level "local tasks" for a specific functionality of our module. Despite the number of documentation related to local tasks in Drupal, setting up two levels of these tasks proved challenging, as we couldn't get them to display in the way we needed. However, after exhaustive research, we found an example in an existing module that helped us solve the problem.

Exploring the Problem

The need was to add a main "local task" and three associated subtasks that would show up when viewing or editing a node. Initially, the main obstacle was finding the right way to implement two levels of local tasks.

The Solution: Inspiration from Contributed Modules

During our search among existing contributed modules, we found…

Categories: FLOSS Project Planets

1xINTERNET blog: Reunited in Berlin - DrupalCamp Berlin 2024

Planet Drupal - Fri, 2024-11-15 07:00

10 years after DrupalCity Berlin 2014 the community kicked-off another DrupalCamp in the heart of Europe uniting the global Drupal community. Learn what's behind this triumphant return.

Categories: FLOSS Project Planets

Pages