Feeds

Junichi Uekawa: Been very busy with real life.

Planet Debian - Sun, 2024-05-05 06:00

Been very busy with real life. Hardly any time to get things done.

Doug Hellmann: sphinxcontrib-sqltable 2.1.0 - SQLAlchemy 2.0 support

Planet Python - Sun, 2024-05-05 05:17

What’s new in 2.1.0? update packaging to use pyproject.toml Update sqltable.py to support SQLAlchemy 2.0 (contributions by Gabriel Gaona) Pin SQLAlchemy>=2.0 to prevent backwards incompatibility with changes to the query execution interface. (contributions by Gabriel Gaona) fix up python version support list Fix the obsolete link to bitbucket (contributions by kbaikov) find the sample schema file depending on how sphinx is invoked move sample database to /tmp for rtd.org

Categories: FLOSS Project Planets

Django Weblog: Last call for DjangoCon Europe 2025 organizers

Planet Python - Sun, 2024-05-05 03:16

Note: This text is an updated and clarified version of the previous call. We have opened up more 2025 event dates; targeting January - April and now also June 2025.

DjangoCon Europe is a major pillar of the Django community, as people from across the world meet and share. This includes many qualities that make it a unique event - unconventional and conventional venues, creative happenings, a feast of talks and a dedication to inclusion and diversity.

Ahead of DjangoCon Europe 2024, we are looking for the next group of organizers to own and lead the 2025 conference. Could your town - or your football stadium, circus tent, private island or city hall - host this wonderful community event?

Hosting a DjangoCon is an ambitious undertaking. It's hard work, but each year it has been successfully run by a team of community volunteers, not all of whom have had previous experience - more important is enthusiasm, organizational skills, the ability to plan and manage budgets, time and people - and plenty of time to invest in the project.

Step 1: Submit your expression of interest

If you’re considering organizing DjangoCon Europe (🙌 great!), fill in our DjangoCon Europe 2025 expression of interest form with your contact details. No need to fill in all the information at this stage, we’ll reach out and help you figure it out.

Express your interest in organizing

Step 2: We’re here to help!

We've set up a DjangoCon Europe support working group of previous organizers that you can reach out to with questions about organizing and running a DjangoCon Europe.

The group will be in touch with everyone submitting the expression of interest form , or you can reach out to them directly: european-organizers-support@djangoproject.com

We'd love to hear from you as soon as possible, so your proposal can be finalized and sent to the DSF board by June 2nd. If time permits, the selected hosts will be publicly announced at this year's DjangoCon Europe by the current organizers.

Step 3: Submitting the proposal

The more detailed and complete your final proposal is, the better. Basic details include:

Organizing committee members: You won’t have a full team yet, probably, naming just some core team members is enough.
The legal entity that is intended to run the conference: Even if the entity does not exist yet, please share how you are planning to set it up.
Dates: We must avoid conflicts with major holidays, EuroPython, DjangoCon US, and PyCon US.
Venue(s), including size, number of possible attendees, pictures, accessibility concerns, catering, etc. Possible dates for 2025 are January 5th to April 30th, and June 1st to June 30th.
Transport links and accommodation: Can your venue be reached by international travelers?
Budgets and ticket prices: Talk to the DjangoCon Europe Support group to get help with this, including information on past event budgets.

We also like to see:

Timelines
Pictures
Draft agreements with providers
Alternatives you have considered

Submit your completed proposal via our DjangoCon Europe 2025 expression of interest form, this time filling in as many fields as possible. We look forward to reviewing great proposals that continue the excellence the whole community associates with DjangoCon Europe.

Q&A Can I organize a conference alone?

We strongly recommend that a team of people submit an application.

I/we don’t have a legal entity yet, is that a problem?

Depending on your jurisdiction, this is usually not a problem. But please share your plans about the entity you will use or form in your application.

Do I/we need experience with organizing conferences?

The support group is here to help you succeed. From experience, we know that many core groups of 2-3 people have been able to run a DjangoCon with guidance from previous organizers and help from volunteers.

What is required in order to announce an event?

Ultimately, a contract with the venue confirming the dates is crucial, since announcing a conference makes people book calendars, holidays, buy transportation and accommodation etc. This, however, would only be relevant after the DSF board has concluded the application process. Naturally, the application itself cannot contain any guarantees, but it’s good to check concrete dates with your venues to ensure they are actually open and currently available, before suggesting these dates in the application.

Do we have to do everything ourselves?

No. You will definitely be offered lots of help by the community. Typically, conference organizers will divide responsibilities into different teams, making it possible for more volunteers to join. Local organizers are free to choose which areas they want to invite the community to help out with, and a call will go out through a blog post announcement on djangoproject.com and social media.

What kind of support can we expect from the Django Software Foundation?

The DSF regularly provides grant funding to DjangoCon organizers, to the extent of $6,000 in recent editions. We also offer support via specific working groups:

The dedicated DjangoCon Europe support working group.
The social media working group can help you promote the event.
The Code of Conduct working group works with all event organizers.

In addition, a lot of Individual Members of the DSF regularly volunteer at community events. If your team aren’t Individual Members, we can reach out to them on your behalf to find volunteers.

What dates are possible in 2025?

For 2025, DjangoCon Europe should ideally happen between January 5th and April 30th, or June 1st and June 30th. This is to avoid the following community events’ provisional dates:

PyCon US 2025: 14 May through 22 May, 2025
EuroPython 2025: July 2025
DjangoCon US 2025: September - October 2025
DjangoCon Africa 2025: August - September 2025

Here are the holidays to avoid:

New Year's Day: Wednesday 1st January 2025
Chinese New Year: Wednesday 29th January 2025
Eid Al-Fitr: Sunday 30th March 2025
Passover: Saturday 12th - Sunday 20th April 2025
Easter: Sunday 20th April 2025
Eid Al-Adha: Friday 6th - Monday 9th June 2025
Rosh Hashanah : Monday 22nd - Wednesday 24th September 2025
Yom Kippur: Wednesday 1st - Thursday 2nd October 2025

What cities or countries are possible?

Any city in Europe. This can be a city or country where DjangoCon Europe has happened in the past (Edinburgh, Porto, Copenhagen, Heidelberg, Florence, Budapest, Cardiff, Toulon, Warsaw, Zurich, Amsterdam, Berlin), or a new locale.

Categories: FLOSS Project Planets

Eli Bendersky: My favorite prime number generator

Planet Python - Sat, 2024-05-04 22:46

Many years ago I've re-posted a Stack Overflow answer with Python code for a terse prime sieve function that generates a potentially infinite sequence of prime numbers ("potentially" because it will run out of memory eventually). Since then, I've used this code many times - mostly because it's short and clear. In this post I will explain how this code works, where it comes from (I didn't come up with it), and some potential optimizations. If you want a teaser, here it is:

def gen_primes(): """Generate an infinite sequence of prime numbers.""" D = {} q = 2 while True: if q not in D: D[q * q] = [q] yield q else: for p in D[q]: D.setdefault(p + q, []).append(p) del D[q] q += 1 The sieve of Eratosthenes

To understand what this code does, we should first start with the basic Sieve of Eratosthenes; if you're familiar with it, feel free to skip this section.

The Sieve of Eratosthenes is a well-known algorithm from ancient Greek times for finding all the primes below a certain number reasonably efficiently using a tabular representation. This animation from Wikipedia explains it pretty well:

Starting with the first prime (2) it marks all its multiples until the requested limit. It then takes the next unmarked number, assumes it's a prime (because it is not a multiple of a smaller prime), and marks its multiples, and so on until all the multiples below the limit are marked. The remaining unmarked numbers are primes.

Here's a well-commented, basic Python implementation:

def gen_primes_upto(n): """Generates a sequence of primes < n. Uses the full sieve of Eratosthenes with O(n) memory. """ if n == 2: return # Initialize table; True means "prime", initially assuming all numbers # are prime. table = [True] * n sqrtn = int(math.ceil(math.sqrt(n))) # Starting with 2, for each True (prime) number I in the table, mark all # its multiples as composite (starting with I*I, since earlier multiples # should have already been marked as multiples of smaller primes). # At the end of this process, the remaining True items in the table are # primes, and the False items are composites. for i in range(2, sqrtn): if table[i]: for j in range(i * i, n, i): table[j] = False # Yield all the primes in the table. yield 2 for i in range(3, n, 2): if table[i]: yield i

When we want a list of all the primes below some known limit, gen_primes_upto is great, and performs fairly well. There are two issues with it, though:

We have to know what the limit is ahead of time; this isn't always possible or convenient.
Its memory usage is high - O(n); this can be significantly optimized, however; see the bonus section at the end of the post for details.

The infinite prime generator

Back to the infinite prime generator that's in the focus of this post. Here is its code again, now with some comments:

def gen_primes(): """Generate an infinite sequence of prime numbers.""" # Maps composites to primes witnessing their compositeness. D = {} # The running integer that's checked for primeness q = 2 while True: if q not in D: # q is a new prime. # Yield it and mark its first multiple that isn't # already marked in previous iterations D[q * q] = [q] yield q else: # q is composite. D[q] holds some of the primes that # divide it. Since we've reached q, we no longer # need it in the map, but we'll mark the next # multiples of its witnesses to prepare for larger # numbers for p in D[q]: D.setdefault(p + q, []).append(p) del D[q] q += 1

The key to the algorithm is the map D. It holds all the primes encountered so far, but not as keys! Rather, they are stored as values, with the keys being the next composite number they divide. This lets the program avoid having to divide each number it encounters by all the primes known so far - it can simply look in the map. A number that's not in the map is a new prime, and the way the map updates is not unlike the sieve of Eratosthenes - when a composite is removed, we add the next composite multiple of the same prime(s). This is guaranteed to cover all the composite numbers, while prime numbers should never be keys in D.

I highly recommend instrumenting this function with some printouts and running through a sample invocation - it makes it easy to understand how the algorithm makes progress.

Compared to the full sieve gen_primes_upto, this function doesn't require us to know the limit ahead of time - it will keep producing prime numbers ad infinitum (but will run out of memory eventually). As for memory usage, the D map has all the primes in it somewhere, but each one appears only once. So its size is O(\pi(n)), where \pi(n) is the Prime-counting function, the number of primes smaller or equal to n. This can be approximated by O(\frac{n}{ln(n)}) [1].

I don't remember where I first saw this approach mentioned, but all the breadcrumbs lead to this ActiveState Recipe by David Eppstein from way back in 2002.

Optimizing the generator

I really like gen_primes; it's short, easy to understand and gives me as many primes as I need without forcing me to know what limit to use, and its memory usage is much more reasonable than the full-blown sieve of Eratosthenes. It is, however, also quite slow, over 5x slower than gen_primes_upto.

The aforementioned ActiveState Recipe thread has several optimization ideas; here's a version that incorporates ideas from Alex Martelli, Tim Hochberg and Wolfgang Beneicke:

def gen_primes_opt(): yield 2 D = {} for q in itertools.count(3, step=2): p = D.pop(q, None) if not p: D[q * q] = q yield q else: x = q + p + p # get odd multiples while x in D: x += p + p D[x] = p

The optimizations are:

Instead of holding a list as the value of D, just have a single number. In cases where we need more than one witness to a composite, find the next multiple of the witness and assign that instead (this is the while x in D inner loop in the else clause). This is a bit like using linear probing in a hash table instead of having a list per bucket.
Skip even numbers by starting with 2 and then proceeding from 3 in steps of 2.
The loop assigning the next multiple of witnesses may land on even numbers (when p and q are both odd). So instead jump to q + p + p directly, which is guaranteed to be odd.

With these in place, the function is more than 3x faster than before, and is now only within 40% or so of gen_primes_upto, while remaining short and reasonably clear.

There are even fancier algorithms that use interesting mathematical tricks to do less work. Here's an approach by Will Ness and Tim Peters (yes, that Tim Peters) that's reportedly faster. It uses the wheels idea from this paper by Sorenson. Some additional details on this approach are available here. This algorithm is both faster and consumes less memory; on the other hand, it's no longer short and simple.

To be honest, it always feels a bit odd to me to painfully optimize Python code, when switching languages provides vastly bigger benefits. For example, I threw together the same algorithms using Go and its experimental iterator support; it's 3x faster than the Python version, with very little effort (even though the new Go iterators and yield functions are still in the proposal stage and aren't optimized). I can't try to rewrite it in C++ or Rust for now, due to the lack of generator support; the yield statement is what makes this code so nice and elegant, and alternative idioms are much less convenient.

Bonus: segmented sieve of Eratosthenes

The Wikipedia article on the sieve of Eratosthenes mentions a segmented approach, which is also described in the Sorenson paper in section 5.

The main insight is that we only need the primes up to \sqrt{n} to be able to sieve a table all the way to N. This results in a sieve that uses only O(\sqrt{n}) memory. Here's a commented Python implementation:

def gen_primes_upto_segmented(n): """Generates a sequence of primes < n. Uses the segmented sieve or Eratosthenes algorithm with O(√n) memory. """ # Simplify boundary cases by hard-coding some small primes. if n < 11: for p in [2, 3, 5, 7]: if p < n: yield p return # We break the range [0..n) into segments of size √n segsize = int(math.ceil(math.sqrt(n))) # Find the primes in the first segment by calling the basic sieve on that # segment (its memory usage will be O(√n)). We'll use these primes to # sieve all subsequent segments. baseprimes = list(gen_primes_upto(segsize)) for bp in baseprimes: yield bp for segstart in range(segsize, n, segsize): # Create a new table of size √n for each segment; the old table # is thrown away, so the total memory use here is √n # seg[i] represents the number segstart+i seg = [True] * segsize for bp in baseprimes: # The first multiple of bp in this segment can be calculated using # modulo. first_multiple = ( segstart if segstart % bp == 0 else segstart + bp - segstart % bp ) # Mark all multiples of bp in the segment as composite. for q in range(first_multiple, segstart + segsize, bp): seg[q % len(seg)] = False # Sieving is done; yield all composites in the segment (iterating only # over the odd ones). start = 1 if segstart % 2 == 0 else 0 for i in range(start, len(seg), 2): if seg[i]: if segstart + i >= n: break yield segstart + i Code

The full code for this post - along with tests and benchmarks - is available on GitHub.

[1]While this is a strong improvement over O(n) (e.g. for a billion primes, memory usage here is only 5% of the full sieve version), it still depends on the size of the input. In the unlikely event that you need to generate truly gigantic primes starting from 2, even the square-root-space solutions become infeasible. In this case, the whole approach should be changed; instead, one would just generate random huge numbers and use probabilistic primality testing to check for their primeness. This is what real libraries like Go's crypto/rand.Prime do.

Categories: FLOSS Project Planets

Eli Bendersky: Faster XML stream processing in Go

Planet Python - Sat, 2024-05-04 22:46

XML processing was all the rage 15 years ago; while it's less prominent these days, it's still an important task in some application domains. In this post I'm going to compare the speed of stream-processing huge XML files in Go, Python and C and finish up with a new, minimal module that uses C to accelerate this task for Go. All the code shown throughout this post is available in this GitHub repository the new Go module is here.

What does XML stream processing mean?

First, let's define the problem at hand in more detail. Roughly speaking, there are two ways we can process data from a file:

Read the whole file into memory at once, and then proces the data in memory.
Read the file in chunks, process each chuck, without having the whole data in memory at any given time.

In many ways, (1) is more convenient because we can easily go back to any part of the file. However, in some situations (2) is essential; specifically, when the file is very large. This is where stream processing comes in. If our input file is 500 GiB, we're unlikely to be able to read it into memory and have to process it in parts. Even for smaller files that would theoretically fit into RAM, it's not always a good idea to read them wholly; this dramatically increases the active heap size and can cause performance issues in garbage-collected languages.

The task

For this benchmark, I'm using xmlgen to create a 230 MiB XML file [1]. A tiny fragment of the file may look like this:

<?xml version="1.0" standalone="yes"?> <site> <regions> <asia> <item id="item0"> <location>United States</location> <quantity>1</quantity> <name>duteous nine eighteen </name> <payment>Creditcard</payment> ... </item> </asia> </regions> </site>

The task is to find how many times "Africa" appears in the data of the <location> tag throughout the document.

Baseline - using the Go standard library

Let's start with a baseline implementation - using the standard library's encoding/xml package. While the package's Unmarshal mode will parse the whole file in one go, it can also be used to process XML token by token and selectively parse interesting elements. Here is the code:

package main import ( "encoding/xml" "fmt" "io" "log" "os" "strings" ) type location struct { Data string `xml:",chardata"` } func main() { f, err := os.Open(os.Args[1]) if err != nil { log.Fatal(err) } defer f.Close() d := xml.NewDecoder(f) count := 0 for { tok, err := d.Token() if tok == nil || err == io.EOF { // EOF means we're done. break } else if err != nil { log.Fatalf("Error decoding token: %s", err) } switch ty := tok.(type) { case xml.StartElement: if ty.Name.Local == "location" { // If this is a start element named "location", parse this element // fully. var loc location if err = d.DecodeElement(&loc, &ty); err != nil { log.Fatalf("Error decoding item: %s", err) } if strings.Contains(loc.Data, "Africa") { count++ } } default: } } fmt.Println("count =", count) }

I made sure to double check that the memory usage of this program stays bounded and low while processing a large file - the maximum RSS was under 7 MiB while processing our 230 MiB input file. I'm verifying this for all the programs presented in this post using /usr/bin/time -v on Linux.

This program takes 6.24 seconds to process the whole file and print out the result.

Python implementation

The first Python implementation uses the xml.etree.ElementTree module from the standard library:

import sys import xml.etree.ElementTree as ET count = 0 for event, elem in ET.iterparse(sys.argv[1], events=("end",)): if event == "end": if elem.tag == 'location' and elem.text and 'Africa' in elem.text: count += 1 elem.clear() print('count =', count)

The key here is the elem.clear() call. It ensures that each element gets discarded afer parsing it fully, so the memory usage won't grow linearly with the size of the file (unless the file is pathological). This program takes 3.7 seconds to process the whole file - much faster than our Go program. Why is that?

While the Go program uses 100% Go code for the task (encoding/xml is implemented entirely in Go), the Python program is using a C extension (most of ElementTree is written in C) wrapping a fast XML parser in C - libexpat. The bulk of the work here is done in C, which is faster than Go. The performance of encoding/xml is further discussed in this issue, though it's an old one and the performance has been somewhat optimized since then.

An alternative XML parsing library for Python is lxml, which uses libxml underneath. Here's a Python version using lxml:

import sys from lxml import etree count = 0 for event, elem in etree.iterparse(sys.argv[1], events=("end",)): if event == "end": if elem.tag == 'location' and elem.text and 'Africa' in elem.text: count += 1 elem.clear() print('count =', count)

This looks very similar to the previous version, and that's on purpose. lxml has an etree-compatible API to make transition from the standard library smoother. This version also takes around 3.7 seconds for our 230 MiB file.

The reason I'm including lxml here is that it will run faster than xml.etree.ElementTree when slurping the whole file, for our particular file size. I want to highlight that this is outside of the scope for my experiment, because I only care about streaming processing. The only way (that I'm aware of!) to successfully process a 500 GiB file with lxml would be by using iterparse.

How fast can it run?

Based on the measurements presented here, Go is about 68% slower than Python for parsing a large XML file in a streaming fashion. While Go usually compiles to a much faster code than pure Python, the Python implementations have the backing of efficient C libraries with which it's difficult to compete. I was curious to know how fast it could be, in theory [2].

To answer this question, I implemented the same program using pure C with libxml, which has a SAX API. I won't paste it wholly here because it's longer, but you can find the full source code on GitHub. It takes just 0.56 seconds to process our 230 MiB input file, which is very impressive given the other results, but also not very surprising. This is C, after all.

You may wonder - if lxml uses libxml underneath, why is it so much slower than the pure C version? The answer is Python call overhead. The lxml version calls back into Python for every parsed element, which incurs a significant cost [3]. Another reason is that my C implementation doesn't actually parse an element - it's just a simple event-based state machine, so there's less extra work being done.

Using libxml from Go

To recap where we are so far:

Python libraries based on underlying C implementations are faster than pure Go.
Pure C is much faster still.

We have two options: we can either try to optimize Go's encoding/xml package, or we can try to wrap a fast C library with Go. While the former is a worthy goal, it involves a large effort and should be a topic for a separate post. Here, I'll go for the latter.

Seaching around the web, I found a few wrappers around libxml. Two that seemed moderately popular and maintained are https://github.com/lestrrat-go/libxml2 and https://github.com/moovweb/gokogiri. Unfortunately, neither of these (or the other bindings I found) are exposing the SAX API of libxml; instead, they focus on the DOM API, where the whole document is parsed by the underlying library and a tree is returned. As mentioned above, we need the SAX interface to process huge files.

gosax

It's time to roll our own :-) I wrote the gosax module, which uses Cgo to call into libxml and exposes a SAX interface [4]. Implementing it was an interesting exercise in Cgo, because it requires some non-trivial concepts like registering Go callbacks with C.

Here's a version of our program using gosax:

package main import ( "fmt" "os" "strings" "github.com/eliben/gosax" ) func main() { counter := 0 inLocation := false scb := gosax.SaxCallbacks{ StartElement: func(name string, attrs []string) { if name == "location" { inLocation = true } else { inLocation = false } }, EndElement: func(name string) { inLocation = false }, Characters: func(contents string) { if inLocation && strings.Contains(contents, "Africa") { counter++ } }, } err := gosax.ParseFile(os.Args[1], scb) if err != nil { panic(err) } fmt.Println("counter =", counter) }

As you can see, it implements a state machine that remembers being inside a location element, where the character data is checked. This program takes 4.03 seconds to process our input file. Not bad! But we can do a bit better, and with a couple of optimizations I managed to bring it down to 3.68 seconds - about the same speed as the Python implementations!

IMHO the roughly similar run times here are a coincidence, because the Python programs are different from my approach in that they expose a higher-level API than pure SAX. Recall that iterparse returns a parsed element, and we can access its text attribute, etc. In gosax, we have to do this much more manually. Since the the cost of calls between Cgo and Go is rather high, there is an optimization opportunity here for gosax. We could do more work in C - parsing a full element, and returning it wholly to Go. This would move work from the Go side to the C side, as well as reduce the number of cross-language calls. But this is a task for another day.

Conclusion

Well, this was fun :-) There are 5 different implementations of the same simple task described here, in 3 different programming languages. Here is a summary of the speed measurements we got:

Python's performance story has always been - "it's probably fast enough, and in the rare cases where it isn't, use a C extension". In Go the narrative is somewhat different: in most cases, the Go compiler produces fairly fast code. Pure Go code is significantly faster than Python and often faster than Java. Even so, every once in a while it may be useful to dip into C or C++ for performance, and in these cases Cgo is a good approach.

It's obvious that encoding/xml needs some work w.r.t. performance, but until that happens - there are good alternatives! Leveraging the speed of libxml has been possible for the DOM API, and now is possible for the SAX API as well. In the long run, I believe that serious performance work on encoding/xml can make it go faster than the libxml wrappers because it would elimitate the high cost of C-to-Go calls.

[1]This size will easily fit in RAM, but it's good enough to provide a meaningful benchmarking duration. [2]When working on optimizations, it's often useful to know "the speed of light" of some computation. Say we want to optimize some function in our program. It's worth asking - how much faster will the program be if this function takes 0 time? If the overall change is tiny, the function is not worth optimizing, most likely. This is just a practical application of Amdahl's law. [3]We can test this hypothesis by timing how long it takes the non-streaming API in lxml to parse the same file. Since it parses the whole XML file in C before returning the parsed structure to Python, we expect the Python call overhead to be much smaller. Indeed, for files that fit into memory this is faster. But once again, in this post we return our attention to streaming APIs - assuming this is our only choice for gigantic files. [4]gosax is very minimal, only providing the most common SAX callbacks. The decision to create a new module was just for convenience and speed; the more correct thing would have likely been to contribute to one of the existing libxml wrappers. I don't see gosax as production-quality at this stage - I just hacked it together to be able to experiment for this post.

Categories: FLOSS Project Planets

Eli Bendersky: Type inference

Planet Python - Sat, 2024-05-04 22:46

Type inference is a major feature of several programming languages, most notably languages from the ML family like Haskell. In this post I want to provide a brief overview of type inference, along with a simple Python implementation for a toy ML-like language.

Uni-directional type inference

While static typing is very useful, one of its potential downsides is verbosity. The programmer has to annotate values with types throughout the code, which results in more effort and clutter. What's really annoying, though, is that in many cases these annotations feel superfluous. Consider this classical C++ example from pre-C++11 times:

std::vector<Blob*> blobs; std::vector<Blob*>::iterator iter = blobs.begin();

Clearly when the compiler sees blobs.begin(), it knows the type of blobs, so it also knows the type of the begin() method invoked on it because it is familiar with the declaration of begin. Why should the programmer be burdened with spelling out the type of the iterator? Indeed, one of the most welcome changes in C++11 was lifting this burden by repurposing auto for basic type inference:

std::vector<Blob*> blobs; auto iter = blobs.begin();

Go has a similar capability with the := syntax. Given some function:

func parseThing(...) (Node, error) { }

We can simply write:

node, err := parseThing(...)

Without having to explicitly declare that node has type Node and err has type error.

These features are certainly useful, and they involve some degree of type inference from the compiler. Some functional programming proponents say this is not real type inference, but I think the difference is just a matter of degree. There's certainly some inference going on here, with the compiler calculating and assigning the right types for expressions without the programmer's help. Since this calculation flows in one direction (from the declaration of the vector::begin method to the auto assignment), I'll call it uni-directional type inference [1].

Bi-directional type inference (Hindley-Milner)

If we define a new map function in Haskell to map a function over a list, we can do it as follows:

mymap f [] = [] mymap f (first:rest) = f first : mymap f rest

Note that we did not specify the types for either the arguments of mymap, or its return value. The Haskell compiler can infer them on its own, using the definition provided:

> :t Main.mymap Main.mymap :: (t1 -> t) -> [t1] -> [t]

The compiler has determined that the first argument of mymap is a generic function, assigning its argument the type t1 and its return value the type t. The second argument of mymap has the type [t1], which means "list of t1"; then the return value of mymap has the type "list of t". How was this accomplished?

Let's start with the second argument. From the [] = [] variant, and also from the (first:rest) deconstruction, the compiler infers it has a list type. But there's nothing else in the code constraining the element type, so the compiler chooses a generic type specifier - t1. f first applies f to an element of this list, so f has to take t1; nothing constrains its return value type, so it gets the generic t. The result is f has type (t1 -> t), which in Haskell parlance means "a function from t1 to t".

Here is another example, written in a toy language I put together for the sake of this post. The language is called microml, and its implementation is described at the end of the post:

foo f g x = if f(x == 1) then g(x) else 20

Here foo is declared as a function with three arguments. What is its type? Let's try to run type inference manually. First, note that the body of the function consists of an if expresssion. As is common in programming languages, this one has some strict typing rules in microml; namely, the type of the condition is boolean (Bool), and the types of the then and else clauses must match.

So we know that f(x == 1) has to return a Bool. Moreover, since x is compared to an integer, x is an Int. What is the type of g? Well, it has an Int argument, and it return value must match the type of the else clause, which is an Int as well.

To summarize:

The type of x is Int
The type of f is Bool -> Bool
The type of g is Int -> Int

So the overall type of foo is:

((Bool -> Bool), (Int -> Int), Int) -> Int

It takes three arguments, the types of which we have determined, and returns an Int.

Note how this type inference process is not just going in one direction, but seems to be "jumping around" the body of the function figuring out known types due to typing rules. This is why I call it bi-directional type inference, but it's much better known as Hindley-Milner type inference, since it was independently discovered by Roger Hindley in 1969 and Robin Milner in 1978.

How Hindley-Milner type inference works

We've seen a couple of examples of manually running type inference on some code above. Now let's see how to translate it to an implementable algorithm. I'm going to present the process in several separate stages, for simplicity. Some other presentations of the algorithm combine several of these stages, but seeing them separately is more educational, IMHO.

The stages are:

Assign symbolic type names (like t1, t2, ...) to all subexpressions.
Using the language's typing rules, write a list of type equations (or constraints) in terms of these type names.
Solve the list of type equations using unification.

Let's use this example again:

foo f g x = if f(x == 1) then g(x) else 20

Starting with stage 1, we'll list all subexpressions in this declaration (starting with the declaration itself) and assign unique type names to them:

foo t0 f t1 g t2 x t3 if f(x == 1) then g(x) else 20 t4 f(x == 1) t5 x == 1 t6 x t3 g(x) t7 20 Int

Note that every subexpression gets a type, and we de-duplicate them (e.g. x is encountered twice and gets the same type name assigned). Constant nodes get known types.

In stage 2, we'll use the language's typing rules to write down equations involving these type names. Usually books and papers use slightly scary formal notation for typing rules; for example, for if:

\[\frac{\Gamma \vdash e_0 : Bool, \Gamma \vdash e_1 : T, \Gamma \vdash e_2 : T}{\Gamma \vdash if\: e_0\: then\: e_1\: else\: e_2 : T}\]

All this means is the intuitive typing of if we've described above: the condition is expected to be boolean, and the types of the then and else clauses are expected to match, and their type becomes the type of the whole expression.

To unravel the notation, prepend "given that" to the expression above the line and "we can derive" to the expression below the line; \Gamma \vdash e_0 : Bool means that e_0 is typed to Bool in the set of typing assumptions called \Gamma.

Similarly, a typing rule for single-argument function application would be:

\[\frac{\Gamma \vdash e_0 : T, \Gamma \vdash f : T \rightarrow U}{\Gamma \vdash f(e_0) : U}\]

The real trick of type inference is running these typing rules in reverse. The rule tells us how to assign types to the whole expression given its constituent types, but we can also use it as an equation that works both ways and lets us infer constituent types from the whole expression's type.

Let's see what equations we can come up with, looking at the code:

From f(x == 1) we infer t1 = (t6 -> t5), because t1 is the type of f, t6 is the type of x == 1, and t5 is the type of f(x == 1). Note that we're using the typing rules for function application here. Moreover, we can infer that t3 is Int and t6 is Bool because of the typing rule of the == operator.

Similarly, from g(x) we infer t2 = (t3 -> t7).

From the if expression, we infer that t6 is Bool (since it's the condition of the if) and that t4 = Int, because the then and else clauses must match.

Now we have a list of equations, and our task is to find the most general solution, treating the equations as constraints. This is done by using the unification algorithm which I described in detail in the previous post. The solution we're seeking here is precisely the most general unifier.

For our expression, the algorithm will find the type of foo to be:

((Bool -> Bool), (Int -> Int), Int) -> Int)

As expected.

If we make a slight modification to the expression to remove the comparison of x with 1:

foo f g x = if f(x) then g(x) else 20

Then we can no longer constrain the type of x, since all we know about it is that it's passed into functions f and g, and nothing else constrains the arguments of these functions. The type inference process will thus calculate this type for foo:

((a -> Bool), (a -> Int), a) -> Int

It assigns x the generic type name a, and uses it for the arguments of f and g as well.

The implementation

An implementation of microml is available here, as a self-contained Python program that parses a microml declaration and infers its type. The best starting point is main.py, which spells out the stages of type inference:

code = 'foo f g x = if f(x == 1) then g(x) else 20' print('Code', '----', code, '', sep='\n') # Parse the microml code snippet into an AST. p = parser.Parser() e = p.parse_decl(code) print('Parsed AST', '----', e, '', sep='\n') # Stage 1: Assign symbolic typenames typing.assign_typenames(e.expr) print('Typename assignment', '----', typing.show_type_assignment(e.expr), '', sep='\n') # Stage 2: Generate a list of type equations equations = [] typing.generate_equations(e.expr, equations) print('Equations', '----', sep='\n') for eq in equations: print('{:15} {:20} | {}'.format(str(eq.left), str(eq.right), eq.orig_node)) # Stage 3: Solve equations using unification unifier = typing.unify_all_equations(equations) print('', 'Inferred type', '----', typing.get_expression_type(e.expr, unifier, rename_types=True), sep='\n')

This will print out:

Code ---- foo f g x = if f(x == 1) then g(x) else 20 Parsed AST ---- Decl(foo, Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20))) Typename assignment ---- Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20)) t0 If(App(f, [(x == 1)]), App(g, [x]), 20) t4 App(f, [(x == 1)]) t5 f t1 (x == 1) t6 x t3 1 Int App(g, [x]) t7 g t2 x t3 20 Int Equations ---- Int Int | 1 t3 Int | (x == 1) Int Int | (x == 1) t6 Bool | (x == 1) t1 (t6 -> t5) | App(f, [(x == 1)]) t2 (t3 -> t7) | App(g, [x]) Int Int | 20 t5 Bool | If(App(f, [(x == 1)]), App(g, [x]), 20) t4 t7 | If(App(f, [(x == 1)]), App(g, [x]), 20) t4 Int | If(App(f, [(x == 1)]), App(g, [x]), 20) t0 ((t1, t2, t3) -> t4) | Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20)) Inferred type ---- (((Bool -> Bool), (Int -> Int), Int) -> Int)

There are many more examples of type-inferred microml code snippets in the test file test_typing.py. Here's another example which is interesting:

> foo f x = if x then lambda t -> f(t) else lambda j -> f(x) ((Bool -> a), Bool) -> (Bool -> a)

The actual inference is implemented in typing.py, which is fairly well commented and should be easy to understand after reading this post. The trickiest part is probably the unification algorithm, but that one is just a slight adaptation of the algorithm presented in the previous post.

[1]

After this post was published, it was pointed out that another type checking / inference technique is already called bi-directional (see this paper for example); while it's related to Hindley-Milner (HM), it's a distinct method. Therefore, my terminology here can create a confusion.

I'll emphasize that my only use of the term "bi-directional" is to distinguish what HM does from the simpler "uni-directional" inference described at the beginning.

Categories: FLOSS Project Planets

Eli Bendersky: Unification

Planet Python - Sat, 2024-05-04 22:46

In logic and computer science, unification is a process of automatically solving equations between symbolic terms. Unification has several interesting applications, notably in logic programming and type inference. In this post I want to present the basic unification algorithm with a complete implementation.

Let's start with some terminology. We'll be using terms built from constants, variables and function applications:

A lowercase letter represents a constant (could be any kind of constant, like an integer or a string)
An uppercase letter represents a variable
f(...) is an application of function f to some parameters, which are terms themselves

This representation is borrowed from first-order logic and is also used in the Prolog programming language. Some examples:

V: a single variable term
foo(V, k): function foo applied to variable V and constant k
foo(bar(k), baz(V)): a nested function application

Pattern matching

Unification can be seen as a generalization of pattern matching, so let's start with that first.

We're given a constant term and a pattern term. The pattern term has variables. Pattern matching is the problem of finding a variable assignment that will make the two terms match. For example:

Constant term: f(a, b, bar(t))
Pattern term: f(a, V, X)

Trivially, the assignment V=b and X=bar(t) works here. Another name to call such an assignment is a substitution, which maps variables to their assigned values. In a less trivial case, variables can appear multiple times in a pattern:

Constant term: f(top(a), a, g(top(a)), t)
Pattern term: f(V, a, g(V), t)

Here the right substitution is V=top(a).

Sometimes, no valid substitutions exist. If we change the constant term in the latest example to f(top(b), a, g(top(a)), t), then there is no valid substitution becase V would have to match top(b) and top(a) simultaneously, which is not possible.

Unification

Unification is just like pattern matching, except that both terms can contain variables. So we can no longer say one is the pattern term and the other the constant term. For example:

First term: f(a, V, bar(D))
Second term f(D, k, bar(a))

Given two such terms, finding a variable substitution that will make them equivalent is called unification. In this case the substitution is {D=a, V=k}.

Note that there is an infinite number of possible unifiers for some solvable unification problem. For example, given:

First term: f(X, Y)
Second term: f(Z, g(X))

We have the substitution {X=Z, Y=g(X)} but also something like {X=K, Z=K, Y=g(K)} and {X=j(K), Z=j(K), Y=g(j(K))} and so on. The first substitution is the simplest one, and also the most general. It's called the most general unifier or mgu. Intuitively, the mgu can be turned into any other unifier by performing another substitution. For example {X=Z, Y=g(X)} can be turned into {X=j(K), Z=j(K), Y=g(j(K))} by applying the substitution {Z=j(K)} to it. Note that the reverse doesn't work, as we can't turn the second into the first by using a substitution. So we say that {X=Z, Y=g(X)} is the most general unifier for the two given terms, and it's the mgu we want to find.

An algorithm for unification

Solving unification problems may seem simple, but there are a number of subtle corner cases to be aware of. In his 1991 paper Correcting a Widespread Error in Unification Algorithms, Peter Norvig noted a common error that exists in many books presenting the algorithm, including SICP.

The correct algorithm is based on J.A. Robinson's 1965 paper "A machine-oriented logic based on the resolution principle". More efficient algorithms have been developed over time since it was first published, but our focus here will be on correctness and simplicity rather than performance.

The following implementation is based on Norvig's, and the full code (with tests) is available on GitHub. This implementation uses Python 3, while Norvig's original is in Common Lisp. There's a slight difference in representations too, as Norvig uses the Lisp-y (f X Y) syntax to denote an application of function f. The two representations are isomorphic, and I'm picking the more classical one which is used in most papers on the subject. In any case, if you're interested in the more Lisp-y version, I have some Clojure code online that ports Norvig's implementation more directly.

We'll start by defining the data structure for terms:

class Term: pass class App(Term): def __init__(self, fname, args=()): self.fname = fname self.args = args # Not shown here: __str__ and __eq__, see full code for the details... class Var(Term): def __init__(self, name): self.name = name class Const(Term): def __init__(self, value): self.value = value

An App represents the application of function fname to a sequence of arguments.

def unify(x, y, subst): """Unifies term x and y with initial subst. Returns a subst (map of name->term) that unifies x and y, or None if they can't be unified. Pass subst={} if no subst are initially known. Note that {} means valid (but empty) subst. """ if subst is None: return None elif x == y: return subst elif isinstance(x, Var): return unify_variable(x, y, subst) elif isinstance(y, Var): return unify_variable(y, x, subst) elif isinstance(x, App) and isinstance(y, App): if x.fname != y.fname or len(x.args) != len(y.args): return None else: for i in range(len(x.args)): subst = unify(x.args[i], y.args[i], subst) return subst else: return None

unify is the main function driving the algorithm. It looks for a substitution, which is a Python dict mapping variable names to terms. When either side is a variable, it calls unify_variable which is shown next. Otherwise, if both sides are function applications, it ensures they apply the same function (otherwise there's no match) and then unifies their arguments one by one, carefully carrying the updated substitution throughout the process.

def unify_variable(v, x, subst): """Unifies variable v with term x, using subst. Returns updated subst or None on failure. """ assert isinstance(v, Var) if v.name in subst: return unify(subst[v.name], x, subst) elif isinstance(x, Var) and x.name in subst: return unify(v, subst[x.name], subst) elif occurs_check(v, x, subst): return None else: # v is not yet in subst and can't simplify x. Extend subst. return {**subst, v.name: x}

The key idea here is recursive unification. If v is bound in the substitution, we try to unify its definition with x to guarantee consistency throughout the unification process (and vice versa when x is a variable). There's another function being used here - occurs_check; I'm retaining its classical name from early presentations of unification. Its goal is to guarantee that we don't have self-referential variable bindings like X=f(X) that would lead to potentially infinite unifiers.

def occurs_check(v, term, subst): """Does the variable v occur anywhere inside term? Variables in term are looked up in subst and the check is applied recursively. """ assert isinstance(v, Var) if v == term: return True elif isinstance(term, Var) and term.name in subst: return occurs_check(v, subst[term.name], subst) elif isinstance(term, App): return any(occurs_check(v, arg, subst) for arg in term.args) else: return False

Let's see how this code handles some of the unification examples discussed earlier in the post. Starting with the pattern matching example, where variables are just one one side:

>>> unify(parse_term('f(a, b, bar(t))'), parse_term('f(a, V, X)'), {}) {'V': b, 'X': bar(t)}

Now the examples from the Unification section:

>>> unify(parse_term('f(a, V, bar(D))'), parse_term('f(D, k, bar(a))'), {}) {'D': a, 'V': k} >>> unify(parse_term('f(X, Y)'), parse_term('f(Z, g(X))'), {}) {'X': Z, 'Y': g(X)}

Finally, let's try one where unification will fail due to two conflicting definitions of variable X.

>>> unify(parse_term('f(X, Y, X)'), parse_term('f(r, g(X), p)'), {}) None

Lastly, it's instructive to trace through the execution of the algorithm for a non-trivial unification to see how it works. Let's unify the terms f(X,h(X),Y,g(Y)) and f(g(Z),W,Z,X):

unify is called, sees the root is an App of function f and loops over the arguments.
- unify(X, g(Z)) invokes unify_variable because X is a variable, and the result is augmenting subst with X=g(Z)
- unify(h(X), W) invokes unify_variable because W is a variable, so the subst grows to {X=g(Z), W=h(X)}
- unify(Y, Z) invokes unify_variable; since neither Y nor Z are in subst yet, the subst grows to {X=g(Z), W=h(X), Y=Z} (note that the binding between two variables is arbitrary; Z=Y would be equivalent)
- unify(g(Y), X) invokes unify_variable; here things get more interesting, because X is already in the subst, so now we call unify on g(Y) and g(Z) (what X is bound to)
  - The functions match for both terms (g), so there's another loop over arguments, this time only for unifying Y and Z
  - unify_variable for Y and Z leads to lookup of Y in the subst and then unify(Z, Z), which returns the unmodified subst; the result is that nothing new is added to the subst, but the unification of g(Y) and g(Z) succeeds, because it agrees with the existing bindings in subst
The final result is {X=g(Z), W=h(X), Y=Z}

Efficiency

The algorithm presented here is not particularly efficient, and when dealing with large unification problems it's wise to consider more advanced options. It does too much copying around of subst, and also too much work is repeated because we don't try to cache terms that have already been unified.

For a good overview of the efficiency of unification algorithms, I recommend checking out two papers:

"An Efficient Unificaiton algorithm" by Martelli and Montanari
"Unification: A Multidisciplinary survey" by Kevin Knight

Categories: FLOSS Project Planets

Eli Bendersky: Elegant Python code for a Markov chain text generator

Planet Python - Sat, 2024-05-04 22:46

While preparing the post on minimal char-based RNNs, I coded a simple Markov chain text generator to serve as a comparison for the quality of the RNN model. That code turned out to be concise and quite elegant (IMHO!), so it seemed like I should write a few words about it.

It's so short I'm just going to paste it here in its entirety, but this link should have it in a Python file with some extra debugging information for tinkering, along with a sample input file.

from collections import defaultdict, Counter import random import sys # This is the length of the "state" the current character is predicted from. # For Markov chains with memory, this is the "order" of the chain. For n-grams, # n is STATE_LEN+1 since it includes the predicted character as well. STATE_LEN = 4 data = sys.stdin.read() model = defaultdict(Counter) print('Learning model...') for i in range(len(data) - STATE_LEN): state = data[i:i + STATE_LEN] next = data[i + STATE_LEN] model[state][next] += 1 print('Sampling...') state = random.choice(list(model)) out = list(state) for i in range(400): out.extend(random.choices(list(model[state]), model[state].values())) state = state[1:] + out[-1] print(''.join(out))

Without going into too much details, a Markov Chain is a model describing the probabilities of events based on the current state only (without having to recall all past states). It's very easy to implement and "train".

In the code shown above, the most important part to grok is the data structure model. It's a dictionary mapping a string state to the probabilities of characters following this state. The size of that string is configurable, but let's just assume it's 4 for the rest of the discussion. This is the order of the Markov chain. For every string seen in the input, we look at the character following it and increment a counter for that character; the end result is a dictionary mapping the alphabet to integers. For example, we may find that for the state "foob", 'a' appeared 75 times right after it, 'b' appeared 25 times, 'e' 44 times and so on.

The learning process is simply sliding a "window" of 4 characters over the input, recording these appearances:

The learning loop is extremely concise; this is made possible by the right choice of Python data structures. First, we use a defaultdict for the model itself; this lets us avoid existence checks or try for states that don't appear in the model at all.

Second, the objects contained inside model are of type Counter, which is a subclass of dict with some special sauce. In its most basic usage, a counter is meant to store an integer count for its keys - exactly what we need here. So a lot of power is packed into this simple statement:

model[state][next] += 1

If you try to rewrite it with model being a dict of dicts, it will become much more complicated to keep track of the corner cases.

With the learning loop completed, we have in model every 4-letter string encountered in the text, mapped to its Counter of occurrences for the character immediately following it. We're ready to generate text, or "sample from the model".

We start by picking a random state that was seen in the training text. Then, we loop for an arbitrary bound and at every step we randomly select the following character, and update the current state. The following character is selected using weighted random selection - precisely the right idiom here, as we already have in each counter the "weights" - the more often some char was observed after a given state, the higher the chance to select it for sampling will be.

Starting with Python 3.6, the standard library has random.choices to implement weighted random selection. Before Python 3.6 we'd have to write that function on our own (Counter has the most_common() method that would make it easier to write an efficient version).

Categories: FLOSS Project Planets

FSF Blogs: Reporting back from day one of LibrePlanet: Cultivating Community

GNU Planet! - Sat, 2024-05-04 20:25

Categories: FLOSS Project Planets

Reporting back from day one of LibrePlanet: Cultivating Community

FSF Blogs - Sat, 2024-05-04 20:25

Categories: FLOSS Project Planets

Revamping Plasma Mobile Navigation Gestures

Planet KDE - Sat, 2024-05-04 20:00

Chronicles of my odyssey revamping navigation gestures for Plasma Mobile

Categories: FLOSS Project Planets

Daily driving Plasma Mobile

Planet KDE - Sat, 2024-05-04 19:00

So, it’s been a while since I’ve last blogged. A lot has happened in the mobile Linux world since I made the post sharing the State of Linux on mobile. We’re 5 years further now, some distro options have disappeared and others have popped up, and although I’ve always been really optimistic about what Linux on mobile promises and can become I’ve never actually used it as my daily driver. Even though I work on postmarketOS and would say I know a fair bit about it’s shortcomings and possibilities, I’ve been relying on an Android phone for all this time to get me through life. And I’ve noticed that this is the case for a lot of people and especially developers in the Linux world.

Recently I decided this should change. How can we ever get Linux on mobile up to a state where we can use it as a proper replacement for the duopoly that is Android and iOS if nobody, including the developers, actually use it even though it’s already out there and available? I’m of course a KDE fan and would love to use Plasma Mobile specifically but it has a ton of papercut issues that could easily be solved if the developers actually noticed them by using it! So about two weeks ago I decided to get myself a new, second-hand, phone to actually daily drive postmarketOS with Plasma Mobile on and I’ll tell you about my experiences with it so far.

My setup

Although a lot of people still seem to think Pine64’s PinePhone (Pro) or Purism’s Librem 5 are the best options for a Linux phone out there, I would argue this is not the case any more. Besides the PinePhone being awfully slow hardware it is suffering from a lack of software support, the kernel is maintained by a single person in their spare time and has a ton of things not actually upstreamed to mainline, and the Librem 5 is way too expensive for what it offers and is made by a company that currently seems to have severe financial problems (they recently layed off most of their developers). Instead I would recommend a well supported (former) Android device, the postmarketOS wiki has a good list of well supported devices and specifically I would recommend getting a SDM845 device, namely the OnePlus 6/6T, SHIFT6mq (you can buy these brand new even) and the Xiaomi Pocophone F1. These are fully mainline supported and are easily buyable second-hand through platforms like eBay for not too much.

I however decided to get a Pixel 3A. This device is also fully mainline supported and I think it’s good to not have all attention focused on a few specific devices but get broader support available. The Pixel 3A was a popular phone when it was still newly sold so a lot of people might actually still have one laying around.

postmarketOS also supports a bunch of tablets which would actually be a really good use-case for Plasma Mobile.

Software

So as I mentioned earlier I’m a KDE fan, so of course I opt to run Plasma Mobile. I maintain a (semi-)nightly repository of all KDE packages that tries to build the entirety of KDE from git master every day. This has been proven very useful for Plasma Mobile development as newly merged changes can be quickly tested by consumers and also reduces the need to compile the entirety of Plasma on your phone if you want to change just a few lines in the code. It was made to support the transitioning to Qt6 but I find it so useful that I’ll keep it around in the future. So on my new phone I enabled this repository, executed the upgrade and rebooted into a fresh Plasma 6 installation straight from git master the day before.

It is good to note that although I’m definitely daily driving postmarketOS now, my sim-card is actually still in my Android phone. I do not currently trust the stability of phone calls, mostly when it comes to audio. We (postmarketOS) are working hard to improve the situation, especially by switching to PipeWire for audio hopefully soon, but for now I’m carrying two devices around having my Android phone share a hotspot to postmarketOS. The camera also currently doesn’t work so for that having the Android phone around for now is also still very useful. The camera however is making good progress with projects like libcamera making it possible to create camera applications and use it in browsers like Firefox.

So, what do I actually use this phone for? These are some of the use-cases I have and the applications I use for them:

browsing the web with Angelfish
- I would actually prefer to use Firefox to not support Google’s monopoly on the web by using a Chromium-based browser but I currently think Angelfish’s experience is better than the Firefox one on mobile
make and keep track of notes with Marknote, for example shopping lists
watch YouTube with PlasmaTube
sync files and pictures from my NextCloud server with GhostCloud
- I actually used this years ago when I still used SailfishOS. I was very glad to see it’s still around and even supports Ubuntu Touch and regular desktop Linux (and thus Plasma Mobile) as well nowadays. A Kirigami based-UI for this so it fits in better would be nice but it’s very usable as is
chat on Matrix with NeoChat
listen to my music with Elisa
manage my local files with Index
manage my calendar with Merkuro
do offline turn-by-turn navigation with Osmin
- I find this application incredible. In my mind navigation is a difficult to create app but Osmin works well and calculates routes very quickly (completely offline!). It’s not yet as feature-full as say OSMand on Android but it’s very usable
browse the fediverse (Mastodon) with Tokodon
check the weather with KWeather

This actually covers a huge part of what I would do on Android as well, now I just do these on postmarketOS instead using almost exclusively KDE applications! I’m still missing a few (for me) important things, most notably a Keepass-compatible client to get the passwords needed for aforementioned applications. I worked around that however by using KDEConnect to share my clipboard from my desktop where I just use KeepassXC. And of course the few Android applications that just don’t have FOSS-replacements like WhatsApp or my bank app, but I’m hoping either Waydroid or even the in my opinion more promising android-translation-layer (basically Wine for Android apps) can be used for that in the future.

Now although this is a very usable setup for me, when actually using this you’ll quickly notice a lot of small but annoying issues. These are all small in size and would easily be fixed if there were developers to experience them but accumulating together they make the whole experience a bit frustrating still. I’ve been reporting everything I could find so far (for example see all the issues I’ve made on just the Plasma Mobile shell). A few examples:

several applications have multiple actions on the same button press, like NeoChat both opening a room and opening a context menu for it when pressing on a room in the room list, BUG: 486545
Angelfish automatically switches to a newly opened tab which is unlike any other mobile browser, unexpectedly throwing the user off whatever they’re reading, and shows an unnecessary message telling the user the new tab has been opened which is blocking the button to go back to the previous tab, BUG: 486463
QMLKonsole (the mobile alternative to Konsole) for some reason has it’s own button to open and close the keyboard but pressing it has various buggy behaviours, BUG: 355
various applications have context hints that are meant to be shown on desktop when hovering an element but show up when pressing buttons or input fields on mobile, BUG: 360
various desktop widgets are completely broken, BUG: 354
sometimes (not often) the shell just crashes, I haven’t been able to find a good reproducer yet

Development

To a lot of KDE developers developing for Plasma Mobile is an unknown (and possibly scary) territory and they might not know how to do it easily. Will you compile everything on your device or cross-compile from your desktop instead, use kde-builder or do it manually? Of course this all comes down to personal preferences in the end but let me tell you how I do it.

Like I mentioned earlier I’m maintaining a nightly repository shipping the entirety of KDE from git master. I highly recommend using this repository so you can quickly test and use new features and bug fixes. Instructions to set this up are on the postmarketOS wiki.

Although compilation on device (e.g. using kde-builder) is most definitely possible, this is just regular good ol’ Linux after all, it’s a slow process due to the limited performance of a phone and might warm it up more than is safe for the device. Instead I would recommend using the lovely pmbootstrap tool we use for postmarketOS development and build on your way more performant PC instead. This tool builds software using Alpine’s simple APKBUILD format. You don’t have to worry too much about learning this format, these APKBUILDs already exist for basically every KDE package out there and you can just reuse these. After setting up pmbootstrap (pmbootstrap init) you can get such an APKBUILD for a KDE package either by manually downloading it from the upstream repository and placing it in the location pmbootstrap expects or run pmbootstrap aportgen --fork-alpine <package name>. This makes pmbootstrap get the APKBUILD from Alpine Linux and put it in your local checkout of postmarketOS packages.

The APKBUILD just downloaded can be used as is and you can now build the package with pmbootstrap build --arch <CPU architecture of the target device> but you probably want to use your local checkout of the code with your changes instead. For this pmbootstrap supports the --src argument which makes it build the same APKBUILD but with the source replaced for your local checkout. If your changes require any dependencies changed from what is currently provided by the package you can edit either the $makedepends or $depends (build dependencies and runtime dependencies respectively) variables in the APKBUILD ($depends might not exist, just create it if it doesn’t).

When you’ve successfully built the package you can send it to your device using pmbootstrap sideload <package name>. This will send it to a device running postmarketOS and let the packagemanager APK install it. Restart the application in question and your changes will be ready to test! pmbootstrap supports way more fancy features and I recommend you read it’s documentation to see what you can do.

Conclusion

In my opinion Plasma Mobile on postmarketOS is very usable right now but at the moment suffers from a lot of papercuts. I hope that since I now daily drive the system I can find all these papercuts, report them and possibly even fix one or two of them. But even more so I hope I can convince KDE developers to pick up a phone themselves and start using the system they’re in fact already developing for (95% of the Plasma Mobile stack is the same as Plasma Desktop and all the applications used were made to be used on desktop as well!). The system has a lot of potential and is already great to use, it just needs developers! Get a cheap second hand phone, flash postmarketOS on it and start using it!

I’m dreaming of a day where I’m not the only one at Akademy that not only has Plasma on their laptop but also their phone!

Categories: FLOSS Project Planets

Python People: Rob Ludwick - Getting the most out of PyCon, including juggling

Planet Python - Sat, 2024-05-04 17:57

PyCon US is just around the corner. I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.

Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.

We talk about:
- Juggling at PyCon
- How to get the most out of PyCon
- Watching talks
- Hallway track
- Open spaces
- Lightening talks
- Expo hall / vendor space
- Poster sessions
- Job fair
- A welcoming community
- Tutorials
- Sprints
- But mostly about the people of Python and PyCon.

"Python enables smart people to work faster" - Rob Ludwick

The Complete pytest Course

Level up your testing skills and save time during coding and maintenance.
Check out courses.pythontest.com

★ Support this podcast on Patreon ★ PyCon US is just around the corner. I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.We talk about: - Juggling at PyCon - How to get the most out of PyCon - Watching talks - Hallway track - Open spaces - Lightening talks - Expo hall / vendor space - Poster sessions - Job fair - A welcoming community - Tutorials - Sprints - But mostly about the people of Python and PyCon."Python enables smart people to work faster" - Rob Ludwick The Complete pytest Course<ul><li>Level up your testing skills and save time during coding and maintenance.</li><li>Check out <a href="https://courses.pythontest.com/p/complete-pytest-course">courses.pythontest.com</a></li></ul> <a href="https://www.patreon.com/PythonPeople" rel="payment" title="★ Support this podcast on Patreon ★">★ Support this podcast on Patreon ★</a>

Categories: FLOSS Project Planets

Test and Code: 220: Getting the most out of PyCon, including juggling - Rob Ludwick

Planet Python - Sat, 2024-05-04 17:54

Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.

I recorded this interview as an episode for one of my other podcasts, Python People. But I think it's got some great pre-conference advice, so I'm sharing it here on Python Test as well.

"Python enables smart people to work faster" - Rob Ludwick

Sponsored by Mailtrap.io

An Email Delivery Platform that developers love.
An email-sending solution with industry-best analytics, SMTP, and email API, SDKs for major programming languages, and 24/7 human support.
Try for Free at MAILTRAP.IO

Sponsored by PyCharm Pro

Use code PYTEST for 20% off PyCharm Professional at jetbrains.com/pycharm
Now with Full Line Code Completion
See how easy it is to run pytest from PyCharm at pythontest.com/pycharm

The Complete pytest Course

For the fastest way to learn pytest, go to courses.pythontest.com
Whether your new to testing or pytest, or just want to maximize your efficiency and effectiveness when testing.

PyCon US is just around the corner. I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.I recorded this interview as an episode for one of my other podcasts, Python People. But I think it's got some great pre-conference advice, so I'm sharing it here on Python Test as well.We talk about: - Juggling at PyCon - How to get the most out of PyCon - Watching talks - Hallway track - Open spaces - Lightening talks - Expo hall / vendor space - Poster sessions - Job fair - A welcoming community - Tutorials - Sprints - But mostly about the people of Python and PyCon."Python enables smart people to work faster" - Rob Ludwick Sponsored by Mailtrap.io<ul><li>An Email Delivery Platform that developers love. </li><li>An email-sending solution with industry-best analytics, SMTP, and email API, SDKs for major programming languages, and 24/7 human support. </li><li>Try for Free at <a href="https://l.rw.rw/pythontest">MAILTRAP.IO</a></li></ul>Sponsored by PyCharm Pro<ul><li>Use code PYTEST for 20% off PyCharm Professional at <a href="https://www.jetbrains.com/pycharm/">jetbrains.com/pycharm</a></li><li>Now with Full Line Code Completion</li><li>See how easy it is to run pytest from PyCharm at <a href="https://pythontest.com/pycharm/">pythontest.com/pycharm</a></li></ul>The Complete pytest Course<ul><li>For the fastest way to learn pytest, go to <a href="https://courses.pythontest.com/p/complete-pytest-course">courses.pythontest.com</a></li><li>Whether your new to testing or pytest, or just want to maximize your efficiency and effectiveness when testing.</li></ul>

Categories: FLOSS Project Planets

Promet Source: DrupalCon 2024: Sessions to Watch for Government

Planet Drupal - Sat, 2024-05-04 15:20

Note: This blog was first published on May 24, 2023, and has been updated to reflect new information and insights. Takeaway: DrupalCon Portland 2024 offers valuable sessions, summits, and exhibitors useful to State and local government agencies. As an attendee, you'll have the opportunity to learn from industry experts, discover powerful tools and solutions, and network with peers facing similar challenges in the government space.

Categories: FLOSS Project Planets

postmarketOS podcast

Planet KDE - Sat, 2024-05-04 14:00

Our friends at postmarketOS hosted Plasma Mobile's lead developer Devin Lin on their podcast. You can find it on the postmarketos website

Categories: FLOSS Project Planets

Sven Hoexter: vym - view your mind

Planet Debian - Sat, 2024-05-04 10:59

Had a need for a mindmapping application and found view your mind in the archive. Works but the version is a bit rusty. Sadly my Debian packaging skills are a bit rusty as well, especially when it comes to bigger GUI applications. Thus I spent a good chunk of yesterday afternoon to rip out cdbs and package the last source release on github which is right now 2.9.22 (the release branch already has 2.9.27, still sorting that out).

Git repository and a amd64 build of the current state. It still deserves some additional love, e.g. creating a -common package for arch indep content.

Proposed a few changes upstream:

Also pinged pollux@ who uploaded vym up to 2019 if he'd be fine if I pick it up. If someone else is interested, I'm also fine to put it up on salsa in the general "Debian" group for shared maintenance. I guess I will use it in the future, but time is still a scarce resource for all of us.

Categories: FLOSS Project Planets

The Drop Times: Best of Both Worlds: Thor Andre Gretland on Gutenberg and Drupal's Synergy

Planet Drupal - Sat, 2024-05-04 08:43

Discover the innovative journey of Drupal Gutenberg through the insights of Thor Andre Gretland, Head of Sales and Business Advisor at Frontkom. In an exclusive interview with The DropTimes, Thor Andre unveils how Gutenberg is revolutionizing the Drupal ecosystem, enhancing content creation, and bridging communities. Learn about the groundbreaking collaboration between WordPress and Drupal, the challenges addressed, and the future of open-source CMS development. From improving user experience to addressing digital marketing needs, this interview is a deep dive into the evolving world of content management.

Categories: FLOSS Project Planets

Send your talks for Akademy NOW!

Planet KDE - Sat, 2024-05-04 06:35

Akademy 2024 (the annual world summit for KDE) is happening in Würzburg, Saturday 7th – Thursday 12th September. (I hope you knew that)

First of all, if you're reading this and thinking, "Should i go to Akademy?"

The answer is [most probably] YES! Akademy has something for everyone, be it coders, translators, promoters, designers, enthusiasts, etc.

Now, with this out of the way, one of the many things that makes Akademy is the talks on the weekend, and you know who has something to say? *YOU*

Yes, *YOU*. I'm sure you've been working on something interesting, or have a great idea to share.

*YOU* may think that your idea is not that great or the things you work on are not interesting, but that's seldomly the case when someone explains me their "boring" thing they've been working on, i always think "Wow that's great".

Ok, so now that I've convinced you to send a talk proposal, when better than *TODAY* to send it?

Yes I know the Call for Participation is open until the 24 of May, but by sending it today you make sure you don't forget sending it later and also [more important for me] you help those of us in the Program Committee not when the final date starts approaching and we don't have lots of talks yet because you all prefer sending talks on the very last minute.

So stop reading and send your talk today ;-)

Categories: FLOSS Project Planets

This week in KDE: Looking towards Plasma 6.1

Planet KDE - Sat, 2024-05-04 00:57

This week we put some of the final Plasma 6.0 bugs to rest, and continued working towards Plasma 6.1 with a variety of UI improvements. Nothing ground-breaking this week, just a slow grind of useful work towards a solid release!

UI Improvements

Kate now considers a file as recent when it’s saved or closed, not just when it’s opened. This means your recent files list will no longer omit files you kept open for a long time while working on them (Christoph Cullmann, Kate 24.05. Link)

The panel icons for Kickoff (Application Launcher) and Kicker (Application Menu) widgets are now capped in size so they can’t grow ridiculously huge on thicccc panel (Akseli Lahtinen and me: Nate Graham, Plasma 6.0.5. Link 1 and link 2)

System Settings no longer lets you choose GNOME’s Adwaita or High Contrast icon themes as your systemwide icon theme, because despite registering themselves as FreeDesktop-compatible icon themes, they are no longer actually designed to be used this way and will break everything from KDE if you try anyway (me: Nate Graham, Plasma 6.0.5. Link)

The screen that KWin considers active for the purpose of determining which screen to open new windows on is now determined by “last user interaction”, which includes things like mouse movement and keyboard focus. Hopefully this should better match people’s expectations (Xaver Hugl, Plasma 6.1. Link)

Made the wallpaper chooser views frameless, matching the current styling of most other settings pages in System Settings and Plasma (me: Nate Graham, Plasma 6.1. Link 1 and link 2):

Plasma’s notifications now use a more appropriate icon for canceling jobs, and also elide long title text in the middle rather than on the left (Ivan Tkachenko, Plasma 6.1. Link 1 and link 2):

Ok, so maybe “plasma-brows…gration-host” is not a work of towering genius. The fact that a long ugly technical name is shown there is another bug we’ll investigate.

Refined the UI shown when changing global themes to make it clear what will happen and what’s potentially dangerous (me: Nate Graham, Plasma 6.1. Link 1 and link 2):

https://i.imgur.com/626VvQ2.mp4

When you use the command-line powerprofilesctl tool to change power profiles, the new state is now reflected in the Power and Battery widget (Natalie Clarius, Plasma 6.1. Link)

Several Breeze icons (folder-encrypted, folder-decrypted, and folder-music) now have proper symbolic versions at their 16px and 22px sizes (me: Nate Graham, Frameworks 6.2. Link)

Bug Fixes

Gwenview no longer fails to open large images; now its Qt 6 version can open the same size of image that the Qt 5 version could (Méven Car, Gwenview 24.05. Link)

On Wayland, KWin no longer crashes when it’s unable to open a socket to XWayland for some reason (Vlad Zahorodnii, Plasma 6.0.5. Link)

Fixed a case where Plasma could crash while modifying the set of favorite apps in Kickoff (Application Launcher), Kicker (Application Menu), or another launcher menu using the same backend infrastructure (Fushan Wen, Plasma 6.0.5. Link)

When using Qt 6.7, the System Tray popup is no longer sometimes inappropriately resized to a tiny nub, and also clicking a System Monitor widget showing GPU sensors no longer causes Plasma to freeze (Marco Martin, Plasma 6.0.5. Link 1 and link 2)

Fixed an extremely strange issue that could be triggered by opening any windows of IntelliJ IDE apps, and would cause other windows and Plasma panels to become transparent to clicks (Vyacheslav Mayorov, Plasma 6.0.5. Link)

When waking the system from sleep, quick-tiled windows no longer sometimes disappear, and vertically-maximized windows are no longer sometimes mis-positioned (Xaver Hugl, Plasma 6.0.5. Link 1 and link 2)

On X11, forcing tablet mode on when using a multi-screen setup with global scaling no longer causes one of the screens to scale everything incorrectly (Xaver Hugl, Plasma 6.0.5 Link)

Applied a workaround in KWin for an AMD GPU driver bug, which should reduce instances of random visual glitchiness (Xaver Hugl, Plasma 6.1. Link)

Fixed another case of Korners, this time for menus in QtWidgets-based apps (Ivan Tkachenko, Plasma 6.1. Link)

Resizing a window with a wallpaper chooser grid in it no longer sometimes causes the grid view’s header to disturbingly detach and appear in the middle of the view (me: Nate Graham, Plasma 6.1. Link)

More audio and video files now have appropriate icons, and when no suitable format-specific icon is found, the system will no longer fall back to an inappropriate symbolic speaker or filmstrip icon (Kai Uwe Broulik and me: Nate Graham, Frameworks 6.2. Link 1 and link 2)

Fixed a case in Kirigami where some UI elements would have incorrect colors when using mixed light/dark color schemes (Evgeniy Harchenko, Frameworks 6.2. Link)

After we fixed the “Pick your installation option popup” in the “Get new [thing]” windows, Qt 6.7 broke it again, so we fixed it again! This time moar betterer (Akseli Lahtinen and Ivan Tkachenko, Frameworks 6.3. Link)

Fixed an issue that caused apps with System Tray icons to inappropriately quit when deleting their tray icons (Tor Arne, Qt 6.7.2. Link)

Other bug information of note:

3 Very high priority Plasma bugs (same as last week). Current list of bugs
35 15-minute Plasma bugs (down from 39 last week). Current list of bugs
126 KDE bugs of all kinds fixed over the last week Full list of bugs

Performance & Technical

Implemented a bunch of security hardening for our crash reporting system based on feedback from SUSE’s security team (Harald Sitter, Plasma 6.0.5. Link)

Automation & Systematization

Added multiple autotests to ensure that mounting different types of mountable filesystems works as intended (Stefan Brüns, Frameworks 6.2. Link)

Added an autotest to make sure that Plasma-themes UI elements that should have the same height—such as text fields and buttons—still do even if the Plasma style is changed (Fushan Wen, Plasma 6.1. Link)

…And Everything Else

This blog only covers the tip of the iceberg! If you’re hungry for more, check out https://planet.kde.org, where you can find more news from other KDE contributors.

How You Can Help

The KDE organization has become important in the world, and your time and labor have helped to bring it there! But as we grow, it’s going to be equally important that this stream of labor be made sustainable, which primarily means paying for it. Right now the vast majority of KDE runs on labor not paid for by KDE e.V. (the nonprofit foundation behind KDE, of which I am a board member), and that’s a problem. We’ve taken steps to change this with paid technical contractors—but those steps are small due to growing but still limited financial resources. If you’d like to help change that, consider donating today!

Otherwise, visit https://community.kde.org/Get_Involved to discover other ways to be part of a project that really matters. Each contributor makes a huge difference in KDE; you are not a number or a cog in a machine! You don’t have to already be a programmer, either. I wasn’t when I got started. Try it, you’ll like it! We don’t bite!

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Junichi Uekawa: Been very busy with real life.

Doug Hellmann: sphinxcontrib-sqltable 2.1.0 - SQLAlchemy 2.0 support

Django Weblog: Last call for DjangoCon Europe 2025 organizers

Eli Bendersky: My favorite prime number generator

Eli Bendersky: Faster XML stream processing in Go

Eli Bendersky: Type inference

Eli Bendersky: Unification

Eli Bendersky: Elegant Python code for a Markov chain text generator

FSF Blogs: Reporting back from day one of LibrePlanet: Cultivating Community

Reporting back from day one of LibrePlanet: Cultivating Community

Revamping Plasma Mobile Navigation Gestures

Daily driving Plasma Mobile

Python People: Rob Ludwick - Getting the most out of PyCon, including juggling

Test and Code: 220: Getting the most out of PyCon, including juggling - Rob Ludwick

Promet Source: DrupalCon 2024: Sessions to Watch for Government

postmarketOS podcast

Sven Hoexter: vym - view your mind

The Drop Times: Best of Both Worlds: Thor Andre Gretland on Gutenberg and Drupal's Synergy

Send your talks for Akademy NOW!

This week in KDE: Looking towards Plasma 6.1

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research