FLOSS Project Planets

Cusy: Python Pattern Matching in Linux Magazine 05/2021

Planet Python - Mon, 2021-04-12 08:57

Linux Magazin 05/2021

The originally object-oriented programming language Python is to receive a new feature in version 3.10, which is mainly known from functional languages: pattern matching. The change is controversial in the Python community and has triggered a heated debate.

Pattern matching is a symbol-processing method that uses a pattern to identify discrete structures or subsets, e.g. strings, trees or graphs. This procedure is found in functional or logical programming languages where a match expression is used to process data based on its structure, e.g. in Scala, Rust and F#. A match statement takes an expression and compares it to successive patterns specified as one or more cases. This is superficially similar to a switch statement in C, Java or JavaScript, but much more powerful.

Python 3.10 is now also to receive such a match expression. The implementation is described in PEP (Python Enhancement Proposal) 634. [1] Further information on the plans can be found in PEP 635 [2] and PEP 636 [3]. How pattern matching is supposed to work in Python 3.10 is shown by this very simple example, where a value is compared with several literals:

def http_error(status): match status: case 400: return "Bad request" case 401: return "Unauthorized" case 403: return "Forbidden" case 404: return "Not found" case 418: return "I'm a teapot" case _: return "Something else"

In the last case of the match statement, an underscore _ acts as a placeholder that intercepts everything. This has caused irritation among developers because an underscore is usually used in Python before variable names to declare them for internal use. While Python does not distinguish between private and public variables as strictly as Java does, it is still a very widely used convention that is also specified in the Style Guide for Python Code [4].

However, the proposed match statement can not only check patterns, i.e. detect a match between the value of a variable and a given pattern, it also rebinds the variables that match the given pattern.

This leads to the fact that in Python we suddenly have to deal with Schrödinger constants, which only remain constant until we take a closer look at them in a match statement. The following example is intended to explain this:

NOT_FOUND = 404 retcode = 200 match retcode: case NOT_FOUND: print('not found') print(f"Current value of {NOT_FOUND=}")

This results in the following output:

not found Current value of NOT_FOUND=200

This behaviour leads to harsh criticism of the proposal from experienced Python developers such as Brandon Rhodes, author of «Foundations of Python Network Programming»:

If this poorly-designed feature is really added to Python, we lose a principle I’ve always taught students: “if you see an undocumented constant, you can always name it without changing the code’s meaning.” The Substitution Principle, learned in algebra? It’ll no longer apply.

— Brandon Rhodes on 12 February 2021, 2:55 pm on Twitter [5]

Many long-time Python developers, however, are not only grumbling about the structural pattern-matching that is to come in Python 3.10. They generally regret developments in recent years in which more and more syntactic sugar has been sprinkled over the language. Original principles, as laid down in the Zen of Python [6], would be forgotten and functional stability would be lost.

Although Python has defined a sophisticated process with the Python Enhancement Proposals (PEPs) [7] that can be used to collaboratively steer the further development of Python, there is always criticism on Twitter and other social media, as is the case now with structural pattern matching. In fact, the topic has already been discussed intensively in the Python community. The Python Steering Council [8] recommended adoption of the Proposals as early as December 2020. Nevertheless, the topic only really boiled up with the adoption of the Proposals. The reason for this is surely the size and diversity of the Python community. Most programmers are probably only interested in discussions about extensions that solve their own problems. The other developments are overlooked until the PEPs are accepted. This is probably the case with structural pattern matching. It opens up solutions to problems that were hardly possible in Python before. For example, it allows data scientists to write matching parsers and compilers for which they previously had to resort to functional or logical programming languages.

With the adoption of the PEP, the discussion has now been taken into the wider Python community. Incidentally, Brett Cannon, a member of the Python Steering Council, pointed out in an interview [9] that the last word has not yet been spoken: until the first beta version, there is still time for changes if problems arise in practically used code. He also held out the possibility of changing the meaning of _ once again.

So maybe we will be spared Schrödinger’s constants.

[1]PEP 634: Specification [2]PEP 635: Motivation and Rationale [3]PEP 636: Tutorial [4]https://pep8.org/#descriptive-naming-styles [5]@brandon_rhodes [6]PEP 20 – The Zen of Python [7]Index of Python Enhancement Proposals (PEPs) [8]Python Steering Council [9]Python Bytes Episode #221
Categories: FLOSS Project Planets

Russell Coker: Riverdale

Planet Debian - Mon, 2021-04-12 08:35

I’ve been watching the show Riverdale on Netflix recently. It’s an interesting modern take on the Archie comics. Having watched Josie and the Pussycats in Outer Space when I was younger I was anticipating something aimed towards a similar audience. As solving mysteries and crimes was apparently a major theme of the show I anticipated something along similar lines to Scooby Doo, some suspense and some spooky things, but then a happy ending where criminals get arrested and no-one gets hurt or killed while the vast majority of people are nice. Instead the first episode has a teen being murdered and Ms Grundy being obsessed with 15yo boys and sleeping with Archie (who’s supposed to be 15 but played by a 20yo actor).

Everyone in the show has some dark secret. The filming has a dark theme, the sky is usually overcast and it’s generally gloomy. This is a significant contrast to Veronica Mars which has some similarities in having a young cast, a sassy female sleuth, and some similar plot elements. Veronica Mars has a bright theme and a significant comedy element in spite of dealing with some dark issues (murder, rape, child sex abuse, and more). But Riverdale is just dark. Anyone who watches this with their kids expecting something like Scooby Doo is in for a big surprise.

There are lots of interesting stylistic elements in the show. Lots of clothing and uniform designs that seem to date from the 1940’s. It seems like some alternate universe where kids have smartphones and laptops while dressing in the style of the 1940s. One thing that annoyed me was construction workers using tools like sledge-hammers instead of excavators. A society that has smart phones but no earth-moving equipment isn’t plausible.

On the upside there is a racial mix in the show that more accurately reflects American society than the original Archie comics and homophobia is much less common than in most parts of our society. For both race issues and gay/lesbian issues the show treats them in an accurate way (portraying some bigotry) while the main characters aren’t racist or homophobic.

I think it’s generally an OK show and recommend it to people who want a dark show. It’s a good show to watch while doing something on a laptop so you can check Wikipedia for the references to 1940s stuff (like when Bikinis were invented). I’m half way through season 3 which isn’t as good as the first 2, I don’t know if it will get better later in the season or whether I should have stopped after season 2.

I don’t usually review fiction, but the interesting aesthetics of the show made it deserve a review.

Related posts:

  1. Android Screen Saving Just over a year ago I bought a Samsung Galaxy...
Categories: FLOSS Project Planets

Stack Abuse: Borůvka's Algorithm in Python - Theory and Implementation

Planet Python - Mon, 2021-04-12 08:30

Borůvka's Algorithm is a greedy algorithm published by Otakar Borůvka, a Czech mathematician best known for his work in graph theory. Its most famous application helps us find the minimum spanning tree in a graph.

A thing worth noting about this algorithm is that it's the oldest minimum spanning tree algorithm, on record. Borůvka came up with it in 1926, before computers as we know them today even existed. It was published as a method of constructing an efficient electricity network.

In this guide, we'll take a refresher on graphs, and what minimum spanning trees are, and then jump into Borůvka's algorithm and implement it in Python:

Graphs and Minimum Spanning Trees

A graph is a abstract structure that represents a group of certain objects called nodes (also known as vertices), in which certain pairs of those nodes are connected or related. Each one of these connections is called an edge.

A tree is an example of a graph:

In the image above, the first graphs has 4 nodes and 4 edges, while the second graph (a binary tree) has 7 nodes and 6 edges.

Graphs can be applied to many problems, from geospatial locations to social network graphs and neural networks. Conceptually, graphs like these are all around us. For example, say we'd like to plot a family tree, or explain to someone how we met our significant other. We might introduce a large number of people and their relationships to make the story as interesting to the listener as it was to us.

Since this is really just a graph of people (nodes) and their relationships (edges) - graphs are a great way to visualize this:

Types of Graphs

Depending on the types of edges a graph has, we have two distinct categories of graphs:

  • Undirected graphs
  • Directed graphs

An undirected graph is a graph in which the edges do not have orientations. All edges in an undirected graph are, therefore, considered bidirectional.

Formally, we can define an undirected graph as G = (V, E) where V is the set of all the graph's nodes, and E is a set that contains unordered pairs of elements from E, which represent edges.

Unordered pairs here means that the relationship between two nodes is always two-sided, so if we know there's an edge that goes from A to B, we know for sure that there's an edge that goes from B to A.

A directed graph is a graph in which the edges have orientations.

Formally, we can define a directed graph as G = (V, E) where V s the set of all the graph's nodes, and E is a set that contains ordered pairs of elements from E.

Ordered pairs imply that the relationship between two nodes can be either one or two-sided. Meaning that if there's an edge that goes from A to B, we can't know if there's an edge that goes from B to A.

The direction of an edge is denoted with an arrow. Keep in mind that two-sided relationships can be shown either by drawing two distinct arrows or just drawing two arrow points on either side of the same edge:

Another way to differentiate graphs based on their edges is regarding the weight of those edges. Based on that, a graph can be:

  • Weighted
  • Unweighted

A weighted graph is a graph in which every edge is assigned a number - its weight. These weights can represent the distance between nodes, capacity, price et cetera, depending on the problem we're solving.

Weighted graphs are used pretty often, for example in problems where we need to find the shortest or, as we will soon see, in problems in which we have to find a minimum spanning tree.

An unweighted graph does not have weights on its edges.

Note: In this article, we will focus on undirected, weighted graphs.

A graph can also be connected and disconnected. A graph is connected if there is a path (which consists of one or more edges) between each pair of nodes. On the other hand, a graphs is disconnected if there is a pair of nodes which can't aren't connected by a path of edges.

Trees and Minimum Spanning Trees

There's a fair bit to be said about trees, subgraphs and spanning trees, though here's a really quick and concise breakdown:

  • A tree is an undirected graph where each two nodes have exactly one path connecting them, no more, no less.

  • A subgraph of a graph A is a graph that is compromised of a subset of graph A's nodes and edges.

  • A spanning tree of graph A is a subgraph of graph A that is a tree, whose set of nodes is the same as graph A's.

  • A minimum spanning tree is a spanning tree, such that the sum of all the weights of the nodes is the smallest possible. Since it's a tree (and the edge weight sum should be minimal), there shouldn't be any cycles.

Note: In case all edge weights in a graph are distinct, the minimum spanning tree of that graph is going to be unique. However, if the edge weights are not distinct, there can be multiple minimum spanning trees for only one graph.

Now that we're covered in terms of graph theory, we can tackle the algorithm itself.

Borůvka's Algorithm

The idea behind this algorithm is pretty simple and intuitive. We mentioned before that this was a greedy algorithm.

When an algorithm is greedy, it constructs a globally "optimal" solution using smaller, locally optimal solutions for smaller subproblems. Usually, it converges with a good-enough solution, since following local optimums doesn't guarantee a globally optimum solution.

Simply put, greedy algorithms make the optimal choice (out of currently known choices) at each step of the problem, aiming to get to the overall most optimal solution when all of the smaller steps add up.

You could think of greedy algorithms as a musician who's improvising at a concert and will in every moment play what sounds the best. On the other hand, non-greedy algorithms are more like a composer, who'll think about the piece they're about to perform, and take their time to write it out as sheet music.

Now, we will break down the algorithm in a couple of steps:

  1. We initialize all nodes as individual components.
  2. We initialize the minimum spanning tree S as an empty set that'll contain the solution.
  3. If there is more than one component:
    • Find the minimum-weight edge that connects this component to any other component.
    • If this edge isn't in the minimum spanning tree S, we add it.
  4. If there is only one component left, we have reached the end of the tree.

This algorithm takes a connected, weighted and undirected graph as an input, and its output is the graph's corresponding minimum spanning tree.

Let's take a look at the following graph and find its minimum spanning tree using Borůvka's algorithm:

At the start, every represents an individual component. That means that we will have 9 components. Let's see what the smallest weight edges that connect these components to any other component would be:

Component Smallest weight edge that connects it to some other component Weight of the edge {0} 0 - 1 4 {1} 0 - 1 4 {2} 2 - 4 2 {3} 3 - 5 5 {4} 4 - 7 1 {5} 3 - 5 10 {6} 6 - 7 1 {7} 4 - 7 1 {8} 7 - 8 3 Now, our graph is going to be in this state:

The green edges in this graph represent the edges that bind together its closest components. As we can see, now we have three components: {0, 1}, {2, 4, 6, 7, 8} and {3, 5}. We repeat the algorithm and try to find the minimum-weight edges that can bind together these components:

Component Smallest weight edge that connects it to some other component Weight of the edge {0, 1} 0 - 6 7 {2, 4, 6, 7, 8} 2 - 3 6 {3, 5} 2 - 3 6

Now, our graph is going to be in this state:

As we can see, we are left with only one component in this graph, which represents our minimum spanning tree! The weight of this tree is 29, which we got after summing all of the edges:

Now, the only thing left to do is implement this algorithm in Python.


We are going to implement a Graph class, which will be the main data structure we'll be working with. Let's start off with the constructor:

class Graph: def __init__(self, num_of_nodes): self.m_v = num_of_nodes self.m_edges = [] self.m_component = {}

In this constructor, we provided the number of nodes in the graph as an argument, and we initialized three fields:

  • m_v - the number of nodes in the graph.
  • m_edges - the list of edges.
  • m_component - the dictionary which stores the index of the component which a node belongs to.

Now, let's make a helper function that we can use to add an edge to a graph's nodes:

def add_edge(self, u, v, weight): self.m_edges.append([u, v, weight])

This function is going to add an edge in the format [first, second, edge weight] to our graph.

Because we want to ultimately make a method that unifies two components, we'll first need a method that propagates a new component throughout a given component. And secondly, we'll need a method that finds the component index of a given node:

def find_component(self, u): if self.m_component[u] == u: return u return self.find_component(self.m_component[u]) def set_component(self, u): if self.m_component[u] == u: return else: for k in self.m_component.keys(): self.m_component[k] = self.find_component(k)

In this method, we will artificially treat the dictionary as a tree. We ask whether or not we've found the root of our component (because only root components will always point to themselves in the m_component dictionary). If we haven't found the root node, we recursively search the current node's parent.

Note: The reason we don't assume that m_components points to the correct component is because when we start unifying components, the only thing that we know for sure won't change its component index is the root components.

For example, in our graph in the example above, in the first iteration, the dictionary is going to look like this:

index value 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

We've got 9 components, and each member is the component by itself. In the second iteration, it's going to look like this:

index value 0 0 1 0 2 2 3 3 4 2 5 3 6 7 7 4 8 7

Now, tracing back to the roots, we'll see that our new components will be: {0, 1}, {2, 4, 7, 6, 8} and {3, 5}.

The last method we're going to need before implementing the algorithm itself is the method that unifies two components into one, given two nodes which belong to their respective components:

def union(self, component_size, u, v): if component_size[u] <= component_size[v]: self.m_component[u] = v component_size[v] += component_size[u] self.set_component(u) elif component_size[u] >= component_size[v]: self.m_component[v] = self.find_component(u) component_size[u] += component_size[v] self.set_component(v) print(self.m_component)

In this function, we find the roots of components for two nodes (which are their component indexes at the same time). Then, we compare the components in terms of size, and attached the smaller one to the larger one. Then, we just add the size of the smaller one to the size of the larger one, because they are now one component.

Finally, if the components are of same size, we just unite them together however we want - in this particular example we did it by adding the second one to the first one.

Now that we've implemented all the utility methods we need, we can finally dive into Borůvka's algorithm:

def boruvka(self): component_size = [] mst_weight = 0 minimum_weight_edge = [-1] * self.m_v for node in range(self.m_v): self.m_component.update({node: node}) component_size.append(1) num_of_components = self.m_v print("---------Forming MST------------") while num_of_components > 1: for i in range(len(self.m_edges)): u = self.m_edges[i][0] v = self.m_edges[i][1] w = self.m_edges[i][2] u_component = self.m_component[u] v_component = self.m_component[v] if u_component != v_component: if minimum_weight_edge[u_component] == -1 or \ minimum_weight_edge[u_component][2] > w: minimum_weight_edge[u_component] = [u, v, w] if minimum_weight_edge[v_component] == -1 or \ minimum_weight_edge[v_component][2] > w: minimum_weight_edge[v_component] = [u, v, w] for node in range(self.m_v): if minimum_weight_edge[node] != -1: u = minimum_weight_edge[node][0] v = minimum_weight_edge[node][1] w = minimum_weight_edge[node][2] u_component = self.m_component[u] v_component = self.m_component[v] if u_component != v_component: mst_weight += w self.union(component_size, u_component, v_component) print("Added edge [" + str(u) + " - " + str(v) + "]\n" + "Added weight: " + str(w) + "\n") num_of_components -= 1 minimum_weight_edge = [-1] * self.m_v print("----------------------------------") print("The total weight of the minimal spanning tree is: " + str(mst_weight))

The first thing we did in this algorithm was initialize additional lists we would need in the algorithm:

  • A list of components (initialized to all of the nodes).
  • A list that keeps their size (initialized to 1), as well as the list of the minimum weight edges (-1 at first, since we don't know what the minimum weight edges are yet).

Then, we go through all of the edges in the graph, and we find the root of components on both sides of those edges.

After that, we are looking for the minimum weight edge that connects these two components using a couple of if clauses:

  • If the current minimum weight edge of component u doesn't exist (is -1), or if it's greater than the edge we're observing right now, we will assign the value of the edge we're observing to it.
  • If the current minimum weight edge of component v doesn't exist (is -1), or if it's greater than the edge we're observing right now, we will assign the value of the edge we're observing to it.

After we've found the cheapest edges for each component, we add them to the minimum spanning tree, and decrease the number of components accordingly.

Finally, we reset the list of minimum weight edges back to -1, so that we can do all of this again. We keep iterating as long as there are more than one component in the list of components.

Let's put the graph we used in the example above as the input of our implemented algorithm:

g = Graph(9) g.add_edge(0, 1, 4) g.add_edge(0, 6, 7) g.add_edge(1, 6, 11) g.add_edge(1, 7, 20) g.add_edge(1, 2, 9) g.add_edge(2, 3, 6) g.add_edge(2, 4, 2) g.add_edge(3, 4, 10) g.add_edge(3, 5, 5) g.add_edge(4, 5, 15) g.add_edge(4, 7, 1) g.add_edge(4, 8, 5) g.add_edge(5, 8, 12)

Chucking it in the algorithm's implementation will result in:

---------Forming MST------------ {0: 1, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8} Added edge [0 - 1] Added weight: 4 {0: 1, 1: 1, 2: 4, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8} Added edge [2 - 4] Added weight: 2 {0: 1, 1: 1, 2: 4, 3: 5, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8} Added edge [3 - 5] Added weight: 5 {0: 1, 1: 1, 2: 4, 3: 5, 4: 4, 5: 5, 6: 6, 7: 4, 8: 8} Added edge [4 - 7] Added weight: 1 {0: 1, 1: 1, 2: 4, 3: 5, 4: 4, 5: 5, 6: 4, 7: 4, 8: 8} Added edge [6 - 7] Added weight: 1 {0: 1, 1: 1, 2: 4, 3: 5, 4: 4, 5: 5, 6: 4, 7: 4, 8: 4} Added edge [7 - 8] Added weight: 3 {0: 4, 1: 4, 2: 4, 3: 5, 4: 4, 5: 5, 6: 4, 7: 4, 8: 4} Added edge [0 - 6] Added weight: 7 {0: 4, 1: 4, 2: 4, 3: 4, 4: 4, 5: 4, 6: 4, 7: 4, 8: 4} Added edge [2 - 3] Added weight: 6 ---------------------------------- The total weight of the minimal spanning tree is: 29

The time complexity of this algorithm is O(ElogV), where E represents the number of edges, while V represents the number of nodes.

The space complexity of this algorithm is O(V + E), since we have to keep a couple of lists whose sizes are equal to the number of nodes, as well as keep all the edges of a graph inside of the data structure itself.


Even though Borůvka's algorithm is not as well known as some other minimum spanning tree algorithms like Prim's or Kruskal's minimum spanning tree algorithms, it gives us pretty much the same result - they all find the minimum spanning tree, and the time complexity is approximately the same.

One advantage that Borůvka's algorithm has compared to the alternatives is that it doesn't need to presort the edges or maintain a priority queue in order to find the minimum spanning tree. Even though that doesn't help its complexity, since it still passes the edges logE times, it is a bit more simple to code.

Categories: FLOSS Project Planets

Community posts: #DrupalMemorial

Planet Drupal - Mon, 2021-04-12 07:19

The last 18 months have been difficult for many in our global community. The global COVID-19 pandemic has taken loved ones from us too soon. Social and political upheaval around the world have fractured civil discourse, and set back the cause of civil rights. Economic uncertainty has affected our jobs and our prospects for the future. For some, the present crises have brought up memories of more distant loss.

In a year in which we have all experienced loss, remembrance and reflection helps us heal.

We encourage the Drupal community to share memories of lost friends, colleagues, family, and loved ones- whether that loss was recent or many years past. We encourage you to share and remember the good they brought to our lives.

We encourage you to share your words of hope for civil and societal change.

Whatever your words of hope and remembrance we encourage you to lift each other up.

How to share your words of remembrance:
The comments on this post have been opened, or you can use the hashtag #drupalmemorial on social media to tag your posts to the embedded memorial wall below

Categories: FLOSS Project Planets

Zato Blog: Understanding WebSocket API timeouts

Planet Python - Mon, 2021-04-12 06:35

Zato WebSocket channels let you accept long-running API connections and, as such, they have a few settings to fine tune their usage of timeouts. Let's discover what they are and how to use them.

WebSocket channels

The four timeout settings are listed below. All of the WebSocket clients using a particular channel will use the same timeouts configuration - this means that a different channel is needed if particular clients require different settings.

  • New token wait time
  • Token TTL
  • Ping interval
  • Threshold
  • New token wait time - when a new WebSocket connection is established to Zato, it has that many seconds to open a session and to send its credentials. If that is not done, Zato immediately closes the connection.

  • Token TTL - once a session is established and a session token is returned to the client, the token's time-to-live (TTL) will be that many seconds. If there is no message from the client within TTL seconds, Zato considers the token expired and it cannot be used any longer although it is not guaranteed that the connection will be closed immediately after the token becomes expired.

    In this context, a message that can extend TTL means one of:

    • A request sent by the client
    • A response to a request previously sent by Zato
    • A response to a ping message sent by Zato
Ping messages
  • Ping interval - Zato sends WebSocket ping messages once in that many seconds. Each time a response to a ping request is received from the client, the session token's TTL is extended by the same number of seconds.

    For instance, supposing a new session token was issued to a client at 15:00:00 with a TTL of 3600 (to 16:00:00) and ping inteval is 30 seconds.

    First, at 15:00:30 Zato will send a ping message.

    If the client responds successfully, the token's TTL will be increased by ping interval seconds more (here, 30) from the time the response arrived, e.g. if it arrives at 15:00:30,789 (after 789 milliseconds), it will be valid up to 16:00:30,789 because this is the result of adding TTL and ping interval seconds from the time the response was received by the server.

  • Threshold - the threshold of missed ping messages after exceeding of which Zato will close the connection. For instance, if the threshold is 5 and ping interval is 10, Zato will ping the client once in 10 seconds, if there are no 5 responses to the pings in a row (a total of 50 seconds in this case), the connection will be closed immediately.

    Note that only pings missed consecutively are counted towards the threshold. For instance, if a client missed 2 out of 5 pings but then replies on the 3rd attempt, its counter of messages missed is reset and it starts from 0 once more as though it never missed a single ping.

A note about firewalls

A great advantage of using WebSocket connections is that they are bidirectional and let one easily send messages to and from clients using the same TCP connection over a longer time.

However, particularly in the relation to ping messages, it needs to be remembered that stateful firewalls in data centers may have their requirements as to how often peers should communicate. This is especially true if the communication is over the Internet rather than in the same data center.

On one hand, this means that the ping interval should be set to a value small enough to ensure that firewalls will not break connections in a belief that Zato does not have anything more to send. Yet, it should not be too small lest, with a huge number of connections, the overhead of pings becomes too burdensome. For instance, pinging each client once a second is almost certainly too much and usually 20-40 seconds are a much better choice.

On the other hand, firewalls may also require the side which initiated the TCP connection (i.e. the WebSocket client) to periodically send some data to keep the connection active, otherwise the firewalls will drop the connection. This means that clients should be also possibly configured to send ping messages and how often they should do it may depend on what the applicable firewalls expect - otherwise, with only Zato pinging the client, it may not be enough for firewalls to understand that a connection is still active.

Python code

Finally, it is worth to keep in mind that all the timeouts, TTLs and pings are managed by the platform automatically and there is no programming needed for them to work.

For instance, the service below, once assigned to a WebSocket channel, will focus on the business functionality rather than on low-level management of timeouts - in other words, there is no additional code required.

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class MyService(Service): def handle(self): self.logger.info('My request is %s', self.request.input) Next steps
  • Start the tutorial to learn more technical details about Zato, including its architecture, installation and usage. After completing it, you will have a multi-protocol service representing a sample scenario often seen in banking systems with several applications cooperating to provide a single and consistent API to its callers.

  • Visit the support page if you would like to discuss anything about Zato with its creators

  • Para aprender más sobre las integraciones de Zato y API en español, haga clic aquí

Categories: FLOSS Project Planets

Russell Coker: Storage Trends 2021

Planet Debian - Mon, 2021-04-12 06:01
The Viability of Small Disks

Less than a year ago I wrote a blog post about storage trends [1]. My main point in that post was that disks smaller than 2TB weren’t viable then and 2TB disks wouldn’t be economically viable in the near future.

Now MSY has 2TB disks for $72 and 2TB SSD for $245, saving $173 if you get a hard drive (compared to saving $240 10 months ago). Given the difference in performance and noise 2TB hard drives won’t be worth using for most applications nowadays.


Last year NVMe prices were very comparable for SSD prices, I was hoping that trend would continue and SSDs would go away. Now for sizes 1TB and smaller NVMe and SSD prices are very similar, but for 2TB the NVMe prices are twice that of SSD – presumably partly due to poor demand for 2TB NVMe. There are also no NVMe devices larger than 2TB on sale at MSY (a store which caters to home stuff not special server equipment) but SSDs go up to 8TB.

It seems that NVMe is only really suitable for workstation storage and for cache etc on a server. So SATA SSDs will be around for a while.

Small Servers

There are a range of low end servers which support a limited number of disks. Dell has 2 disk servers and 4 disk servers. If one of those had 8TB SSDs you could have 8TB of RAID-1 or 24TB of RAID-Z storage in a low end server. That covers the vast majority of servers (small business or workgroup servers tend to have less than 8TB of storage).

Larger Servers

Anandtech has an article on Seagates roadmap to 120TB disks [2]. They currently sell 20TB disks using HAMR technology

Currently the biggest disks that MSY sells are 10TB for $395, which was also the biggest disk they were selling last year. Last year MSY only sold SSDs up to 2TB in size (larger ones were available from other companies at much higher prices), now they sell 8TB SSDs for $949 (4* capacity increase in less than a year). Seagate is planning 30TB disks for 2023, if SSDs continue to increase in capacity by 4* per year we could have 128TB SSDs in 2023. If you needed a server with 100TB of storage then having 2 or 3 SSDs in a RAID array would be much easier to manage and faster than 4*30TB disks in an array.

When you have a server with many disks you can expect to have more disk failures due to vibration. One time I built a server with 18 disks and took disks from 2 smaller servers that had 4 and 5 disks. The 9 disks which had been working reliably for years started having problems within weeks of running in the bigger server. This is one of the many reasons for paying extra for SSD storage.

Seagate is apparently planning 50TB disks for 2026 and 100TB disks for 2030. If that’s the best they can do then SSD vendors should be able to sell larger products sooner at prices that are competitive. Matching hard drive prices is not required, getting to less than 4* the price should be enough for most customers.

The Anandtech article is worth reading, it mentions some interesting features that Seagate are developing such as having 2 actuators (which they call Mach.2) so the drive can access 2 different tracks at the same time. That can double the performance of a disk, but that doesn’t change things much when SSDs are more than 100* faster. Presumably the Mach.2 disks will be SAS and incredibly expensive while providing significantly less performance than affordable SATA SSDs.

Computer Cases

In my last post I speculated on the appearance of smaller cases designed to not have DVD drives or 3.5″ hard drives. Such cases still haven’t appeared apart from special purpose machines like the NUC that were available last year.

It would be nice if we could get a new industry standard for smaller power supplies. Currently power supplies are expected to be almost 5 inches wide (due to the expectation of a 5.25″ DVD drive mounted horizontally). We need some industry standards for smaller PCs that aren’t like the NUC, the NUC is very nice, but most people who build their own PC need more space than that. I still think that planning on USB DVD drives is the right way to go. I’ve got 4PCs in my home that are regularly used and CDs and DVDs are used so rarely that sharing a single DVD drive among all 4 wouldn’t be a problem.


I’m tempted to get a couple of 4TB SSDs for my home server which cost $487 each, it currently has 2*500G SSDs and 3*4TB disks. I would have to remove some unused files but that’s probably not too hard to do as I have lots of old backups etc on there. Another possibility is to use 2*4TB SSDs for most stuff and 2*4TB disks for backups.

I’m recommending that all my clients only use SSDs for their storage. I only have one client with enough storage that disks are the only option (100TB of storage) but they moved all the functions of that server to AWS and use S3 for the storage. Now I don’t have any clients doing anything with storage that can’t be done in a better way on SSD for a price difference that’s easy for them to afford.

Affordable SSD also makes RAID-1 in workstations more viable. 2 disks in a PC is noisy if you have an office full of them and produces enough waste heat to be a reliability issue (most people don’t cool their offices adequately on weekends). 2 SSDs in a PC is no problem at all. As 500G SSDs are available for $73 it’s not a significant cost to install 2 of them in every PC in the office (more cost for my time than hardware). I generally won’t recommend that hard drives be replaced with SSDs in systems that are working well. But if a machine runs out of space then replacing it with SSDs in a RAID-1 is a good choice.

Moore’s law might cover SSDs, but it definitely doesn’t cover hard drives. Hard drives have fallen way behind developments of most other parts of computers over the last 30 years, hopefully they will go away soon.

Related posts:

  1. Storage Trends In considering storage trends for the consumer side I’m looking...
  2. New Storage Developments Eweek has an article on a new 1TB Seagate drive....
  3. Finding Storage Performance Problems Here are some basic things to do when debugging storage...
Categories: FLOSS Project Planets

Agiledrop.com Blog: Top Drupal blog posts from March 2021

Planet Drupal - Mon, 2021-04-12 05:35

We’re bringing you our latest recap of top Drupal blog posts from last month. Get ready for some really great posts this time!

Categories: FLOSS Project Planets

Scene Items in KWin

Planet KDE - Mon, 2021-04-12 04:36

If your background includes game development, the concept of a scene should sound familiar. A scene is a way to organize the contents of the screen using a tree, where parent nodes affect their child nodes. In a game, a scene would typically consist of elements such as lights, actors, terrain, etc.

KWin also has a scene. With this blog post, I want to provide a quick glimpse at the current scene design, and the plan how it can be improved for Wayland.

Current state

Since compositing functionality in KWin predates Wayland, the scene is relatively simple, it’s just a list of windows sorted in the stacking order. After all, on X11, a compositing window manager only needs to take window buffers and compose them into a single image.

With the introduction of Wayland support, we started hitting limitations of the current scene design. wl_surface is a quite universal thing. It can used to represent the contents of a window, or a cursor, or a drag-and-drop icon, etc.

Since the scene thinks of the screen in terms of windows, it needs to have custom code paths to cover all potential usages of the wl_surface interface. But doing that has its own problems. For example, if an application renders cursors using a graphics api such as OpenGL or Vulkan, KWin won’t be able to display such cursors because the code path that renders cursors doesn’t handle hardware accelerated client buffers.

Another limitation of the current scene is that it doesn’t allow tracking damage more efficiently per each wl_surface, which is needed to avoid repainting areas of the screen that haven’t changed and thus keep power usage low.

Introducing scene items

The root cause of our problems is that the scene thinks of the contents of the screen in terms of windows. What if we stop viewing a window as a single, indivisible object? What if we start viewing every window as something that’s made of several other items, e.g. a surface item with window contents, a server-side decoration item, and a nine-tile patch drop shadow item?

A WindowItem is composed of several other items – a ShadowItem, a DecorationItem, and a SurfaceItem

With such a design, the scene won’t be limited only to windows, for example we could start putting drag-and-drop icons in it. In addition to that, it will be possible to reuse the code that paints wl_surface objects or track damage per individual surface

Besides windows, the scene contains a drag-and-drop icon and a software cursor

Another advantage of the item-based design is that it will provide a convenient path towards migration to a scene/render graph, which is crucial for performing compositing on different threads or less painful transition to Vulkan.

Work done so far

At the end of March, an initial batch of changes to migrate to the item-based design was merged. We still have a lot of work ahead of us, but even with those initial changes, you will already see some improves in the Wayland session. For example, there should less visual artifacts in applications that utilize sub-surfaces, e.g. Firefox.

The end goal of the transition to the item-based design is to have a more flexible and extensible scene. So far, the plan is to continue doing refactorings and avoid rewriting the entire compositing machinery, if possible. You can find out more about the scene redesign progress by visiting https://invent.kde.org/plasma/kwin/-/issues/30.


In short, we still have some work to do to make rendering abstractions in KWin fit well all the cases that there are on Wayland. However, even with the work done so far, the results are very promising!

Categories: FLOSS Project Planets

Remote work tips: availability heat map

Planet KDE - Mon, 2021-04-12 03:00

When your team goes remote or when you are creating a new remote or distributed team, you need to reconsider the most basic ground rules. Most are a given when colocated. One of these ground rules to reconsider is people’s availability.

At the office, you expect people to be available more or less at similar times, even if your organization promotes flexi-time or core hours, such expectation is mostly there. But when you go remote or even in the case of companies moving towards flexi-days (many will after COVID-19) availability is something that needs to be carefully considered and agreed within the context of the team or department.

This article will focus on one of those ground rules, availability, including a simple but powerful way of starting the conversation with your team members about it, which has a major impact in scheduling.

I have written before about the need to redefine those ground rules when going remote in several articles. I list them at the end of this article, in the References section. I mentioned in one of those articles that my former colleague back during my Linaro days, Serge Broslavsky, showed me a visualization to start the conversation about availability that I found so useful that I use it ever since. I have mastered it over time, have used it frequently and even assigned it a name: availability heat map. But before describing what it is, let me start by justifying why you should focus energy in reconsidering availability.

In remote environments, be explicit about availability

When remote, each team member works in a different environment even if they are located in the same geo area, time zone or if they share the same life style. I always assume as starting point that their environments might be very different from each other, so their availability might be too. It needs to be agreed, which requires a careful conversation.

Some people live with others at home (friends, partner, etc), they might have different responsibilities towards them and in some cases, those around them affect the environment in a way that it is not possible to assume that their availability will not be affected. In some cases, people work in cafes, coworking, etc. which involve other constrains.

Another typical case where availability becomes a topic is when having team members from different cultures. Different cultures have different approaches to lunch, for instance. Northern Europeans tend to have lunch very early, Central Europeans usually take lunch in no more than one hour (British even less in general). There are plenty of cultures out there that loves to promote to kill themselves slowly by eating fast and poorly at lunch :-). There are others though that take lunch seriously so they take more time. It is a social activity that, in some cases, is very important for families and at work. Latins tend to fall in that category. At the office, the environment influence these habits making them more homogeneous but that is not necessarily the case when working remotely, at least not on daily basis.

I have managed teams where the availability in summer changed compared to winter for people that lives up north or in very cold or warm areas. They might want to take advantage of the daylight during noon in winter or prefer to work during the mid-day because is too warm outside.

An interesting consequence of revisiting availability that I pay attention to is the expectations out of office hours related with communication channels. I have worked with people used to phone others when they are at the office but their colleagues are not. If I am at the office and the most of my team is too, it is ok to call who is not by phone. The heat map also helps to open a conversation about the consequences of not being available and what to expect. It helps these kind of people to understand which channel should be used to reach out to you and when.

A third interesting case is people that multitask or work in more than one project. Also teams with dependencies on other teams which have a different understanding of availability. This case is very frequent in remote environments. Discuss and agree on availability become a ground rule that should be taken seriously since day one.

What is the advantage of working from home if you cannot make your personal and work life compatible to some extend? A better life balance is a big win for both, the employee and the employer. Having a serious thought about availability is essential for achieving it. As a manager, I have had cases in which the remote workers where in coworkings instead of at home because the company did not provide to them the tools to create such balance. That should be avoided when possible.

My point is that going remote requires a conversation about availability that you most likely do not need to have at the office, so inexperienced managers or teams in remote work often take it from granted. Once they realize the problem, it might be hard to redefine availability or even impossible. In extreme cases, you might only find out when burning out is getting closer. Funny enough, I have found more of these cases among managers and freelancers than employed developers throughout my career. It has to do with team protection.

The availability heat map

In order to start such conversation, ask each member of your team or department to fill out the availability heat map as first step, ideally right after they join your organization. After analysing it, you will have a better idea of the impact that living in different environments as well as other factors like time zones and personal preferences will have over people’s availability. You will be in a much better position to discuss the team or department schedule, which will be reflected in the calendar (if possible), making it compatible with company policy or business needs.

In summary, make the availability explicit, compared to colocated environments, where availability is implicit in general. The availability heat map is a simple initial step to do so.

Who is it for

I have used the availability heat map with the following groups. I assume this extremely simple activity can work for additional groups:

  • Teams with members in different time zones.
  • Multicultural teams.
  • Large remote teams.
  • Teams with members who belong or support more than one team.
  • Teams with strong dependencies with people from other teams.
  • Teams with people with small kids.

Color scheme

I tend to use four colors in the availability heat map. Each color has a specific meaning. The goal is to assign a color to each hour of the day, as shown in the example. I came to this scheme over time. You can adapt it to your experience or environment :

  • Green: you are in front of the computer and available for the rest of the team on regular basis.
  • Yellow: you might be available although it cannot be assumed by default. It might depend on the day, time of the year, workload.
  • Amber: you are usually unreachable at these hours unless it is planned in advance. It is an undesired time slot for you by default.
  • Red: you are available if an emergency or under very unusual circumstances only.

The usual ratios of hours I have worked with in the past are 4-6 green hours, 2-6 yellow hours, 2-4 amber and 8-12 red ones. Do not try to show many green hours at first. This exercise is not to demonstrate you work 8 or more hours a day, which it is a common mistake among junior (i remote work) newcomers when they join a new organization or team. The price in your schedule might be very high and eventually unsustainable over time.

Explain the exercise

My recommendation is that you explain face to face (video chat) to the affected people the exercise, with your availability already introduced, before asking others to fill out theirs. People from different cultures and background respond differently to this activity based on cultural factors or prior experience with managers and remote work. In my experience, some people take at first this exercise as a control one, specially if you are the manager or PO instead of the Scrum Master or facilitator.

The goal is to find out the ideal time slots for scheduling activities but at the same time, as manager, you can take this opportunity to learn about people constrains and desires when it comes to working hours. I use this action as starting point for some 1:1 conversations. I mentioned before that when remote, each employee works in a different environment and such environment affects their performance. As a remote manager, you have to learn about it and provide guidance on how to establish a good balance so they maximize work efficiency in a sustainable way. It is not about interfering into their personal lives. The line is thin.

The example

In this example, we have five team members where the last two live in different time zones, UTC-5 and UTC+2. After each member fills out their desired/expected availability, the conversation about scheduling becomes easier. Each team member as well as managers and other supporting roles have a simple way to understand what kind of sacrifices each member might have to do to be available to their colleagues, making their availability compatible with the business needs as well as their team needs (ideally those should be very similar). The scheduling of the team and department ceremonies and other company activities hopefully become easier now. Understanding when the real time communication is effective and when the work should become asynchronous also become simpler.

In this case, thanks to the fact that Kentavious is an early bird and that Kyle is used to working with people in Europe, from the US East Coast and Brazil, they have already adapted their availability to work with those on these time zones. As you can see, the approach to lunch is different for each team member. In addition, Anthony has to finish work early and Marc prefers to work before going to bed, which is a common pattern among parents with small kids.

According to the map, there are two overlapping hours. If I would be the manager or part of this team, I would talk to them in group to expose that increasing the number of overlapping times bring benefits to the overall performance of the team. I would talk individually then with each member to find out a way to have one or two additional overlapping hours. In general, I would consider three or four hours of overlapping availability enough as starting point in this case. I always favor a homogeneous expectation of availability throughout the week that having “special” days where your schedule changes. In a previous job I had my “Tuesdays for Asia” and my “Thursdays for US” and believe me, it was not fun.

After a conversation and decision process, it would be good to update the availability heat map. I suggest to make it available to others. If your organization or project is formed by many teams, you might want to add the availability heat map to your team landing page. In my experience, it helps when scheduling activities with specific teams by people which are not directly related to them on regular basis.

If you have a tool where you can create and maintain a team calendar, try to add the common available hours there and make them visible to others. If your team is a service or support team to other teams, you might want a more powerful tool to communicate your availability but the availability heat map might do the job at high level.

There are tools out there to accomplish the same goal than the availability heat map, but I like simplicity and I never needed anything more complex, assuming you have a powerful corporate calendaring tool.

Finally, please keep in mind that the availability heat map is a dynamic visualization. Revisit this ground rule on regular basis, at least on summer and winter. Small but significant changes might apply.


In a variety of use cases, especially related with remote work, there are basic ground rules that need to be reconsidered. Availability is one of them.

The availability heat map is an extremely simple action that can provide a first overview of the overlapping times and can trigger a conversation to increase or adapt those hours, as previous step to define when the team ceremonies might or should take place, how the communication should happen when, etc.. It is also an interesting action to trigger 1:1 conversations with your managee or colleagues. It s simple and easy to adapt to many use cases.

If you have a different way to reach the same goal, please let me know. If you like this idea and will adopt it, please let me know how it goes and which adaptations you did. I am always interested in improving the availability heat map.

Thanks Serge.


Previous articles I wrote related with remote work:

Categories: FLOSS Project Planets

Walled gardens

Planet KDE - Mon, 2021-04-12 01:30

Just yesterday someone joined in Plasma Mobile matrix room, asking for help and support for developing a native Signal client.

The post was immediately responded by some of my fellow developers with the responses which were basically,

  • Why signal when matrix is available and superior to it
  • While signal is open-source it does not provide all freedom to modify/redistribute
  • People should stop using signal

Some of these are important concerns but it made me think about the very initial Plasma Mobile announcement,

Important bits being,

The goal for Plasma Mobile is to give the user full use of the device. It is designed as an inclusive system, intended to support all kinds of apps. Native apps are developed using Qt; it will also support apps written in GTK, Android apps, Ubuntu apps, and many others, if the license allows and the app can be made to work at a technical level.


Most offerings on mobile devices lack openness and trust. In a world of walled gardens, Plasma Mobile is intended to be a platform that respects and protects user privacy. It provides a fully open base that others can help develop and use for themselves, or in their products.

Plasma Mobile aims to be not a walled garden, and provides a full control/freedom to users, which interestingly also comes with freedom to use the walled garden inside your open garden.

If user can not have this freedom or is actively being pushed towards ecosystem liked by the developers, then what we have created is a walled garden with illusion of being open garden.

There is also question of the mission for Plasma Mobile,

As a Free software community, it is our mission to give users the option of retaining full control over their data. The choice for a mobile operating system should not be a choice between missing functions or forsaken privacy of user data and personal information. Plasma Mobile offers the ability to choose the services that are allowed to integrate deeply into the system. It will not share any data unless that is explicitly requested.

Where we aim that users have full control over their data and do not use closed systems.

Which is why we need to find a balance between both of this goals/mission. We need to make sure that our default user experience does not make use of closed ecosystem software and at same time if users/developers have preference or requirement of using other systems we enable them to do so to best of our capability.

Day #6 of the #100DaysToOffload series.

After a long break due to some personal stuff I am back to writing for #100DaysToOffload

Categories: FLOSS Project Planets

Mike Driscoll: PyDev of the Week: Will McGugan

Planet Python - Mon, 2021-04-12 01:05

This week we welcome Will McGugan (@willmcgugan) as our PyDev of the Week! Will is the author of the Rich package, which is for rich text and beautiful formatting in the terminal. If you have a moment, you should check out Will’s blog. Will is also the author of Beginning Game Development with Python and Pygame. You can see what other projects he contributes to over on GitHub.

Let’s spend some time getting to know Will better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in a small town in North East Scotland. My career took me around the UK, including some years in Oxford and London. I’ve since returned to Scotland where I live in Edinburgh with my wife. I’m quite fortunate to have been working from home as a freelance software developer long before the pandemic started.

I’m mostly self-taught, having dropped out of University to work in video games. Although I think by the time you reach my age all developers are self-taught. In such a fast-moving industry learning on the job is a must.

My main hobby outside software development is photography—in particular, wildlife photography. I once spent a night in a Finnish forest shooting wild Eurasian bears. That was quite an experience! As soon as the world returns to normal I plan to do way more traveling and photography.

I post many of my photographs on my blog and if you prompt me I’ll talk at length about focal lengths and bokeh.

Why did you start using Python?

I discovered Python back in the early 2000s when I worked in video games. I was looking for a scripting language I could compile in to a game engine to manage the game mechanics while C++ handled the heaving lifting and graphics. I considered Python, Ruby, and Lua. After some research and experimentation, I settled on not Python, but Lua.

Lua was probably the best choice for that task but I found myself turning to Python for scripts and tools. I viewed Python then as more of an upgrade to Windows batch files and not as a real programming language. Only when the scripts I was writing grew more sophisticated did I begin to appreciate the expressiveness of Python and the batteries included approach. It was a refreshing change from C++ where so much had to be written from scratch.

Fast forward a few years and I made the switch to working with Python full-time, writing a chess interface for the Internet Chess Club. Python has been the focus of my career ever since, and even though I spent years learning C++, I don’t regret the switch!

What other programming languages do you know and which is your favorite?

The main other languages I use day-to-day is Javascript and Typescript (if that counts as another language), often in the context of a web application with a backend written in Python.

It’s been a while but did a lot of work with C and C++ back in the day. I also wrote a fair amount of 80×86 assembly language, at a time where hand-tuning instructions was a sane thing to do.

My favorite language is of course Python. I love the language itself and the ecosystem that has grown around it.

What projects are you working on now?

My side project is currently Rich, a library for fancy terminal rendering. I’ll talk more about that later.

My day job has me building technology for dataplicity.com, which is a remote administration tool targeted at the Raspberry Pi single-board computer (but works with any Linux). Other than the front-end, the stack is entirely running on Python.

Which Python libraries are your favorite (core or 3rd party)?

I’m a huge fan of asyncio as a lot of work I do requires concurrency of some sort or another. I’ve used Twisted and Tornado to do very similar things in the past, but asyncio with the async and await keywords have made for a much more pleasant experience. Related, is aiohttp a web framework on top of asyncio which I’ve used in the day job to build a highly-scalable websocket server.

Two libraries I like right now are PyDantic and Typer. I really like the way they use typing to create objects that can be statically checked with Mypy and related tools. The authors are pioneers and I hope to see more of this approach in the future!

How did the Rich package come about?

Some time ago my side-project was a web application framework called Moya. While building the command line interface for Moya I put together a “Console” class which turned out to be the prototypal version of Rich. The Moya Console class was not terribly well thought out and hard to separate from the main project, but there were some really good ideas there and I always thought I should build a standalone version of it.

I would revisit this idea in my mind every time I struggled to read some ugly badly formatted terminal output (often implemented by myself). I wished that this Uber-console I was formulating in my head already existed fully-fledged and documented, but it wasn’t going to write itself. Sometime in late 2019, I started work on it.

The core features came together quite quickly. Incidentally, I was in Wuhan, China just a few weeks before the pandemic hit when the first rich output was generated (naturally a bold magenta underlined blinking “Hello, World!”).

The first core feature was rich text, which is where the package name was derived. I could associate styles with a range of characters in a string, much like the way you can markup text in HTML. And that marked-up string could then be further manipulated while preserving the styles. That one feature made so many others possible, like syntax highlighting and markdown rendering.

The v1.0.0 release came out in May 2020 and it really took off, way more than I was expecting. There were bugs of course and plenty of feedback. I had intended to leave it there and maybe just maintain it for a while, but there were so many good suggestions for features that I kept working on it. At the moment I’m considering more TUI (Text User Interface) features to make terminal-based applications.

What are Rich’s strengths and weaknesses?

The main strength is probably the composability of the renderables (renderable is my term for anything that generates output in the terminal). For instance, a table cell may contain a panel (or any other renderable) that may itself contain another table, carrying on ad-infinitum, or at least until you run out of characters. It’s a model that allows you to quickly create elegant formatting in the terminal more like a web page than a stream of characters. One user even wrote his CV (résumé) using Rich, and it looks great!

One weakness may be the emoji support. Everyone loves emoji in terminal output. But what many don’t realize is that terminal support for emojis is spotty. Not all terminals display emojis with the same width, so the very same output may look neat on one terminal but have broken alignment on another and, to make matters worse, there is no way for Rich to detect how emoji are rendered.

Is there anything else you’d like to say?

I post Python-related articles on my blog (https://www.willmcgugan.com) from time to time. I’m @willmcgugan on twitter.

Thanks for doing the interview, Will!

The post PyDev of the Week: Will McGugan appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

hussainweb.me: Running (testing) Drupal in CI pipeline

Planet Drupal - Sun, 2021-04-11 23:53
Here's a quick post to show how we can run Drupal in a CI environment easily so that we can test the site. Regardless of how you choose to run the tests (e.g. PHPUnit, Behat, etc), you still need to run the site somewhere. It is not a great idea to test on an actual environment (unless it is isolated and designated for testing). You need to set up a temporary environment just for the CI pipeline where you run the tests and then tear it down.
Categories: FLOSS Project Planets

Kristen Pol: Join us this week for DrupalCon contribution days!

Planet Drupal - Sun, 2021-04-11 23:23

Source: Drupal Contributions Platform

DrupalCon North America 2021's main program starts tomorrow! Hope to see you in Hopin for the keynotes, sessions, BoFs, Expo Hall, Driesnote, Drupal Trivia, and more.

This year, instead of being only on Friday, Drupal contribution has been spread across the whole week. I'm excited about the new scheduling and hope to see you this week in the contribution areas. For those new to Drupal or new to contribution, I'd like to convince you that Drupal contribution is worth your time... starting with the contribution event is *free*!

read more

Categories: FLOSS Project Planets

Chris Hager: PDFx update and new version release (v1.4.1)

Planet Python - Sun, 2021-04-11 20:00

PDFx is a tool to extract text, links, references and metadata from PDF files and URLs. Thanks to several contributors the project received a thorough update and was brought into 2021. The new release of today is PDFx v1.4.1 🎉

PDFx works like this:

Categories: FLOSS Project Planets

GNU Guix: New Supported Platform: powerpc64le-linux

GNU Planet! - Sun, 2021-04-11 20:00

It is a pleasure to announce that support for powerpc64le-linux (PowerISA v.2.07 and later) has now been merged to the master branch of GNU Guix!

This means that GNU Guix can be used immediately on this platform from a Git checkout. Starting with the next release (Guix v1.2.1), you will also be able to download a copy of Guix pre-built for powerpc64le-linux. Regardless of how you get it, you can run the new powerpc64le-linux port of GNU Guix on top of any existing powerpc64le GNU/Linux distribution.

This new platform is available as a "technology preview". This means that although it is supported, substitutes are not yet available from the build farm, and some packages may fail to build. Although powerpc64le-linux support is nascent, the Guix community is actively working on improving it, and this is a great time to get involved!

Why Is This Important?

This is important because it means that GNU Guix now works on the Talos II, Talos II Lite, and Blackbird mainboards sold by Raptor Computing Systems. This modern, performant hardware uses IBM POWER9 processors, and it is designed to respect your freedom. The Talos II and Talos II Lite have recently received Respects Your Freedom (RYF) certification from the FSF, and Raptor Computing Systems is currently pursuing RYF certification for the more affordable Blackbird, too. All of this hardware can run without any non-free code, even the bootloader and firmware. In other words, this is a freedom-friendly hardware platform that aligns well with GNU Guix's commitment to software freedom.

How is this any different from existing RYF hardware, you might ask? One reason is performance. The existing RYF laptops, mainboards, and workstations can only really be used with Intel Core Duo or AMD Opteron processors. Those processors were released over 15 years ago. Since then, processor performance has increased drastically. People should not have to choose between performance and freedom, but for many years that is exactly what we were forced to do. However, the POWER9 machines sold by Raptor Computing Systems have changed this: the free software community now has an RYF-certified option that can compete with the performance of modern Intel and AMD systems.

Although the performance of POWER9 processors is competitive with modern Intel and AMD processors, the real advantage of the Talos II, Talos II Lite, and Blackbird is that they were designed from the start to respect your freedom. Modern processors from both Intel and AMD include back doors over which you are given no control. Even though the back doors can be removed with significant effort on older hardware in some cases, this is an obstacle that nobody should have to overcome just to control their own computer. Many of the existing RYF-certified options (e.g., the venerable Lenovo x200) use hardware that can only be considered RYF-certified after someone has gone through the extra effort of removing those back doors. No such obstacles exist when using the Talos II, Talos II Lite, or Blackbird. In fact, although Intel and AMD both go out of their way to keep you from understanding what is going on in your own computer, Raptor Computing Systems releases all of the software and firmware used in their boards as free software. They even include circuit diagrams when they ship you the machine!

Compared to the existing options, the Talos II, Talos II Lite, and Blackbird are a breath of fresh air that the free software community really deserves. Raptor Computing Systems' commitment to software freedom and owner control is an inspiring reminder that it is possible to ship a great product while still respecting the freedom of your customers. And going forward, the future looks bright for the open, royalty-free Power ISA stewarded by the OpenPOWER Foundation, which is now a Linux Foundation project (see also: the same announcement from the OpenPOWER Foundation.

In the rest of this blog post, we will discuss the steps we took to port Guix to powerpc64le-linux, the issues we encountered, and the steps we can take going forward to further solidify support for this exciting new platform.

Bootstrapping powerpc64le-linux: A Journey

To build software, you need software. How can one port Guix to a platform before support for that platform exists? This is a bootstrapping problem.

In Guix, all software for a given platform (e.g., powerpc64le-linux) is built starting from a small set of "bootstrap binaries". These are binaries of Guile, GCC, Binutils, libc, and a few other packages, pre-built for the relevant platform. It is intended that the bootstrap binaries are the only pieces of software in the entire package collection that Guix cannot build from source. In practice, additional bootstrap roots are possible, but introducing them in Guix is highly discouraged, and our community actively works to reduce our overall bootstrap footprint. There is one set of bootstrap binaries for each platform that Guix supports.

This means that to port Guix to a new platform, you must first build the bootstrap binaries for that platform. In theory, you can do this in many ways. For example, you might try to manually compile them on an existing system. However, Guix has package definitions that you can use to build them - using Guix, of course!

Commonly, the first step in porting Guix to a new platform is to use Guix to cross-compile the bootstrap binaries for that new platform from a platform on which Guix is already supported. This can be done by running a command like the following on a system where Guix is already installed:

guix build --target=powerpc64le-linux-gnu bootstrap-tarballs

This is the route that we took when building the powerpc64le-linux bootstrap binaries, as described in commit 8a1118a. You might wonder why the target above is "powerpc64le-linux-gnu" even though the new Guix platform is called "powerpc64le-linux". This is because "powerpc64le-linux-gnu" is a GNU triplet identifying the new platform, but "powerpc64le-linux" is the name of a "system" (i.e., a platform) in Guix. Guix contains code that converts between the two as needed (see nix-system->gnu-triplet and gnu-triplet->nix-system in guix/utils.scm. When cross-compiling, you only need to specify the GNU triplet.

Note that before you can even do this, you must first update the glibc-dynamic-linker and system->linux-architecture procedures in Guix's code, as described in Porting. In addition, the versions of packages in Guix that make up the GNU toolchain (gcc, glibc, etc.) must already support the target platform. This pre-existing toolchain support needs to be good enough so that Guix can (1) build, on some already-supported platform, a cross-compilation toolchain for the target platform, (2) use, on the already-supported platform, the cross-compilation toolchain to cross-compile the bootstrap binaries for the target platform, and (3) use, on the target platform, the bootstrap binaries to natively build the rest of the Guix package collection. The above guix build command takes care of steps (1) and (2) automatically.

Step (3) is a little more involved. Once the bootstrap binaries for the target platform have been built, they must be published online for anyone to download. After that, Guix's code must be updated so that (a) it recognizes the "system" name (e.g., "powerpc64le-linux") that will be used to identify the new platform and (b) it fetches the new platform's bootstrap binaries from the right location. After all that is done, you just have to try building things and see what breaks. For example, you can run ./pre-inst-env guix build hello from your Git checkout to try building GNU Hello.

The actual bootstrap binaries for powerpc64le-linux are stored on the alpha.gnu.org FTP server. Chris Marusich built these bootstrap binaries in an x86_64-linux Guix System VM which was running on hardware owned by Léo Le Bouter. Chris then signed the binaries and provided them to Ludovic Courtès, who in turn verified their authenticity, signed them, and uploaded them to alpha.gnu.org. After that, we updated the code to use the newly published bootstrap binaries in commit 8a1118a. Once all that was done, we could begin bootstrapping the rest of the system - or trying to, at least.

There were many stumbling blocks. For example, to resolve some test failures, we had to update the code in Guix that enables it to make certain syscalls from scheme. In another example, we had to patch GCC so that it looks for the 64-bit libraries in /lib, rather than /lib64, since that is where Guix puts its 64-bit libraries by convention. In addition, some packages required in order to build Guix failed to build, so we had to debug those build failures, too.

For a list of all the changes, see the patch series or the actual commits, which are:

$ git log --oneline --no-decorate 8a1118a96c9ae128302c3d435ae77cb3dd693aea^..65c46e79e0495fe4d32f6f2725d7233fff10fd70 65c46e79e0 gnu: sed: Make it build on SELinux-enabled kernels. 93f21e1a35 utils: Fix target-64bit? on powerpc64le-linux. 8d9aece8c4 ci: %cross-targets: Add powerpc64le-linux-gnu. c29bfbfc78 syscalls: Fix RNDADDTOENTCNT on powerpc64le-linux. b57de27d03 syscalls: Fix clone on powerpc64le-linux. a16eb6c5f9 Add powerpc64le-linux as a supported Guix architecture. b50f426803 gnu: libelf: Fix compilation for powerpc64le-linux. 1a0f4013d3 gnu: texlive-latex-base: Fix compilation on powerpc64le*. e9938dc8f0 gnu: texlive-bin: Fix compilation on powerpc64le*. 69b3907adf gnu: guile-avahi: Fix compilation on powerpc64le-linux. 4cc2d2aa59 gnu: bdb-4.8: Fix configure on powerpc64le-linux. be4b1cf53b gnu: binutils-final: Support more Power architectures. 060478c32c gnu: binutils-final: Provide bash for binary on powerpc-linux. b2135b5d57 gnu: gcc-boot0: Enable 128-bit long double for POWER9. 6e98e9ca92 gnu: glibc: Fix ldd path on powerpc*. cac88b28b8 gnu: gcc-4.7: On powerpc64le, fix /lib64 references. fc7cf0c1ec utils: Add target-powerpc? procedure. 8a1118a96c gnu: bootstrap: Add support for powerpc64le-linux.

In the end, through the combined efforts of multiple people, we slowly worked through the issues until we reached a point where we could do all of the following things successfully:

  • Build Guix manually on a Debian GNU/Linux ppc64el machine (this is Debian's name for a system using the powerpc64le-linux-gnu triplet), and verify that its make check tests passed.
  • Build GNU Hello using Guix and run it.
  • Run guix pull to build and install the most recent version of Guix, with powerpc64le-linux support.
  • Build a release binary tarball for powerpc64le-linux via: make guix-binary.powerpc64le-linux.tar.xz
  • Use that binary to install a version of Guix that could build/run GNU Hello and run guix pull successfully.

This was an exciting moment! But there was still more work to be done.

Originally, we did this work on the wip-ppc64le branch, with the intent of merging it into core-updates. By convention, the "core-updates" branch in Guix is where changes are made if they cause too many rebuilds. Since we were updating package definitions so deep in the dependency graph of the package collection, we assumed it wouldn't be possible to avoid rebuilding the world. For this reason, we had based the wip-ppc64le branch on core-updates.

However, Efraim Flashner proved us wrong! He created a separate branch, wip-ppc64le-for-master, where he adjusted some of the wip-ppc64le commits to avoid rebuilding the world on other platforms. Thanks to his work, we were able to merge the changes directly to master! This meant that we would be able to include it in the next release (Guix v.1.2.1).

In short, the initial porting work is done, and it is now possible for anyone to easily try out Guix on this new platform. Because guix pull works, too, it is also easy to iterate on what we have and work towards improving support for the platform. It took a lot of cooperation and effort to get this far, but there are multiple people actively contributing to this port in the Guix community who want to see it succeed. We hope you will join us in exploring the limits of this exciting new freedom-friendly platform!

Other Porting Challenges

Very early in the porting process, there were some other problems that stymied our work.

First, we actually thought we would try to port to powerpc64-linux (big-endian). However, this did not prove to be any easier than the little-endian port. In addition, other distributions (e.g., Debian and Fedora) have recently dropped their big-endian powerpc64 ports, so the little-endian variant is more likely to be tested and supported in the community. For these reasons, we decided to focus our efforts on the little-endian variant, and so far we haven't looked back.

In both the big-endian and little-endian case, we were saddened to discover that the bootstrap binaries are not entirely reproducible. This fact is documented in bug 41669, along with our extensive investigations.

In short, if you build the bootstrap binaries on two separate machines without using any substitutes, you will find that the derivation which cross-compiles %gcc-static (the bootstrap GCC, version 5.5.0) produces different output on the two systems. However, if you build %gcc-static twice on the same system, it builds reproducibly. This suggests that something in the transitive closure of inputs of %gcc-static is perhaps contributing to its non-reproducibility. There is an interesting graph toward the end of the bug report, shown below:

This graph shows the derivations that produce differing outputs across two Guix System machines, when everything is built without substitutes. It starts from the derivation that cross-compiles %gcc-static for powerpc64-linux-gnu (from x86_64-linux) using Guix at commit 1ced8379c7641788fa607b19b7a66d18f045362b. Then, it walks the graph of derivation inputs, recording only those derivations which produce differing output on the two different machines. If the non-reproducibility (across systems) of %gcc-static is caused by a non-reproducible input, then it is probably caused by one or more of the derivations shown in this graph.

At some point, you have to cut your losses and move on. After months of investigation without resolving the reproducibility issue, we finally decided to move forward with the bootstrap binaries produced earlier. If necessary, we can always go back and try to fix this issue. However, it seemed more important to get started with the bootstrapping work.

Anyone who is interested in solving this problem is welcome to comment on the bug report and help us to figure out the mystery. We are very interested in solving it, but at the moment we are more focused on building the rest of the Guix package collection on the powerpc64le-linux platform using the existing bootstrap binaries.

Next Steps

It is now possible to install Guix on a powerpc64le-linux system and use it to build some useful software - in particular, Guix itself. So Guix is now "self-hosted" on this platform, which gives us a comfortable place to begin further work.

The following tasks still need to be done. Anyone can help, so please get in touch if you want to contribute!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Categories: FLOSS Project Planets

Jelmer Vernooij: The upstream ontologist

Planet Debian - Sun, 2021-04-11 18:40

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

The upstream ontologist is a project that extracts metadata about upstream projects in a consistent format. It does this with a combination of heuristics and reading ecosystem-specific metadata files, such as Python’s setup.py, rust’s Cargo.toml as well as e.g. scanning README files.

Supported Data Sources

It will extract information from a wide variety of sources, including:

Supported Fields

Fields that it currently provides include:

  • Homepage: homepage URL
  • Name: name of the upstream project
  • Contact: contact address of some sort of the upstream (e-mail, mailing list URL)
  • Repository: VCS URL
  • Repository-Browse: Web URL for viewing the VCS
  • Bug-Database: Bug database URL (for web viewing, generally)
  • Bug-Submit: URL to use to submit new bugs (either on the web or an e-mail address)
  • Screenshots: List of URLs with screenshots
  • Archive: Archive used - e.g. SourceForge
  • Security-Contact: e-mail or URL with instructions for reporting security issues
  • Documentation: Link to documentation on the web:
  • Wiki: Wiki URL
  • Summary: one-line description of the project
  • Description: longer description of the project
  • License: Single line license description (e.g. "GPL 2.0") as declared in the metadata[1]
  • Copyright: List of copyright holders
  • Version: Current upstream version
  • Security-MD: URL to markdown file with security policy

All data fields have a “certainty” associated with them (“certain”, “confident”, “likely” or “possible”), which gets set depending on how the data was derived or where it was found. If multiple possible values were found for a specific field, then the value with the highest certainty is taken.


The ontologist provides a high-level Python API as well as two command-line tools that can write output in two different formats:

For example, running guess-upstream-metadata on dulwich:

% guess-upstream-metadata <string>:2: (INFO/1) Duplicate implicit target name: "contributing". Name: dulwich Repository: https://www.dulwich.io/code/ X-Security-MD: https://github.com/dulwich/dulwich/tree/HEAD/SECURITY.md X-Version: 0.20.21 Bug-Database: https://github.com/dulwich/dulwich/issues X-Summary: Python Git Library X-Description: | This is the Dulwich project. It aims to provide an interface to git repos (both local and remote) that doesn't call out to git directly but instead uses pure Python. X-License: Apache License, version 2 or GNU General Public License, version 2 or later. Bug-Submit: https://github.com/dulwich/dulwich/issues/new Lintian-Brush

lintian-brush can update DEP-12-style debian/upstream/metadata files that hold information about the upstream project that is packaged as well as the Homepage in the debian/control file based on information provided by the upstream ontologist. By default, it only imports data with the highest certainty - you can override this by specifying the --uncertain command-line flag.

[1]Obviously this won't be able to describe the full licensing situation for many projects. Projects like scancode-toolkit are more appropriate for that.
Categories: FLOSS Project Planets

On finishing Season of KDE: improving Kirigami docs

Planet KDE - Sun, 2021-04-11 16:14

I wrote my first Season of KDE blog-post 3 months ago… and have since forgotten to write any updates. It’s time to address that!

Since January, I’ve been working mainly on improving the documentation for Kirigami. Back then, the Develop wiki had some pages teaching newcomers how to create a Kirigami application, but these were a little disjointed and didn’t really lead readers towards any specific goal.

There were also a lot of aspects and components of Kirigami that weren’t properly documented. Some of the existing materials also needed revising in terms of style, structure, and clarity.

Tutorials Kirigami documentation in the KDE Develop site

Before Season of KDE I’d recently started tinkering with QML and Kirigami. I wanted to create a simple application that would let you count down the days towards a date, like those you can get on your phone, but without all the obnoxious ads. Since I had no real knowledge of these tools, I started following the tutorials on the KDE Develop wiki, which was a great way of finding out what the problems were with these tutorials.

I went with the idea of the date countdown app and used this as the final goal of the tutorial. If you read through the tutorials now, you’ll find that each page builds towards creating such an app. The new tutorials go over all the essentials that you would need to know to create a basic Kirigami application, covering everything from setting up the development environment, to how to use Kirigami components, to how QML signals work, and so on. Care has also been taken to explain concepts that a beginner developer might not know much about, such as the aforementioned signals. I point this out because I, as a beginner, did not know how signals worked.

These new tutorials should make it quite a bit easier for new developers to come in and learn how a chunk of KDE development works. Hopefully we’ll soon have an influx of enthusiastic new developers bringing new applications to KDE, or helping out with our existing apps!

These new tutorials can be found in the Kirigami section of the KDE Develop site. The project I had initially begun before SoK is now called DayKountdown and is part of the Plasma Mobile namespace!

DayKountdown in its starting view. Beginners

Also helpful to beginners is a new page placed at the end of the new tutorials. This page has been designed to contain everything a newcomer might need or be interested in after creating their first Kirigami application.

Kirigami-based applications for newcomers

Taking a page out of GNOME’s newcomer guide, we have a dedicated section for new contributors. Provided is a summarised list of contribution guidelines, along with active projects that we recommend new developers can contribute to. These projects are organised in terms of complexity and feature useful links where readers can learn more about them. I hope these will encourage readers to become contributors!

There are now also a number of handy links to resources readers can use to learn more about the various tools used in KDE development. We’ve linked to some of the other tutorials available on the Develop wiki, as well as more general resources available elsewhere tackling C++ and Qt. Whereas before readers would have had to search for their own resources, now they will have an index of handpicked websites where they can go and learn more.

This page can be found here.

Component pages

Another big effort has been to expand the number of component pages in the Kirigami documentation. Previously, there have only been a limited number of components explained in the wiki, and as a result, new developers were never made aware of the breadth of components offered by Kirigami. A large part of the work in this Season of KDE project has been to address this problem.

With my last SoK merge request, we will go from having 3 component pages in the wiki to having 12! A range of cool Kirigami components now have their own pages, from inline messages to overlay sheets to form layouts and more. Carl Schwan and I are still working on polishing the merge request and getting it ready, but once it lands, it will really help the documentation take shape. The wiki should become much more useful for those interested in learning more about what they can create with Kirigami.

That’s not to say Kirigami is fully documented yet. It isn’t! But I think it’s a step in the right direction.

My time as a Season of KDE participant

6 months ago, I really didn’t know how to code at all. I’d written a lot about open source in the software in the past — I’ve advocated for it for a long time — but I never really knew how any of it worked.

I still don’t know how most things work, but I can definitely say I have learned a lot about KDE. Working on the Kirigami docs has been a very fun experience, partly because creating apps is fun in and of itself, but also because I can now grasp at how some of the applications on my computer have been made. That feels like a big-brain moment.

I must also thank my mentor Carl Schwan, who has been super helpful throughout these 3 months. Whether it has been combing over my ungainly merge requests, or reviewing the code for DayKountdown, his advice has been great and it has helped me become a (slightly) better coder.

Finally, it’s extremely fulfilling to have contributed to a software project that I have been using for the longest time. Thank you for merging my MRs!!!! I am sure there will be more of them to come, and I am looking forward refactoring lots and lots of my code



Categories: FLOSS Project Planets

Andy Wingo: guile's reader, in guile

GNU Planet! - Sun, 2021-04-11 15:51

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

Categories: FLOSS Project Planets