FLOSS Project Planets

Anarcat: Matrix notes

Planet Python - Fri, 2022-06-17 11:34

I have some concerns about Matrix (the protocol, not the movie that came out recently, although I do have concerns about that as well). I've been watching the project for a long time, and it seems more a promising alternative to many protocols like IRC, XMPP, and Signal.

This review may sound a bit negative, because it focuses on those concerns. I am the operator of an IRC network and people keep asking me to bridge it with Matrix. I have myself considered just giving up on IRC and converting to Matrix. This space is a living document exploring my research of that problem space. The TL;DR: is that no, I'm not setting up a bridge just yet, and I'm still on IRC.

This article was written over the course of the last three months, but I have been watching the Matrix project for years (my logs seem to say 2016 at least). The article is rather long. It will likely take you half an hour to read, so copy this over to your ebook reader, your tablet, or dead trees, and lean back and relax as I show you around the Matrix. Or, alternatively, just jump to a section that interest you, most likely the conclusion.

Introduction to Matrix

Matrix is an "open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history".

It's also (when compared with XMPP) "an eventually consistent global JSON database with an HTTP API and pubsub semantics - whilst XMPP can be thought of as a message passing protocol."

According to their FAQ, the project started in 2014, has about 20,000 servers, and millions of users. Matrix works over HTTPS but over a special port: 8448.

Security and privacy

I have some concerns about the security promises of Matrix. It's advertised as a "secure" with "E2E [end-to-end] encryption", but how does it actually work?

Data retention defaults

One of my main concerns with Matrix is data retention, which is a key part of security in a threat model where (for example) an hostile state actor wants to surveil your communications and can seize your devices.

On IRC, servers don't actually keep messages all that long: they pass them along to other servers and clients as fast as they can, only keep them in memory, and move on to the next message. There are no concerns about data retention on messages (and their metadata) other than the network layer. (I'm ignoring the issues with user registration, which is a separate, if valid, concern.) Obviously, an hostile server could log everything passing through it, but IRC federations are normally tightly controlled. So, if you trust your IRC operators, you should be fairly safe. Obviously, clients can (and often do, even if OTR is configured!) log all messages, but this is generally not the default. Irssi, for example, does not log by default. IRC bouncers are more likely to log to disk, of course, to be able to do what they do.

Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata.

There is a mechanism to expire entries in Synapse, but it is not enabled by default. So one should generally assume that a message sent on Matrix is never expired.

GDPR in the federation

But even if that setting was enabled by default, how do you control it? This is a fundamental problem of the federation: if any user is allowed to join a room (which is the default), those user's servers will log all content and metadata from that room. That includes private, one-on-one conversations, since those are essentially rooms as well.

In the context of the GDPR, this is really tricky: who is the responsible party (known as the "data controller") here? It's basically any yahoo who fires up a home server and joins a room.

In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:

  1. enumerate all the users that ever joined the room while you were there
  2. discover all their home servers
  3. start a GDPR procedure against all those servers

I recognize this is a hard problem to solve while still keeping an open ecosystem. But I believe that Matrix should have much stricter defaults towards data retention than right now. Message expiry should be enforced by default, for example. (Note that there are also redaction policies that could be used to implement part of the GDPR automatically, see the privacy policy discussion below on that.)

Also keep in mind that, in the brave new peer-to-peer world that Matrix is heading towards, the boundary between server and client is likely to be fuzzier, which would make applying the GDPR even more difficult.

In fact, maybe Synapse should be designed so that there's no configurable flag to turn off data retention. A bit like how most system loggers in UNIX (e.g. syslog) come with a log retention system that typically rotate logs after a few weeks or month. Historically, this was designed to keep hard drives from filling up, but it also has the added benefit of limiting the amount of personal information kept on disk in this modern day. (Arguably, syslog doesn't rotate logs on its own, but, say, Debian GNU/Linux, as an installed system, does have log retention policies well defined for installed packages, and those can be discussed. And "no expiry" is definitely a bug.

Matrix.org privacy policy

When I first looked at Matrix, five years ago, Element.io was called Riot.im and had a rather dubious privacy policy:

We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service.

[...]

This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.

When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here).

They also included a "free to snitch" clause:

If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.

Those are really broad terms, above and beyond what is typically expected legally.

Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers.

Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since.

Notable points of the new privacy policies:

  • 2.3.1.1: the "federation" section actually outlines that "Federated homeservers and Matrix clients which respect the Matrix protocol are expected to honour these controls and redaction/erasure requests, but other federated homeservers are outside of the span of control of Element, and we cannot guarantee how this data will be processed"
  • 2.6: users under the age of 16 should not use the matrix.org service
  • 2.10: Upcloud, Mythic Beast, Amazon, and CloudFlare possibly have access to your data (it's nice to at least mention this in the privacy policy: many providers don't even bother admitting to this kind of delegation)
  • Element 2.2.1: mentions many more third parties (Twilio, Stripe, Quaderno, LinkedIn, Twitter, Google, Outplay, PipeDrive, HubSpot, Posthog, Sentry, and Matomo (phew!) used when you are paying Matrix.org for hosting

I'm not super happy with all the trackers they have on the Element platform, but then again you don't have to use that service. Your favorite homeserver (assuming you are not on Matrix.org) probably has their own Element deployment, hopefully without all that garbage.

Overall, this is all a huge improvement over the previous privacy policy, so hats off to the Matrix people for figuring out a reasonable policy in such a tricky context. I particularly like this bit:

We will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.

It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal.

As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.

Metadata handling

Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind.

This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin.

Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation.

To be fair, IRC doesn't have a great story here either: any client knows not only who's talking to who in a room, but also typically their client IP address. Servers can (and often do) obfuscate this, but often that obfuscation is trivial to reverse. Some servers do provide "cloaks" (sometimes automatically), but that's kind of a "slap-on" solution that actually moves the problem elsewhere: now the server knows a little more about the user.

Overall, I would worry much more about a Matrix home server seizure than a IRC or Signal server seizure. Signal does get subpoenas, and they can only give out a tiny bit of information about their users: their phone number, and their registration, and last connection date. Matrix carries a lot more information in its database.

Amplification attacks on URL previews

I (still!) run an Icecast server and sometimes share links to it on IRC which, obviously, also ends up on (more than one!) Matrix home servers because some people connect to IRC using Matrix. This, in turn, means that Matrix will connect to that URL to generate a link preview.

I feel this outlines a security issue, especially because those sockets would be kept open seemingly forever. I tried to warn the Matrix security team but somehow, I don't think this issue was taken very seriously. Here's the disclosure timeline:

  • January 18: contacted Matrix security
  • January 19: response: already reported as a bug
  • January 20: response: can't reproduce
  • January 31: timeout added, considered solved
  • January 31: I respond that I believe the security issue is underestimated, ask for clearance to disclose
  • February 1: response: asking for two weeks delay after the next release (1.53.0) including another patch, presumably in two weeks' time
  • February 22: Matrix 1.53.0 released
  • April 14: I notice the release, ask for clearance again
  • April 14: response: referred to the public disclosure

There are a couple of problems here:

  1. the bug was publicly disclosed in September 2020, and not considered a security issue until I notified them, and even then, I had to insist

  2. no clear disclosure policy timeline was proposed or seems established in the project (there is a security disclosure policy but it doesn't include any predefined timeline)

  3. I wasn't informed of the disclosure

  4. the actual solution is a size limit (10MB, already implemented), a time limit (30 seconds, implemented in PR 11784), and a content type allow list (HTML, "media" or JSON, implemented in PR 11936), and I'm not sure it's adequate

  5. (pure vanity:) I did not make it to their Hall of fame

I'm not sure those solutions are adequate because they all seem to assume a single home server will pull that one URL for a little while then stop. But in a federated network, many (possibly thousands) home servers may be connected in a single room at once. If an attacker drops a link into such a room, all those servers would connect to that link all at once. This is an amplification attack: a small amount of traffic will generate a lot more traffic to a single target. It doesn't matter there are size or time limits: the amplification is what matters here.

It should also be noted that clients that generate link previews have more amplification because they are more numerous than servers. And of course, the default Matrix client (Element) does generate link previews as well.

That said, this is possibly not a problem specific to Matrix: any federated service that generates link previews may suffer from this.

I'm honestly not sure what the solution is here. Maybe moderation? Maybe link previews are just evil? All I know is there was this weird bug in my Icecast server and I tried to ring the bell about it, and it feels it was swept under the rug. Somehow I feel this is bound to blow up again in the future, even with the current mitigation.

Moderation

In Matrix like elsewhere, Moderation is a hard problem. There is a detailed moderation guide and much of this problem space is actively worked on in Matrix right now. A fundamental problem with moderating a federated space is that a user banned from a room can rejoin the room from another server. This is why spam is such a problem in Email, and why IRC networks have stopped federating ages ago (see the IRC history for that fascinating story).

The mjolnir bot

The mjolnir moderation bot is designed to help with some of those things. It can kick and ban users, redact all of a user's message (as opposed to one by one), all of this across multiple rooms. It can also subscribe to a federated block list published by matrix.org to block known abusers (users or servers). Bans are pretty flexible and can operate at the user, room, or server level.

Matrix people suggest making the bot admin of your channels, because you can't take back admin from a user once given.

The command-line tool

There's also a new command line tool designed to do things like:

  • System notify users (all users/users from a list, specific user)
  • delete sessions/devices not seen for X days
  • purge the remote media cache
  • select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
  • purge history of theses rooms
  • shutdown rooms

This tool and Mjolnir are based on the admin API built into Synapse.

Rate limiting

Synapse has pretty good built-in rate-limiting which blocks repeated login, registration, joining, or messaging attempts. It may also end up throttling servers on the federation based on those settings.

Fundamental federation problems

Because users joining a room may come from another server, room moderators are at the mercy of the registration and moderation policies of those servers. Matrix is like IRC's +R mode ("only registered users can join") by default, except that anyone can register their own homeserver, which makes this limited.

Server admins can block IP addresses and home servers, but those tools are not currently available to room admins. So it would be nice to have room admins have that capability, just like IRC channel admins can block users based on their IP address.

Matrix has the concept of guest accounts, but it is not used very much, and virtually no client supports it. This contrasts with the way IRC works: by default, anyone can join an IRC network even without authentication. Some channels require registration, but in general you are free to join and look around (until you get blocked, of course).

I have heard anecdotal evidence that "moderating bridges is hell", and I can imagine why. Moderation is already hard enough on one federation, when you bridge a room with another network, you inherit all the problems from that network but without the entire abuse control tools from the original network's API...

Room admins

Matrix, in particular, has the problem that room administrators (which have the power to redact messages, ban users, and promote other users) are bound to their Matrix ID which is, in turn, bound to their home servers. This implies that a home server administrators could (1) impersonate a given user and (2) use that to hijack the room. So in practice, the home server is the trust anchor for rooms, not the user themselves.

That said, if server B administrator hijack user joe on server B, they will hijack that room on that specific server. This will not (necessarily) affect users on the other servers, as servers could refuse parts of the updates or ban the compromised account (or server).

It does seem like a major flaw that room credentials are bound to Matrix identifiers, as opposed to the E2E encryption credentials. In an encrypted room even with fully verified members, a compromised or hostile home server can still take over the room by impersonating an admin. That admin (or even a newly minted user) can then send events or listen on the conversations.

This is even more frustrating when you consider that Matrix events are actually signed and therefore have some authentication attached to them, acting like some sort of Merkle tree (as it contains a link to previous events). That signature, however, is made from the homeserver PKI keys, not the client's E2E keys, which makes E2E feel like it has been "bolted on" later.

Availability

While Matrix has a strong advantage over Signal in that it's decentralized (so anyone can run their own homeserver,), I couldn't find an easy way to run a "multi-primary" setup, or even a "redundant" setup (even if with a single primary backend), short of going full-on "replicate PostgreSQL and Redis data", which is not typically for the faint of heart.

How this works in IRC

On IRC, it's quite easy to setup redundant nodes. All you need is:

  1. a new machine (with it's own public address with an open port)

  2. a shared secret (or certificate) between that machine and an existing one on the network

  3. a connect {} block on both servers

That's it: the node will join the network and people can connect to it as usual and share the same user/namespace as the rest of the network. The servers take care of synchronizing state: you do not need about replicating a database server.

(Now, experienced IRC people will know there's a catch here: IRC doesn't have authentication built in, and relies on "services" which are basically bots that authenticate users (I'm simplifying, don't nitpick). If that service goes down, the network still works, but then people can't authenticate, and they can start doing nasty things like steal people's identity if they get knocked offline. But still: basic functionality still works: you can talk in rooms and with users that are on the reachable network.)

User identities

Matrix is more complicated. Each "home server" has its own identity namespace: a specific user (say @anarcat:matrix.org) is bound to that specific home server. If that server goes down, that user is completely disconnected. They could register a new account elsewhere and reconnect, but then they basically lose all their configuration: contacts, joined channels are all lost.

(Also notice how the Matrix IDs don't look like a typical user address like an email in XMPP. They at least did their homework and got the allocation for the scheme.)

Rooms

Users talk to each other in "rooms", even in one-to-one communications. (Rooms are also used for other things like "spaces", they're basically used for everything, think "everything is a file" kind of tool.) For rooms, home servers act more like IRC nodes in that they keep a local state of the chat room and synchronize it with other servers. Users can keep talking inside a room if the server that originally hosts the room goes down. Rooms can have a local, server-specific "alias" so that, say, #room:matrix.org is also visible as #room:example.com on the example.com home server. Both addresses refer to the same room underlying room.

(Finding this in the Element settings is not obvious though, because that "alias" are actually called a "local address" there. So to create such an alias (in Element), you need to go in the room settings' "General" section, "Show more" in "Local address", then add the alias name (e.g. foo), and then that room will be available on your example.com homeserver as #foo:example.com.)

So a room doesn't belong to a server, it belongs to the federation, and anyone can join the room from any serer (if the room is public, or if invited otherwise). You can create a room on server A and when a user from server B joins, the room will be replicated on server B as well. If server A fails, server B will keep relaying traffic to connected users and servers.

A room is therefore not fundamentally addressed with the above alias, instead ,it has a internal Matrix ID, which basically a random string. It has a server name attached to it, but that was made just to avoid collisions. That can get a little confusing. For example, the #fractal:gnome.org room is an alias on the gnome.org server, but the room ID is !hwiGbsdSTZIwSRfybq:matrix.org. That's because the room was created on matrix.org, but the preferred branding is gnome.org now.

As an aside, rooms, by default, live forever, even after the last user quits. There's an admin API to delete rooms and a tombstone event to redirect to another one, but neither have a GUI yet. The latter is part of MSC1501 ("Room version upgrades") which allows a room admin to close a room, with a message and a pointer to another room.

Spaces

Discovering rooms can be tricky: there is a per-server room directory, but Matrix.org people are trying to deprecate it in favor of "Spaces". Room directories were ripe for abuse: anyone can create a room, so anyone can show up in there. It's possible to restrict who can add aliases, but anyways directories were seen as too limited.

In contrast, a "Space" is basically a room that's an index of other rooms (including other spaces), so existing moderation and administration mechanism that work in rooms can (somewhat) work in spaces as well. This enables a room directory that works across federation, regardless on which server they were originally created.

New users can be added to a space or room automatically in Synapse. (Existing users can be told about the space with a server notice.) This gives admins a way to pre-populate a list of rooms on a server, which is useful to build clusters of related home servers, providing some sort of redundancy, at the room -- not user -- level.

Home servers

So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported.

The documentation used to say you could "run a hot spare" but that has been removed. Last I heard, it was not possible to run a high-availability setup where multiple, separate locations could replace each other automatically. You can have high performance setups where the load gets distributed among workers, but those are based on a shared database (Redis and PostgreSQL) backend.

So my guess is it would be possible to create a "warm" spare server of a matrix home server with regular PostgreSQL replication, but that is not documented in the Synapse manual. This sort of setup would also not be useful to deal with networking issues or denial of service attacks, as you will not be able to spread the load over multiple network locations easily. Redis and PostgreSQL heroes are welcome to provide their multi-primary solution in the comments. In the meantime, I'll just point out this is a solution that's handled somewhat more gracefully in IRC, by having the possibility of delegating the authentication layer.

Delegations

If you do not want to run a Matrix server yourself, it's possible to delegate the entire thing to another server. There's a server discovery API which uses the .well-known pattern (or SRV records, but that's "not recommended" and a bit confusing) to delegate that service to another server. Be warned that the server still needs to be explicitly configured for your domain. You can't just put:

{ "m.server": "matrix.org:443" }

... on https://example.com/.well-known/matrix/server and start using @you:example.com as a Matrix ID. That's because Matrix doesn't support "virtual hosting" and you'd still be connecting to rooms and people with your matrix.org identity, not example.com as you would normally expect. This is also why you cannot rename your home server.

The server discovery API is what allows servers to find each other. Clients, on the other hand, use the client-server discovery API: this is what allows a given client to find your home server when you type your Matrix ID on login.

Performance

The high availability discussion brushed over the performance of Matrix itself, but let's now dig into that.

Horizontal scalability

There were serious scalability issues of the main Matrix server, Synapse, in the past. So the Matrix team has been working hard to improve its design. Since Synapse 1.22 the home server can horizontally to multiple workers (see this blog post for details) which can make it easier to scale large servers.

Other implementations

There are other promising home servers implementations from a performance standpoint (dendrite, Golang, entered beta in late 2020; conduit, Rust, beta; others), but none of those are feature-complete so there's a trade-off to be made there. Synapse is also adding a lot of feature fast, so it's an open question whether the others will ever catch up. (I have heard that Dendrite might actually surpass Synapse in features within a few years, which would put Synapse in a more "LTS" situation.)

Latency

Matrix can feel slow sometimes. For example, joining the "Matrix HQ" room in Element (from matrix.debian.social) takes a few minutes and then fails. That is because the home server has to sync the entire room state when you join the room. There was promising work on this announced in the lengthy 2021 retrospective, and some of that work landed (partial sync) in the 1.53 release already. Other improvements coming include sliding sync, lazy loading over federation, and fast room joins. So that's actually something that could be fixed in the fairly short term.

But in general, communication in Matrix doesn't feel as "snappy" as on IRC or even Signal. It's hard to quantify this without instrumenting a full latency test bed (for example the tools I used in the terminal emulators latency tests), but even just typing in a web browser feels slower than typing in a xterm or Emacs for me.

Even in conversations, I "feel" people don't immediately respond as fast. In fact, this could be an interesting double-blind experiment to make: have people guess whether they are talking to a person on Matrix, XMPP, or IRC, for example. My theory would be that people could notice that Matrix users are slower, if only because of the TCP round-trip time each message has to take.

Transport

Some courageous person actually made some tests of various messaging platforms on a congested network. His evaluation was basically:

  • Briar: uses Tor, so unusable except locally
  • Matrix: "struggled to send and receive messages", joining a room takes forever as it has to sync all history, "took 20-30 seconds for my messages to be sent and another 20 seconds for further responses"
  • XMPP: "worked in real-time, full encryption, with nearly zero lag"

So that was interesting. I suspect IRC would have also fared better, but that's just a feeling.

Other improvements to the transport layer include support for websocket and the CoAP proxy work from 2019 (targeting 100bps links), but both seem stalled at the time of writing. The Matrix people have also announced the pinecone p2p overlay network which aims at solving large, internet-scale routing problems. See also this talk at FOSDEM 2022.

Usability Onboarding and workflow

The workflow for joining a room, when you use Element web, is not great:

  1. click on a link in a web browser
  2. land on (say) https://matrix.to/#/#matrix-dev:matrix.org
  3. offers "Element", yeah that's sounds great, let's click "Continue"
  4. land on https://app.element.io/#/room%2F%23matrix-dev%3Amatrix.org and then you need to register, aaargh

As you might have guessed by now, there is a specification to solve this, but web browsers need to adopt it as well, so that's far from actually being solved. At least browsers generally know about the matrix: scheme, it's just not exactly clear what they should do with it, especially when the handler is just another web page (e.g. Element web).

In general, when compared with tools like Signal or WhatsApp, Matrix doesn't fare so well in terms of user discovery. I probably have some of my normal contacts that have a Matrix account as well, but there's really no way to know. It's kind of creepy when Signal tells you "this person is on Signal!" but it's also pretty cool that it works, and they actually implemented it pretty well.

Registration is also less obvious: in Signal, the app confirms your phone number automatically. It's friction-less and quick. In Matrix, you need to learn about home servers, pick one, register (with a password! aargh!), and then setup encryption keys (not default), etc. It's a lot more friction.

And look, I understand: giving away your phone number is a huge trade-off. I don't like it either. But it solves a real problem and makes encryption accessible to a ton more people. Matrix does have "identity servers" that can serve that purpose, but I don't feel confident sharing my phone number there. It doesn't help that the identity servers don't have private contact discovery: giving them your phone number is a more serious security compromise than with Signal.

There's a catch-22 here too: because no one feels like giving away their phone numbers, no one does, and everyone assumes that stuff doesn't work anyways. Like it or not, Signal forcing people to divulge their phone number actually gives them critical mass that means actually a lot of my relatives are on Signal and I don't have to install crap like WhatsApp to talk with them.

5 minute clients evaluation

Throughout all my tests I evaluated a handful of Matrix clients, mostly from Flathub because almost none of them are packaged in Debian.

Right now I'm using Element, the flagship client from Matrix.org, in a web browser window, with the PopUp Window extension. This makes it look almost like a native app, and opens links in my main browser window (instead of a new tab in that separate window), which is nice. But I'm tired of buying memory to feed my web browser, so this indirection has to stop. Furthermore, I'm often getting completely logged off from Element, which means re-logging in, recovering my security keys, and reconfiguring my settings. That is extremely annoying.

Coming from Irssi, Element is really "GUI-y" (pronounced "gooey"). Lots of clickety happening. To mark conversations as read, in particular, I need to click-click-click on all the tabs that have some activity. There's no "jump to latest message" or "mark all as read" functionality as far as I could tell. In Irssi the former is built-in (alt-a) and I made a custom /READ command for the latter:

/ALIAS READ script exec \$_->activity(0) for Irssi::windows

And yes, that's a Perl script in my IRC client. I am not aware of any Matrix client that does stuff like that, except maybe Weechat, if we can call it a Matrix client, or Irssi itself, now that it has a Matrix plugin (!).

As for other clients, I have looked through the Matrix Client Matrix (confusing right?) to try to figure out which one to try, and, even after selecting Linux as a filter, the chart is just too wide to figure out anything. So I tried those, kind of randomly:

  • Fractal
  • Mirage
  • Nheko
  • Quaternion

Unfortunately, I lost my notes on those, I don't actually remember which one did what. I still have a session open with Mirage, so I guess that means it's the one I preferred, but I remember they were also all very GUI-y.

Maybe I need to look at weechat-matrix or gomuks. At least Weechat is scriptable so I could continue playing the power-user. Right now my strategy with messaging (and that includes microblogging like Twitter or Mastodon) is that everything goes through my IRC client, so Weechat could actually fit well in there. Going with gomuks, on the other hand, would mean running it in parallel with Irssi or ... ditching IRC, which is a leap I'm not quite ready to take just yet.

Oh, and basically none of those clients (except Nheko and Element) support VoIP, which is still kind of a second-class citizen in Matrix. It does not support large multimedia rooms, for example: Jitsi was used for FOSDEM instead of the native videoconferencing system.

Bots

This falls a little aside the "usability" section, but I didn't know where to put this... There's a few Matrix bots out there, and you are likely going to be able to replace your existing bots with Matrix bots. It's true that IRC has a long and impressive history with lots of various bots doing various things, but given how young Matrix is, there's still a good variety:

  • maubot: generic bot with tons of usual plugins like sed, dice, karma, xkcd, echo, rss, reminder, translate, react, exec, gitlab/github webhook receivers, weather, etc
  • opsdroid: framework to implement "chat ops" in Matrix, connects with Matrix, GitHub, GitLab, Shell commands, Slack, etc
  • matrix-nio: another framework, used to build lots more bots like:
    • hemppa: generic bot with various functionality like weather, RSS feeds, calendars, cron jobs, OpenStreetmaps lookups, URL title snarfing, wolfram alpha, astronomy pic of the day, Mastodon bridge, room bridging, oh dear
    • devops: ping, curl, etc
    • podbot: play podcast episodes from AntennaPod
    • cody: Python, Ruby, Javascript REPL
    • eno: generic bot, "personal assistant"
  • mjolnir: moderation bot
  • hookshot: bridge with GitLab/GitHub
  • matrix-monitor-bot: latency monitor

One thing I haven't found an equivalent for is Debian's MeetBot. There's an archive bot but it doesn't have topics or a meeting chair, or HTML logs.

Working on Matrix

As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it.

(One trendy answer to this problem is to "rewrite it in rust": Matrix are working on implementing a lot of those specifications in a matrix-rust-sdk that's designed to take the implementation details away from users.)

Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150.

That's a lot of text which includes stuff like 3D worlds which, frankly, I don't think you should be working on when you have such important security and usability problems. (The internet as a whole, arguably, doesn't fare much better. RFC600 is a really obscure discussion about "INTERFACING AN ILLINOIS PLASMA TERMINAL TO THE ARPANET". Maybe that's how many MSCs will end up as well, left forgotten in the pits of history.)

And that's the thing: maybe the Matrix people have a different objective than I have. They want to connect everything to everything, and make Matrix a generic transport for all sorts of applications, including virtual reality, collaborative editors, and so on.

I just want secure, simple messaging. Possibly with good file transfers, and video calls. That it works with existing stuff is good, and it should be federated to remove the "Signal point of failure". So I'm a bit worried with the direction all those MSCs are taking, especially when you consider that clients other than Element are still struggling to keep up with basic features like end-to-end encryption or room discovery, never mind voice or spaces...

Conclusion

Overall, Matrix is somehow in the space XMPP was a few years ago. It has a ton of features, pretty good clients, and a large community. It seems to have gained some of the momentum that XMPP has lost. It may have the most potential to replace Signal if something bad would happen to it (like, I don't know, getting banned or going nuts with cryptocurrency)...

But it's really not there yet, and I don't see Matrix trying to get there either, which is a bit worrisome.

Looking back at history

I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation.

IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it. The "short" version is:

  • 1988: Finish researcher publishes first IRC source code
  • 1989: 40 servers worldwide, mostly universities
  • 1990: EFnet ("eris-free network") fork which blocks the "open relay", named Eris - followers of Eris form the A-net, which promptly dissolves itself, with only EFnet remaining
  • 1992: Undernet fork, which offered authentication ("services"), routing improvements and timestamp-based channel synchronisation
  • 1994: DALnet fork, from Undernet, again on a technical disagreement
  • 1995: Freenode founded
  • 1996: IRCnet forks from EFnet, following a flame war of historical proportion, splitting the network between Europe and the Americas
  • 1997: Quakenet founded
  • 1999: (XMPP founded)
  • 2001: 6 million users, OFTC founded
  • 2002: DALnet peaks at 136,000 users
  • 2003: IRC as a whole peaks at 10 million users, EFnet peaks at 141,000 users
  • 2004: (Facebook founded), Undernet peaks at 159,000 users
  • 2005: Quakenet peaks at 242,000 users, IRCnet peaks at 136,000 (Youtube founded)
  • 2006: (Twitter founded)
  • 2009: (WhatsApp, Pinterest founded)
  • 2010: (TextSecure AKA Signal, Instagram founded)
  • 2011: (Snapchat founded)
  • ~2013: Freenode peaks at ~100,000 users
  • 2016: IRCv3 standardisation effort started (TikTok founded)
  • 2021: Freenode self-destructs, Libera chat founded
  • 2022: Libera peaks at 50,000 users, OFTC peaks at 30,000 users

(The numbers were taken from the Wikipedia page and Netsplit.de. Note that I also include other networks launch in parenthesis for context.)

Pretty dramatic, don't you think? Eventually, somehow, IRC became irrelevant for most people: few people are even aware of it now. With less than a million users active, it's smaller than Mastodon, XMPP, or Matrix at this point.1 If I were to venture a guess, I'd say that infighting, lack of a standardization body, and a somewhat annoying protocol meant the network could not grow. It's also possible that the decentralised yet centralised structure of IRC networks limited their reliability and growth.

But large social media companies have also taken over the space: observe how IRC numbers peak around the time the wave of large social media companies emerge, especially Facebook (2.9B users!!) and Twitter (400M users).

Where the federated services are in history

Right now, Matrix, and Mastodon (and email!) are at the "pre-EFnet" stage: anyone can join the federation. Mastodon has started working on a global block list of fascist servers which is interesting, but it's still an open federation. Right now, Matrix is totally open, but matrix.org publishes a (federated) block list of hostile servers (#matrix-org-coc-bl:matrix.org, yes, of course it's a room).

Interestingly, Email is also in that stage, where there are block lists of spammers, and it's a race between those blockers and spammers. Large email providers, obviously, are getting closer to the EFnet stage: you could consider they only accept email from themselves or between themselves. It's getting increasingly hard to deliver mail to Outlook and Gmail for example, partly because of bias against small providers, but also because they are including more and more machine-learning tools to sort through email and those systems are, fundamentally, unknowable. It's not quite the same as splitting the federation the way EFnet did, but the effect is similar.

HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing. Which is how Google became one of the most powerful corporations on earth, and how they became the gatekeepers of human knowledge online.

I have only briefly mentioned XMPP here, and my XMPP fans will undoubtedly comment on that, but I think it's somewhere in the middle of all of this. It was co-opted by Facebook and Google, and both corporations have abandoned it to its fate. I remember fondly the days where I could do instant messaging with my contacts who had a Gmail account. Those days are gone, and I don't talk to anyone over Jabber anymore, unfortunately. And this is a threat that Matrix still has to face.

It's also the threat Email is currently facing. On the one hand corporations like Facebook want to completely destroy it and have mostly succeeded: many people just have an email account to register on things and talk to their friends over Instagram or (lately) TikTok (which, I know, is not Facebook, but they started that fire).

On the other hand, you have corporations like Microsoft and Google who are still using and providing email services — because, frankly, you still do need email for stuff, just like fax is still around — but they are more and more isolated in their own silo. At this point, it's only a matter of time they reach critical mass and just decide that the risk of allowing external mail coming in is not worth the cost. They'll simply flip the switch and work on an allow-list principle. Then we'll have closed the loop and email will be dead, just like IRC is "dead" now.

I wonder which path Matrix will take. Could it liberate us from these vicious cycles?

  1. According to Wikipedia, there are currently about 500 distinct IRC networks operating, on about 1,000 servers, serving over 250,000 users. In contrast, Mastodon seems to be around 5 million users, Matrix.org claimed at FOSDEM 2021 to have about 28 million globally visible accounts, and Signal lays claim to over 40 million souls. XMPP claims to have "millions" of users on the xmpp.org homepage but the FAQ says they don't actually know. On the proprietary silo side of the fence, this page says

    • Facebook: 2.9 billion users
    • WhatsApp: 2B
    • Instagram: 1.4B
    • TikTok: 1B
    • Snapchat: 500M
    • Pinterest: 480M
    • Twitter: 397M

    Notable omission from that list: Youtube, with its mind-boggling 2.6 billion users...

    Those are not the kind of numbers you just "need to convince a brother or sister" to grow the network...

Categories: FLOSS Project Planets

Antoine Beaupré: Matrix notes

Planet Debian - Fri, 2022-06-17 11:34

I have some concerns about Matrix (the protocol, not the movie that came out recently, although I do have concerns about that as well). I've been watching the project for a long time, and it seems more a promising alternative to many protocols like IRC, XMPP, and Signal.

This review may sound a bit negative, because it focuses on those concerns. I am the operator of an IRC network and people keep asking me to bridge it with Matrix. I have myself considered just giving up on IRC and converting to Matrix. This space is a living document exploring my research of that problem space. The TL;DR: is that no, I'm not setting up a bridge just yet, and I'm still on IRC.

This article was written over the course of the last three months, but I have been watching the Matrix project for years (my logs seem to say 2016 at least). The article is rather long. It will likely take you half an hour to read, so copy this over to your ebook reader, your tablet, or dead trees, and lean back and relax as I show you around the Matrix. Or, alternatively, just jump to a section that interest you, most likely the conclusion.

Introduction to Matrix

Matrix is an "open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history".

It's also (when compared with XMPP) "an eventually consistent global JSON database with an HTTP API and pubsub semantics - whilst XMPP can be thought of as a message passing protocol."

According to their FAQ, the project started in 2014, has about 20,000 servers, and millions of users. Matrix works over HTTPS but over a special port: 8448.

Security and privacy

I have some concerns about the security promises of Matrix. It's advertised as a "secure" with "E2E [end-to-end] encryption", but how does it actually work?

Data retention defaults

One of my main concerns with Matrix is data retention, which is a key part of security in a threat model where (for example) an hostile state actor wants to surveil your communications and can seize your devices.

On IRC, servers don't actually keep messages all that long: they pass them along to other servers and clients as fast as they can, only keep them in memory, and move on to the next message. There are no concerns about data retention on messages (and their metadata) other than the network layer. (I'm ignoring the issues with user registration, which is a separate, if valid, concern.) Obviously, an hostile server could log everything passing through it, but IRC federations are normally tightly controlled. So, if you trust your IRC operators, you should be fairly safe. Obviously, clients can (and often do, even if OTR is configured!) log all messages, but this is generally not the default. Irssi, for example, does not log by default. IRC bouncers are more likely to log to disk, of course, to be able to do what they do.

Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata.

There is a mechanism to expire entries in Synapse, but it is not enabled by default. So one should generally assume that a message sent on Matrix is never expired.

GDPR in the federation

But even if that setting was enabled by default, how do you control it? This is a fundamental problem of the federation: if any user is allowed to join a room (which is the default), those user's servers will log all content and metadata from that room. That includes private, one-on-one conversations, since those are essentially rooms as well.

In the context of the GDPR, this is really tricky: who is the responsible party (known as the "data controller") here? It's basically any yahoo who fires up a home server and joins a room.

In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:

  1. enumerate all the users that ever joined the room while you were there
  2. discover all their home servers
  3. start a GDPR procedure against all those servers

I recognize this is a hard problem to solve while still keeping an open ecosystem. But I believe that Matrix should have much stricter defaults towards data retention than right now. Message expiry should be enforced by default, for example. (Note that there are also redaction policies that could be used to implement part of the GDPR automatically, see the privacy policy discussion below on that.)

Also keep in mind that, in the brave new peer-to-peer world that Matrix is heading towards, the boundary between server and client is likely to be fuzzier, which would make applying the GDPR even more difficult.

In fact, maybe Synapse should be designed so that there's no configurable flag to turn off data retention. A bit like how most system loggers in UNIX (e.g. syslog) come with a log retention system that typically rotate logs after a few weeks or month. Historically, this was designed to keep hard drives from filling up, but it also has the added benefit of limiting the amount of personal information kept on disk in this modern day. (Arguably, syslog doesn't rotate logs on its own, but, say, Debian GNU/Linux, as an installed system, does have log retention policies well defined for installed packages, and those can be discussed. And "no expiry" is definitely a bug.

Matrix.org privacy policy

When I first looked at Matrix, five years ago, Element.io was called Riot.im and had a rather dubious privacy policy:

We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service.

[...]

This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.

When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here).

They also included a "free to snitch" clause:

If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.

Those are really broad terms, above and beyond what is typically expected legally.

Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers.

Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since.

Notable points of the new privacy policies:

  • 2.3.1.1: the "federation" section actually outlines that "Federated homeservers and Matrix clients which respect the Matrix protocol are expected to honour these controls and redaction/erasure requests, but other federated homeservers are outside of the span of control of Element, and we cannot guarantee how this data will be processed"
  • 2.6: users under the age of 16 should not use the matrix.org service
  • 2.10: Upcloud, Mythic Beast, Amazon, and CloudFlare possibly have access to your data (it's nice to at least mention this in the privacy policy: many providers don't even bother admitting to this kind of delegation)
  • Element 2.2.1: mentions many more third parties (Twilio, Stripe, Quaderno, LinkedIn, Twitter, Google, Outplay, PipeDrive, HubSpot, Posthog, Sentry, and Matomo (phew!) used when you are paying Matrix.org for hosting

I'm not super happy with all the trackers they have on the Element platform, but then again you don't have to use that service. Your favorite homeserver (assuming you are not on Matrix.org) probably has their own Element deployment, hopefully without all that garbage.

Overall, this is all a huge improvement over the previous privacy policy, so hats off to the Matrix people for figuring out a reasonable policy in such a tricky context. I particularly like this bit:

We will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.

It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal.

As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.

Metadata handling

Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind.

This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin.

Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation.

To be fair, IRC doesn't have a great story here either: any client knows not only who's talking to who in a room, but also typically their client IP address. Servers can (and often do) obfuscate this, but often that obfuscation is trivial to reverse. Some servers do provide "cloaks" (sometimes automatically), but that's kind of a "slap-on" solution that actually moves the problem elsewhere: now the server knows a little more about the user.

Overall, I would worry much more about a Matrix home server seizure than a IRC or Signal server seizure. Signal does get subpoenas, and they can only give out a tiny bit of information about their users: their phone number, and their registration, and last connection date. Matrix carries a lot more information in its database.

Amplification attacks on URL previews

I (still!) run an Icecast server and sometimes share links to it on IRC which, obviously, also ends up on (more than one!) Matrix home servers because some people connect to IRC using Matrix. This, in turn, means that Matrix will connect to that URL to generate a link preview.

I feel this outlines a security issue, especially because those sockets would be kept open seemingly forever. I tried to warn the Matrix security team but somehow, I don't think this issue was taken very seriously. Here's the disclosure timeline:

  • January 18: contacted Matrix security
  • January 19: response: already reported as a bug
  • January 20: response: can't reproduce
  • January 31: timeout added, considered solved
  • January 31: I respond that I believe the security issue is underestimated, ask for clearance to disclose
  • February 1: response: asking for two weeks delay after the next release (1.53.0) including another patch, presumably in two weeks' time
  • February 22: Matrix 1.53.0 released
  • April 14: I notice the release, ask for clearance again
  • April 14: response: referred to the public disclosure

There are a couple of problems here:

  1. the bug was publicly disclosed in September 2020, and not considered a security issue until I notified them, and even then, I had to insist

  2. no clear disclosure policy timeline was proposed or seems established in the project (there is a security disclosure policy but it doesn't include any predefined timeline)

  3. I wasn't informed of the disclosure

  4. the actual solution is a size limit (10MB, already implemented), a time limit (30 seconds, implemented in PR 11784), and a content type allow list (HTML, "media" or JSON, implemented in PR 11936), and I'm not sure it's adequate

  5. (pure vanity:) I did not make it to their Hall of fame

I'm not sure those solutions are adequate because they all seem to assume a single home server will pull that one URL for a little while then stop. But in a federated network, many (possibly thousands) home servers may be connected in a single room at once. If an attacker drops a link into such a room, all those servers would connect to that link all at once. This is an amplification attack: a small amount of traffic will generate a lot more traffic to a single target. It doesn't matter there are size or time limits: the amplification is what matters here.

It should also be noted that clients that generate link previews have more amplification because they are more numerous than servers. And of course, the default Matrix client (Element) does generate link previews as well.

That said, this is possibly not a problem specific to Matrix: any federated service that generates link previews may suffer from this.

I'm honestly not sure what the solution is here. Maybe moderation? Maybe link previews are just evil? All I know is there was this weird bug in my Icecast server and I tried to ring the bell about it, and it feels it was swept under the rug. Somehow I feel this is bound to blow up again in the future, even with the current mitigation.

Moderation

In Matrix like elsewhere, Moderation is a hard problem. There is a detailed moderation guide and much of this problem space is actively worked on in Matrix right now. A fundamental problem with moderating a federated space is that a user banned from a room can rejoin the room from another server. This is why spam is such a problem in Email, and why IRC networks have stopped federating ages ago (see the IRC history for that fascinating story).

The mjolnir bot

The mjolnir moderation bot is designed to help with some of those things. It can kick and ban users, redact all of a user's message (as opposed to one by one), all of this across multiple rooms. It can also subscribe to a federated block list published by matrix.org to block known abusers (users or servers). Bans are pretty flexible and can operate at the user, room, or server level.

Matrix people suggest making the bot admin of your channels, because you can't take back admin from a user once given.

The command-line tool

There's also a new command line tool designed to do things like:

  • System notify users (all users/users from a list, specific user)
  • delete sessions/devices not seen for X days
  • purge the remote media cache
  • select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
  • purge history of theses rooms
  • shutdown rooms

This tool and Mjolnir are based on the admin API built into Synapse.

Rate limiting

Synapse has pretty good built-in rate-limiting which blocks repeated login, registration, joining, or messaging attempts. It may also end up throttling servers on the federation based on those settings.

Fundamental federation problems

Because users joining a room may come from another server, room moderators are at the mercy of the registration and moderation policies of those servers. Matrix is like IRC's +R mode ("only registered users can join") by default, except that anyone can register their own homeserver, which makes this limited.

Server admins can block IP addresses and home servers, but those tools are not currently available to room admins. So it would be nice to have room admins have that capability, just like IRC channel admins can block users based on their IP address.

Matrix has the concept of guest accounts, but it is not used very much, and virtually no client supports it. This contrasts with the way IRC works: by default, anyone can join an IRC network even without authentication. Some channels require registration, but in general you are free to join and look around (until you get blocked, of course).

I have heard anecdotal evidence that "moderating bridges is hell", and I can imagine why. Moderation is already hard enough on one federation, when you bridge a room with another network, you inherit all the problems from that network but without the entire abuse control tools from the original network's API...

Room admins

Matrix, in particular, has the problem that room administrators (which have the power to redact messages, ban users, and promote other users) are bound to their Matrix ID which is, in turn, bound to their home servers. This implies that a home server administrators could (1) impersonate a given user and (2) use that to hijack the room. So in practice, the home server is the trust anchor for rooms, not the user themselves.

That said, if server B administrator hijack user joe on server B, they will hijack that room on that specific server. This will not (necessarily) affect users on the other servers, as servers could refuse parts of the updates or ban the compromised account (or server).

It does seem like a major flaw that room credentials are bound to Matrix identifiers, as opposed to the E2E encryption credentials. In an encrypted room even with fully verified members, a compromised or hostile home server can still take over the room by impersonating an admin. That admin (or even a newly minted user) can then send events or listen on the conversations.

This is even more frustrating when you consider that Matrix events are actually signed and therefore have some authentication attached to them, acting like some sort of Merkle tree (as it contains a link to previous events). That signature, however, is made from the homeserver PKI keys, not the client's E2E keys, which makes E2E feel like it has been "bolted on" later.

Availability

While Matrix has a strong advantage over Signal in that it's decentralized (so anyone can run their own homeserver,), I couldn't find an easy way to run a "multi-primary" setup, or even a "redundant" setup (even if with a single primary backend), short of going full-on "replicate PostgreSQL and Redis data", which is not typically for the faint of heart.

How this works in IRC

On IRC, it's quite easy to setup redundant nodes. All you need is:

  1. a new machine (with it's own public address with an open port)

  2. a shared secret (or certificate) between that machine and an existing one on the network

  3. a connect {} block on both servers

That's it: the node will join the network and people can connect to it as usual and share the same user/namespace as the rest of the network. The servers take care of synchronizing state: you do not need about replicating a database server.

(Now, experienced IRC people will know there's a catch here: IRC doesn't have authentication built in, and relies on "services" which are basically bots that authenticate users (I'm simplifying, don't nitpick). If that service goes down, the network still works, but then people can't authenticate, and they can start doing nasty things like steal people's identity if they get knocked offline. But still: basic functionality still works: you can talk in rooms and with users that are on the reachable network.)

User identities

Matrix is more complicated. Each "home server" has its own identity namespace: a specific user (say @anarcat:matrix.org) is bound to that specific home server. If that server goes down, that user is completely disconnected. They could register a new account elsewhere and reconnect, but then they basically lose all their configuration: contacts, joined channels are all lost.

(Also notice how the Matrix IDs don't look like a typical user address like an email in XMPP. They at least did their homework and got the allocation for the scheme.)

Rooms

Users talk to each other in "rooms", even in one-to-one communications. (Rooms are also used for other things like "spaces", they're basically used for everything, think "everything is a file" kind of tool.) For rooms, home servers act more like IRC nodes in that they keep a local state of the chat room and synchronize it with other servers. Users can keep talking inside a room if the server that originally hosts the room goes down. Rooms can have a local, server-specific "alias" so that, say, #room:matrix.org is also visible as #room:example.com on the example.com home server. Both addresses refer to the same room underlying room.

(Finding this in the Element settings is not obvious though, because that "alias" are actually called a "local address" there. So to create such an alias (in Element), you need to go in the room settings' "General" section, "Show more" in "Local address", then add the alias name (e.g. foo), and then that room will be available on your example.com homeserver as #foo:example.com.)

So a room doesn't belong to a server, it belongs to the federation, and anyone can join the room from any serer (if the room is public, or if invited otherwise). You can create a room on server A and when a user from server B joins, the room will be replicated on server B as well. If server A fails, server B will keep relaying traffic to connected users and servers.

A room is therefore not fundamentally addressed with the above alias, instead ,it has a internal Matrix ID, which basically a random string. It has a server name attached to it, but that was made just to avoid collisions. That can get a little confusing. For example, the #fractal:gnome.org room is an alias on the gnome.org server, but the room ID is !hwiGbsdSTZIwSRfybq:matrix.org. That's because the room was created on matrix.org, but the preferred branding is gnome.org now.

As an aside, rooms, by default, live forever, even after the last user quits. There's an admin API to delete rooms and a tombstone event to redirect to another one, but neither have a GUI yet. The latter is part of MSC1501 ("Room version upgrades") which allows a room admin to close a room, with a message and a pointer to another room.

Spaces

Discovering rooms can be tricky: there is a per-server room directory, but Matrix.org people are trying to deprecate it in favor of "Spaces". Room directories were ripe for abuse: anyone can create a room, so anyone can show up in there. It's possible to restrict who can add aliases, but anyways directories were seen as too limited.

In contrast, a "Space" is basically a room that's an index of other rooms (including other spaces), so existing moderation and administration mechanism that work in rooms can (somewhat) work in spaces as well. This enables a room directory that works across federation, regardless on which server they were originally created.

New users can be added to a space or room automatically in Synapse. (Existing users can be told about the space with a server notice.) This gives admins a way to pre-populate a list of rooms on a server, which is useful to build clusters of related home servers, providing some sort of redundancy, at the room -- not user -- level.

Home servers

So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported.

The documentation used to say you could "run a hot spare" but that has been removed. Last I heard, it was not possible to run a high-availability setup where multiple, separate locations could replace each other automatically. You can have high performance setups where the load gets distributed among workers, but those are based on a shared database (Redis and PostgreSQL) backend.

So my guess is it would be possible to create a "warm" spare server of a matrix home server with regular PostgreSQL replication, but that is not documented in the Synapse manual. This sort of setup would also not be useful to deal with networking issues or denial of service attacks, as you will not be able to spread the load over multiple network locations easily. Redis and PostgreSQL heroes are welcome to provide their multi-primary solution in the comments. In the meantime, I'll just point out this is a solution that's handled somewhat more gracefully in IRC, by having the possibility of delegating the authentication layer.

Delegations

If you do not want to run a Matrix server yourself, it's possible to delegate the entire thing to another server. There's a server discovery API which uses the .well-known pattern (or SRV records, but that's "not recommended" and a bit confusing) to delegate that service to another server. Be warned that the server still needs to be explicitly configured for your domain. You can't just put:

{ "m.server": "matrix.org:443" }

... on https://example.com/.well-known/matrix/server and start using @you:example.com as a Matrix ID. That's because Matrix doesn't support "virtual hosting" and you'd still be connecting to rooms and people with your matrix.org identity, not example.com as you would normally expect. This is also why you cannot rename your home server.

The server discovery API is what allows servers to find each other. Clients, on the other hand, use the client-server discovery API: this is what allows a given client to find your home server when you type your Matrix ID on login.

Performance

The high availability discussion brushed over the performance of Matrix itself, but let's now dig into that.

Horizontal scalability

There were serious scalability issues of the main Matrix server, Synapse, in the past. So the Matrix team has been working hard to improve its design. Since Synapse 1.22 the home server can horizontally to multiple workers (see this blog post for details) which can make it easier to scale large servers.

Other implementations

There are other promising home servers implementations from a performance standpoint (dendrite, Golang, entered beta in late 2020; conduit, Rust, beta; others), but none of those are feature-complete so there's a trade-off to be made there. Synapse is also adding a lot of feature fast, so it's an open question whether the others will ever catch up. (I have heard that Dendrite might actually surpass Synapse in features within a few years, which would put Synapse in a more "LTS" situation.)

Latency

Matrix can feel slow sometimes. For example, joining the "Matrix HQ" room in Element (from matrix.debian.social) takes a few minutes and then fails. That is because the home server has to sync the entire room state when you join the room. There was promising work on this announced in the lengthy 2021 retrospective, and some of that work landed (partial sync) in the 1.53 release already. Other improvements coming include sliding sync, lazy loading over federation, and fast room joins. So that's actually something that could be fixed in the fairly short term.

But in general, communication in Matrix doesn't feel as "snappy" as on IRC or even Signal. It's hard to quantify this without instrumenting a full latency test bed (for example the tools I used in the terminal emulators latency tests), but even just typing in a web browser feels slower than typing in a xterm or Emacs for me.

Even in conversations, I "feel" people don't immediately respond as fast. In fact, this could be an interesting double-blind experiment to make: have people guess whether they are talking to a person on Matrix, XMPP, or IRC, for example. My theory would be that people could notice that Matrix users are slower, if only because of the TCP round-trip time each message has to take.

Transport

Some courageous person actually made some tests of various messaging platforms on a congested network. His evaluation was basically:

  • Briar: uses Tor, so unusable except locally
  • Matrix: "struggled to send and receive messages", joining a room takes forever as it has to sync all history, "took 20-30 seconds for my messages to be sent and another 20 seconds for further responses"
  • XMPP: "worked in real-time, full encryption, with nearly zero lag"

So that was interesting. I suspect IRC would have also fared better, but that's just a feeling.

Other improvements to the transport layer include support for websocket and the CoAP proxy work from 2019 (targeting 100bps links), but both seem stalled at the time of writing. The Matrix people have also announced the pinecone p2p overlay network which aims at solving large, internet-scale routing problems. See also this talk at FOSDEM 2022.

Usability Onboarding and workflow

The workflow for joining a room, when you use Element web, is not great:

  1. click on a link in a web browser
  2. land on (say) https://matrix.to/#/#matrix-dev:matrix.org
  3. offers "Element", yeah that's sounds great, let's click "Continue"
  4. land on https://app.element.io/#/room%2F%23matrix-dev%3Amatrix.org and then you need to register, aaargh

As you might have guessed by now, there is a specification to solve this, but web browsers need to adopt it as well, so that's far from actually being solved. At least browsers generally know about the matrix: scheme, it's just not exactly clear what they should do with it, especially when the handler is just another web page (e.g. Element web).

In general, when compared with tools like Signal or WhatsApp, Matrix doesn't fare so well in terms of user discovery. I probably have some of my normal contacts that have a Matrix account as well, but there's really no way to know. It's kind of creepy when Signal tells you "this person is on Signal!" but it's also pretty cool that it works, and they actually implemented it pretty well.

Registration is also less obvious: in Signal, the app confirms your phone number automatically. It's friction-less and quick. In Matrix, you need to learn about home servers, pick one, register (with a password! aargh!), and then setup encryption keys (not default), etc. It's a lot more friction.

And look, I understand: giving away your phone number is a huge trade-off. I don't like it either. But it solves a real problem and makes encryption accessible to a ton more people. Matrix does have "identity servers" that can serve that purpose, but I don't feel confident sharing my phone number there. It doesn't help that the identity servers don't have private contact discovery: giving them your phone number is a more serious security compromise than with Signal.

There's a catch-22 here too: because no one feels like giving away their phone numbers, no one does, and everyone assumes that stuff doesn't work anyways. Like it or not, Signal forcing people to divulge their phone number actually gives them critical mass that means actually a lot of my relatives are on Signal and I don't have to install crap like WhatsApp to talk with them.

5 minute clients evaluation

Throughout all my tests I evaluated a handful of Matrix clients, mostly from Flathub because almost none of them are packaged in Debian.

Right now I'm using Element, the flagship client from Matrix.org, in a web browser window, with the PopUp Window extension. This makes it look almost like a native app, and opens links in my main browser window (instead of a new tab in that separate window), which is nice. But I'm tired of buying memory to feed my web browser, so this indirection has to stop. Furthermore, I'm often getting completely logged off from Element, which means re-logging in, recovering my security keys, and reconfiguring my settings. That is extremely annoying.

Coming from Irssi, Element is really "GUI-y" (pronounced "gooey"). Lots of clickety happening. To mark conversations as read, in particular, I need to click-click-click on all the tabs that have some activity. There's no "jump to latest message" or "mark all as read" functionality as far as I could tell. In Irssi the former is built-in (alt-a) and I made a custom /READ command for the latter:

/ALIAS READ script exec \$_->activity(0) for Irssi::windows

And yes, that's a Perl script in my IRC client. I am not aware of any Matrix client that does stuff like that, except maybe Weechat, if we can call it a Matrix client, or Irssi itself, now that it has a Matrix plugin (!).

As for other clients, I have looked through the Matrix Client Matrix (confusing right?) to try to figure out which one to try, and, even after selecting Linux as a filter, the chart is just too wide to figure out anything. So I tried those, kind of randomly:

  • Fractal
  • Mirage
  • Nheko
  • Quaternion

Unfortunately, I lost my notes on those, I don't actually remember which one did what. I still have a session open with Mirage, so I guess that means it's the one I preferred, but I remember they were also all very GUI-y.

Maybe I need to look at weechat-matrix or gomuks. At least Weechat is scriptable so I could continue playing the power-user. Right now my strategy with messaging (and that includes microblogging like Twitter or Mastodon) is that everything goes through my IRC client, so Weechat could actually fit well in there. Going with gomuks, on the other hand, would mean running it in parallel with Irssi or ... ditching IRC, which is a leap I'm not quite ready to take just yet.

Oh, and basically none of those clients (except Nheko and Element) support VoIP, which is still kind of a second-class citizen in Matrix. It does not support large multimedia rooms, for example: Jitsi was used for FOSDEM instead of the native videoconferencing system.

Bots

This falls a little aside the "usability" section, but I didn't know where to put this... There's a few Matrix bots out there, and you are likely going to be able to replace your existing bots with Matrix bots. It's true that IRC has a long and impressive history with lots of various bots doing various things, but given how young Matrix is, there's still a good variety:

  • maubot: generic bot with tons of usual plugins like sed, dice, karma, xkcd, echo, rss, reminder, translate, react, exec, gitlab/github webhook receivers, weather, etc
  • opsdroid: framework to implement "chat ops" in Matrix, connects with Matrix, GitHub, GitLab, Shell commands, Slack, etc
  • matrix-nio: another framework, used to build lots more bots like:
    • hemppa: generic bot with various functionality like weather, RSS feeds, calendars, cron jobs, OpenStreetmaps lookups, URL title snarfing, wolfram alpha, astronomy pic of the day, Mastodon bridge, room bridging, oh dear
    • devops: ping, curl, etc
    • podbot: play podcast episodes from AntennaPod
    • cody: Python, Ruby, Javascript REPL
    • eno: generic bot, "personal assistant"
  • mjolnir: moderation bot
  • hookshot: bridge with GitLab/GitHub
  • matrix-monitor-bot: latency monitor

One thing I haven't found an equivalent for is Debian's MeetBot. There's an archive bot but it doesn't have topics or a meeting chair, or HTML logs.

Working on Matrix

As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it.

(One trendy answer to this problem is to "rewrite it in rust": Matrix are working on implementing a lot of those specifications in a matrix-rust-sdk that's designed to take the implementation details away from users.)

Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150.

That's a lot of text which includes stuff like 3D worlds which, frankly, I don't think you should be working on when you have such important security and usability problems. (The internet as a whole, arguably, doesn't fare much better. RFC600 is a really obscure discussion about "INTERFACING AN ILLINOIS PLASMA TERMINAL TO THE ARPANET". Maybe that's how many MSCs will end up as well, left forgotten in the pits of history.)

And that's the thing: maybe the Matrix people have a different objective than I have. They want to connect everything to everything, and make Matrix a generic transport for all sorts of applications, including virtual reality, collaborative editors, and so on.

I just want secure, simple messaging. Possibly with good file transfers, and video calls. That it works with existing stuff is good, and it should be federated to remove the "Signal point of failure". So I'm a bit worried with the direction all those MSCs are taking, especially when you consider that clients other than Element are still struggling to keep up with basic features like end-to-end encryption or room discovery, never mind voice or spaces...

Conclusion

Overall, Matrix is somehow in the space XMPP was a few years ago. It has a ton of features, pretty good clients, and a large community. It seems to have gained some of the momentum that XMPP has lost. It may have the most potential to replace Signal if something bad would happen to it (like, I don't know, getting banned or going nuts with cryptocurrency)...

But it's really not there yet, and I don't see Matrix trying to get there either, which is a bit worrisome.

Looking back at history

I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation.

IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it. The "short" version is:

  • 1988: Finish researcher publishes first IRC source code
  • 1989: 40 servers worldwide, mostly universities
  • 1990: EFnet ("eris-free network") fork which blocks the "open relay", named Eris - followers of Eris form the A-net, which promptly dissolves itself, with only EFnet remaining
  • 1992: Undernet fork, which offered authentication ("services"), routing improvements and timestamp-based channel synchronisation
  • 1994: DALnet fork, from Undernet, again on a technical disagreement
  • 1995: Freenode founded
  • 1996: IRCnet forks from EFnet, following a flame war of historical proportion, splitting the network between Europe and the Americas
  • 1997: Quakenet founded
  • 1999: (XMPP founded)
  • 2001: 6 million users, OFTC founded
  • 2002: DALnet peaks at 136,000 users
  • 2003: IRC as a whole peaks at 10 million users, EFnet peaks at 141,000 users
  • 2004: (Facebook founded), Undernet peaks at 159,000 users
  • 2005: Quakenet peaks at 242,000 users, IRCnet peaks at 136,000 (Youtube founded)
  • 2006: (Twitter founded)
  • 2009: (WhatsApp, Pinterest founded)
  • 2010: (TextSecure AKA Signal, Instagram founded)
  • 2011: (Snapchat founded)
  • ~2013: Freenode peaks at ~100,000 users
  • 2016: IRCv3 standardisation effort started (TikTok founded)
  • 2021: Freenode self-destructs, Libera chat founded
  • 2022: Libera peaks at 50,000 users, OFTC peaks at 30,000 users

(The numbers were taken from the Wikipedia page and Netsplit.de. Note that I also include other networks launch in parenthesis for context.)

Pretty dramatic, don't you think? Eventually, somehow, IRC became irrelevant for most people: few people are even aware of it now. With less than a million users active, it's smaller than Mastodon, XMPP, or Matrix at this point.1 If I were to venture a guess, I'd say that infighting, lack of a standardization body, and a somewhat annoying protocol meant the network could not grow. It's also possible that the decentralised yet centralised structure of IRC networks limited their reliability and growth.

But large social media companies have also taken over the space: observe how IRC numbers peak around the time the wave of large social media companies emerge, especially Facebook (2.9B users!!) and Twitter (400M users).

Where the federated services are in history

Right now, Matrix, and Mastodon (and email!) are at the "pre-EFnet" stage: anyone can join the federation. Mastodon has started working on a global block list of fascist servers which is interesting, but it's still an open federation. Right now, Matrix is totally open, but matrix.org publishes a (federated) block list of hostile servers (#matrix-org-coc-bl:matrix.org, yes, of course it's a room).

Interestingly, Email is also in that stage, where there are block lists of spammers, and it's a race between those blockers and spammers. Large email providers, obviously, are getting closer to the EFnet stage: you could consider they only accept email from themselves or between themselves. It's getting increasingly hard to deliver mail to Outlook and Gmail for example, partly because of bias against small providers, but also because they are including more and more machine-learning tools to sort through email and those systems are, fundamentally, unknowable. It's not quite the same as splitting the federation the way EFnet did, but the effect is similar.

HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing. Which is how Google became one of the most powerful corporations on earth, and how they became the gatekeepers of human knowledge online.

I have only briefly mentioned XMPP here, and my XMPP fans will undoubtedly comment on that, but I think it's somewhere in the middle of all of this. It was co-opted by Facebook and Google, and both corporations have abandoned it to its fate. I remember fondly the days where I could do instant messaging with my contacts who had a Gmail account. Those days are gone, and I don't talk to anyone over Jabber anymore, unfortunately. And this is a threat that Matrix still has to face.

It's also the threat Email is currently facing. On the one hand corporations like Facebook want to completely destroy it and have mostly succeeded: many people just have an email account to register on things and talk to their friends over Instagram or (lately) TikTok (which, I know, is not Facebook, but they started that fire).

On the other hand, you have corporations like Microsoft and Google who are still using and providing email services — because, frankly, you still do need email for stuff, just like fax is still around — but they are more and more isolated in their own silo. At this point, it's only a matter of time they reach critical mass and just decide that the risk of allowing external mail coming in is not worth the cost. They'll simply flip the switch and work on an allow-list principle. Then we'll have closed the loop and email will be dead, just like IRC is "dead" now.

I wonder which path Matrix will take. Could it liberate us from these vicious cycles?

  1. According to Wikipedia, there are currently about 500 distinct IRC networks operating, on about 1,000 servers, serving over 250,000 users. In contrast, Mastodon seems to be around 5 million users, Matrix.org claimed at FOSDEM 2021 to have about 28 million globally visible accounts, and Signal lays claim to over 40 million souls. XMPP claims to have "millions" of users on the xmpp.org homepage but the FAQ says they don't actually know. On the proprietary silo side of the fence, this page says

    • Facebook: 2.9 billion users
    • WhatsApp: 2B
    • Instagram: 1.4B
    • TikTok: 1B
    • Snapchat: 500M
    • Pinterest: 480M
    • Twitter: 397M

    Notable omission from that list: Youtube, with its mind-boggling 2.6 billion users...

    Those are not the kind of numbers you just "need to convince a brother or sister" to grow the network...

Categories: FLOSS Project Planets

PyBites: What I have learned from an open-source project

Planet Python - Fri, 2022-06-17 10:30
What preceded it

I like the Carbon images that appear on Twitter from Pybites. Out of curiosity, I took a look at the code on GitHub, but it was pretty overwhelming and intimidating, so I quickly moved on to something I did “understand.”

I often follow a tutorial or collect items I might need one day. I had so many Udemy courses that I was ashamed of it. Taking a course gives a sense of security: they take you by the hand, and you get the feeling that you are learning something because you can do the exercises they present.

However, in practice, I did not learn to program in Python. The basics stuck, but I couldn’t build an app with them, and I couldn’t apply my knowledge; I wasn’t even able to write a simple rock-paper-scissors game.

To get out of this paralysis tutorial, I started with PDM (the Pybites Developer Mindset program). The one-on-one guidance from someone who will assess my work was an exciting undertaking. But now, I’m glad I did because it has brought me to the point where I can even dig into the code of an open-source project and modify it to my liking.

The code on GitHub and my plan

Before I could get started with the code from PyBites-Open-Source on GitHub, I had to figure out what I wanted to do. So I started making a list of what I wanted, what I had to do, and what resources I needed.

The list of tasks I had prepared for myself included the following:

  • Find out what happens in the code.
  • List the things I want to change.
  • Analyze the link in the browser to see which values I have to give.
  • Analyze the JSON file to see what the values look like.
  • Place the values in the code, that is adjusting the URL by passing in the variables.
  • Rename the image to date-time-carbon.png.
The URL

To start, I created a nice image on Carbon, which I could use for learning from this open-source project. Then I exported all the desired settings (a JSON file) so that I would always have the data at hand. Finally, I copied the link from the address bar. The URL contained all the data needed to create the image.

Sample snippet to test

To be able to test whether everything was going according to plan, I used a sample snippet to ensure everything looked how I envisioned it. This snippet would be the same during the development process to have a test guideline.

# using slicing s = "some random string" # move "som" to the right print(s[3:] + s[:3]) # move "ing" to the left print(s[-3:] + s[:-3]) # using collections deque from collections import deque s = "some random string" deq = deque(s) deq.rotate(3) print (deq) # make it a string again print("".join(deq)) The options I would like to see in my result.

The images of Pybites are very beautiful, but I wanted something different. For this I had to adjust a number of options. After examining the JSON file and URL, I found that I needed the following data:

  • windowTheme (boxy);
  • width (fixed width of the image = 680);
  • widthAdjustment (false, because I give it a fixed width);
  • dropShadow(true);
  • dropShadowOffsetY(“20px”);
  • dropShadowBlurRadius(“68px”);
  • fontFamily (“JetBrains Mono”, must be installed on the computer);
  • fontSize(“14px”);
  • lineHeight(“155%”);
  • paddingVertically(“35px”, how much padding at the bottom and top of the image);
  • paddingHorizontally(“34px”, how much padding on the left and right side of the image).

Once I had this list of desired settings, I started adding them one by one to the code, each time checking whether all still worked. Luckily, because you can quickly make a typo in the code, the error messages were helpful and even hinted: “Did you mean …?”

Where do I want to save the images?

The code saved the images in the current working directory (os.getcwd()), but I didn’t want that. I had created a dedicated folder on my computer for the images so that they would all be in the same place and easy to find.

For this, I had to change the target directory. Because this is something that was not so well known to me, I had to use Google heavily and read many answers on Stack Overflow. Finally, I came to my answer, and what a fantastic feeling that gave me when I got it to work!

os.chdir("C:/Users/<username>/carbon-snippets") No default image name

I didn’t want the image’s default name (carbon.png) because that means that the new image overwrites the old image. I had to make sure that the images had unique names. After brainstorming about the best name I could give as a default, I concluded that it would be best to let the name consist of the date and time. Thus the idea arose to build the name from date-time-carbon.png.

However, changing the default name was difficult because the default name was passed in from the form on the website whose value was in a placeholder. Asking the right question on Google will get you closer to the answer on Stack Overflow. And then it’s a matter of adapting the given answers and examples to solve your problem. Additional code that did it:

new_file_name = driver.find_element_by_xpath("//input[@placeholder='carbon']") new_file_name.clear() new_file_name.send_keys(image_name_png) Convert the date to your format

For naming the images, I used datetime.now() but the result was neither nice nor practical for the image name. I wanted the following format: 20220608-174726-carbon.png so datetime.now() had to be converted to a string. This can be achieved with datetime.now().strftime("%Y%m%d-%H%M%S"). I saved this format for the name of the image in a variable that send_keys (code above) could use:

image_date_time = datetime.now().strftime("%Y%m%d-%H%M%S") image_name_png = f"{image_date_time}-carbon"

Converting the time was the last step in the process of modifying Pybites’ code to allow me to do the following:

  1. I can run the code from the terminal.
  2. Running the code will create an image on Carbon.
  3. This image has my adjustments.
  4. When the image is ready, it goes into the folder I specified.
  5. The image gets a customized name.

Result:

The resulting carbon image, customized to my preferences. Conclusion of this way of working and learning

The main takeaway: it’s incredible how much you can learn from working with an open-source project!

And the feeling of euphoria every time you complete a task from your to-do list and thus get closer to the end goal is amazing.

Often it was a matter of daring: incorporate the code I found online into my code, see what it did, then test the result. If the code didn’t work, I found out why. I started looking for a new solution with the newly acquired knowledge until it worked!

My final conclusion from working with an open-source project: you can learn things from courses, but I am convinced that JIT Learning is a much better way of learning for me. In other words, Just In Time Learning has stolen my heart

Keep calm and code in Python!

Leonieke

Categories: FLOSS Project Planets

Agaric Collective: Drupal 9.4 installation with existing configuration fails because "unable to uninstall the MySQL module"!?

Planet Drupal - Fri, 2022-06-17 09:57

Here is how to deal with the surprising-to-impossible-seeming error "Unable to uninstall the MySQL module because: The module 'MySQL' is providing the database driver 'mysql'.."

Like, why is it trying to uninstall anything when you are installing? Well, it is because you are installing with existing configuration— and your configuration is out-of-date. This same problem will happen on configuration import on a Drupal website, too.

Really this error message is a strong reminder to always run database updates and then commit any resulting configuration changes after updating Drupal core or module code.

And so the solution is to roll back the code to Drupal 9.3, do your installation from configuration, and then run the database updates, export configuration, and commit the result.

For example:

git checkout composer install drush -y site:install drutopia --existing-config git checkout main composer install drush -y updb drush -y cex git commit -m "Apply configuration updates from Drupal 9.4 upgrade"

The system update enable_provider_database_driver is the post-update hook that is doing the work here to "Enable the modules that are providing the listed database drivers." Pretty cool feature and a strong reminder to always, always run database updates and commit any configuration changes immediately after any code updates!

Read more and discuss at agaric.coop.

Categories: FLOSS Project Planets

PyBites: Is it time to step back and look at the greater design?

Planet Python - Fri, 2022-06-17 09:56

**The  official Pybites T-Shirt**

This week we talk about an important topic: how to prevent yourself as a programmer from getting into tunnel vision when coding. 

We have a nice practical example we’ll share how we hit this last week (and many times more for that matter) in one of our solutions and also relate it to similar experience people go through when working with us. 

You have to take a step back from time to time (“creating space”) to think about the bigger picture design.

We hope it helps you and prepares you better when you hit this issue which (as we like to say) can be insidious.

Pybites merch store (t-shirts, mugs and stickers): 
https://pybit.es/shop/

PDM (Pybites Developer Mindset) program:
https://pybit.es/catalogue/the-pdm-program/

Related episodes:
– Getting unstuck with your code
– Sunk cost bias

Books we’re reading:
– Fluent Python 2nd ed
– The Secret Commonwealth
– All books mentioned on our podcast

Thanks for listening, for any feedback hit us up via email: info@pybit.es

Categories: FLOSS Project Planets

Python for Beginners: Copy a List in Python

Planet Python - Fri, 2022-06-17 09:00

While programming in python, we sometimes need to store the same data in multiple places. This may be due to the fact that we need to preserve the original data. In this article, we will discuss different ways to copy a list in python.

Table of Contents
  1. Copy a List in Python
    1. The id() Function in Python
    2. Copy a List Using the list() Constructor in Python
    3. Copy a List Using the append() Method in Python
    4. Copy a List Using the extend() Method in Python
    5. Copy a List Using Slicing in Python
    6. Copy a List Using List Comprehension in Python
    7. Copy a List Using the copy() Method in Python
  2. Copy List of Lists in Python
    1. Copy List of Lists Using the append() Method in Python
    2. Copy List of Lists Using the extend() Method in Python
    3. Copy List of Lists Using the copy() Method in Python
    4. Copy List of Lists Using the copy Module in Python
  3. Conclusion
Copy a List in Python

When we need to copy an integer in python, we simply assign a variable to another variable as shown below.

num1 = 10 print("The first number is:", num1) num2 = num1 print("The copied number is:", num2)

Output:

The first number is: 10 The copied number is: 10

Here, we have created a variable num1 with the value 10. Then, we have assigned num1 to another variable num2. After assignment, even if we change the original variable, the value in the copied variable remains unaffected. You can observe this in the following example.

num1 = 10 print("The first number is:", num1) num2 = num1 print("The copied number is:", num2) num1 = 15 print("The first number after modification:", num1) print("The copied number is:", num2)

Output:

The first number is: 10 The copied number is: 10 The first number after modification: 15 The copied number is: 10

In the above example, you can see that the value in num2 hasn’t been modified after modifying num1.

Now, let us copy a list using the assignment operation.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) list2 = list1 print("The copied list is:", list2)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The copied list is: [1, 2, 3, 4, 5, 6, 7]

When we change the original list in this case, the copied list also gets modified. 

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) list2 = list1 print("The copied list is:", list2) list1.append(23) print("The original list after modification is:", list1) print("The copied list after modification is:", list2)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The copied list is: [1, 2, 3, 4, 5, 6, 7] The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 23] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7, 23]

Why does this happen?

Copy by assignment works for an integer because integers are immutable objects. When an integer is assigned to another integer, both refer to the same object. Once we modify any of the integer variables, a new python object is created and the original python object remains unaffected. You can observe this in the following example.

num1 = 10 print("The id of first number is:", id(num1)) num2 = num1 print("The id of copied number is:", id(num2)) num1 = 15 print("The id of first number after modification:", id(num1)) print("The id of copied number after modification is:", id(num2))

Output:

The id of first number is: 9789248 The id of copied number is: 9789248 The id of first number after modification: 9789408 The id of copied number after modification is: 9789248

Here, you can see that id of both num1 and num2 after copying num1 to num2 using the assignment operator is the same. However, when we modify num1, the id of num1 changes.

Lists are mutable objects. When we modify a list, the original list object is modified. Hence, no python object is created during the modification of list variables and the change is reflected both in the copied and the original list variable. You can observe this in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The id of original list is:", id(list1)) list2 = list1 print("The id of copied list is:", id(list2)) list1.append(23) print("The id of original list after modification is:", id(list1)) print("The id of copied list after modification is:", id(list2))

Output:

The id of original list is: 139879630222784 The id of copied list is: 139879630222784 The id of original list after modification is: 139879630222784 The id of copied list after modification is: 139879630222784

Here, you can see that the id of both lists remains the same even after modification in list1. Therefore, it is confirmed that the original list and the copied list refer to the same object even after modification.

As the assignment operation doesn’t work for copying lists in python, we will discuss different ways to copy a list in python. Before doing so, let us first discuss the id() function in python.

The id() Function in Python

Each python object has a unique identifier. We can use the id() function to obtain the identifier associated with any python object as shown in the above examples.

If two variables, when passed as input to the id() function, give the same output, both the variables refer to the same object.

If the output of the id() function is different for different variables, they refer to different objects.

In the upcoming sections, we will use the id() function to check if a list is copied or not. If a list is successfully copied, the identifiers of both lists will be different. In such a case, when we modify a list, it won’t affect the other list. 

If the identifier of both the variables referring to lists is the same, the variables refer to the same list. In this case, any change made in the list associated with a variable will be reflected in the list associated with another variable.

With this background, let us now discuss the different ways to copy a list in python.

Copy a List Using the list() Constructor in Python

The list() constructor is used to create a new list from any iterable object like a list, tuple, or set. It takes an iterable object as its input argument and returns a new list containing the elements of the input iterable object.

To copy a list using the list() constructor in python, we will pass the original list as the input argument to the list() constructor.

After execution, the list() constructor will return a new list containing the elements from the original list. The new list and the original list will have different identifiers that can be obtained using the id() method. Hence, any change made to one of the lists will not affect the other list. You can observe this in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = list(list1) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 140137673798976 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 140137673851328 The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 10] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7]

In the above example, you can see that the id of the original list and the copied list is different. Therefore, making changes to the original list will not impact the copied list.

Copy a List Using the append() Method in Python

The append() method is used to add a new element to a list. When invoked on a list, it takes an element as its input argument and adds it to the last position of the list.

To copy a list using the append() method in python, we will first create an empty list named list_copy. After that, we will iterate through the original list using a for loop. While iteration, we will add each element of the original list to list_copy using the append() method.

After executing the for loop, we will get the copied list in the list_copy variable. You can observe this in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [] for element in list1: list_copy.append(element) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 140171559405056 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 140171560708032 The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 10] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7]

You can observe that the new list and the original list have different identifiers that we have obtained using the id() method. Hence, any change made to one of the lists doesn’t the other list.

Copy a List Using the extend() Method in Python

The extend() method is used to add multiple elements at once to a list. When invoked on a list, the extend() method takes an iterable object as its input argument. After execution, it appends all the elements of the input iterable object to the list. 

To copy a list using the extend() method, we will first create an empty list named list_copy. After that, we will invoke the extend() method on list_copy with the original list as its input argument. After execution, we will get the copied list in the list_copy variable.

The new list and the original list will have different identifiers that can be obtained using the id() method. Hence, any change made to one of the lists will not affect the other list. You can observe this in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [] list_copy.extend(list1) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 139960369243648 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 139960370546624 The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 10] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7] Copy a List Using Slicing in Python

Slicing in python is used to create a copy of a part of a list. The general syntax for slicing is as follows.

new_list=original_list[start_index:end_index]

Here, 

  • new_list is the list created after the execution of the statement.
  • original_list is the given input list.
  • start_index is the index of the leftmost element that has to be included in new_list. If we leave start_index empty, the default value 0 is taken and the list is copied starting from the first element.
  • end_index is the index of the rightmost element that has to be included in the new_list. If we leave end_index empty, the default value is taken to be the length of the list. Hence, the list is copied till the last element.

You can copy a list using slicing as shown in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = list1[:] print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 139834922264064 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 139834923567040 The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 10] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7]

You can observe that the new list and the original list have different identifiers that we have obtained using the id() method. Hence, any change made to one of the lists doesn’t affect the other list.

Copy a List Using List Comprehension in Python

List comprehension is used to create a new list using the elements of an existing list by imposing some conditions on the elements. The general syntax of list comprehension is as follows.

new_list=[element for element in existing_list if condition]

Here, 

  • new_list is the list created after the execution of the statement.
  • existing_list is the input list.
  • The element represents elements of the existing list. This variable is used to iterate over the existing_list.
  • condition is a condition imposed on the element.

Here, we need to copy a given list using list comprehension. Hence, we will not use any condition. Following is the code to copy a list in python using the list comprehension in python.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [element for element in list1] print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 139720431945088 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 139720433248128

You can observe that the new list and the original list have different identifiers that we have obtained using the id() method. Hence, any change made to one of the lists does not affect the other list.

Copy a List Using the copy() Method in Python

Python also provides us with the copy() method to copy a list in python. The copy() method, when invoked on a list, returns a copy of the original list. The new list and the original list will have different identifiers that can be obtained using the id() method. Hence, any change made to one of the lists will not affect the other list. You can observe this in the following example.

list1 = [1, 2, 3, 4, 5, 6, 7] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = list1.copy() print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1.append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [1, 2, 3, 4, 5, 6, 7] The id of original list is: 140450078000576 The copied list is: [1, 2, 3, 4, 5, 6, 7] The id of copied list is: 140450079303616 The original list after modification is: [1, 2, 3, 4, 5, 6, 7, 10] The copied list after modification is: [1, 2, 3, 4, 5, 6, 7] Copy List of Lists in Python

Copying a list of lists in python is not similar to copying a list. For instance, let us copy a list of lists in python using the copy() method.

list1 = [[1, 2, 3], [4, 5, 6,], [7,8,9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = list1.copy() print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy))

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 139772961010560 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 139772961063040

Here, you can see that id of the original list and the copied list are different. It means that both the lists are different python objects. Despite this, when we perform any modification to the original list, it is reflected in the modified list. You can observe this in the following example.

list1 = [[1, 2, 3], [4, 5, 6, ], [7, 8, 9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = list1.copy() print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1[2].append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 139948423344896 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 139948423397504 The original list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]] The copied list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]]

This situation is unwarranted. It happens due to the storage pattern of the list of lists in the memory. A list of lists contains references to the inner lists.

When we copy a list of lists using the copy() method, the list of lists along with the references of the inner lists gets copied. Therefore, the inner lists aren’t copied and only the references to the inner lists are copied. Due to this, when we make changes to any of the inner lists in the original list, it is also reflected in the copied list.

To avoid this situation, we can iteratively copy the elements of the list of lists.

Copy List of Lists Using the append() Method in Python

To copy a list of lists using the append() method in python, we will use the following steps.

  • First, we will create an empty list named list_copy.
  • After that, we will iterate over the list of lists using a for loop. For each inner list in the list of lists, we will perform the following tasks.
    • First, we will create an empty list named temp. After that, we will iterate over the elements of the inner list using another for loop.
    • While iteration, we will append the elements of the current inner list into temp using the append() method. The append() method, when invoked on temp, will take an element from the inner list as input argument and will append it to temp.
  • After traversing each element of the current inner list, we will append temp to list_copy. After that, we will move to the next inner list and repeat the previous steps.

Once we iterate over all the inner lists of the input list, we will get the copied list of lists in the copy_list variable. You can observe this in the following example.

list1 = [[1, 2, 3], [4, 5, 6, ], [7, 8, 9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [] for inner_list in list1: temp = [] for element in inner_list: temp.append(element) list_copy.append(temp) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1[2].append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 139893771608960 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 139893771661952 The original list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]] The copied list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

After iteratively copying the list elements, you can see that modifications in the original list do not affect the copied list.

Copy List of Lists Using the extend() Method in Python

Instead of using the append() method, we can also use the extend() method to copy a list of lists in python. For this, we will use the following steps.

  • First, we will create an empty list named list_copy.
  • After that, we will iterate over the list of lists using a for loop. For each inner list in the list of lists, we will perform the following tasks.
  • First, we will create an empty list named temp. After that, we will invoke the extend() method on temp with the inner list as its input argument.
  • Then, we will append temp to list_copy. After that, we will move to the next inner list.

Once we iterate over all the inner lists of the input list, we will get the copied list of lists in the copy_list variable. You can observe this in the following example.

list1 = [[1, 2, 3], [4, 5, 6, ], [7, 8, 9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [] for inner_list in list1: temp = [] temp.extend(inner_list) list_copy.append(temp) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1[2].append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 140175921128448 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 140175921181312 The original list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]] The copied list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] Copy List of Lists Using the copy() Method in Python

We can also use the copy() method to copy a list of lists in python. For this, we will use the following steps.

  • First, we will create an empty list named list_copy.
  • After that, we will iterate over the list of lists using a for loop. For each inner list in the list of lists, we will perform the following tasks.
  • We will invoke the copy() method on the inner list. We will assign the output of the copy() method to a variable temp.
  • Then, we will append temp to list_copy. After that, we will move to the next inner list.

Once we iterate over all the inner lists of the input list, we will get the copied list of lists in the copy_list variable. You can observe this in the following example.

list1 = [[1, 2, 3], [4, 5, 6, ], [7, 8, 9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = [] for inner_list in list1: temp = inner_list.copy() list_copy.append(temp) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1[2].append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 140468123341760 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 140468123394560 The original list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]] The copied list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] Copy List of Lists Using the copy Module in Python

The copy module provides us with the deepcopy() method with which we can copy nested objects. The deepcopy() method takes a python object as an input argument and recursively copies all the elements of the input object. 

We can copy a list of lists in python using the deepcopy() method as shown in the following example.

import copy list1 = [[1, 2, 3], [4, 5, 6, ], [7, 8, 9]] print("The original list is:", list1) print("The id of original list is:", id(list1)) list_copy = copy.deepcopy(list1) print("The copied list is:", list_copy) print("The id of copied list is:", id(list_copy)) list1[2].append(10) print("The original list after modification is:", list1) print("The copied list after modification is:", list_copy)

Output:

The original list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of original list is: 139677987171264 The copied list is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] The id of copied list is: 139677987171776 The original list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]] The copied list after modification is: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

You can observe that the new list and the original list have different identifiers that we have obtained using the id() method. Hence, any change made to one list does not affect the other list.

Conclusion

In this article, we have discussed different ways to copy a list in python. We also looked at different ways to copy a list of lists in python.

To learn more about python programming, you can read this article on how to check for sorted list in python. You might also like this article on how to save a dictionary to file in python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Copy a List in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

PyCharm: PyCharm 2022.1.3 Release Candidate Is Available

Planet Python - Fri, 2022-06-17 08:33

The release candidate for PyCharm 2022.1.3 is ready for you to spot-check before we roll out the release.

Important! If you have PyCharm 2022.1.2 already installed, you will need to update to PyCharm 2022.1.3 RC manually via the Toolbox App or the website.

Download PyCharm 2022.1.3 RC!

Here’s what’s inside:

  • Added the ability to auto-select the external diff tool based on the file extension [IDEA-69499].
  • Fixed the issue causing a misleading error message when using $var in calc() function in SCSS files [WEB-54056].
  • Fixed the issue causing an “Unexpected term” error when using a variable in a min() and max() arguments list [WEB-52057].
  • Fixed a regression with the debug console that was truncating outputs [PY-53983].
  • Fixed a regression with .pth files being ignored inside venv site-packages [PY-54321].
  • Fixed the issue with console: now it reads multiple inputs correctly [IDEA-293951].
  • Fixed the issue causing clickable file paths to not work in the VCS tool window [IDEA-292405].
  • ESLint 8.0 works again with YarnPnP [WEB-55858].

If you notice any bugs, please submit them to our issue tracker.

The PyCharm team

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #114: Getting Started in Python Cybersecurity and Forensics

Planet Python - Fri, 2022-06-17 08:00

Are you interested in a career in security using Python? Would you like to stay ahead of potential vulnerabilities in your Python applications? This week on the show, James Pleger talks about Python information security, incident response, and forensics.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

EuroPython: The Humans of EuroPython: Naomi Ceder ✨

Planet Python - Fri, 2022-06-17 05:59

I’m Naomi Ceder, and I’ve been involved in Python communities since I first learned Python (at LinuxWorld in a tutorial given by Guido) in 2001. Over those years I’ve taught Python in schools, at meetups, and at conferences, and I’ve been a conference organiser at PyCon US (the poster session, the education summit, the intro to sprinting tutorials, and the Spanish track, Las PyCon Charlas) and at PyCon UK.

What do you do?

I’m a past chair of the Python Software Foundation, the author of The Quick Python Book, and I do Python training for businesses. I’m also the founder of Trans*Code, and I speak internationally about inclusion and diversity in tech communities and enterprises.

What does your community do?

Trans*Code is a hackday centered on the trans and non-binary communities, and we are partnering with EuroPython this year to hold our first in person event since 2019. This will be our 14th event, and our first in Ireland.

What motivates you to organise Trans*Code?

Trans and non-binary folk have come to be one of the chief targets as the culture wars have ratcheted up. Many would deny our rights, our humanity, even the opportunity to experience hope and joy. Trans*Code was founded specifically to bring trans and non-binary folk (and their friends) together to build community and help reclaim that hope and that joy. I believe that optimism is a revolutionary act.

What do you wish non-trans people knew or understood?

Personally I wish they understood that expressing shock, outrage, and sadness at transphobia and discrimination does no one any good. An expression of shock tells me that they have largely been able to ignore the countless other occurrences of transphobia that occur everywhere on a daily basis. Trans folk, like other marginalised groups, are certainly not shocked at discrimination and transphobia - we  deal with it every. single. day.  

Likewise I’m afraid that sadness or outrage that such cruel things happen is useless to us. Trans and NB folk don’t need pity or outrage, what we need is to be allowed to live our lives without harassment, to be able to get jobs and access needed medical care. In short, what we need is for people to have our backs - not so much as “allies”, but (as I’m fond of saying) as co-conspirators and even friends.

What do you think conferences should know about trans and non-binary people?

It’s not just about toilets. Okay, so yes, we’d very much like to have the same right to use the toilet without harassment as everyone else. But it&aposs just as important to feel safe, welcome, and included at the conference. It doesn’t hurt for a conference to be explicit that trans folk will be respected and their safety ensured. Does the conference have any trans/NB organisers? Any other trans speakers? Does it have a solid code of conduct, with enforcement?

Beyond that trans and non-binary folk want the same conference experience as anyone else - to be welcomed and included as part of a community of shared interests, and not continually called out (explicitly or implicitly) for being different. One of my most wretched conference experiences was a social event where “friends” spent the evening continually bringing up my trans experiences. I suppose they satisfied their curiosity and gained some education, but it left me feeling miserable, exhausted, and totally alone.  

Any final words?

I’d like to invite anyone who is interested in technology and who can make it to join us - you can find out more about Trans*Code at https://trans.tech and specifically about our EuroPython event at https://ep2022.europython.eu/trans_code. If you are not trans/NB, you will gain an understanding you didn’t have before. Several people who’ve not had prior contact with trans folk and have attended one of our events, have expressed surprise and delight at the understanding they’ve gained, and the awesome friends that they’ve made.

And if you are trans/non-binary it will be a day to breathe free and just be with others like you, and an opportunity to rekindle hope and reclaim our joy.

Categories: FLOSS Project Planets

Qt Creator 8 Beta released

Planet KDE - Fri, 2022-06-17 05:41

We are happy to announce the release of Qt Creator 8 Beta!

Categories: FLOSS Project Planets

Web Review, Week 2022-24

Planet KDE - Fri, 2022-06-17 04:12

Let’s go for my web review for the week 2022-24.

The Cult in Google

Tags: google, cult, surprising

Such cults are freaky… I’m always baffled at their ability to put their fangs into an organization and grow in there.

https://medium.com/@kwilliamlloyd/the-cult-in-google-3c1a910214d1


Firefox Rolls Out Total Cookie Protection By Default To All Users

Tags: tech, browser, firefox, surveillance

This is definitely a welcome move to improve protection from tracking.

https://blog.mozilla.org/en/products/firefox/firefox-rolls-out-total-cookie-protection-by-default-to-all-users-worldwide/


Don’t be that open-source user, don’t be me

Tags: tech, foss, maintenance, economics

This is definitely well put, users shouldn’t feel entitled. Maintainers do what they can (even if there’s a company backing up your favorite FOSS project) and if you use the software for free with no support contract… things will be done when they’re done.

https://jacobtomlinson.dev/posts/2022/dont-be-that-open-source-user-dont-be-me/


Cool desktops don’t change 😎

Tags: tech, desktop, low-tech

I didn’t know about the Lindy effect, this is an interesting point. Obviously I have a different setup (Plasma has been around longer than XMonad after all) but the overall advice is good.

https://tylercipriani.com/blog/2022/06/15/choose-boring-desktop-technology/


Symbiote, a nearly-impossible-to-detect Linux malwareSecurity Affairs

Tags: tech, linux, security

Alright, this one looks somewhat concerning…

https://securityaffairs.co/wordpress/132113/malware/symbiote-linux-malware.html


⚡️ The computers are fast, but you don’t know it

Tags: tech, programming, python, c++, optimization, performance, data-science

And this is why you likely need to optimize your data pipelines at some point. There are plenty of levers available.

http://shvbsle.in/computers-are-fast-but-you-dont-know-it-p1/


Product Backlog Building Canvas

Tags: tech, agile, product-management

Good ideas to improve your user stories. I often see not so complete stories, it doesn’t stop at the title, there’s more to do. The proposed canvas is interesting and definitely helps.

https://martinfowler.com/articles/product-backlog-building-canvas.html


Cargo Culting Software Engineering Practices

Tags: tech, agile, engineering, craftsmanship

As always, what really matters in the end is the context

https://isthisit.nz/posts/2022/cargo-culting-software-engineering-practices/


Tune Software Development for Rate of Change, not Rate of Progress

Tags: tech, agile, project-management

Interesting take as usual. Utilization doesn’t matter, throughput is what you need to keep in mind.

https://medium.com/@kentbeck_7670/tune-software-development-for-rate-of-change-not-rate-of-progress-56f93c15a769


The Floppotron 3.0 » Silent’s Homepage

Tags: tech, funny

Alright, this definitely escalated beyond imagination. Still it’s a fun project.

http://silent.org.pl/home/2022/06/13/the-floppotron-3-0/


Bye for now!

Categories: FLOSS Project Planets

Dima Kogan: Ricoh GR IIIx 802.11 reverse engineering

Planet Debian - Fri, 2022-06-17 01:04

I just got a fancy new camera: Ricoh GR IIIx. It's pretty great, and I strongly recommend it to anyone that wants a truly pocketable camera with fantastic image quality and full manual controls. One annoyance is the connectivity. It does have both Bluetooth and 802.11, but the only official method of using them is some dinky closed phone app. This is silly. I just did some reverse-engineering, and I now have a functional shell script to download the last few images via 802.11. This is more convenient than plugging in a wire or pulling out the memory card. Fortunately, Ricoh didn't bend over backwards to make the reversing difficult, so to figure it out I didn't even need to download the phone app, and sniff the traffic.

When you turn on the 802.11 on the camera, it says stuff about essid and password, so clearly the camera runs its own access point. Not ideal, but it's good-enough. I connected, and ran nmap to find hosts and open ports: only port 80 on 192.168.0.1 is open. Pointing curl at it yields some error, so I need to figure out the valid endpoints. I downloaded the firmware binary, and tried to figure out what's in it:

dima@shorty:/tmp$ binwalk fwdc243b.bin DECIMAL HEXADECIMAL DESCRIPTION -------------------------------------------------------------------------------- 3036150 0x2E53F6 Cisco IOS microcode, for "8" 3164652 0x3049EC Certificate in DER format (x509 v3), header length: 4, sequence length: 5412 5472143 0x537F8F Copyright string: "Copyright (" 6128763 0x5D847B PARity archive data - file number 90 10711634 0xA37252 gzip compressed data, maximum compression, from Unix, last modified: 2022-02-15 05:47:23 13959724 0xD5022C MySQL ISAM compressed data file Version 11 24829873 0x17ADFB1 MySQL MISAM compressed data file Version 4 24917663 0x17C369F MySQL MISAM compressed data file Version 4 24918526 0x17C39FE MySQL MISAM compressed data file Version 4 24921612 0x17C460C MySQL MISAM compressed data file Version 4 24948153 0x17CADB9 MySQL MISAM compressed data file Version 4 25221672 0x180DA28 MySQL MISAM compressed data file Version 4 25784158 0x1896F5E Cisco IOS microcode, for "\" 26173589 0x18F6095 MySQL MISAM compressed data file Version 4 28297588 0x1AFC974 MySQL ISAM compressed data file Version 6 28988307 0x1BA5393 MySQL ISAM compressed data file Version 3 28990184 0x1BA5AE8 MySQL MISAM index file Version 3 29118867 0x1BC5193 MySQL MISAM index file Version 3 29449193 0x1C15BE9 JPEG image data, JFIF standard 1.01 29522133 0x1C278D5 JPEG image data, JFIF standard 1.08 29522412 0x1C279EC Copyright string: "Copyright (" 29632931 0x1C429A3 JPEG image data, JFIF standard 1.01 29724094 0x1C58DBE JPEG image data, JFIF standard 1.01

The gzip chunk looks like what I want:

dima@shorty:/tmp$ tail -c+10711635 fwdc243b.bin> /tmp/tst.gz dima@shorty:/tmp$ < /tmp/tst.gz gunzip | file - /dev/stdin: ASCII cpio archive (SVR4 with no CRC) dima@shorty:/tmp$ < /tmp/tst.gz gunzip > tst.cpio

OK, we have some .cpio thing. It's plain-text. I grep around it in, looking for GET and POST and such, and I see various URI-looking things at /v1/..... Grepping for that I see

dima@shorty:/tmp$ strings tst.cpio | grep /v1/ GET /v1/debug/revisions GET /v1/ping GET /v1/photos GET /v1/props PUT /v1/params/device PUT /v1/params/lens PUT /v1/params/camera GET /v1/liveview GET /v1/transfers POST /v1/device/finish POST /v1/device/wlan/finish POST /v1/lens/focus POST /v1/camera/shoot POST /v1/camera/shoot/compose POST /v1/camera/shoot/cancel GET /v1/photos/{}/{} GET /v1/photos/{}/{}/info PUT /v1/photos/{}/{}/transfer /v1/photos/<string>/<string> /v1/photos/<string>/<string>/info /v1/photos/<string>/<string>/transfer /v1/device/finish /v1/device/wlan/finish /v1/lens/focus /v1/camera/shoot /v1/camera/shoot/compose /v1/camera/shoot/cancel /v1/changes /v1/changes message received. /v1/changes issue event. /v1/changes new websocket connection. /v1/changes websocket connection closed. reason({}) /v1/transfers, transferState({}), afterIndex({}), limit({})

Jackpot. I pointed curl at most of these, and they do interesting things. Generally they all spit out JSON. /v1/liveview sends out a sequence of JPEG images. The thing I care about is /v1/photos/DIRECTORY/FILE and /v1/photos/DIRECTORY/FILE/info. The result is a script I just wrote to connect to the camera, download N images, and connect back to the original access point:

https://github.com/dkogan/ricoh-download

Kinda crude, but works for now. I'll improve it with time.

After I did this I found an old thread from 2015 where somebody was using an apparently-compatible camera, and wrote a fancier tool:

https://www.pentaxforums.com/forums/184-pentax-k-s1-k-s2/295501-k-s2-wifi-laptop-2.html

Categories: FLOSS Project Planets

Lullabot: Lullabot Podcast: The New Olivero Theme – Awesome to the Core

Planet Drupal - Thu, 2022-06-16 23:30

A group of Lullabots (and Former 'bot and podcast co-host Mike Herchel) get together to discuss the new Default theme in Drupal 9 and 10 that they helped build.

The theme called "Olivero" is as beautiful as it is flexible and accessible.

The team talks about the immense amount of work it took for a project of such high visibility in the Drupal community.

Categories: FLOSS Project Planets

Event Organizers: Camp Debrief: Stanford WebCamp 2022

Planet Drupal - Thu, 2022-06-16 22:09

This is the first in a series of “Camp Debriefs” by the Drupal Event Organizer Working Group. In this debrief, Fei Lauren (feilauren) interviews Irina Zaks (irinaz) about Stanford WebCamp 2022. If you would like your Drupal event to be featured in a Camp Debrief, contact the EOWG.

Irina first got involved with Drupal around 2006. After attending BADcamp in 2007, she decided the South Bay Area needed their own camp and Stanford could be a host. “We have to have this [DrupalCamp] experience here without driving that far”. 

Still relatively new to the community, she started reaching out to people who could help put together a proposal to Stanford University. It was approved and the university agreed to provide the space at a minimal cost. Stanford Camp was born and appeared to the public on Jan 23, 2010. 

What is the biggest challenge involved in starting a new camp or event? 

“If you want to do this camp, do it for yourself… do it because you feel that it is important. For you. For your friends. For the people.” 

In spite of the enthusiasm people may have about helping, there will be times when they simply don’t or can’t show up. We can’t always expect others to have the same passion or inspiration that we have - but keep going, Irina says. Not because other people think it’s a good idea, do it because it’s important to you.

“And then getting the word out there. Outreach is the most challenging part - reaching out to people who are struggling to work with Drupal, and they aren’t even aware that they can come to the camp and get support.”

It’s important to keep momentum and meet regularly. An ideal format for organizers might be one smaller meeting in the fall to connect with other organizers, then starting regular meetings 2 - 3 months before the event to start working and planning.  

What have you learned about doing events in-person vs online? 

A huge obstacle has always been to find enough rooms. This is true in many cases - even for casual local meetups, finding an appropriate venue can be a challenge. But for a large camp, Irina warns, the difficulty and cost scales. 

In the wake of the worst of the pandemic, we understand that both remote and in-person events have value. On one hand, humans are deeply social creatures and we need to connect in person. But when sessions are broadcasted online, so many more can be reached for a fraction of the cost. So Irina and the other Stanford WebCamp organizers explored what aspects of each are most valuable and came to the conclusion that a hybrid event would be the most successful. 

Hybrid models are great in theory, but can feel isolating for remote attendees - how can we keep people engaged?


The Stanford WebCamp 2021 organizing team

Stanford WebCamp’s solution is remarkably simple: everyone joins sessions remotely. In person, there is a reception or a lunch so that people who can make it have an opportunity to network and socialize. But it’s not required - even if folks are in the same room, attendees for sessions are each on Zoom.

“We all have the same experience”, Irina says. 

What are some of the things that go wrong?

“There are things that go wrong - and that’s okay”.

She plainly points out that in many cases, people are reasonably patient and folks will work together to get things back on track. The Drupal community is all about collaboration, after all.

But something that helps is having multiple people ready to help keep things on track. For example, a moderator and a back up moderator. 

What can we do to push through the burnout and wrap everything up?

“Have joy in what you do”, she advises. 

But she also talks about how important it is to set ourselves up for success by ensuring we find meaning in our work, but also that we aren’t taking on more than we can sustain.

“Pick the mountain that’s right for you.”

Don’t reinvent wheels and make anything more difficult than it needs to be. Re-use wording for sessions that are similar, don’t rebuild the website, but most importantly - remember that it should be meaningful.

But even so, we can sometimes lose sight of what matters to us. We forget why we are doing the work. In this case, Irina goes back to the values she wrote in 2014 with Tori Lewis, Director of Projects, when she started Fibonacci Web Studio.

Okay - we are inspired. What’s next?

Well, Stanford WebCamp is free. I think it’s safe to say it’s worth checking out their website and signing up to get updates for next year’s event.

Learn more about the Event Organizer’s Working Group, join their monthly meetings, or read up on some of the steps you might want to take to organize a camp of your own.

Categories: FLOSS Project Planets

Wingware: Wing Python IDE Version 8.3.2 - June 17, 2022

Planet Python - Thu, 2022-06-16 21:00

Wing 8.3.2 fixes several code intelligence issues for f-string expressions, avoids problems when using ~ or a non-default Base Directory with remote hosts, allows running a pytest parameterized test with a float value, adds the option Use Fixed Width Font for Output to the Testing tool's right-click context menu, scrolls correctly to the current line in version control blame/praise output, fixes intermittent PyQt & PySide autocompletion problems, and makes a number of other usability improvements.

See the change log for details.

Download Wing 8.3 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


What's New in Wing 8.3

Support for Containers and Clusters

Wing 8 adds support for developing, testing, and debugging Python code that runs inside containers, such as those provided by Docker and LXC/LXD, and clusters of containers managed by a container orchestration system like Docker Compose. A new Containers tool can be used to start, stop, and monitor container services, and new Docker container environments may be created during project creation.

For details, see Working with Containers and Clusters.

New Package Management Tool

Wing 8 adds a new Packages tool that provides the ability to install, remove, and update packages found in the Python environment used by your project. This supports pipenv, pip, and conda as the underlying package manager. Packages may be selected manually from PyPI or by package specifications found in a requirements.txt or Pipfile.

For details, see Package Manager .

Improved Project Creation

Wing 8 redesigns New Project support so that the host, project directory, Python environment, and project type may all be selected independently. New projects may use either an existing or newly created source directory, optionally cloning code from a revision control repository. An existing or newly created Python environment may be selected, using virtualenv, pipenv, conda, or Docker.

Improved Python Code Analysis and Warnings

Wing 8 expands the capabilities of Wing's static analysis engine, by improving its support for f-strings, named tuples, and other language constructs. Find Uses, Refactoring, and auto-completion now work within f-string expressions, Wing's built-in code warnings work with named tuples, the Source Assistant displays more detailed and complete value type information, and code warning indicators are updated more cleanly during edits.

Improved Remote Development

Wing 8 makes it easier to configure remote development and provides more flexibility in conforming to local security policies. The new built-in SSH implementation may be used instead of separately configuring OpenSSH or PuTTY. Remote development can now also work without access to an SSH agent: Wing will prompt for passwords, private key passphrases, and other input needed to establish an SSH connection.

And More

Wing 8 also adds support for Python 3.10, including match/case statements, native executable for Apple Silicon (M1) hardware, a new Nord style display theme, reduced application startup time, support for Unreal Engine, Delete Symbol and Rename Current Module refactoring operations, improved debug stepping and exception handling in async code, remote development without an SSH agent, expanded support for branching, stashing/shelving and other operations for Git and Mercurial, and much more.

For a complete list of new features in Wing 8, see What's New in Wing 8.


Try Wing 8.3 Now!

Wing 8.3 is an exciting new step for Wingware's Python IDE product line. Find out how Wing 8 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 7 and earlier, and Migrating from Older Versions for a list of compatibility notes.

Categories: FLOSS Project Planets

Adding Spaces Horizontal Bar to NeoChat - GSoC'22 post #3

Planet KDE - Thu, 2022-06-16 20:00

Hi!

This is my third post during Google Summer of Code 2022.

During the first week of coding period, I tried my hands at adding a horizontally scrolling bar on top of room list, which would show user's joined spaces.

The first ended in failure, because I was used to using setContext() for controlling QML via C++. NeoChat uses a different method of exposing classes though. Tobias helped me understand the method NeoChat uses.

I gave the thing another try and got some success this time.

I added a new role in roomlistmodel, named IsSpaceRole. This calls the function isSpace() from neochatroom. The function checks room creation event and determines if a given room is space or not.

On the UI part, there was a horizontal scrolling UI module used elsewhere, which I reused.

When it came to integrating the UI component into actual Room List Page, things again took a hit. My first try was to wrap Room List and Space List into a Row layout. That made the Room List not show rooms, and only the categories.

I was suggested by Carl to put Space list as header of the Scrollable Page. Doing so gave better result, apart from the fact that Space list now overlaps with Room List.

Tobias suggested that specifying height of the Space List should fix that. I also need to fix the issue of invisible rooms taking up width in Space list.

For the coming week, I plan to implement room filtering, such that when user clicks on a certain Space, then only the room corresponding to that Space are visible.

Categories: FLOSS Project Planets

FSF Events: Friday Free Software Directory on IRC: June 17 starting at 12:00 p.m. EDT/16:00 UTC

GNU Planet! - Thu, 2022-06-16 15:50
Join the FSF and friends this Friday, June 17, from 12:00 p.m. to 3 p.m. EDT (16:00 to 19:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Agaric Collective: Uniting Visions: Kicking off Thursday 3pm ET planning & building sessions for democratic conversation scaling platform, Visions Unite

Planet Drupal - Thu, 2022-06-16 12:28

Hi friends and collaborators, join us today at 3pm ET (or any subsequent Thursday at 3) as we kick off a series of research, planning, discussion, and building sessions for Visions Unite.

As our primary pro bono project, Agaric is working on Visions Unite, "where people seeking to make the world more whole can share ideas and information and gather the commitment and resources to build power to be the change we need", which a dozen projects have tried to do—what makes this different is sharing power via democratic mass communication.

Here are some initial user stories for Visions Unite.

Help plan and build the interface and underlying technology! (Drupal friends, we have been leaning against Drupal but might do it for the MVP— would love to hear your thoughts for or against.)

Connection info will always be up-to-date at agaric.coop/show (for these sessions we are taking over most of our Show & Tell hour, which is weekly on Thursdays 3pm Eastern).

Read more and discuss at agaric.coop.

Categories: FLOSS Project Planets

Pages