Planet Debian

Subscribe to Planet Debian feed
Planet Debian - https://planet.debian.org/
Updated: 1 hour 56 min ago

Joerg Jaspert: Rust? Munin? munin-plugin…

Thu, 2022-05-19 16:33

My first Rust crate: munin-plugin

Sooo, some time ago I had to rewrite a munin plugin from Shell to Rust, due to the shell version going crazy after some runtime and using up a CPU all for its own. Sure, it only did that on Systems with Oracle Database installed, so that monster seems to be bad (who would have guessed?), but somehow I had to fixup this plugin and wasn’t allowed to drop that wannabe-database.

A while later I wrote a plugin to graph Fibre Channel Host data, and then Network interface statistics, all with a one-second resolution for the graphs, to allow one to zoom in and see every spike. Not have RRD round of the interesting parts.

As one can imagine, that turns out to be a lot of very similar code - after all, most of the difference is in the graph config statements and actual data gathering, but the rest of code is just the same.

As I already know there are more plugins (hello rsyslog statistics) I have to (sometimes re-)write in Rust, I took some time and wrote me a Rust library to make writing munin-plugins in Rust easier. Yay, my first crate on crates.io (and wrote lots of docs for it).

By now I made my 1 second resolution CPU load plugin and the 1 second resolution Network interface plugin use this lib already. To test less complicated plugins with the lib, I took the munin default plugin “load” (Linux variant) and made a Rust version from it, but mostly to see that something as simple as that is also easy to implement: Munin load

I got some idea on how to provide a useful default implementation of the fetch function, so one can write even less code, when using this library.

It is my first library in Rust, so if you see something bad or missing in there, feel free to open issues or pull requests.

Now, having done this, one thing missing: Someone to (re)write munin itself in something that is actually fast… Not munin-node, but munin. Or maybe the RRD usage, but with a few hundred nodes in it, with loads of graphs, we had to adjust munin code and change some timeout or it would commit suicide regularly. And some other code change for not always checking for a rename, or something like it. And only run parts of the default cronjob once an hour, not on every update run. And switch to fetching data over ssh (and munin-async on the nodes). And rrdcached with loads of caching for the trillions of files (currently amounts to ~800G of data).. And it still needs way more CPU than it should. Soo, lots of possible optimizations hidden in there. Though I bet a non-scripting language rewrite might gain the most. (Except, of course, someone needs to do it… :) )

Categories: FLOSS Project Planets

Gunnar Wolf: I do have a full face

Wed, 2022-05-18 10:52

I have been a bearded subject since I was 18, back in 1994. Yes, during 1999-2000, I shaved for my military service, and I briefly tried the goatee look in 2008… Few people nowadays can imagine my face without a forest of hair.

But sometimes, life happens. And, unlike my good friend Bdale, I didn’t get Linus to do the honors… But, all in all, here I am:

Turns out, I have been suffering from quite bad skin infections for a couple of years already. Last Friday, I checked in to the hospital, with an ugly, swollen face (I won’t put you through that), and the hospital staff decided it was in my best interests to trim my beard. And then some more. And then shave me. I sat in the hospital for four days, getting soaked (medical term) with antibiotics and otherstuff, got my recipes for the next few days, and… well, I really hope that’s the end of the infections. We shall see!

So, this is the result of the loving and caring work of three different nurses. Yes, not clean-shaven (I should not trim it further, as shaving blades are a risk of reinfection).

Anyway… I guess the bits of hair you see over the place will not take too long to become a beard again, even get somewhat respectable. But I thought some of you would like to see the real me™ 😉

PS- Thanks to all who have reached out with good wishes. All is fine!

Categories: FLOSS Project Planets

Reproducible Builds: Supporter spotlight: Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix

Wed, 2022-05-18 06:00

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do.

This is the fourth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project.

We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC and the Google Open Source Security Team (GOSST). Today, however, we will be talking with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix.


Chris Lamb: Hi Jan, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself?

Jan: Thanks for the chat; it’s been a while! Well, I’ve always been trying to find something new and interesting that is just asking to be created but is mostly being overlooked. That’s how I came to work on GNU Guix and create GNU Mes to address the bootstrapping problem that we have in free software. It’s also why I have been working on releasing Dezyne, a programming language and set of tools to specify and formally verify concurrent software systems as free software.

Briefly summarised, compilers are often written in the language they are compiling. This creates a chicken-and-egg problem which leads users and distributors to rely on opaque, pre-built binaries of those compilers that they use to build newer versions of the compiler. To gain trust in our computing platforms, we need to be able to tell how each part was produced from source, and opaque binaries are a threat to user security and user freedom since they are not auditable. The goal of bootstrappability (and the bootstrappable.org project in particular) is to minimise the amount of these “bootstrap” binaries.

Anyway, after studying Physics at Eindhoven University of Technology (TU/e), I worked for digicash.com, a startup trying to create a digital and anonymous payment system – sadly, however, a traditional account-based system won. Separate to this, as there was no software (either free or proprietary) to automatically create beautiful music notation, together with Han-Wen Nienhuys, I created GNU LilyPond. Ten years ago, I took the initiative to co-found a democratic school in Eindhoven based on the principles of sociocracy. And last Christmas I finally went vegan, after being mostly vegetarian for about 20 years!


Chris: For those who have not heard of it before, what is GNU Guix? What are the key differences between Guix and other Linux distributions?

Jan: GNU Guix is both a package manager and a full-fledged GNU/Linux distribution. In both forms, it provides state-of-the-art package management features such as transactional upgrades and package roll-backs, hermetical-sealed build environments, unprivileged package management as well as per-user profiles. One obvious difference is that Guix forgoes the usual Filesystem Hierarchy Standard (ie. /usr, /lib, etc.), but there are other significant differences too, such as Guix being scriptable using Guile/Scheme, as well as Guix’s dedication and focus on free software.


Chris: How does GNU Guix relate to GNU Mes? Or, rather, what problem is Mes attempting to solve?

Jan: GNU Mes was created to address the security concerns that arise from bootstrapping an operating system such as Guix. Even if this process entirely involves free software (i.e. the source code is, at least, available), this commonly uses large and unauditable binary blobs.

Mes is a Scheme interpreter written in a simple subset of C and a C compiler written in Scheme, and it comes with a small, bootstrappable C library. Twice, the Mes bootstrap has halved the size of opaque binaries that were needed to bootstrap GNU Guix. These reductions were achieved by first replacing GNU Binutils, GNU GCC and the GNU C Library with Mes, and then replacing Unix utilities such as awk, bash, coreutils, grep sed, etc., by Gash and Gash-Utils. The final goal of Mes is to help create a full-source bootstrap for any interested UNIX-like operating system.


Chris: What is the current status of Mes?

Jan: Mes supports all that is needed from ‘R5RS’ and GNU Guile to run MesCC with Nyacc, the C parser written for Guile, for 32-bit x86 and ARM. The next step for Mes would be more compatible with Guile, e.g., have guile-module support and support running Gash and Gash Utils.

In working to create a full-source bootstrap, I have disregarded the kernel and Guix build system for now, but otherwise, all packages should be built from source, and obviously, no binary blobs should go in. We still need a Guile binary to execute some scripts, and it will take at least another one to two years to remove that binary. I’m using the 80/20 approach, cutting corners initially to get something working and useful early.

Another metric would be how many architectures we have. We are quite a way with ARM, tinycc now works, but there are still problems with GCC and Glibc. RISC-V is coming, too, which could be another metric. Someone has looked into picking up NixOS this summer. “How many distros do anything about reproducibility or bootstrappability?” The bootstrappability community is so small that we don’t ‘need’ metrics, sadly. The number of bytes of binary seed is a nice metric, but running the whole thing on a full-fledged Linux system is tough to put into a metric. Also, it is worth noting that I’m developing on a modern Intel machine (ie. a platform with a management engine), that’s another key component that doesn’t have metrics.


Chris: From your perspective as a Mes/Guix user and developer, what does ‘reproducibility’ mean to you? Are there any related projects?

Jan: From my perspective, I’m more into the problem of bootstrapping, and reproducibility is a prerequisite for bootstrappability. Reproducibility clearly takes a lot of effort to achieve, however. It’s relatively easy to install some Linux distribution and be happy, but if you look at communities that really care about security, they are investing in reproducibility and other ways of improving the security of their supply chain. Projects I believe are complementary to Guix and Mes include NixOS, Debian and — on the hardware side — the RISC-V platform shares many of our core principles and goals.


Chris: Well, what are these principles and goals?

Jan: Reproducibility and bootstrappability often feel like the “next step” in the frontier of free software. If you have all the sources and you can’t reproduce a binary, that just doesn’t “feel right” anymore. We should start to desire (and demand) transparent, elegant and auditable software stacks. To a certain extent, that’s always been a low-level intent since the beginning of free software, but something clearly got lost along the way.

On the other hand, if you look at the NPM or Rust ecosystems, we see a world where people directly install binaries. As they are not as supportive of copyleft as the rest of the free software community, you can see that movement and people in our area are doing more as a response to that so that what we have continues to remain free, and to prevent us from falling asleep and waking up in a couple of years and see, for example, Rust in the Linux kernel and (more importantly) we require big binary blobs to use our systems. It’s an excellent time to advance right now, so we should get a foothold in and ensure we don’t lose any more.


Chris: What would be your ultimate reproducibility goal? And what would the key steps or milestones be to reach that?

Jan: The “ultimate” goal would be to have a system built with open hardware, with all software on it fully bootstrapped from its source. This bootstrap path should be easy to replicate and straightforward to inspect and audit. All fully reproducible, of course! In essence, we would have solved the supply chain security problem.

Our biggest challenge is ignorance. There is much unawareness about the importance of what we are doing. As it is rather technical and doesn’t really affect everyday computer use, that is not surprising. This unawareness can be a great force driving us in the opposite direction. Think of Rust being allowed in the Linux kernel, or Python being required to build a recent GNU C library (glibc). Also, the fact that companies like Google/Apple still want to play “us” vs “them”, not willing to to support GPL software. Not ready yet to truly support user freedom.

Take the infamous log4j bug — everyone is using “open source” these days, but nobody wants to take responsibility and help develop or nurture the community. Not “ecosystem”, as that’s how it’s being approached right now: live and let live/die: see what happens without taking any responsibility. We are growing and we are strong and we can do a lot… but if we have to work against those powers, it can become problematic. So, let’s spread our great message and get more people involved!


Chris: What has been your biggest win?

Jan: From a technical point of view, the “full-source” bootstrap has have been our biggest win. A talk by Carl Dong at the 2019 Breaking Bitcoin conference stated that connecting Jeremiah Orian’s Stage0 project to Mes would be the “holy grail” of bootstrapping, and we recently managed to achieve just that: in other words, starting from hex0, 357-byte binary, we can now build the entire Guix system.

This past year we have not made significant visible progress, however, as our funding was unfortunately not there. The Stage0 project has advanced in RISC-V. A month ago, though, I secured NLnet funding for another year, and thanks to NLnet, Ekaitz Zarraga and Timothy Sample will work on GNU Mes and the Guix bootstrap as well. Separate to this, the bootstrappable community has grown a lot from two people it was six years ago: there are now currently over 100 people in the #bootstrappable IRC channel, for example. The enlarged community is possibly an even more important win going forward.


Chris: How realistic is a 100% bootstrappable toolchain? And from someone who has been working in this area for a while, is “solving Trusting Trust)” actually feasible in reality?

Jan: Two answers: Yes and no, it really depends on your definition. One great thing is that the whole Stage0 project can also run on the Knight virtual machine, a hardware platform that was designed, I think, in the 1970s. I believe we can and must do better than we are doing today, and that there’s a lot of value in it.

The core issue is not the trust; we can probably all trust each other. On the other hand, we don’t want to trust each other or even ourselves. I am not, personally, going to inspect my RISC-V laptop, and other people create the hardware and do probably not want to inspect the software. The answer comes back to being conscientious and doing what is right. Inserting GCC as a binary blob is not right. I think we can do better, and that’s what I’d like to do. The security angle is interesting, but I don’t like the paranoid part of that; I like the beauty of what we are creating together and stepwise improving on that.


Chris: Thanks for taking the time to talk to us today. If someone wanted to get in touch or learn more about GNU Guix or Mes, where might someone go?

Jan: Sure! First, check out:

I’m also on Twitter (@janneke_gnu) and on octodon.social (@janneke@octodon.social).


Chris: Thanks for taking the time to talk to us today.

Jan: No problem. :)



For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

Categories: FLOSS Project Planets

Louis-Philippe Véronneau: Clojure Team 2022 Sprint Report

Tue, 2022-05-17 15:15

This is the report for the Debian Clojure Team remote sprint that took place on May 13-14th.

Looking at my previous blog entries, this was my first Debian sprint since July 2020! Crazy how fast time flies...

Many thanks to those who participated, namely:

  • Rob Browning (rlb)
  • Elana Hashman (ehashman)
  • Jérôme Charaoui (lavamind)
  • Leandro Doctors (allentiak)
  • Louis-Philippe Véronneau (pollo)

Sadly, Utkarsh Gupta — although having planned on participating — ended up not being able to and worked on DebConf Bursary paperwork instead.

rlb

Rob mostly worked on a creating a dh-clojure tool to help make packaging Clojure libraries easier.

At the moment, most of the packaging is done manually, by invoking build tools by hand. Having a tool to automate many of the steps required to build Clojure packages would go a long way in making them more uniform.

His work (although still very much a WIP) can be found here: https://salsa.debian.org/rlb/dh-clojure/

ehashman

Elana:

  • Finished the Java Team VCS migration to the Clojure Team namespace.
  • Worked on updating Leiningen to 2.9.8.
  • Proposed an upstream dependency update in Leiningen to match Debian's most recent version.
  • Gave pollo Owner access on the Clojure Team namespace and added lavamind as a Developer.
  • Uploaded Clojure 1.10.3-1.
  • Updated sjacket-clojure to version 0.1.1.1 and uploaded it to experimental.
  • Added build tests to spec-alpha-clojure.
  • Filed bug #1010995 for missing test dependency for Clojure.
  • Closed bugs #976151, #992735 and #992736.
lavamind

It was Jérôme's first time working on Clojure packages, and things went great! During the sprint, he:

  • Joined the Clojure Team on salsa.
  • Identified missing dependencies to update puppetdb to the 7.x release.
  • Learned how to package Clojure libraries in Debian.
  • Packaged murphy-clojure, truss-clojure and encore-clojure and uploaded them to NEW.
  • Began to package nippy-clojure.
allentiak

Leandro joined us on Saturday, since he couldn't get off work on Friday. He mostly continued working on replacing our in-house scripts for /usr/bin/clojure by upstream's, a task he had already started during GSoC 2021.

Sadly, none of us were familiar with Debian's mechanism for alternatives. If you (yes you, dear reader) are familiar with it, I'm sure he would warmly welcome feedback on his development branch.

pollo

As for me, I:

  • Fixed a classpath bug in core-async-clojure that was breaking other libraries.
  • Added meaningful autopkgtests to core-async-clojure.
  • Uploaded new versions of tools-analyzer-clojure and trapperkeeper-clojure with autopkgtests.
  • Updated pomegranate-clojure and nrepl-clojure to the latest upstream version and revamped the way they were packaged.
  • Assisted lavamind with Clojure packaging.

Overall, it was quite a productive sprint!

Thanks to Debian for sponsoring our food during the sprint. It was nice to be able to concentrate on fixing things instead of making food :)

Here's a bonus picture of the nice sushi platter I ended up getting for dinner on Saturday night:

Categories: FLOSS Project Planets

Matthew Garrett: Can we fix bearer tokens?

Mon, 2022-05-16 03:48
Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn't have magical blobs that grant you access to basically everything even if you've copied them from a legitimate holder to Honest John's Totally Legitimate API Consumer.

To make it clearer what the problem is here, let's use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy - anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn't it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they're definitely the same person. But computers don't have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank's website provides a TLS certificate that lets your browser know that you're talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction - you can also prove your identity to the remote website. This is referred to as "mutual TLS", or mTLS, and a successful mTLS transaction ends up with both ends knowing who they're talking to, as long as they have a reason to trust the certificate they were presented with.

That's actually a pretty big constraint! We have a reasonable model for the server - it's something that's issued by a trusted third party and it's tied to the DNS name for the server in question. Clients don't tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don't need to? We don't need the client to be able to prove its identity to arbitrary third party sites here - we just need the client to be able to prove it's a legitimate holder of whichever bearer token it's presenting to that site. And that's a much easier problem.

Here's the simple solution - clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that's going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that's presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you're in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here's the fun bit - it doesn't matter. If you're issuing a bearer token to a system then you're already asserting that the system is trusted. If the system is lying to you about whether or not the key it's presenting is hardware-backed then you've already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don't run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren't we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn't sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it's connecting to and performance is going to tank. Fixing this involves doing whatever's necessary to convince the browser to pipe everything over a single TLS connection, and that's just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it's still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don't like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn't secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs - we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn't trivial. But it's also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comments
Categories: FLOSS Project Planets

Dirk Eddelbuettel: RcppArmadillo 0.11.1.1.0 on CRAN: Updates

Sun, 2022-05-15 17:21

Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has syntax deliberately close to Matlab and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 978 other packages on CRAN, downloaded over 24 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 469 times according to Google Scholar.

This release brings a first new upstream fix in the new release series 11.*. In particular, treatment of ill-conditioned matrices is further strengthened. We once again tested this very rigorously via three different RC releases each of which got a full reverse-dependencies run (for which results are always logged here). A minor issue with old g++ compilers was found once 11.1.0 was tagged to this upstream release is now 11.1.1. Also fixed is an OpenMP setup issue where Justin Silverman noticed that we did not propagate the -fopenmp setting correctly.

The full set of changes (since the last CRAN release 0.11.0.0.0) follows.

Changes in RcppArmadillo version 0.11.1.1.0 (2022-05-15)
  • Upgraded to Armadillo release 11.1.1 (Angry Kitchen Appliance)

    • added inv_opts::no_ugly option to inv() and inv_sympd() to disallow inverses of poorly conditioned matrices

    • more efficient handling of rank-deficient matrices via inv_opts::allow_approx option in inv() and inv_sympd()

    • better detection of rank deficient matrices by solve()

    • faster handling of symmetric and diagonal matrices by cond()

  • The configure script again propagates the'found' case again, thanks to Justin Silverman for the heads-up and suggested fix (Dirk and Justin in #376 and #377 fixing #375).

Changes in RcppArmadillo version 0.11.0.1.0 (2022-04-14)
  • Upgraded to Armadillo release 11.0.1 (Creme Brulee)

    • fix miscompilation of inv() and inv_sympd() functions when using inv_opts::allow_approx and inv_opts::tiny options

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Antoine Beaupré: NVMe/SSD disk failure

Fri, 2022-05-13 16:19

Yesterday, my workstation (curie) was hung when I came in the office. After a "skinny elephant", the box rebooted, but it couldn't find the primary disk (in the BIOS). Instead, it booted on the secondary HDD drive, still running an old Fedora 27 install which somehow survived to this day, possibly because ?BTRFS is incomprehensible.

Somehow, I blindly accepted the Fedora prompt asking me to upgrade to Fedora 28, not realizing that:

  1. Fedora is now at release 36, not 28
  2. major upgrades take about an hour...
  3. ... and happen at boot time, blocking the entire machine (I'll remember this next time I laugh at Windows and Mac OS users stuck on updates on boot)
  4. you can't skip more than one major upgrade

Which means that upgrading to latest would take over 4 hours. Thankfully, it's mostly automated and seems to work pretty well (which is not exactly the case for Debian). It still seems like a lot of wasted time -- it would probably be better to just reinstall the machine at this point -- and not what I had planned to do that morning at all.

In any case, after waiting all that time, the machine booted (in Fedora) again, and now it could detect the SSD disk. The BIOS could find the disk too, so after I reinstalled grub (from Fedora) and fixed the boot order, it rebooted, but secureboot failed, so I turned that off (!?), and I was back in Debian.

I did an emergency backup with ddrescue, from the running system which probably doesn't really work as a backup (because the filesystem is likely to be corrupt) but it was fast enough (20 minutes) and gave me some peace of mind. My offsites backup have been down for a while and since I treat my workstations as "cattle" (not "pets"), I don't have a solid recovery scenario for those situations other than "just reinstall and run Puppet", which takes a while.

Now I'm wondering what the next step is: probably replace the disk anyways (the new one is bigger: 1TB instead of 500GB), or keep the new one as a hot backup somehow. Too bad I don't have a snapshotting filesystem on there... (Technically, I have LVM, but LVM snapshots are heavy and slow, and can't atomically cover the entire machine.)

It's kind of scary how this thing failed: totally dropped off the bus, just not in the BIOS at all. I prefer the way spinning rust fails: clickety sounds, tons of warnings beforehand, partial recovery possible. With this new flashy junk, you just lose everything all at once. Not fun.

Categories: FLOSS Project Planets

Antoine Beaupré: BTRFS notes

Fri, 2022-05-13 16:04

I'm not a fan of BTRFS. This page serves as a reminder of why, but also a cheat sheet to figure out basic tasks in a BTRFS environment because those are not obvious to me, even after repeatedly having to deal with them.

Content warning: there might be mentions of ZFS.

Stability concerns

I'm worried about BTRFS stability, which has been historically ... changing. RAID-5 and RAID-6 are still marked unstable, for example. It's kind of a lucky guess whether your current kernel will behave properly with your planned workload. For example, in Linux 4.9, RAID-1 and RAID-10 were marked as "mostly OK" with a note that says:

Needs to be able to create two copies always. Can get stuck in irreversible read-only mode if only one copy can be made.

Even as of now, RAID-1 and RAID-10 has this note:

The simple redundancy RAID levels utilize different mirrors in a way that does not achieve the maximum performance. The logic can be improved so the reads will spread over the mirrors evenly or based on device congestion.

Granted, that's not a stability concern anymore, just performance. A reviewer of a draft of this article actually claimed that BTRFS only reads from one of the drives, which hopefully is inaccurate, but goes to show how confusing all this is.

There are other warnings in the Debian wiki that are quite scary. Even the legendary Arch wiki has a warning on top of their BTRFS page, still.

Even if those issues are now fixed, it can be hard to tell when they were fixed. There is a changelog by feature but it explicitly warns that it doesn't know "which kernel version it is considered mature enough for production use", so it's also useless for this.

It would have been much better if BTRFS was released into the world only when those bugs were being completely fixed. Or that, at least, features were announced when they were stable, not just "we merged to mainline, good luck". Even now, we get mixed messages even in the official BTRFS documentation which says "The Btrfs code base is stable" (main page) while at the same time clearly stating unstable parts in the status page (currently RAID56).

There are much harsher BTRFS critics than me out there so I will stop here, but let's just say that I feel a little uncomfortable trusting server data with full RAID arrays to BTRFS. But surely, for a workstation, things should just work smoothly... Right? Well, let's see the snags I hit.

My BTRFS test setup

Before I go any further, I should probably clarify how I am testing BTRFS in the first place.

The reason I tried BTRFS is that I was ... let's just say "strongly encouraged" by the LWN editors to install Fedora for the terminal emulators series. That, in turn, meant the setup was done with BTRFS, because that was somewhat the default in Fedora 27 (or did I want to experiment? I don't remember, it's been too long already).

So Fedora was setup on my 1TB HDD and, with encryption, the partition table looks like this:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931,5G 0 disk ├─sda1 8:1 0 200M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 7,8G 0 part │ └─fedora_swap 253:5 0 7.8G 0 crypt [SWAP] └─sda4 8:4 0 922,5G 0 part └─fedora_crypt 253:4 0 922,5G 0 crypt /

(This might not entirely be accurate: I rebuilt this from the Debian side of things.)

This is pretty straightforward, except for the swap partition: normally, I just treat swap like any other logical volume and create it in a logical volume. This is now just speculation, but I bet it was setup this way because "swap" support was only added in BTRFS 5.0.

I fully expect BTRFS experts to yell at me now because this is an old setup and BTRFS is so much better now, but that's exactly the point here. That setup is not that old (2018? old? really?), and migrating to a new partition scheme isn't exactly practical right now. But let's move on to more practical considerations.

No builtin encryption

BTRFS aims at replacing the entire mdadm, LVM, and ext4 stack with a single entity, and adding new features like deduplication, checksums and so on.

Yet there is one feature it is critically missing: encryption. See, my typical stack is actually mdadm, LUKS, and then LVM and ext4. This is convenient because I have only a single volume to decrypt.

If I were to use BTRFS on servers, I'd need to have one LUKS volume per-disk. For a simple RAID-1 array, that's not too bad: one extra key. But for large RAID-10 arrays, this gets really unwieldy.

The obvious BTRFS alternative, ZFS, supports encryption out of the box and mixes it above the disks so you only have one passphrase to enter. The main downside of ZFS encryption is that it happens above the "pool" level so you can typically see filesystem names (and possibly snapshots, depending on how it is built), which is not the case with a more traditional stack.

Subvolumes, filesystems, and devices

I find BTRFS's architecture to be utterly confusing. In the traditional LVM stack (which is itself kind of confusing if you're new to that stuff), you have those layers:

  • disks: let's say /dev/nvme0n1 and nvme1n1
  • RAID arrays with mdadm: let's say the above disks are joined in a RAID-1 array in /dev/md1
  • volume groups or VG with LVM: the above RAID device (technically a "physical volume" or PV) is assigned into a VG, let's call it vg_tbbuild05 (multiple PVs can be added to a single VG which is why there is that abstraction)
  • LVM logical volumes: out of that volume group actually "virtual partitions" or "logical volumes" are created, that is where your filesystem lives
  • filesystem, typically with ext4: that's your normal filesystem, which treats the logical volume as just another block device

A typical server setup would look like this:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 1.7T 0 disk ├─nvme0n1p1 259:1 0 8M 0 part ├─nvme0n1p2 259:2 0 512M 0 part │ └─md0 9:0 0 511M 0 raid1 /boot ├─nvme0n1p3 259:3 0 1.7T 0 part │ └─md1 9:1 0 1.7T 0 raid1 │ └─crypt_dev_md1 253:0 0 1.7T 0 crypt │ ├─vg_tbbuild05-root 253:1 0 30G 0 lvm / │ ├─vg_tbbuild05-swap 253:2 0 125.7G 0 lvm [SWAP] │ └─vg_tbbuild05-srv 253:3 0 1.5T 0 lvm /srv └─nvme0n1p4 259:4 0 1M 0 part

I stripped the other nvme1n1 disk because it's basically the same.

Now, if we look at my BTRFS-enabled workstation, which doesn't even have RAID, we have the following:

  • disk: /dev/sda with, again, /dev/sda4 being where BTRFS lives
  • filesystem: fedora_crypt, which is, confusingly, kind of like a volume group. it's where everything lives. i think.
  • subvolumes: home, root, /, etc. those are actually the things that get mounted. you'd think you'd mount a filesystem, but no, you mount a subvolume. that is backwards.

It looks something like this to lsblk:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931,5G 0 disk ├─sda1 8:1 0 200M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 7,8G 0 part [SWAP] └─sda4 8:4 0 922,5G 0 part └─fedora_crypt 253:4 0 922,5G 0 crypt /srv

Notice how we don't see all the BTRFS volumes here? Maybe it's because I'm mounting this from the Debian side, but lsblk definitely gets confused here. I frankly don't quite understand what's going on, even after repeatedly looking around the rather dismal documentation. But that's what I gather from the following commands:

root@curie:/home/anarcat# btrfs filesystem show Label: 'fedora' uuid: 5abb9def-c725-44ef-a45e-d72657803f37 Total devices 1 FS bytes used 883.29GiB devid 1 size 922.47GiB used 916.47GiB path /dev/mapper/fedora_crypt root@curie:/home/anarcat# btrfs subvolume list /srv ID 257 gen 108092 top level 5 path home ID 258 gen 108094 top level 5 path root ID 263 gen 108020 top level 258 path root/var/lib/machines

I only got to that point through trial and error. Notice how I use an existing mountpoint to list the related subvolumes. If I try to use the filesystem path, the one that's listed in filesystem show, I fail:

root@curie:/home/anarcat# btrfs subvolume list /dev/mapper/fedora_crypt ERROR: not a btrfs filesystem: /dev/mapper/fedora_crypt ERROR: can't access '/dev/mapper/fedora_crypt'

Maybe I just need to use the label? Nope:

root@curie:/home/anarcat# btrfs subvolume list fedora ERROR: cannot access 'fedora': No such file or directory ERROR: can't access 'fedora'

This is really confusing. I don't even know if I understand this right, and I've been staring at this all afternoon. Hopefully, the lazyweb will correct me eventually.

(As an aside, why are they called "subvolumes"? If something is a "sub" of "something else", that "something else" must exist right? But no, BTRFS doesn't have "volumes", it only has "subvolumes". Go figure. Presumably the filesystem still holds "files" though, at least empirically it doesn't seem like it lost anything so far.

In any case, at least I can refer to this section in the future, the next time I fumble around the btrfs commandline, as I surely will. I will possibly even update this section as I get better at it, or based on my reader's judicious feedback.

Mounting BTRFS subvolumes

So how did I even get to that point? I have this in my /etc/fstab, on the Debian side of things:

UUID=5abb9def-c725-44ef-a45e-d72657803f37 /srv btrfs defaults 0 2

This thankfully ignores all the subvolume nonsense because it relies on the UUID. mount tells me that's actually the "root" (? /?) subvolume:

root@curie:/home/anarcat# mount | grep /srv /dev/mapper/fedora_crypt on /srv type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)

Let's see if I can mount the other volumes I have on there. Remember that subvolume list showed I had home, root, and var/lib/machines. Let's try root:

mount -o subvol=root /dev/mapper/fedora_crypt /mnt

Interestingly, root is not the same as /, it's a different subvolume! It seems to be the Fedora root (/, really) filesystem. No idea what is happening here. I also have a home subvolume, let's mount it too, for good measure:

mount -o subvol=home /dev/mapper/fedora_crypt /mnt/home

Note that lsblk doesn't notice those two new mountpoints, and that's normal: it only lists block devices and subvolumes (rather inconveniently, I'd say) do not show up as devices:

root@curie:/home/anarcat# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931,5G 0 disk ├─sda1 8:1 0 200M 0 part ├─sda2 8:2 0 1G 0 part ├─sda3 8:3 0 7,8G 0 part └─sda4 8:4 0 922,5G 0 part └─fedora_crypt 253:4 0 922,5G 0 crypt /srv

This is really, really confusing. Maybe I did something wrong in the setup. Maybe it's because I'm mounting it from outside Fedora. Either way, it just doesn't feel right.

No disk usage per volume

If you want to see what's taking up space in one of those subvolumes, tough luck:

root@curie:/home/anarcat# df -h /srv /mnt /mnt/home Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora_crypt 923G 886G 31G 97% /srv /dev/mapper/fedora_crypt 923G 886G 31G 97% /mnt /dev/mapper/fedora_crypt 923G 886G 31G 97% /mnt/home

(Notice, in passing, that it looks like the same filesystem is mounted in different places. In that sense, you'd expect /srv and /mnt (and /mnt/home?!) to be exactly the same, but no: they are entirely different directory structures, which I will not call "filesystems" here because everyone's head will explode in sparks of confusion.)

Yes, disk space is shared (that's the Size and Avail columns, makes sense). But nope, no cookie for you: they all have the same Used columns, so you need to actually walk the entire filesystem to figure out what each disk takes.

(For future reference, that's basically:

root@curie:/home/anarcat# time du -schx /mnt/home /mnt /srv 124M /mnt/home 7.5G /mnt 875G /srv 883G total real 2m49.080s user 0m3.664s sys 0m19.013s

And yes, that was painfully slow.)

ZFS actually has some oddities in that regard, but at least it tells me how much disk each volume (and snapshot) takes:

root@tubman:~# time df -t zfs -h Filesystem Size Used Avail Use% Mounted on rpool/ROOT/debian 3.5T 1.4G 3.5T 1% / rpool/var/tmp 3.5T 384K 3.5T 1% /var/tmp rpool/var/spool 3.5T 256K 3.5T 1% /var/spool rpool/var/log 3.5T 2.0G 3.5T 1% /var/log rpool/home/root 3.5T 2.2G 3.5T 1% /root rpool/home 3.5T 256K 3.5T 1% /home rpool/srv 3.5T 80G 3.5T 3% /srv rpool/var/cache 3.5T 114M 3.5T 1% /var/cache bpool/BOOT/debian 571M 90M 481M 16% /boot real 0m0.003s user 0m0.002s sys 0m0.000s

That's 56360 times faster, by the way.

But yes, that's not fair: those in the know will know there's a different command to do what df does with BTRFS filesystems, the btrfs filesystem usage command:

root@curie:/home/anarcat# time btrfs filesystem usage /srv Overall: Device size: 922.47GiB Device allocated: 916.47GiB Device unallocated: 6.00GiB Device missing: 0.00B Used: 884.97GiB Free (estimated): 30.84GiB (min: 27.84GiB) Free (statfs, df): 30.84GiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data,single: Size:906.45GiB, Used:881.61GiB (97.26%) /dev/mapper/fedora_crypt 906.45GiB Metadata,DUP: Size:5.00GiB, Used:1.68GiB (33.58%) /dev/mapper/fedora_crypt 10.00GiB System,DUP: Size:8.00MiB, Used:128.00KiB (1.56%) /dev/mapper/fedora_crypt 16.00MiB Unallocated: /dev/mapper/fedora_crypt 6.00GiB real 0m0,004s user 0m0,000s sys 0m0,004s

Almost as fast as ZFS's df! Good job. But wait. That doesn't actually tell me usage per subvolume. Notice it's filesystem usage, not subvolume usage, which unhelpfully refuses to exist. That command only shows that one "filesystem" internal statistics that are pretty opaque.. You can also appreciate that it's wasting 6GB of "unallocated" disk space there: I probably did something Very Wrong and should be punished by Hacker News. I also wonder why it has 1.68GB of "metadata" used...

At this point, I just really want to throw that thing out of the window and restart from scratch. I don't really feel like learning the BTRFS internals, as they seem oblique and completely bizarre to me. It feels a little like the state of PHP now: it's actually pretty solid, but built upon so many layers of cruft that I still feel it corrupts my brain every time I have to deal with it (needle or haystack first? anyone?)...

Conclusion

I find BTRFS utterly confusing and I'm worried about its reliability. I think a lot of work is needed on usability and coherence before I even consider running this anywhere else than a lab, and that's really too bad, because there are really nice features in BTRFS that would greatly help my workflow. (I want to use filesystem snapshots as high-performance, high frequency backups.)

So now I'm experimenting with OpenZFS. It's so much simpler, just works, and it's rock solid. After this 8 minute read, I had a good understanding of how ZFS worked. Here's the 30 seconds overview:

  • vdev: a RAID array
  • vpool: a volume group of vdevs
  • datasets: normal filesystems (or block device, if you want to use another filesystem on top of ZFS)

There's also other special volumes like caches and logs that you can (really easily, compared to LVM caching) use to tweak your setup. You might also want to look at recordsize or ashift to tweak the filesystem to fit better your workload (or deal with drives lying about their sector size, I'm looking at you Samsung), but that's it.

Running ZFS on Linux currently involves building kernel modules from scratch on every host, which I think is pretty bad. But I was able to setup a ZFS-only server using this excellent documentation without too much problem.

I'm hoping some day the copyright issues are resolved and we can at least ship binary packages, but the politics (e.g. convincing Debian that is the right thing to do) and the logistics (e.g. DKMS auto-builders? is that even a thing? how about signed DKMS packages? fun-fun-fun!) seem really impractical. Who knows, maybe hell will freeze over (again) and Oracle will fix the CDDL. I personally think that we should just completely ignore this problem (which wasn't even supposed to be a problem) and ship binary packages directly, but I'm a pragmatic and do not always fit well with the free software fundamentalists.

All of this to say that, short term, we don't have a reliable, advanced filesystem/logical disk manager in Linux. And that's really too bad.

Categories: FLOSS Project Planets

Bits from Debian: New Debian Developers and Maintainers (March and April 2022)

Fri, 2022-05-13 11:00

The following contributors got their Debian Developer accounts in the last two months:

  • Henry-Nicolas Tourneur (hntourne)
  • Nick Black (dank)

The following contributors were added as Debian Maintainers in the last two months:

  • Jan Mojžíš
  • Philip Wyett
  • Thomas Ward
  • Fabio Fantoni
  • Mohammed Bilal
  • Guilherme de Paula Xavier Segundo

Congratulations!

Categories: FLOSS Project Planets

Arturo Borrero González: Toolforge GridEngine Debian 10 Buster migration

Fri, 2022-05-13 05:42

This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

In accordance with our operating system upgrade policy, we should migrate our servers to Debian Buster.

As discussed in the previous post, one of the most important and successful services provided by the Wikimedia Cloud Services team at the Wikimedia Foundation is Toolforge. Toolforge is a platform that allows users and developers to run and use a variety of applications with the ultimate goal of helping the Wikimedia mission from the technical side.

As you may know already, all Wikimedia Foundation servers are powered by Debian, and this includes Toolforge and Cloud VPS. The Debian Project mostly follows a two year cadence for releases, and Toolforge has been using Debian Stretch for some years now, which nowadays is considered “old-old-stable”. In accordance with our operating system upgrade policy, we should migrate our servers to Debian Buster.

Toolforge’s two different backend engines, Kubernetes and Grid Engine, are impacted by this upgrade policy. Grid Engine is notably tied to the underlying Debian release, and the execution environment offered to tools running in the grid is limited to what the Debian archive contains for a given release. This is unlike in Kubernetes, where tool developers can leverage container images and decouple the runtime environment selection from the base operating system.

Since the Toolforge grid original conception, we have been doing the same operation over and over again:

  • Prepare a parallel grid deployment with the new operating system.
  • Ask our users (tool developers) to evaluate a newer version of their runtime and programming languages.
  • Introduce a migration window and coordinate a quick migration.
  • Finally, drop the old operating system from grid servers.

We’ve done this type of migration several times before. The last few ones were Ubuntu Precise to Ubuntu Trusty and Ubuntu Trusty to Debian Stretch. But this time around we had some special angles to consider.

So, you are upgrading the Debian release
  • You are migrating to Debian 11 Bullseye, no?
  • No, we’re migrating to Debian 10 Buster
  • Wait, but Debian 11 Bullseye exists!
  • Yes, we know! Let me explain…

We’re migrating the grid from Debian 9 Stretch to Debian 10 Buster, but perhaps we should be migrating from Debian 9 Stretch to Debian 11 Bullseye directly. This is a legitimate concern, and we discussed it in September 2021.

Back then, our reasoning was that skipping to Debian 11 Bullseye would be more difficult for our users, especially because greater jump in version numbers for the underlying runtimes. Additionally, all the migration work started before Debian 11 Bullseye was released. Our original intention was for the migration to be completed before the release. For a couple of reasons the project was delayed, and when it was time to restart the project we decided to continue with the original idea.

We had some work done to get Debian 10 Buster working correctly with the grid, and supporting Debian 11 Bullseye would require an additional effort. We didn’t even check if Grid Engine could be installed in the latest Debian release. For the grid, in general, the engineering effort to do a N+1 upgrade is lower than doing a N+2 upgrade. If we had tried a N+2 upgrade directly, things would have been much slower and difficult for us, and for our users.

In that sense, our conclusion was to not skip Debian 10 Buster.

We no longer want to run Grid Engine

In a previous blog post we shared information about our desired future for Grid Engine in Toolforge. Our intention is to discontinue our usage of this technology.

No grid? What about my tools?

Traditionally there have been two main workflows or use cases that were supported in the grid, but not in our Kubernetes backend:

  • Running jobs, long-running bots and other scheduled tasks.
  • Mixing runtime environments (for example, a nodejs app that runs some python code).

The good news is that work to handle the continuity of such use cases has already started. This takes the form of two main efforts:

  • The Toolforge buildpacks project — to support arbitrary runtime environments.
  • The Toolforge Jobs Framework — to support jobs, scheduled tasks, etc.

In particular, the Toolforge Jobs Framework has been available for a while in an open beta phase. We did some initial design and implementation, then deployed it in Toolforge for some users to try it and report bugs, report missing features, etc.

These are complex, and feature-rich projects, and they deserve a dedicated blog post. More information on each will be shared in the future. For now, it is worth noting that both initiatives have some degree of development already.

The conclusion

Knowing all the moving parts, we were faced with a few hard questions when deciding how to approach the Debian 9 Stretch deprecation:

  • Should we not upgrade the grid, and focus on Kubernetes instead? Let Debian 9 Stretch be the last supported version on the grid?
  • What is the impact of these decisions on the technical community? What is best for our users?

The choices we made are already known in the community. A couple of weeks ago we announced the Debian 9 Stretch Grid Engine deprecation. In parallel to this migration, we decided to promote the new Toolforge Jobs Framework, even if it’s still in beta phase. This new option should help users to future-proof their tool, and reduce maintenance effort. An early migration to Kubernetes now will avoid any more future grid problems.

We truly hope that Debian 10 Buster is the last version we have for the grid, but as they say, hope is not a good strategy when it comes to engineering. What we will do is to work really hard in bringing Toolforge to the service level we want, and that means to keep developing and enabling more Kubernetes-based functionalities.

Stay tuned for more upcoming blog posts with additional information about Toolforge.

This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

Categories: FLOSS Project Planets

Reproducible Builds (diffoscope): diffoscope 212 released

Thu, 2022-05-12 20:00

The diffoscope maintainers are pleased to announce the release of diffoscope version 212. This version includes the following changes:

* Add support for extracting vmlinuz/vmlinux Linux kernel images. (Closes: reproducible-builds/diffoscope#304) * Some Python .pyc files report as "data", so support ".pyc" as a fallback extension.

You find out more by visiting the project homepage.

Categories: FLOSS Project Planets

Jonathan Dowland: Scalable Computing seminar

Thu, 2022-05-12 06:19

Last week I delivered a seminar for the research group I belong to, Scalable Computing. This was a slightly-expanded version of the presentation I gave at uksystems21. The most substantial change is the addition of a fourth example to describe recent work on optimising for a second non-functional requirement: Bandwidth.

Categories: FLOSS Project Planets

Raphaël Hertzog: Debian 9 soon out of (free) security support

Wed, 2022-05-11 09:13

Organizations that are still running Debian 9 servers should be aware that the security support of the Debian LTS team will end on June 30th 2022.

If upgrading to a newer Debian release is not an option for them, then they should consider subscribing to Freexian’s Extended LTS to get security support for the packages that they are using on their servers.

It’s worth pointing out that we made some important changes to Freexian’s Extended LTS offering :

  • we are now willing to support each Debian release for up to 10 years (so 5 years of ELTS support after the 5 initial years), provided that we have customers willing to pay the required amount.
  • we have changed our pricing scheme so that we can announce up-front the (increasing) cost over the 5 years of ELTS
  • we have dropped the requirement to subscribe to the Debian LTS sponsorship, though it’s still a good idea to contribute to the funding of that project to ensure that one’s packages are properly monitored/maintained during the LTS period

This means that we have again extended the life of Debian 8 Jessie, this time until June 30th 2025. And that Debian 9 Stretch – that will start its “extended” life on July 1st 2022 – can be maintained up to June 30th 2027.

Organizations using Debian 10 should consider sponsoring the Debian LTS team since security support for that Debian release will soon transition from the regular security team to the LTS team.

Categories: FLOSS Project Planets

Ben Hutchings: Debian LTS work, April 2022

Tue, 2022-05-10 17:41

In April I was assigned 16 hours of work by Freexian's Debian LTS initiative and carried over 8 hours from March. I worked 11 hours, and will carry over the remaining time to May.

I spent most of my time triaging security issues for Linux, working out which of them were fixed upstream and which actually applied to the versions provided in Debian 9 "stretch". I also rebased the Linux 4.9 (linux) package on the latest stable update, but did not make an upload this month.

Categories: FLOSS Project Planets

Daniel Kahn Gillmor: 2022 Digital Rights Job Fair

Tue, 2022-05-10 16:39

I'm lucky enough to work at the intersection between information communications technology and civil rights/civil liberties. I get to combine technical interests and social/political interests.

I've talked with many folks over the years who are interested in doing similar work. Some come from a technical background, and some from an activist background (and some from both). Are you one of them? Are you someone who works as an activist or in a technical field who wants to look into different ways of meging these interests?

Some great organizers maintain a job board for Digital Rights. Next month they'll host a Digital Rights Job Fair, which offers an opportunity to talk with good people at organizations that fight in different ways for a better world. You need to RSVP to attend.

Categories: FLOSS Project Planets

Russell Coker: Elon and Free Speech

Tue, 2022-05-10 07:53

Elon Musk has made the news for spending billions to buy a share of Twitter for the alleged purpose of providing free speech. The problem with this claim is that having any company controlling a large portion of the world’s communication is inherently bad for free speech. The same applies for Facebook, but that’s not a hot news item at the moment.

If Elon wanted to provide free speech he would want to have decentralised messaging systems so that someone who breaks rules on one platform could find another with different rules. Among other things free speech ideally permits people to debate issues with residents of another country on issues related to different laws. If advocates for the Russian government get kicked off Twitter as part of the American sanctions against Russia then American citizens can’t debate the issue with Russian citizens via Twitter. Mastodon is one example of a federated competitor to Twitter [1]. With a federated messaging system each host could make independent decisions about interpretation of sanctions. Someone who used a Mastodon instance based in the US could get a second account in another country if they wanted to communicate with people in countries that are sanctioned by the US.

The problem with Mastodon at the moment is lack of use. It’s got a good set of features and support for different platforms, there are apps for Android and iPhone as well as lots of other software using the API. But if the people you want to communicate with aren’t on it then it’s less useful. Elon could solve that problem by creating a Tesla Mastodon server and give a free account to everyone who buys a new Tesla, which is the sort of thing that a lot of Tesla buyers would like. It’s quite likely that other companies selling prestige products would follow that example. Everyone has seen evidence of people sharing photos on social media with someone else’s expensive car, a Mastodon account on ferrari.com or mercedes.com would be proof of buying the cars in question. The number of people who buy expensive cars new is a very small portion of the world population, but it’s a group of people who are more influential than average and others would join Mastodon servers to follow them.

The next thing that Elon could do to kill Twitter would be to have all his companies (which have something more than a dozen verified Twitter accounts) use Mastodon accounts for their primary PR releases and then send the same content to Twitter with a 48 hour delay. That would force journalists and people who want to discuss those companies on social media to follow the Mastodon accounts. Again this wouldn’t be a significant number of people, but they would be influential people. Getting journalists to use a communications system increases it’s importance.

The question is whether Elon is lacking the vision necessary to plan a Mastodon deployment or whether he just wants to allow horrible people to run wild on Twitter.

The Verge has an interesting article from 2019 about Gab using Mastodon [2]. The fact that over the last 2.5 years I didn’t even hear of Gab using Mastodon suggests that the fears of some people significantly exceeded the problem. I’m sure that some Gab users managed to harass some Mastodon users, but generally they were apparently banned quickly. As an aside the Mastodon server I use doesn’t appear to ban Gab, a search for Gab on it gave me a user posting about being “pureblood” at the top of the list.

Gab claims to have 4 million accounts and has an estimated 100,000 active users. If 5.5% of Tesla owners became active users on a hypothetical Tesla server that would be the largest Mastodon server. Elon could demonstrate his commitment to free speech by refusing to ban Gab in any way. The Wikipedia page about Gab [3] has a long list of horrible people and activities associated with it. Is that the “free speech” to associate with Tesla? Polestar makes some nice electric cars that appear quite luxurious [4] and doesn’t get negative PR from the behaviour of it’s owner, that’s something Elon might want to consider.

Is this really about bragging rights? Buying a controlling interest in a company that has a partial monopoly on Internet communication is something to boast about. Could users of commercial social media be considered serfs who serve their billionaire overlord?

Related posts:

  1. Car Drivers vs Mechanics and Free Software In a comment on my post about Designing Unsafe Cars...
  2. Healthcare and Free Software The Washington Monthly has an interesting article about healthcare and...
  3. free software liason? In my previous work as a sys-admin I have worked...
Categories: FLOSS Project Planets

Melissa Wen: Multiple syncobjs support for V3D(V) (Part 2)

Tue, 2022-05-10 05:00

In the previous post, I described how we enable multiple syncobjs capabilities in the V3D kernel driver. Now I will tell you what was changed on the userspace side, where we reworked the V3DV sync mechanisms to use Vulkan multiple wait and signal semaphores directly. This change represents greater adherence to the Vulkan submission framework.

I was not used to Vulkan concepts and the V3DV driver. Fortunately, I counted on the guidance of the Igalia’s Graphics team, mainly Iago Toral (thanks!), to understand the Vulkan Graphics Pipeline, sync scopes, and submission order. Therefore, we changed the original V3DV implementation for vkQueueSubmit and all related functions to allow direct mapping of multiple semaphores from V3DV to the V3D-kernel interface.

Disclaimer: Here’s a brief and probably inaccurate background, which we’ll go into more detail later on.

In Vulkan, GPU work submissions are described as command buffers. These command buffers, with GPU jobs, are grouped in a command buffer submission batch, specified by vkSubmitInfo, and submitted to a queue for execution. vkQueueSubmit is the command called to submit command buffers to a queue. Besides command buffers, vkSubmitInfo also specifies semaphores to wait before starting the batch execution and semaphores to signal when all command buffers in the batch are complete. Moreover, a fence in vkQueueSubmit can be signaled when all command buffer batches have completed execution.

From this sequence, we can see some implicit ordering guarantees. Submission order defines the start order of execution between command buffers, in other words, it is determined by the order in which pSubmits appear in VkQueueSubmit and pCommandBuffers appear in VkSubmitInfo. However, we don’t have any completion guarantees for jobs submitted to different GPU queue, which means they may overlap and complete out of order. Of course, jobs submitted to the same GPU engine follow start and finish order. A fence is ordered after all semaphores signal operations for signal operation order. In addition to implicit sync, we also have some explicit sync resources, such as semaphores, fences, and events.

Considering these implicit and explicit sync mechanisms, we rework the V3DV implementation of queue submissions to better use multiple syncobjs capabilities from the kernel. In this merge request, you can find this work: v3dv: add support to multiple wait and signal semaphores. In this blog post, we run through each scope of change of this merge request for a V3D driver-guided description of the multisync support implementation.

Groundwork and basic code clean-up:

As the original V3D-kernel interface allowed only one semaphore, V3DV resorted to booleans to “translate” multiple semaphores into one. Consequently, if a command buffer batch had at least one semaphore, it needed to wait on all jobs submitted complete before starting its execution. So, instead of just boolean, we created and changed structs that store semaphores information to accept the actual list of wait semaphores.

Expose multisync kernel interface to the driver:

In the two commits below, we basically updated the DRM V3D interface from that one defined in the kernel and verified if the multisync capability is available for use.

Handle multiple semaphores for all GPU job types:

At this point, we were only changing the submission design to consider multiple wait semaphores. Before supporting multisync, V3DV was waiting for the last job submitted to be signaled when at least one wait semaphore was defined, even when serialization wasn’t required. V3DV handle GPU jobs according to the GPU queue in which they are submitted:

  • Control List (CL) for binning and rendering
  • Texture Formatting Unit (TFU)
  • Compute Shader Dispatch (CSD)

Therefore, we changed their submission setup to do jobs submitted to any GPU queues able to handle more than one wait semaphores.

These commits created all mechanisms to set arrays of wait and signal semaphores for GPU job submissions:

  • Checking the conditions to define the wait_stage.
  • Wrapping them in a multisync extension.
  • According to the kernel interface (described in the previous blog post), configure the generic extension as a multisync extension.

Finally, we extended the ability of GPU jobs to handle multiple signal semaphores, but at this point, no GPU job is actually in charge of signaling them. With this in place, we could rework part of the code that tracks CPU and GPU job completions by verifying the GPU status and threads spawned by Event jobs.

Rework the QueueWaitIdle mechanism to track the syncobj of the last job submitted in each queue:

As we had only single in/out syncobj interfaces for semaphores, we used a single last_job_sync to synchronize job dependencies of the previous submission. Although the DRM scheduler guarantees the order of starting to execute a job in the same queue in the kernel space, the order of completion isn’t predictable. On the other hand, we still needed to use syncobjs to follow job completion since we have event threads on the CPU side. Therefore, a more accurate implementation requires last_job syncobjs to track when each engine (CL, TFU, and CSD) is idle. We also needed to keep the driver working on previous versions of v3d kernel-driver with single semaphores, then we kept tracking ANY last_job_sync to preserve the previous implementation.

Rework synchronization and submission design to let the jobs handle wait and signal semaphores:

With multiple semaphores support, the conditions for waiting and signaling semaphores changed accordingly to the particularities of each GPU job (CL, CSD, TFU) and CPU job restrictions (Events, CSD indirect, etc.). In this sense, we redesigned V3DV semaphores handling and job submissions for command buffer batches in vkQueueSubmit.

We scrutinized possible scenarios for submitting command buffer batches to change the original implementation carefully. It resulted in three commits more:

We keep track of whether we have submitted a job to each GPU queue (CSD, TFU, CL) and a CPU job for each command buffer. We use syncobjs to track the last job submitted to each GPU queue and a flag that indicates if this represents the beginning of a command buffer.

The first GPU job submitted to a GPU queue in a command buffer should wait on wait semaphores. The first CPU job submitted in a command buffer should call v3dv_QueueWaitIdle() to do the waiting and ignore semaphores (because it is waiting for everything).

If the job is not the first but has the serialize flag set, it should wait on the completion of all last job submitted to any GPU queue before running. In practice, it means using syncobjs to track the last job submitted by queue and add these syncobjs as job dependencies of this serialized job.

If this job is the last job of a command buffer batch, it may be used to signal semaphores if this command buffer batch has only one type of GPU job (because we have guarantees of execution ordering). Otherwise, we emit a no-op job just to signal semaphores. It waits on the completion of all last jobs submitted to any GPU queue and then signal semaphores. Note: We changed this approach to correctly deal with ordering changes caused by event threads at some point. Whenever we have an event job in the command buffer, we cannot use the last job in the last command buffer assumption. We have to wait all event threads complete to signal

After submitting all command buffers, we emit a no-op job to wait on all last jobs by queue completion and signal fence. Note: at some point, we changed this approach to correct deal with ordering changes caused by event threads, as mentioned before.

Final considerations

With many changes and many rounds of reviews, the patchset was merged. After more validations and code review, we polished and fixed the implementation together with external contributions:

Also, multisync capabilities enabled us to add new features to V3DV and switch the driver to the common synchronization and submission framework:

  • v3dv: expose support for semaphore imports

    This was waiting for multisync support in the v3d kernel, which is already available. Exposing this feature however enabled a few more CTS tests that exposed pre-existing bugs in the user-space driver so we fix those here before exposing the feature.

  • v3dv: Switch to the common submit framework

    This should give you emulated timeline semaphores for free and kernel-assisted sharable timeline semaphores for cheap once you have the kernel interface wired in.

We used a set of games to ensure no performance regression in the new implementation. For this, we used GFXReconstruct to capture Vulkan API calls when playing those games. Then, we compared results with and without multisync caps in the kernelspace and also enabling multisync on v3dv. We didn’t observe any compromise in performance, but improvements when replaying scenes of vkQuake game.

Categories: FLOSS Project Planets

Melissa Wen: Multiple syncobjs support for V3D(V) (Part 1)

Tue, 2022-05-10 04:00

As you may already know, we at Igalia have been working on several improvements to the 3D rendering drivers of Broadcom Videocore GPU, found in Raspberry Pi 4 devices. One of our recent works focused on improving V3D(V) drivers adherence to Vulkan submission and synchronization framework. We had to cross various layers from the Linux Graphics stack to add support for multiple syncobjs to V3D(V), from the Linux/DRM kernel to the Vulkan driver. We have delivered bug fixes, a generic gate to extend job submission interfaces, and a more direct sync mapping of the Vulkan framework. These changes did not impact the performance of the tested games and brought greater precision to the synchronization mechanisms. Ultimately, support for multiple syncobjs opened the door to new features and other improvements to the V3DV submission framework.

DRM Syncobjs

But, first, what are DRM sync objs?

* DRM synchronization objects (syncobj, see struct &drm_syncobj) provide a * container for a synchronization primitive which can be used by userspace * to explicitly synchronize GPU commands, can be shared between userspace * processes, and can be shared between different DRM drivers. * Their primary use-case is to implement Vulkan fences and semaphores. [...] * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL.

And Jason Ekstrand well-summarized dma_fence features in a talk at the Linux Plumbers Conference 2021:

A struct that represents a (potentially future) event:

  • Has a boolean “signaled” state
  • Has a bunch of useful utility helpers/concepts, such as refcount, callback wait mechanisms, etc.

Provides two guarantees:

  • One-shot: once signaled, it will be signaled forever
  • Finite-time: once exposed, is guaranteed signal in a reasonable amount of time
What does multiple semaphores support mean for Raspberry Pi 4 GPU drivers?

For our main purpose, the multiple syncobjs support means that V3DV can submit jobs with more than one wait and signal semaphore. In the kernel space, wait semaphores become explicit job dependencies to wait on before executing the job. Signal semaphores (or post dependencies), in turn, work as fences to be signaled when the job completes its execution, unlocking following jobs that depend on its completion.

The multisync support development comprised of many decision-making points and steps summarized as follow:

  • added to the v3d kernel-driver capabilities to handle multiple syncobj;
  • exposed multisync capabilities to the userspace through a generic extension; and
  • reworked synchronization mechanisms of the V3DV driver to benefit from this feature
  • enabled simulator to work with multiple semaphores
  • tested on Vulkan games to verify the correctness and possible performance enhancements.

We decided to refactor parts of the V3D(V) submission design in kernel-space and userspace during this development. We improved job scheduling on V3D-kernel and the V3DV job submission design. We also delivered more accurate synchronizing mechanisms and further updates in the Broadcom Vulkan driver running on Raspberry Pi 4. Therefore, we summarize here changes in the kernel space, describing the previous state of the driver, taking decisions, side improvements, and fixes.

From single to multiple binary in/out syncobjs:

Initially, V3D was very limited in the numbers of syncobjs per job submission. V3D job interfaces (CL, CSD, and TFU) only supported one syncobj (in_sync) to be added as an execution dependency and one syncobj (out_sync) to be signaled when a submission completes. Except for CL submission, which accepts two in_syncs: one for binner and another for render job, it didn’t change the limited options.

Meanwhile in the userspace, the V3DV driver followed alternative paths to meet Vulkan’s synchronization and submission framework. It needed to handle multiple wait and signal semaphores, but the V3D kernel-driver interface only accepts one in_sync and one out_sync. In short, V3DV had to fit multiple semaphores into one when submitting every GPU job.

Generic ioctl extension

The first decision was how to extend the V3D interface to accept multiple in and out syncobjs. We could extend each ioctl with two entries of syncobj arrays and two entries for their counters. We could create new ioctls with multiple in/out syncobj. But after examining other drivers solutions to extend their submission’s interface, we decided to extend V3D ioctls (v3d_cl_submit_ioctl, v3d_csd_submit_ioctl, v3d_tfu_submit_ioctl) by a generic ioctl extension.

I found a curious commit message when I was examining how other developers handled the issue in the past:

Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 22 09:23:22 2019 +0000 drm/i915: Introduce the i915_user_extension_method An idea for extending uABI inspired by Vulkan's extension chains. Instead of expanding the data struct for each ioctl every time we need to add a new feature, define an extension chain instead. As we add optional interfaces to control the ioctl, we define a new extension struct that can be linked into the ioctl data only when required by the user. The key advantage being able to ignore large control structs for optional interfaces/extensions, while being able to process them in a consistent manner. In comparison to other extensible ioctls, the key difference is the use of a linked chain of extension structs vs an array of tagged pointers. For example, struct drm_amdgpu_cs_chunk { __u32 chunk_id; __u32 length_dw; __u64 chunk_data; }; [...]

So, inspired by amdgpu_cs_chunk and i915_user_extension, we opted to extend the V3D interface through a generic interface. After applying some suggestions from Iago Toral (Igalia) and Daniel Vetter, we reached the following struct:

struct drm_v3d_extension { __u64 next; __u32 id; #define DRM_V3D_EXT_ID_MULTI_SYNC 0x01 __u32 flags; /* mbz */ };

This generic extension has an id to identify the feature/extension we are adding to an ioctl (that maps the related struct type), a pointer to the next extension, and flags (if needed). Whenever we need to extend the V3D interface again for another specific feature, we subclass this generic extension into the specific one instead of extending ioctls indefinitely.

Multisync extension

For the multiple syncobjs extension, we define a multi_sync extension struct that subclasses the generic extension struct. It has arrays of in and out syncobjs, the respective number of elements in each of them, and a wait_stage value used in CL submissions to determine which job needs to wait for syncobjs before running.

struct drm_v3d_multi_sync { struct drm_v3d_extension base; /* Array of wait and signal semaphores */ __u64 in_syncs; __u64 out_syncs; /* Number of entries */ __u32 in_sync_count; __u32 out_sync_count; /* set the stage (v3d_queue) to sync */ __u32 wait_stage; __u32 pad; /* mbz */ };

And if a multisync extension is defined, the V3D driver ignores the previous interface of single in/out syncobjs.

Once we had the interface to support multiple in/out syncobjs, v3d kernel-driver needed to handle it. As V3D uses the DRM scheduler for job executions, changing from single syncobj to multiples is quite straightforward. V3D copies from userspace the in syncobjs and uses drm_syncobj_find_fence()+ drm_sched_job_add_dependency() to add all in_syncs (wait semaphores) as job dependencies, i.e. syncobjs to be checked by the scheduler before running the job. On CL submissions, we have the bin and render jobs, so V3D follows the value of wait_stage to determine which job depends on those in_syncs to start its execution.

When V3D defines the last job in a submission, it replaces dma_fence of out_syncs with the done_fence from this last job. It uses drm_syncobj_find() + drm_syncobj_replace_fence() to do that. Therefore, when a job completes its execution and signals done_fence, all out_syncs are signaled too.

Other improvements to v3d kernel driver

This work also made possible some improvements in the original implementation. Following Iago’s suggestions, we refactored the job’s initialization code to allocate memory and initialize a job in one go. With this, we started to clean up resources more cohesively, clearly distinguishing cleanups in case of failure from job completion. We also fixed the resource cleanup when a job is aborted before the DRM scheduler arms it - at that point, drm_sched_job_arm() had recently been introduced to job initialization. Finally, we prepared the semaphore interface to implement timeline syncobjs in the future.

Going Up

The patchset that adds multiple syncobjs support and improvements to V3D is available here and comprises four patches:

  • drm/v3d: decouple adding job dependencies steps from job init
  • drm/v3d: alloc and init job in one shot
  • drm/v3d: add generic ioctl extension
  • drm/v3d: add multiple syncobjs support

After extending the V3D kernel interface to accept multiple syncobjs, we worked on V3DV to benefit from V3D multisync capabilities. In the next post, I will describe a little of this work.

Categories: FLOSS Project Planets

Utkarsh Gupta: FOSS Activites in April 2022

Tue, 2022-05-10 01:41

Here’s my (thirty-first) monthly but brief update about the activities I’ve done in the F/L/OSS world.

Debian

This was my 40th month of actively contributing to Debian. I became a DM in late March 2019 and a DD on Christmas ‘19! \o/

There’s a bunch of things I did this month but mostly non-technical, now that DC22 is around the corner. Here are the things I did:

Debian Uploads
  • Helped Andrius w/ FTBFS for php-text-captcha, reported via #977403.
    • I fixed the samed in Ubuntu a couple of months ago and they copied over the patch here.
Other $things:
  • Volunteering for DC22 Content team.
  • Leading the Bursary team w/ Paulo.
  • Answering a bunch of questions of referees and attendees around bursary.
  • Being an AM for Arun Kumar, process #1024.
  • Mentoring for newcomers.
  • Moderation of -project mailing list.
Ubuntu

This was my 15th month of actively contributing to Ubuntu. Now that I joined Canonical to work on Ubuntu full-time, there’s a bunch of things I do! \o/

I mostly worked on different things, I guess.

I was too lazy to maintain a list of things I worked on so there’s no concrete list atm. Maybe I’ll get back to this section later or will start to list stuff from the fall, as I was doing before. :D

Debian (E)LTS

Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years. Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success.

And Debian Extended LTS (ELTS) is its sister project, extending support to the Jessie release (+2 years after LTS support).

This was my thirty-first month as a Debian LTS and twentieth month as a Debian ELTS paid contributor.
I worked for 23.25 hours for LTS and 20.00 hours for ELTS.

LTS CVE Fixes and Announcements:
  • Issued DLA 2976-1, fixing CVE-2022-1271, for gzip.
    For Debian 9 stretch, these problems have been fixed in version 1.6-5+deb9u1.
  • Issued DLA 2977-1, fixing CVE-2022-1271, for xz-utils.
    For Debian 9 stretch, these problems have been fixed in version 5.2.2-1.2+deb9u1.
  • Working on src:tiff and src:mbedtls to fix the issues, still waiting for more issues to be reported, though.
  • Looking at src:mutt CVEs. Haven’t had the time to complete but shall roll out next month.
ELTS CVE Fixes and Announcements:
  • Issued ELA 593-1, fixing CVE-2022-1271, for gzip.
    For Debian 8 jessie, these problems have been fixed in version 1.6-4+deb8u1.
  • Issued ELA 594-1, fixing CVE-2022-1271, for xz-utils.
    For Debian 8 jessie, these problems have been fixed in version 5.1.1alpha+20120614-2+deb8u1.
  • Issued ELA 598-1, fixing CVE-2019-16935, CVE-2021-3177, and CVE-2021-4189, for python2.7.
    For Debian 8 jessie, these problems have been fixed in version 2.7.9-2-ds1-1+deb8u9.
  • Working on src:tiff and src:beep to fix the issues, still waiting for more issues to be reported for src:tiff and src:beep is a bit of a PITA, though. :)
Other (E)LTS Work:
  • Triaged gzip, xz-utils, tiff, beep, python2.7, python-django, and libgit2,
  • Signed up to be a Freexian Collaborator! \o/
  • Read through some bits around that.
  • Helped and assisted new contributors joining Freexian.
  • Answered questions (& discussions) on IRC (#debian-lts and #debian-elts).
  • General and other discussions on LTS private and public mailing list.
  • Attended monthly Debian meeting. Held on Jitsi this month.
Debian LTS Survey

I’ve spent 18 hours on the LTS survey on the following bits:

  • Rolled out the announcement. Started the survey.
  • Answered a bunch of queries, people asked via e-mail.
  • Looked at another bunch of tickets: https://salsa.debian.org/freexian-team/project-funding/-/issues/23.
  • Sent a reminder and fixed a few things here and there.
  • Gave a status update during the meeting.
  • Extended the duration of the survey.

Until next time.
:wq for today.

Categories: FLOSS Project Planets

Robert McQueen: Evolving a strategy for 2022 and beyond

Mon, 2022-05-09 10:01

As a board, we have been working on several initiatives to make the Foundation a better asset for the GNOME Project. We’re working on a number of threads in parallel, so I wanted to explain the “big picture” a bit more to try and connect together things like the new ED search and the bylaw changes.

We’re all here to see free and open source software succeed and thrive, so that people can be be truly empowered with agency over their technology, rather than being passive consumers. We want to bring GNOME to as many people as possible so that they have computing devices that they can inspect, trust, share and learn from.

In previous years we’ve tried to boost the relevance of GNOME (or technologies such as GTK) or solicit donations from businesses and individuals with existing engagement in FOSS ideology and technology. The problem with this approach is that we’re mostly addressing people and organisations who are already supporting or contributing FOSS in some way. To truly scale our impact, we need to look to the outside world, build better awareness of GNOME outside of our current user base, and find opportunities to secure funding to invest back into the GNOME project.

The Foundation supports the GNOME project with infrastructure, arranging conferences, sponsoring hackfests and travel, design work, legal support, managing sponsorships, advisory board, being the fiscal sponsor of GNOME, GTK, Flathub… and we will keep doing all of these things. What we’re talking about here are additional ways for the Foundation to support the GNOME project – we want to go beyond these activities, and invest into GNOME to grow its adoption amongst people who need it. This has a cost, and that means in parallel with these initiatives, we need to find partners to fund this work.

Neil has previously talked about themes such as education, advocacy, privacy, but we’ve not previously translated these into clear specific initiatives that we would establish in addition to the Foundation’s existing work. This is all a work in progress and we welcome any feedback from the community about refining these ideas, but here are the current strategic initiatives the board is working on. We’ve been thinking about growing our community by encouraging and retaining diverse contributors, and addressing evolving computing needs which aren’t currently well served on the desktop.

Initiative 1. Welcoming newcomers. The community is already spending a lot of time welcoming newcomers and teaching them the best practices. Those activities are as time consuming as they are important, but currently a handful of individuals are running initiatives such as GSoC, Outreachy and outreach to Universities. These activities help bring diverse individuals and perspectives into the community, and helps them develop skills and experience of collaborating to create Open Source projects. We want to make those efforts more sustainable by finding sponsors for these activities. With funding, we can hire people to dedicate their time to operating these programs, including paid mentors and creating materials to support newcomers in future, such as developer documentation, examples and tutorials. This is the initiative that needs to be refined the most before we can turn it into something real.

Initiative 2: Diverse and sustainable Linux app ecosystem. I spoke at the Linux App Summit about the work that GNOME and Endless has been supporting in Flathub, but this is an example of something which has a great overlap between commercial, technical and mission-based advantages. The key goal here is to improve the financial sustainability of participating in our community, which in turn has an impact on the diversity of who we can expect to afford to enter and remain in our community. We believe the existence of this is critically important for individual developers and contributors to unlock earning potential from our ecosystem, through donations or app sales. In turn, a healthy app ecosystem also improves the usefulness of the Linux desktop as a whole for potential users. We believe that we can build a case for commercial vendors in the space to join an advisory board alongside with GNOME, KDE, etc to input into the governance and contribute to the costs of growing Flathub.

Initiative 3: Local-first applications for the GNOME desktop. This is what Thib has been starting to discuss on Discourse, in this thread. There are many different threats to free access to computing and information in today’s world. The GNOME desktop and apps need to give users convenient and reliable access to technology which works similarly to the tools they already use everyday, but keeps them and their data safe from surveillance, censorship, filtering or just being completely cut off from the Internet. We believe that we can seek both philanthropic and grant funding for this work. It will make GNOME a more appealing and comprehensive offering for the many people who want to protect their privacy.

The idea is that these initiatives all sit on the boundary between the GNOME community and the outside world. If the Foundation can grow and deliver these kinds of projects, we are reaching to new people, new contributors and new funding. These contributions and investments back into GNOME represent a true “win-win” for the newcomers and our existing community.

(Originally posted to GNOME Discourse, please feel free to join the discussion there.)

Categories: FLOSS Project Planets

Pages