GNU Planet!

Subscribe to GNU Planet! feed
Planet GNU - https://planet.gnu.org/
Updated: 7 hours 38 min ago

www-zh-cn @ Savannah: It is easy to contribute to GNU

Wed, 2024-04-17 19:30

I will be delivering my talk, "It is easy to contribute to GNU," Saturday, May 4, 2024, 12:15--13:00 EDT (16:00 UTC), at the LibrePlanet 2024 conference, and I hope you’ll check it out!

LibrePlanet is a conference about software freedom, happening on May 4 & 5, 2024. The event is hosted by the Free Software Foundation (FSF), and brings together software developers, law and policy experts, activists, students, and computer users to learn skills, celebrate free software accomplishments, and face upcoming challenges. Newcomers are always welcome, and LibrePlanet 2024 will feature programming for all ages and experience levels.

*Please register in advance at <https://libreplanet.org/2024/>.*

wxie

Categories: FLOSS Project Planets

GNU Taler news: GNU Taler v0.10 released

Sun, 2024-04-14 18:00
We are happy to announce the release of GNU Taler v0.10.
Categories: FLOSS Project Planets

Simon Josefsson: Reproducible and minimal source-only tarballs

Sat, 2024-04-13 12:44

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn’t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are hopefully reproducible on several architectures.

What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader…

This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work.

Let’s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:

git clone https://gitlab.com/gsasl/libntlm.git cd libntlm git checkout v1.8 ./bootstrap ./configure make distcheck gpg -b libntlm-1.8.tar.gz

The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980’s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened.

The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs.

The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs — although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like.

Publishing minimal source-only tarballs enable easier auditing of a project’s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover.

With all that discussion and rationale out of the way, let’s return to the release process. I have added another step here:

make srcdist gpg -b libntlm-1.8-src.tar.gz

Now the release is ready. I publish these four files in the Libntlm’s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:

91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca libntlm-1.8-src.tar.gz ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1 libntlm-1.8.tar.gz

So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:

podman run -it --rm ubuntu:22.04 apt-get update apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates git clone https://gitlab.com/gsasl/libntlm.git cd libntlm git checkout v1.8 ./bootstrap ./configure make dist srcdist sha256sum libntlm-*.tar.gz

You should see the exact same SHA256 checksum values. Hooray!

This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed.

Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container – replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:

podman run -it --rm almalinux:8 dnf update -y dnf install -y make wget gcc wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz tar xfa libntlm-1.8.tar.gz cd libntlm-1.8 ./configure make dist sha256sum libntlm-1.8.tar.gz

The source-only minimal tarball can be regenerated on Debian 11:

podman run -it --rm debian:11 apt-get update apt-get install -y --no-install-recommends make git ca-certificates git clone https://gitlab.com/gsasl/libntlm.git cd libntlm git checkout v1.8 make -f cfg.mk srcdist sha256sum libntlm-1.8-src.tar.gz

As the Magnus Opus or chef-d’œuvre, let’s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 – replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.

podman run -it --rm docker.io/kpengboy/trisquel:11.0 apt-get update apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz tar xfa libntlm-1.8-src.tar.gz cd libntlm-v1.8 ./bootstrap ./configure make dist sha256sum libntlm-1.8.tar.gz

Yay! You should now have great confidence in that the release artifacts correspond to what’s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts.

I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot.

Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I’ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab’s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the ‘v’ in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah’s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah’s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I’ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion.

Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs – even though the content would.

Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm’s ChangeLog file died on the surgery table here.

So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian’s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm’s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team.

Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work — see my notes in gnulib’s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately.

One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don’t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We’ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn’t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don’t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn’t require the git tool to extract the necessary files, but none that I found practical — ideas welcome!

Finally some brief notes on how this was implementated. Enabling bootstrappable source-only minimal tarballs via gnulib’s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf.

The srcdist make rule is simple:

git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD

Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to a timestamp found in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn’t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I’m using now is fairly similar to the one I suggested over a year ago.

Doing continous testing of all this is critical to make sure things don’t regress. Libntlm’s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa.

If this way of working plays out well, I hope to implement it in other projects too.

What do you think? Happy Hacking!

Categories: FLOSS Project Planets

FSF Blogs: Meet the locals: Come to LibrePlanet and connect with free software supporters in New England

Wed, 2024-04-10 14:11
New England free software supporters: we invite you to come socialize with other local free software supporters at LibrePlanet 2024.
Categories: FLOSS Project Planets

stow @ Savannah: GNU Stow 2.4.0 released

Sun, 2024-04-07 19:22

Stow 2.4.0 has been released. This release contains some much-wanted bug-fixes — specifically, fixing the --dotfiles option to work with dot-foo directories, and avoiding a spurious warning when unstowing. There were also very many clean-ups and improvements, mostly internal and not visible to users. See http://git.savannah.gnu.org/cgit/stow.git/tree/NEWS for more details.

Categories: FLOSS Project Planets

FSF Blogs: There are plenty of ways to socialize at LibrePlanet 2024: Cultivating Community

Wed, 2024-04-03 11:10
In this blog, we're sharing with you all the ways you can socialize and participate in LibrePlanet 2024: Cultivating Community outside of the official program.
Categories: FLOSS Project Planets

Simon Josefsson: Towards reproducible minimal source code tarballs? On *-src.tar.gz

Mon, 2024-04-01 06:28

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the entire software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I’d like to attempt to formalize one idea (remains to be see in which category it belongs), which have been discussed before, but the context in which the idea can be appreciated have not been as clear as it is today.

  1. Reproducible source tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified — preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I’ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream’s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the “minimal” property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository — and packages were set up to extract the version they need (gnulib’s ./bootstrap already supports this via the –gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule’s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by “unnecessary” files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today’s foo-1.2.3.tar.gz files.

I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. My counter-argument is that this optimize for packagers’ benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that.

What do you think?

Categories: FLOSS Project Planets

parallel @ Savannah: GNU Parallel 20240322 ('Sweden') released [stable]

Sun, 2024-03-31 17:11

GNU Parallel 20240322 ('Sweden') has been released. It is available for download at: lbry://@GnuParallel:4

Quote of the month:

   GNU parallel ftw
    -- hostux.social/@rmpr @_paulmairo@twitter

New in this release:

  • Bug fixes and man page updates.


GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.


About GNU Parallel


GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

  parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

  find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference


If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)


If GNU Parallel saves you money:



About GNU SQL


GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.


About GNU Niceload


GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

Categories: FLOSS Project Planets

poke @ Savannah: poke-elf 1.0 released

Sat, 2024-03-30 15:08

I am happy to announce the first release of poke-elf, version 1.0.

The tarball poke-elf-1.0.tar.gz is now available at
https://ftp.gnu.org/gnu/poke/poke-elf-1.0.tar.gz.

> poke-elf (https://jemarch.net/poke-elf) is a full-fledged GNU poke pickle for editing ELF object
> files, executables, shared libraries and core dumps.  It supports
> many architectures and extensions.
>
> This pickle is part of the GNU poke project.
>
> GNU poke (https://jemarch.net/poke) is an interactive, extensible
> editor for binary data.  Not limited to editing basic entities such
> as bits and bytes, it provides a full-fledged procedural,
> interactive programming language designed to describe data
> structures and to operate on them.


Please send us comments, suggestions, bug reports, patches,
questions, complaints, bitcoins, or whatever, to poke-devel@gnu.org.

Happy ELF poking!

---

Jose E. Marchesi
Frankfurt am Main
30 March 2024


Categories: FLOSS Project Planets

poke @ Savannah: GNU poke 4.0 released

Sat, 2024-03-30 14:15

I am happy to announce a new major release of GNU poke, version 4.0.

This release is the result of a year of development.  A lot of things
have changed and improved with respect to the 3.x series; we have
fixed many bugs and added quite a lot of new exciting and useful
features.  See below for a description of many of them.

The tarball poke-4.0.tar.gz is now available at
https://ftp.gnu.org/gnu/poke/poke-4.0.tar.gz.

> GNU poke (http://www.jemarch.net/poke) is an interactive, extensible
> editor for binary data.  Not limited to editing basic entities such
> as bits and bytes, it provides a full-fledged procedural,
> interactive programming language designed to describe data
> structures and to operate on them.


Thanks to the people who contributed with code and/or documentation to
this release.

Once again, our special thanks to Bruno Haible for his invaluable advise and his help in throughfully testing this new release in many different platforms and configurations.

What is new in this release:

User interface updates


  • The `dump' command now accepts an argument :val.  This argument is a mapped value, and makes `dump' to dump the bytes corresponding to the value, using colors for the different fields.  This command is useful in order to get a visual representation of the constituents of the value and their corresponding bytes.


  • It is now possible to compare Poke values of type `any' using the equality and inequality operators == and !=.


  • GNU poke now acknowledges the POKE_LOAD_PATH environment variable whose value, if defined, gets prepended to the load_path when poke starts.


  • When the poke compiler finds an error in an inline asm template it now emits a proper parse error.


  • The poked program now recognizes the -S command line option properly.


  • The poked program now uses a socket in /tmp/poked-UID.pic where UID is the user ID of the effective user running the program.  This is better than the previous behavior of always using /tmp/poked.ipc, since it allows for several poked instances to be run in the system.


  • The poke program now allows referring to IO spaces by name/handler with $<STR>, where STR is a non-ambiguous substring of some open IO space handler.  Examples are $</bin/ls> and $<*0*>, which could be referred to as $<ls> and $<0> respectively.


  • A new utility called pokefmt has been added to the GNU poke distribution, which implements a simple template system.  See the manual for details on how to use this utility.


  • The poke prompt can now be customized by the user.  This is done by re-defining a function called pk_prompt.  The default value for this function just returns "(poke)", but it can be made as complex as desired.


  • The poke prompt can now be styled using the `prompt' styling class.


  • The new dot-command `.compiler ast EXPR' will compile EXPR and then print its abstract syntax tree (AST).  This is useful for debugging the compiler.


  • The dot-command `.info type' now accepts both expressions or Poke type specifiers as argument.  In the first case it prints information about the type of the value to which the expression evaluates.  In the second case it prints information about the type denoted by the given specifier.


  • The dot-command `.info type' no longer shows field pretty-printer methods, nor anonymous fields in the list of methods and fields.


  • A dot-command `.mmap FILENAME, BASE, SIZE' is now available to poke at devices and files that require mmap.  This is the case of many devices provided by kernel drivers.


Poke Language updates


  • The Poke language now supports using the `t' and `T' suffixes to denote the uint<1> (bit) values 0t and 1t.


  • It is now possible to specify pretty-printers for particular fields in struct type definitions, rather than having to pretty-print the whole value.  To pretty-print a field FNAME, just define the pretty-printer as a method called _print_FNAME.


  • A new immutable variable pk_version is made available, that contains a string with the version of the running poke.


  • A new struct type Pk_Version is defined, that denotes the version of a GNU poke system, or of a pickle.  Accompanying functions pk_version_parse and pk_vercmp are available for parsing PK_Version values from strings and for comparing versions, respectively.  The version comparing function accepts either Pk_Version or string formatted versions indistinctly.


  • The new built-in function `rtrace' prints out the current call stack in the PVM, a function name in each line.  It makes use of the new PVM instruction of the same name.


  • The new built-in function `iosearch' allows searching for IO spaces by name/handler from Poke programs.


Standard Poke Library updates


  • The new built-in `openmmap' function allows to create MMAP-operated IO spaces in Poke programs.


  • New functions `isdigit' and `isxdigit' have been added to the standard library, that check whether a given character is a decimal digit or an hexadecimal digit respectively.


  • New function `strrchr' has been added to the standard library, that finds th elast occurrence of a character in a string and returns either its index or, if the character is not found, minus one.


  • New function `strtoi' has been added to the standard library, that parses a numeric denotation on a string and returns the result and the number of parsed characters.


  • The function `atoi' has been refactored to be defined in terms of `strtoi'.


  • New function `strtok' has been added to the standard library, that helps tokenizing strings.


  • New function `strstr' has been added to the standard library, that searches for a sub-string in some given string.


  • The standard function `stoca' has been changed so it doesn't always require passing an array to it.  If no array is passed then it allocates and returns an array by itself.  This is backwards compatible.


libpoke updates


  • A new service pk_keyword_p is available in libpoke, that tells whether a given name is a keyword in the Poke language.


  • When calling pk_load specifying a module that has already been loaded, it is now loaded again and all the definitions in it are re-defined.  This makes the libpoke service to match the behavior of the `load' Poke language construction.


  • The libpoke library now supports the handling of delimited alien tokens with the form $<[^>]*>.


  • New services pk_register_thread and pk_unregister_thread have been added in order to allow using libpoke in multi-threaded programs.


  • We have done more work to remove global state from libpoke, with the goal that someday it shall be possible for a single program to have several instances of the poke incremental compiler.  We are not there yet, but getting near.


  • New services pk_set_debug_p and pk_get_last_ast_str have been added to libpoke, which set the incremental compiler in debug mode and makes it possible to get a printable representation of the AST (abstract syntax tree) corresponding to the last compiled expression.


  • The pk_ios_search service now gets a flag argument, enabling the user to select between exact or partial matching of the handler while searching for the IOS.


  • New services pk_set_user_data and pk_get_user_data are added in order to set a user-defined payload that gets passed back in several libpoke callbacks.


  • The terminal interface in libpoke has been updated so a reference to the pk_compiler incremental compiler is passed to all the callbacks.


Pickles updates


  • A new pickle `srec' has been added for editing, encoding and decoding Motorola SREC files.


  • A new pickle `orc' has been added for poking at ORC data, which is the stack unwinding format used within the Linux kernel.


  • A new pickle `gcov' has been added for editing GCOV data (.gcda) and notes (.gcno) files.


  • A new pickle `base64' has been added to poke, that provides functions to encode and decode data in base64 as defined by the RFC 4648.


  • A new pickle `iscan' has been added to poke, that provides a framework implementing Icon-like scanning contexts.


  • A new pickle `iscan-str' has been added to poke, that provides Icon-like scanning capabilities in Poke strings.


  • A new pickle `gpt' pickle has been added to poke at GUID partition tables.


  • A new pickle `jojodiff' has been added to generate and apply JojoDiff binary patches.  An accompanying pk-jojopatch utility is also provided.


  • A new pickle `linux' has been added to poke, that provides internal data structures used by the Linux kernel.




  • The sframe pickle has been updated to reflect AArch64 PAuth information.


  • The PE pickle now supports BASE64 encoded names, which is a Microsoft extension.


  • The BTF pickle now performs more data integrity checks, and also now supports BTF_KIND_ENUM64 entries.


  • All the pickles distributed with GNU poke have been modified so they don't use standard types like `int' or 'long' anymore.  This is to make it possible to use them in non-poke applications integrating with libpoke, like GDB.


Build system updates


  • poke, libpoke and pokefmt now builds and runs natively in Windows.


  • Different components in the source tree (poked, pokefmt) can now be disabled using the --disable-poked and --disable-pokefmt command-line options.


  • A file poke.m4 is now installed, that provides the macros PK_PROG_POKE and PK_CHECK_PICKLE.  These macros are to be used by projects and packages that install GNU poke pickles.  The first macro checks for a particular version of poke, whereas the second checks for the availability of some particular pickle.


Documentation updates


  • The manual has been fixed to refer to `gettime' instead of `get_time'.  This function changed name in 3.0.


  • The GNU poke manual in `info' format is now installed under its own directory category (GNU poke) rather than under Editors.  This is because other poke related projects like poke-elf and poke-dwarf also install manuals under this new directory category.


---

Jose E. Marchesi
Frankfurt am Main
30 March 2024


Categories: FLOSS Project Planets

Parabola GNU/Linux-libre: [arch-announce] The xz package has been backdoored

Fri, 2024-03-29 15:32

From: "Arch Linux: Recent news updates: David Runge" arch-announce@lists.archlinux.org

TL;DR: Upgrade your systems and container images now!

As many of you may have already read 1, the upstream release tarballs for xz in version 5.6.0 and 5.6.1 contain malicious code which adds a backdoor.

This vulnerability is tracked in the Arch Linux security tracker 2.

The xz packages prior to version 5.6.1-2 (specifically 5.6.0-1 and 5.6.1-1) contain this backdoor.

We strongly advise against using affected release artifacts and instead downloading what is currently available as latest version!

Upgrading the system

It is strongly advised to do a full system upgrade right away if your system currently has xz version 5.6.0-1 or 5.6.1-1 installed:

pacman -Syu

Regarding sshd authentication bypass/code execution

From the upstream report 1:

> openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.

Arch does not directly link openssh to liblzma, and thus this attack vector is not possible. You can confirm this by issuing the following command:

ldd &quot;$(command -v sshd)&quot;

However, out of an abundance of caution, we advise users to remove the malicious code from their system by upgrading either way. This is because other yet-to-be discovered methods to exploit the backdoor could exist.

URL: https://archlinux.org/news/the-xz-package-has-been-backdoored/

Categories: FLOSS Project Planets

wget @ Savannah: GNU Wget 1.24.5 Released

Fri, 2024-03-29 07:28

Noteworthy changes in release 1.24.5 (2024-03-10) [stable]

  • Fix how subdomain matches are checked for HSTS. Fixes a minor issue where cookies may be leaked to the wrong domain
  • Wget will now also parse the srcset attribute in <source> HTML tags
  • Support reading fetchmail style "user" and "passwd" fields from netrc
  • In some cases, prevent the confusing "Cannot write to... (success)" error messages
  • Support extremely fast download speeds (TB/s). Previously this would cause Wget to crash when printing the speed
  • Improve portability on OpenBSD to run the test suite
  • Ensure that CSS URLs are corectly quoted (Bug: 64082)
Categories: FLOSS Project Planets

coreutils @ Savannah: coreutils-9.5 released [stable]

Thu, 2024-03-28 11:39


This is to announce coreutils-9.5, a stable release.
See the NEWS below for a summary of changes.

There have been 187 commits by 18 people in the 30 weeks since 9.4.
Thanks to everyone who has contributed!
The following people contributed changes to this release:

  Aearil (1)                      Petr Malat (1)
  Bruno Haible (3)                Pádraig Brady (75)
  Christian Göttsche (1)          Samuel Tardieu (1)
  Collin Funk (4)                 Stephane Chazelas (1)
  Daan De Meyer (1)               Stephen Kitt (1)
  Greg Wooledge (1)               Sylvestre Ledru (3)
  Grisha Levit (2)                Ville Skyttä (1)
  Michel Lind (1)                 dann frazier (1)
  Paul Eggert (89)                lvgenggeng (1)

Pádraig [on behalf of the coreutils maintainers]
==================================================================

Here is the GNU coreutils home page:
    https://gnu.org/s/coreutils/

For a summary of changes and contributors, see:
  https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=shortlog;h=v9.5
or run this command from a git-cloned coreutils directory:
  git shortlog v9.4..v9.5

Here are the compressed sources:
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.5.tar.gz   (15MB)
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.5.tar.xz   (5.8MB)

Here are the GPG detached signatures:
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.5.tar.gz.sig
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.5.tar.xz.sig

Use a mirror for higher download bandwidth:
  https://www.gnu.org/order/ftp.html

Here are the SHA1 and SHA256 checksums:

  3285114d93b39e5e4643b0846f570203a5e4c97b  coreutils-9.5.tar.gz
  dnrmoilQ7ELzul98Heed0ngA7o6bhkLaXe21l0oXQeU=  coreutils-9.5.tar.gz
  867fed7ce2ee15c5150a355a5f3a3b50578cf78d  coreutils-9.5.tar.xz
  zTKO3qyS9qZl3p8yPJO3Eq8YWLwuDYjz9xAEaUcKG4o=  coreutils-9.5.tar.xz

Verify the base64 SHA256 checksum with cksum -a sha256 --check
from coreutils-9.2 or OpenBSD's cksum since 2007.

Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify coreutils-9.5.tar.gz.sig

The signature should match the fingerprint of the following key:

  pub   rsa4096/0xDF6FD971306037D9 2011-09-23 [SC]
        Key fingerprint = 6C37 DC12 121A 5006 BC1D  B804 DF6F D971 3060 37D9
  uid                   [ultimate] Pádraig Brady <P@draigBrady.com>
  uid                   [ultimate] Pádraig Brady <pixelbeat@gnu.org>

If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.

  gpg --locate-external-key P@draigBrady.com

  gpg --recv-keys DF6FD971306037D9

  wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=coreutils&download=1' | gpg --import -

As a last resort to find the key, you can try the official GNU
keyring:

  wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
  gpg --keyring gnu-keyring.gpg --verify coreutils-9.5.tar.gz.sig

This release was bootstrapped with the following tools:
  Autoconf 2.72c.32-cb6fb
  Automake 1.16.5
  Gnulib v0.1-7293-g259829e78b
  Bison 3.8.2

NEWS

* Noteworthy changes in release 9.5 (2024-03-28) [stable]

** Bug fixes

  chmod -R now avoids a race where an attacker may replace a traversed file
  with a symlink, causing chmod to operate on an unintended file.
  [This bug was present in "the beginning".]

  cp, mv, and install no longer issue spurious diagnostics like "failed
  to preserve ownership" when copying to GNU/Linux CIFS file systems.
  They do this by working around some Linux CIFS bugs.

  cp --no-preserve=mode will correctly maintain set-group-ID bits
  for created directories.  Previously on systems that didn't support ACLs,
  cp would have reset the set-group-ID bit on created directories.
  [bug introduced in coreutils-8.20]

  join and uniq now support multi-byte characters better.
  For example, 'join -tX' now works even if X is a multi-byte character,
  and both programs now treat multi-byte characters like U+3000
  IDEOGRAPHIC SPACE as blanks if the current locale treats them so.

  numfmt options like --suffix no longer have an arbitrary 127-byte limit.
  [bug introduced with numfmt in coreutils-8.21]

  mktemp with --suffix now better diagnoses templates with too few X's.
  Previously it conflated the insignificant --suffix in the error.
  [bug introduced in coreutils-8.1]

  sort again handles thousands grouping characters in single-byte locales
  where the grouping character is greater than CHAR_MAX.  For e.g. signed
  character platforms with a 0xA0 (aka &nbsp) grouping character.
  [bug introduced in coreutils-9.1]

  split --line-bytes with a mixture of very long and short lines
  no longer overwrites the heap (CVE-2024-0684).
  [bug introduced in coreutils-9.2]

  tail no longer mishandles input from files in /proc and /sys file systems,
  on systems with a page size larger than the stdio BUFSIZ.
  [This bug was present in "the beginning".]

  timeout avoids a narrow race condition, where it might kill arbitrary
  processes after a failed process fork.
  [bug introduced with timeout in coreutils-7.0]

  timeout avoids a narrow race condition, where it might fail to
  kill monitored processes immediately after forking them.
  [bug introduced with timeout in coreutils-7.0]

  wc no longer fails to count unprintable characters as parts of words.
  [bug introduced in textutils-2.1]

** Changes in behavior

  base32 and base64 no longer require padding when decoding.
  Previously an error was given for non padded encoded data.

  base32 and base64 have improved detection of corrupted encodings.
  Previously encodings with non zero padding bits were accepted.

  basenc --base16 -d now supports lower case hexadecimal characters.
  Previously an error was given for lower case hex digits.

  cp --no-clobber, and mv -n no longer exit with failure status if
  existing files are encountered in the destination.  Instead they revert
  to the behavior from before v9.2, silently skipping existing files.

  ls --dired now implies long format output without hyperlinks enabled,
  and will take precedence over previously specified formats or hyperlink mode.

  numfmt will accept lowercase 'k' to indicate Kilo or Kibi units on input,
  and uses lowercase 'k' when outputting such units in '--to=si' mode.

  pinky no longer tries to canonicalize the user's login location by default,
  rather requiring the new --lookup option to enable this often slow feature.

  wc no longer ignores encoding errors when counting words.
  Instead, it treats them as non white space.

** New features

  chgrp now accepts the --from=OWNER:GROUP option to restrict changes to files
  with matching current OWNER and/or GROUP, as already supported by chown(1).

  chmod adds support for -h, -H,-L,-P, and --dereference options, providing
  more control over symlink handling.  This supports more secure handling of
  CLI arguments, and is more consistent with chown, and chmod on other systems.

  cp now accepts the --keep-directory-symlink option (like tar), to preserve
  and follow existing symlinks to directories in the destination.

  cp and mv now accept the --update=none-fail option, which is similar
  to the --no-clobber option, except that existing files are diagnosed,
  and the command exits with failure status if existing files.
  The -n,--no-clobber option is best avoided due to platform differences.

  env now accepts the -a,--argv0 option to override the zeroth argument
  of the command being executed.

  mv now accepts an --exchange option, which causes the source and
  destination to be exchanged.  It should be combined with
  --no-target-directory (-T) if the destination is a directory.
  The exchange is atomic if source and destination are on a single
  file system that supports atomic exchange; --exchange is not yet
  supported in other situations.

  od now supports printing IEEE half precision floating point with -t fH,
  or brain 16 bit floating point with -t fB, where supported by the compiler.

  tail now supports following multiple processes, with repeated --pid options.

** Improvements

  cp,mv,install,cat,split now read and write a minimum of 256KiB at a time.
  This was previously 128KiB and increasing to 256KiB was seen to increase
  throughput by 10-20% when reading cached files on modern systems.

  env,kill,timeout now support unnamed signals. kill(1) for example now
  supports sending such signals, and env(1) will list them appropriately.

  SELinux operations in file copy operations are now more efficient,
  avoiding unneeded MCS/MLS label translation.

  sort no longer dynamically links to libcrypto unless -R is used.
  This decreases startup overhead in the typical case.

  wc is now much faster in single-byte locales and somewhat faster in
  multi-byte locales.


Categories: FLOSS Project Planets

FSF News: Alyssa Rosenzweig, who spearheaded the reverse-engineering of Apple's GPU, to keynote LibrePlanet

Wed, 2024-03-27 12:50
BOSTON, Massachusetts, USA -- March 27, 2024 -- The Free Software Foundation (FSF) today announced Alyssa Rosenzweig, who reverse-engineered Apple's current line of graphics processing units (GPU), as keynote speaker for LibrePlanet 2024. LibrePlanet 2024: Cultivating Community is the sixteenth edition of the FSF's conference on ethical technology and user freedom and will be held on May 4 and 5 at the Wentworth Institute of Technology in Boston, MA, as well as online.
Categories: FLOSS Project Planets

GNUnet News: libgnunetchat 0.3.1

Fri, 2024-03-22 19:00
libgnunetchat 0.3.1 released

This is mostly a bugfix release for libgnunetchat 0.3.0 to reduce build issues.

Download links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try http://ftp.gnu.org/gnu/gnunet/

Categories: FLOSS Project Planets

pspp @ Savannah: PSPP 2.0.1 has been released

Thu, 2024-03-21 19:42

I'm very pleased to announce the release of a new version of GNU PSPP.  PSPP is a program for statistical analysis of sampled data.  It is a free replacement for the proprietary program SPSS.

Changes from 2.0.0 to 2.0.1:

  • Bug fixes.
  • Translation updates.

Please send PSPP bug reports to bug-gnu-pspp@gnu.org.

Categories: FLOSS Project Planets

GNUnet News: GNUnet 0.21.1

Thu, 2024-03-14 19:00
GNUnet 0.21.1

This is a bugfix release for gnunet 0.21.0. It primarily addresses some connectivity issues introduced with our new transport subsystem.

Links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try https://ftp.gnu.org/gnu/gnunet/

Categories: FLOSS Project Planets

a2ps @ Savannah: a2ps 4.15.6 released [stable]

Wed, 2024-03-13 14:24


I am delighted to announce version 4.15.6 of GNU a2ps, the Anything to
PostScript converter.

This release fixes a couple of bugs, in particular with printing (the -P
flag). See below for details.


Here are the compressed sources and a GPG detached signature:
  https://ftpmirror.gnu.org/a2ps/a2ps-4.15.6.tar.gz
  https://ftpmirror.gnu.org/a2ps/a2ps-4.15.6.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://www.gnu.org/order/ftp.html

Here are the SHA1 and SHA256 checksums:

e20e8009d8812c8d960884b79aab95f235c725c0  a2ps-4.15.6.tar.gz
h/+dgByxGWkYHVuM+LZeZeWyS7DHahuCXoCY8pBvvfQ  a2ps-4.15.6.tar.gz

The SHA256 checksum is base64 encoded, instead of the
hexadecimal encoding that most checksum tools default to.

Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify a2ps-4.15.6.tar.gz.sig

The signature should match the fingerprint of the following key:

  pub   rsa2048 2013-12-11 [SC]
        2409 3F01 6FFE 8602 EF44  9BB8 4C8E F3DA 3FD3 7230
  uid   Reuben Thomas <rrt@sc3d.org>
  uid   keybase.io/rrt <rrt@keybase.io>

If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.

  gpg --locate-external-key rrt@sc3d.org

  gpg --recv-keys 4C8EF3DA3FD37230

  wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=a2ps&download=1' | gpg --import -

As a last resort to find the key, you can try the official GNU
keyring:

  wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
  gpg --keyring gnu-keyring.gpg --verify a2ps-4.15.6.tar.gz.sig


This release was bootstrapped with the following tools:
  Autoconf 2.71
  Automake 1.16.5
  Gnulib v0.1-7186-g5aa8eafc0e

NEWS

* Noteworthy changes in release 4.15.6 (2024-03-13) [stable]
 * Bug fixes:
   - Fix a2ps-lpr-wrapper to work with no arguments, as a2ps requires.
   - Minor fixes & improvements to sheets.map for image types and PDF.
 * Build system:
   - Minor fixes and improvements.


Categories: FLOSS Project Planets

GNU Guix: Adventures on the quest for long-term reproducible deployment

Wed, 2024-03-13 10:05

Rebuilding software five years later, how hard can it be? It can’t be that hard, especially when you pride yourself on having a tool that can travel in time and that does a good job at ensuring reproducible builds, right?

In hindsight, we can tell you: it’s more challenging than it seems. Users attempting to travel 5 years back with guix time-machine are (or were) unavoidably going to hit bumps on the road—a real problem because that’s one of the use cases Guix aims to support well, in particular in a reproducible research context.

In this post, we look at some of the challenges we face while traveling back, how we are overcoming them, and open issues.

The vision

First of all, one clarification: Guix aims to support time travel, but we’re talking of a time scale measured in years, not in decades. We know all too well that this is already very ambitious—it’s something that probably nobody except Nix and Guix are even trying. More importantly, software deployment at the scale of decades calls for very different, more radical techniques; it’s the work of archivists.

Concretely, Guix 1.0.0 was released in 2019 and our goal is to allow users to travel as far back as 1.0.0 and redeploy software from there, as in this example:

$ guix time-machine -q --commit=v1.0.0 -- \ environment --ad-hoc python2 -- python > guile: warning: failed to install locale Python 2.7.15 (default, Jan 1 1970, 00:00:01) [GCC 5.5.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>

(The command above uses guix environment, the predecessor of guix shell, which didn’t exist back then.) It’s only 5 years ago but it’s pretty much remote history on the scale of software evolution—in this case, that history comprises major changes in Guix itself and in Guile. How well does such a command work? Well, it depends.

The project has two build farms; bordeaux.guix.gnu.org has been keeping substitutes (pre-built binaries) of everything it built since roughly 2021, while ci.guix.gnu.org keeps substitutes for roughly two years, but there is currently no guarantee on the duration substitutes may be retained. Time traveling to a period where substitutes are available is fine: you end up downloading lots of binaries, but that’s OK, you rather quickly have your software environment at hand.

Bumps on the build road

Things get more complicated when targeting a period in time for which substitutes are no longer available, as was the case for v1.0.0 above. (And really, we should assume that substitutes won’t remain available forever: fellow NixOS hackers recently had to seriously consider trimming their 20-year-long history of substitutes because the costs are not sustainable.)

Apart from the long build times, the first problem that arises in the absence of substitutes is source code unavailability. I’ll spare you the details for this post—that problem alone would deserve a book. Suffice to say that we’re lucky that we started working on integrating Guix with Software Heritage years ago, and that there has been great progress over the last couple of years to get closer to full package source code archival (more precisely: 94% of the source code of packages available in Guix in January 2024 is archived, versus 72% of the packages available in May 2019).

So what happens when you run the time-machine command above? It brings you to May 2019, a time for which none of the official build farms had substitutes until a few days ago. Ideally, thanks to isolated build environments, you’d build things for hours or days, and in the end all those binaries will be here just as they were 5 years ago. In practice though, there are several problems that isolation as currently implemented does not address.

Among those, the most frequent problem is time traps: software build processes that fail after a certain date (these are also referred to as “time bombs” but we’ve had enough of these and would rather call for a ceasefire). This plagues a handful of packages out of almost 30,000 but unfortunately we’re talking about packages deep in the dependency graph. Here are some examples:

  • OpenSSL unit tests fail after a certain date because some of the X.509 certificates they use have expired.
  • GnuTLS had similar issues; newer versions rely on datefudge to fake the date while running the tests and thus avoid that problem altogether.
  • Python 2.7, found in Guix 1.0.0, also had that problem with its TLS-related tests.
  • OpenJDK would fail to build at some point with this interesting message: Error: time is more than 10 years from present: 1388527200000 (the build system would consider that its data about currencies is likely outdated after 10 years).
  • Libgit2, a dependency of Guix, had (has?) a time-dependent tests.
  • MariaDB tests started failing in 2019.

Someone traveling to v1.0.0 will hit several of these, preventing guix time-machine from completing. A serious bummer, especially to those who’ve come to Guix from the perspective of making their research workflow reproducible.

Time traps are the main road block, but there’s more! In rare cases, there’s software influenced by kernel details not controlled by the build daemon:

In a handful of cases, but important ones, builds might fail when performed on certain CPUs. We’re aware of at least two cases:

Neither time traps nor those obscure hardware-related issues can be avoided with the isolation mechanism currently used by the build daemon. This harms time traveling when substitutes are unavailable. Giving up is not in the ethos of this project though.

Where to go from here?

There are really two open questions here:

  1. How can we tell which packages needs to be “fixed”, and how: building at a specific date, on a specific CPU?
  2. How can keep those aspects of the build environment (time, CPU variant) under control?

Let’s start with #2. Before looking for a solution, it’s worth remembering where we come from. The build daemon runs build processes with a separate root file system, under dedicated user IDs, and in separate Linux namespaces, thereby minimizing interference with the rest of the system and ensuring a well-defined build environment. This technique was implemented by Eelco Dolstra for Nix in 2007 (with namespace support added in 2012), at a time where the word container had to do with boats and before “Docker” became the name of a software tool. In short, the approach consists in controlling the build environment in every detail (it’s at odds with the strategy that consists in achieving reproducible builds in spite of high build environment variability). That these are mere processes with a bunch of bind mounts makes this approach inexpensive and appealing.

Realizing we’d also want to control the build environment’s date, we naturally turn to Linux namespaces to address that—Dolstra, Löh, and Pierron already suggested something along these lines in the conclusion of their 2010 Journal of Functional Programming paper. Turns out there is now a time namespace. Unfortunately it’s limited to CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks; the manual page states:

Note that time namespaces do not virtualize the CLOCK_REALTIME clock. Virtualization of this clock was avoided for reasons of complexity and overhead within the kernel.

I hear you say: What about datefudge and libfaketime? These rely on the LD_PRELOAD environment variable to trick the dynamic linker into pre-loading a library that provides symbols such as gettimeofday and clock_gettime. This is a fine approach in some cases, but it’s too fragile and too intrusive when targeting arbitrary build processes.

That leaves us with essentially one viable option: virtual machines (VMs). The full-system QEMU lets you specify the initial real-time clock of the VM with the -rtc flag, which is exactly what we need (“user-land” QEMU such as qemu-x86_64 does not support it). And of course, it lets you specify the CPU model to emulate.

News from the past

Now, the question is: where does the VM fit? The author considered writing a package transformation that would change a package such that it’s built in a well-defined VM. However, that wouldn’t really help: this option didn’t exist in past revisions, and it would lead to a different build anyway from the perspective of the daemon—a different derivation.

The best strategy appeared to be offloading: the build daemon can offload builds to different machines over SSH, we just need to let it send builds to a suitably-configured VM. To do that, we can reuse some of the machinery initially developed for childhurds that takes care of setting up offloading to the VM: creating substitute signing keys and SSH keys, exchanging secret key material between the host and the guest, and so on.

The end result is a service for Guix System users that can be configured in a few lines:

(use-modules (gnu services virtualization)) (operating-system ;; … (services (append (list (service virtual-build-machine-service-type)) %base-services)))

The default setting above provides a 4-core VM whose initial date is January 2020, emulating a Skylake CPU from that time—the right setup for someone willing to reproduce old binaries. You can check the configuration like this:

$ sudo herd configuration build-vm CPU: Skylake-Client number of CPU cores: 4 memory size: 2048 MiB initial date: Wed Jan 01 00:00:00Z 2020

To enable offloading to that VM, one has to explicitly start it, like so:

$ sudo herd start build-vm

From there on, every native build is offloaded to the VM. The key part is that with almost no configuration, you get everything set up to build packages “in the past”. It’s a Guix System only solution; if you run Guix on another distro, you can set up a similar build VM but you’ll have to go through the cumbersome process that is all taken care of automatically here.

Of course it’s possible to choose different configuration parameters:

(service virtual-build-machine-service-type (virtual-build-machine (date (make-date 0 0 00 00 01 10 2017 0)) ;further back in time (cpu "Westmere") (cpu-count 16) (memory-size (* 8 1024)) (auto-start? #t)))

With a build VM with its date set to January 2020, we have been able to rebuild Guix and its dependencies along with a bunch of packages such as emacs-minimal from v1.0.0, overcoming all the time traps and other challenges described earlier. As a side effect, substitutes are now available from ci.guix.gnu.org so you can even try this at home without having to rebuild the world:

$ guix time-machine -q --commit=v1.0.0 -- build emacs-minimal --dry-run guile: warning: failed to install locale substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% 38.5 MB would be downloaded: /gnu/store/53dnj0gmy5qxa4cbqpzq0fl2gcg55jpk-emacs-minimal-26.2

For the fun of it, we went as far as v0.16.0, released in December 2018:

guix time-machine -q --commit=v0.16.0 -- \ environment --ad-hoc vim -- vim --version

This is the furthest we can go since channels and the underlying mechanisms that make time travel possible did not exist before that date.

There’s one “interesting” case we stumbled upon in that process: in OpenSSL 1.1.1g (released April 2020 and packaged in December 2020), some of the test certificates are not valid before April 2020, so the build VM needs to have its clock set to May 2020 or thereabouts. Booting the build VM with a different date can be done without reconfiguring the system:

$ sudo herd stop build-vm $ sudo herd start build-vm -- -rtc base=2020-05-01T00:00:00

The -rtc … flags are passed straight to QEMU, which is handy when exploring workarounds…

The time-travel continuous integration jobset has been set up to check that we can, at any time, travel back to one of the past releases. This at least ensures that Guix itself and its dependencies have substitutes available at ci.guix.gnu.org.

Reproducible research workflows reproduced

Incidentally, this effort rebuilding 5-year-old packages has allowed us to fix embarrassing problems. Software that accompanies research papers that followed our reproducibility guidelines could no longer be deployed, at least not without this clock twiddling effort:

It’s good news that we can now re-deploy these 5-year-old software environments with minimum hassle; it’s bad news that holding this promise took extra effort.

The ability to reproduce the environment of software that accompanies research work should not be considered a mundanity or an exercise that’s “overkill”. The ability to rerun, inspect, and modify software are the natural extension of the scientific method. Without a companion reproducible software environment, research papers are merely the advertisement of scholarship, to paraphrase Jon Claerbout.

The future

The astute reader surely noticed that we didn’t answer question #1 above:

How can we tell which packages needs to be “fixed”, and how: building at a specific date, on a specific CPU?

It’s a fact that Guix so far lacks information about the date, kernel, or CPU model that should be used to build a given package. Derivations purposefully lack that information on the grounds that it cannot be enforced in user land and is rarely necessary—which is true, but “rarely” is not the same as “never”, as we saw. Should we create a catalog of date, CPU, and/or kernel annotations for packages found in past revisions? Should we define, for the long-term, an all-encompassing derivation format? If we did and effectively required virtual build machines, what would that mean from a bootstrapping standpoint?

Here’s another option: build packages in VMs running in the year 2100, say, and on a baseline CPU. We don’t need to require all users to set up a virtual build machine—that would be impractical. It may be enough to set up the project build farms so they build everything that way. This would allow us to catch time traps and year 2038 bugs before they bite.

Before we can do that, the virtual-build-machine service needs to be optimized. Right now, offloading to build VMs is as heavyweight as offloading to a separate physical build machine: data is transferred back and forth over SSH over TCP/IP. The first step will be to run SSH over a paravirtualized transport instead such as AF_VSOCK sockets. Another avenue would be to make /gnu/store in the guest VM an overlay over the host store so that inputs do not need to be transferred and copied.

Until then, happy software (re)deployment!

Acknowledgments

Thanks to Simon Tournier for insightful comments on a previous version of this post.

Categories: FLOSS Project Planets

Pages