GNU Planet!

Subscribe to GNU Planet! feed
Planet GNU - https://planet.gnu.org/
Updated: 10 hours 21 min ago

GNU Taler news: NGI Taler project launched

Sun, 2024-01-14 18:00
We are excited to announce the creation of an EU-funded consortium with the central objective to launch GNU Taler as a privacy-preserving payment system across Europe. You can find more information on the consortium page.
Categories: FLOSS Project Planets

cpio @ Savannah: GNU cpio version 2.15

Sun, 2024-01-14 07:21

GNU cpio version 2.15 is available for download. This is a bug-fixing release.  Short summary of changes:

  • Fix the operation of --no-absolute-filenames --make-directories.
  • Restore access and modification times of symlinks in copy-in and copy-pass modes.
Categories: FLOSS Project Planets

GNU Guix: Building packages targeting psABIs

Sun, 2024-01-14 07:00

Starting with version 2.33, the GNU C library (glibc) grew the capability to search for shared libraries using additional paths, based on the hardware capabilities of the machine running the code. This was a great boon for x86_64, which was first released in 2003, and has seen many changes in the capabilities of the hardware since then. While it is extremely common for Linux distributions to compile for a baseline which encompasses all of an architecture, there is performance being left on the table by targeting such an old specification and not one of the newer revisions.

One option used internally in glibc and in some other performance-critical libraries is indirect functions, or IFUNCs (see also here) The loader, ld.so uses them to pick function implementations optimized for the available CPU at load time. GCC's (functional multi-versioning (FMV))[https://gcc.gnu.org/wiki/FunctionMultiVersioning] generates several optimized versions of functions, using the IFUNC mechanism so the approprate one is selected at load time. These are strategies which most performance-sensitive libraries do, but not all of them.

With the --tune using package transformation option, Guix implements so-called package multi-versioning, which creates package variants using compiler flags set to use optimizations targeted for a specific CPU.

Finally - and we're getting to the central topic of this post! - glibc since version 2.33 supports another approach: ld.so would search not just the /lib folder, but also the glibc-hwcaps folders, which for x86_64 included /lib/glibc-hwcaps/x86-64-v2, /lib/glibc-hwcaps/x86-64-v3 and /lib/glibc-hwcaps/x86-64-v4, corresponding to the psABI micro-architectures of the x86_64 architecture. This means that if a library was compiled against the baseline of the architecture then it should be installed in /lib, but if it were compiled a second time, this time using (depending on the build instructions) -march=x86-64-v2, then the libraries could be installed in /lib/glibc-hwcaps/x86-64-v2 and then glibc, using ld.so, would choose the correct library at runtime.

These micro-architectures aren't a perfect match for the different hardware available, it is often the case that a particular CPU would satisfy the requirements of one tier and part of the next but would therefore only be able to use the optimizations provided by the first tier and not by the added features that the CPU also supports.

This of course shouldn't be a problem in Guix; it's possible, and even encouraged, to adjust packages to be more useful for one's needs. The problem comes from the search paths: ld.so will only search for the glibc-hwcaps directory if it has already found the base library in the preceding /lib directory. This isn't a problem for distributions following the File System Hierarchy (FHS), but for Guix we will need to ensure that all the different versions of the library will be in the same output.

With a little bit of planning this turns out to not be as hard as it sounds. Lets take for example, the GNU Scientific Library, gsl, a math library which helps with all sorts of numerical analysis. First we create a procedure to generate our 3 additional packages, corresponding to the psABIs that are searched for in the glibc-hwcaps directory.

(define (gsl-hwabi psabi) (package/inherit gsl (name (string-append "gsl-" psabi)) (arguments (substitute-keyword-arguments (package-arguments gsl) ((#:make-flags flags #~'()) #~(append (list (string-append "CFLAGS=-march=" #$psabi) (string-append "CXXFLAGS=-march=" #$psabi)) #$flags)) ((#:configure-flags flags #~'()) #~(append (list (string-append "--libdir=" #$output "/lib/glibc-hwcaps/" #$psabi)) #$flags)) ;; The building machine can't necessarily run the code produced. ((#:tests? _ #t) #f) ((#:phases phases #~%standard-phases) #~(modify-phases #$phases (add-after 'install 'remove-extra-files (lambda _ (for-each (lambda (dir) (delete-file-recursively (string-append #$output dir))) (list (string-append "/lib/glibc-hwcaps/" #$psabi "/pkgconfig") "/bin" "/include" "/share")))))))) (supported-systems '("x86_64-linux" "powerpc64le-linux")) (properties `((hidden? . #t) (tunable? . #f)))))

We remove some directories and any binaries since we only want the libraries produced from the package; we want to use the headers and any other bits from the main package. We then combine all of the pieces together to produce a package which can take advantage of the hardware on which it is run:

(define-public gsl-hwcaps (package/inherit gsl (name "gsl-hwcaps") (arguments (substitute-keyword-arguments (package-arguments gsl) ((#:phases phases #~%standard-phases) #~(modify-phases #$phases (add-after 'install 'install-optimized-libraries (lambda* (#:key inputs outputs #:allow-other-keys) (let ((hwcaps "/lib/glibc-hwcaps/")) (for-each (lambda (psabi) (copy-recursively (string-append (assoc-ref inputs (string-append "gsl-" psabi)) hwcaps psabi) (string-append #$output hwcaps psabi)) '("x86-64-v2" "x86-64-v3" "x86-64-v4")))))))) (native-inputs (modify-inputs (package-native-inputs gsl) (append (gsl-hwabi "x86-64-v2") (gsl-hwabi "x86-64-v3") (gsl-hwabi "x86-64-v4")))) (supported-systems '("x86_64-linux")) (properties `((tunable? . #f)))))

In this case the size of the final package is increased by about 13 MiB, from 5.5 MiB to 18 MiB. It is up to you if the speed-up from providing an optimized library is worth the size trade-off.

To use this package as a replacement build input in a package package-input-rewriting/spec is a handy tool:

(define use-glibc-hwcaps (package-input-rewriting/spec ;; Replace some packages with ones built targeting custom packages build ;; with glibc-hwcaps support. `(("gsl" . ,(const gsl-hwcaps))))) (define-public inkscape-with-hwcaps (package (inherit (use-glibc-hwcaps inkscape)) (name "inkscape-with-hwcaps")))

Of the Guix supported architectures, x86_64-linux and powerpc64le-linux can both benefit from this new capability.

Through the magic of newer versions of GCC and LLVM it is safe to use these libraries in place of the standard libraries while compiling packages; these compilers know about the glibc-hwcap directories and will purposefully link against the base library during build time, with glibc's ld.so choosing the optimized library at runtime.

One possible use case for these libraries is crating guix packs of packages to run on other systems. By substituting these libraries it becomes possible to crate a guix pack which will have better performance than a standard package used in a guix pack. This works even when the included libraries don't make use of the IFUNCs from glibc or functional multi-versioning from GCC. Providing optimized yet portable pre-compiled binaries is a great way to take advantage of this feature.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Categories: FLOSS Project Planets

FSF News: FSF job opportunity: Outreach and communications coordinator

Fri, 2024-01-12 15:49
The Free Software Foundation (FSF), a Massachusetts 501(c)(3) charity with a worldwide mission to protect computer user freedom, seeks a motivated and talented individual, if possible Boston-based, to be our full-time outreach and communications coordinator.
Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, January 19, starting at 12:00 EST (17:00 UTC)

Fri, 2024-01-12 14:31
Join the FSF and friends on Friday, January 19, from 12:00 to 15:00 EST (17:00 to 20:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Andy Wingo: micro macro story time

Thu, 2024-01-11 09:10

Today, a tiny tale: about 15 years ago I was working on Guile’s macro expander. Guile inherited this code from an early version of Kent Dybvig’s portable syntax expander. It was... not easy to work with.

Some difficulties were essential. Scope is tricky, after all.

Some difficulties were incidental, but deep. The expander is ultimately a function that translates Scheme-with-macros to Scheme-without-macros. However, it is itself written in Scheme-with-macros, so to load it on a substrate without macros requires a pre-expanded copy of itself, whose data representations need to be compatible with any incremental change, so that you will be able to use the new expander to produce a fresh pre-expansion. This difficulty could have been avoided by incrementally bootstrapping the library. It works once you are used to it, but it’s gnarly.

But then, some difficulties were just superflously egregious. Dybvig is a totemic developer and researcher, but a generation or two removed from me, and when I was younger, it never occurred to me to just email him to ask why things were this way. (A tip to the reader: if someone is doing work you are interested in, you can just email them. Probably they write you back! If they don’t respond, it’s not you, they’re probably just busy and their inbox leaks.) Anyway in my totally speculatory reconstruction of events, when Dybvig goes to submit his algorithm for publication, he gets annoyed that “expand” doesn’t sound fancy enough. In a way it’s similar to the original SSA developers thinking that “phony functions” wouldn’t get published.

So Dybvig calls the expansion function “χ”, because the Greek chi looks like the X in “expand”. Fine for the paper, whatever paper that might be, but then in psyntax, there are all these functions named chi and chi-lambda and all sorts of nonsense.

In early years I was often confused by these names; I wasn’t in on the pun, and I didn’t feel like I had enough responsibility for this code to think what the name should be. I finally broke down and changed all instances of “chi” to “expand” back in 2011, and never looked back.

Anyway, this is a story with a very specific moral: don’t name your functions chi.

Categories: FLOSS Project Planets

micron @ Savannah: Version 1.4

Mon, 2024-01-08 16:08

GNU micron version 1.4 is available for download.

Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, January 12, starting at 12:00 EST (17:00 UTC)

Mon, 2024-01-08 14:15
Join the FSF and friends on Friday, January 12, from 12:00 to 15:00 EST (17:00 to 20:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Andy Wingo: missing the point of webassembly

Mon, 2024-01-08 06:45

I find most descriptions of WebAssembly to be uninspiring: if you start with a phrase like “assembly-like language” or a “virtual machine”, we have already lost the plot. That’s not to say that these descriptions are incorrect, but it’s like explaining what a dog is by starting with its circulatory system. You’re not wrong, but you should probably lead with the bark.

I have a different preferred starting point which is less descriptive but more operational: WebAssembly is a new fundamental abstraction boundary. WebAssembly is a new way of dividing computing systems into pieces and of composing systems from parts.

This all may sound high-falutin´, but it’s for real: this is the actually interesting thing about Wasm.

fundamental & abstract

It’s probably easiest to explain what I mean by example. Consider the Linux ABI: Linux doesn’t care what code it’s running; Linux just handles system calls and schedules process time. Programs that run against the x86-64 Linux ABI don’t care whether they are in a container or a virtual machine or “bare metal” or whether the processor is AMD or Intel or even a Mac M3 with Docker and Rosetta 2. The Linux ABI interface is fundamental in the sense that either side can implement any logic, subject to the restrictions of the interface, and abstract in the sense that the universe of possible behaviors has been simplified to a limited language, in this case that of system calls.

Or take HTTP: when you visit wingolog.org, you don’t have to know (but surely would be delighted to learn) that it’s Scheme code that handles the request. I don’t have to care if the other side of the line is curl or Firefox or Wolvic. HTTP is such a successful fundamental abstraction boundary that at this point it is the default for network endpoints; whether you are a database or a golang microservice, if you don’t know that you need a custom protocol, you use HTTP.

Or, to rotate our metaphorical compound microscope to high-power magnification, consider the SYS-V amd64 C ABI: almost every programming language supports some form of extern C {} to access external libraries, and the best language implementations can produce artifacts that implement the C ABI as well. The standard C ABI splits programs into parts, and allows works from separate teams to be composed into a whole. Indeed, one litmus test of a fundamental abstraction boundary is, could I reasonably define an interface and have an implementation of it be in Scheme or OCaml or what-not: if the answer is yes, we are in business.

It is in this sense that WebAssembly is a new fundamental abstraction boundary.

WebAssembly shares many of the concrete characteristics of other abstractions. Like the Linux syscall interface, WebAssembly defines an interface language in which programs rely on host capabilities to access system features. Like the C ABI, calling into WebAssembly code has a predictable low cost. Like HTTP, you can arrange for WebAssembly code to have no shared state with its host, by construction.

But WebAssembly is a new point in this space. Unlike the Linux ABI, there is no fixed set of syscalls: WebAssembly imports are named, typed, and without pre-defined meaning, more like the C ABI. Unlike the C ABI, WebAssembly modules have only the shared state that they are given; neither side has a license to access all of the memory in the “process”. And unlike HTTP, WebAssembly modules are “in the room” with their hosts: close enough that hosts can allow themselves the luxury of synchronous function calls, and to allow WebAssembly modules to synchronously call back into their hosts.

applied teleology

At this point, you are probably nodding along, but also asking yourself, what is it for? If you arrive at this question from the “WebAssembly is a virtual machine” perspective, I don’t think you’re well-equipped to answer. But starting as we did by the interface, I think we are better positioned to appreciate how WebAssembly fits into the computing landscape: the narrative is generative, in that you can explore potential niches by identifying existing abstraction boundaries.

Again, let’s take a few examples. Say you ship some “smart cities” IoT device, consisting of a microcontroller that runs some non-Linux operating system. The system doesn’t have an MMU, so you don’t have hardware memory protections, but you would like to be able to enforce some invariants on the software that this device runs; and you would also like to be able to update that software over the air. WebAssembly is getting used in these environments; I wish I had a list of deployments at hand, but perhaps we can at least take this article last year from a WebAssembly IoT vendor as proof of commercial interest.

Or, say you run a function-as-a-service cloud, meaning that you run customer code in response to individual API requests. You need to limit the allowable set of behaviors from the guest code, so you choose some abstraction boundary. You could use virtual machines, but that would be quite expensive in terms of memory. You could use containers, but you would like more control over the guest code. You could have these functions written in JavaScript, but that means that your abstraction is no longer fundamental; you limit your applicability. WebAssembly fills an interesting niche here, and there are a number of products in this space, for example Fastly Compute or Fermyon Spin.

Or to go smaller, consider extensible software, like the GIMP image editor or VS Code: in the past you would use loadable plug-in modules via the C ABI, which can be quite gnarly, or you lean into a particular scripting language, which can be slow, inexpressive, and limit the set of developers that can write extensions. It’s not a silver bullet, but WebAssembly can have a role here. For example, the Harfbuzz text shaping library supports fonts with an embedded (em-behdad?) WebAssembly extension to control how strings of characters are mapped to positioned glyphs.

aside: what boundaries do

They say that good fences make good neighbors, and though I am not quite sure it is true—since my neighbor put up a fence a few months ago, our kids don’t play together any more—boundaries certainly facilitate separation of functionality. Conway’s law is sometimes applied as a descriptive observation—ha-ha, isn’t that funny, they just shipped their org chart—but this again misses the point, in that boundaries facilitate separation, but also composition: if I know that I can fearlessly allow a font to run code because I have an appropriate abstraction boundary between host application and extension, I have gained in power. I no longer need to be responsible for every part of the product, and my software can scale up to solve harder problems by composing work from multiple teams.

There is little point in using WebAssembly if you control both sides of a boundary, just as (unless you have chickens) there is little point in putting up a fence that runs through the middle of your garden. But where you want to compose work from separate teams, the boundaries imposed by WebAssembly can be a useful tool.

narrative generation

WebAssembly is enjoying a tail-wind of hype, so I think it’s fair to say that wherever you find a fundamental abstraction boundary, someone is going to try to implement it with WebAssembly.

Again, some examples: back in 2022 I speculated that someone would “compile” Docker containers to WebAssembly modules, and now that is a thing.

I think at some point someone will attempt to replace eBPF with Wasm in the Linux kernel; eBPF is just not as good a language as Wasm, and the toolchains that produce it are worse. eBPF has clunky calling-conventions about what registers are saved and spilled at call sites, a decision that can be made more efficiently for the program and architecture at hand when register-allocating WebAssembly locals. (Sometimes people lean on the provably-terminating aspect of eBPF as its virtue, but that could apply just as well to Wasm if you prohibit the loop opcode (and the tail-call instructions) at verification-time.) And why don’t people write whole device drivers in eBPF? Or rather, targetting eBPF from C or what-have-you. It’s because eBPF is just not good enough. WebAssembly is, though! Anyway I think Linux people are too chauvinistic to pick this idea up but I bet Microsoft could do it.

I was thinking today, you know, it actually makes sense to run a WebAssembly operating system, one which runs WebAssembly binaries. If the operating system includes the Wasm run-time, it can interpose itself at syscall boundaries, sometimes allowing it to avoid context switches. You could start with something like the Linux ABI, perhaps via WALI, but for a subset of guest processes that conform to particular conventions, you could build purpose-built composition that can allocate multiple WebAssembly modules to a single process, eliding inter-process context switches and data copies for streaming computations. Or, focussing on more restricted use-cases, you could make a microkernel; googling around I found this article from a couple days ago where someone is giving this a go.

wwwhat about the wwweb

But let’s go back to the web, where you are reading this. In one sense, WebAssembly is a massive web success, being deployed to literally billions of user agents. In another, it is marginal: people do not write web front-ends in WebAssembly. Partly this is because the kind of abstraction supported by linear-memory WebAssembly 1.0 isn’t a good match for the garbage-collected DOM API exposed by web browsers. As a corrolary, languages that are most happy targetting this linear-memory model (C, Rust, and so on) aren’t good for writing DOM applications either. WebAssembly is used in auxiliary modules where you want to run legacy C++ code on user devices, or to speed up a hot leaf function, but isn’t a huge success.

This will change with the recent addition of managed data types to WebAssembly, but not necessarily in the way that you might think. Like, now that it will be cheaper and more natural to pass data back and forth with JavaScript, are we likely to see Wasm/GC progressively occupying more space in web applications? For me, I doubt that progressive is the word. In the same way that you wouldn’t run a fence through the middle of your front lawn, you wouldn’t want to divide your front-end team into JavaScript and WebAssembly sub-teams. Instead I think that we will see more phase transitions, in which whole web applications switch from JavaScript to Wasm/GC, compiled from Dart or Elm or what have you. The natural fundamental abstraction boundary in a web browser is between the user agent and the site’s code, not within the site’s code itself.

conclusion

So, friends, if you are talking to a compiler engineer, by all means: keep describing WebAssembly as an virtual machine. It will keep them interested. But for everyone else, the value of WebAssembly is what it does, which is to be a different way of breaking a system into pieces. Armed with this observation, we can look at current WebAssembly uses to understand the nature of those boundaries, and to look at new boundaries to see if WebAssembly can have a niche there. Happy hacking, and may your components always compose!

Categories: FLOSS Project Planets

mailutils @ Savannah: GNU mailutils version 3.17

Sat, 2024-01-06 10:20

GNU mailutils version 3.17 is available for download. This is a maintenance release, including some new features:

Use of TLS in pop3d and imap4d


If not explicitly specified, the TLS mode to use (ondemand, connect, etc.) is derived from the configured port.  E.g., for imap4d, port 143 implies ondemand mode, and port 993 implies connection mode.

The global tls-mode setting is used only when the mode cannot be determined otherwise, i.e. neither per-server tls-mode is given nor the port gives any clues as to the TLS mode to use.

Categories: FLOSS Project Planets

anubis @ Savannah: GNU anubis version 4.3

Sat, 2024-01-06 06:42

GNU anubis version 4.3 is available for download. This is a maintenance release, including some new features:

anubisusr requires GnuTLS


New configuration statement: use-pam

 
Used in CONTROL section, this boolean statement enables or disables the use of the Pluggable Authentication Module interface for accounting and session management.

New configuration statement: identd-keyfile


Sets the name of the file with shared keys used for decrypting replies from the auth service.  It is used in traditional mode if anubis receives an encrypted response from the client's identd server (e.g. if they are running pidentd with encryption).

Categories: FLOSS Project Planets

FSF Blogs: FSD meeting recap 2024-01-05

Fri, 2024-01-05 16:34
Check out the important work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.
Categories: FLOSS Project Planets

Andy Wingo: scheme modules vs whole-program compilation: fight

Fri, 2024-01-05 15:43

In a recent dispatch, I explained the whole-program compilation strategy used in Whiffle and Hoot. Today’s note explores what a correct solution might look like.

being explicit

Consider a module that exports an increment-this-integer procedure. We’ll use syntax from the R6RS standard:

(library (inc) (export inc) (import (rnrs)) (define (inc n) (+ n 1)))

If we then have a program:

(import (rnrs) (inc)) (inc 42)

Then the meaning of this program is clear: it reduces to (+ 42 1), then to 43. Fine enough. But how do we get there? How does the compiler compose the program with the modules that it uses (transitively), to produce a single output?

In Whiffle (and Hoot), the answer is, sloppily. There is a standard prelude that initially has a number of bindings from the host compiler, Guile. One of these is +, exposed under the name %+, where the % in this case is just a warning to the reader that this is a weird primitive binding. Using this primitive, the prelude defines a wrapper:

... (define (+ x y) (%+ x y)) ...

At compilation-time, Guile’s compiler recognizes %+ as special, and therefore compiles the body of + as consisting of a primitive call (primcall), in this case to the addition primitive. The Whiffle (and Hoot, and native Guile) back-ends then avoid referencing an imported binding when compiling %+, and instead produce backend-specific code: %+ disappears. Most uses of the + wrapper get inlined so %+ ends up generating code all over the program.

The prelude is lexically splatted into the compilation unit via a pre-expansion phase, so you end up with something like:

(let () ; establish lexical binding contour ... (define (+ x y) (%+ x y)) ... (let () ; new nested contour (define (inc n) (+ n 1)) (inc 42)))

This program will probably optimize (via partial evaluation) to just 43. (What about let and define? Well. Perhaps we’ll get to that.)

But, again here I have taken a short-cut, which is about modules. Hoot and Whiffle don’t really do modules, yet anyway. I keep telling Spritely colleagues that it’s complicated, and rightfully they keep asking why, so this article gets into it.

is it really a big letrec?

Firstly you have to ask, what is the compilation unit anyway? I mean, given a set of modules A, B, C and so on, you could choose to compile them separately, relying on the dynamic linker to compose them at run-time, or all together, letting the compiler gnaw on them all at once. Or, just A and B, and so on. One good-enough answer to this problem is library-group form, which explicitly defines a set of topologically-sorted modules that should be compiled together. In our case, to treat the (inc) module together with our example program as one compilation unit, we would have:

(library-group ;; start with sequence of libraries ;; to include in compilation unit... (library (inc) ...) ;; then the tail is the program that ;; might use the libraries (import (rnrs) (inc)) (inc 42))

In this example, the (rnrs) base library is not part of the compilation unit. Presumably it will be linked in, either as a build step or dynamically at run-time. For Hoot we would want the whole prelude to be included, because we don’t want any run-time dependencies. Anyway hopefully this would expand out to something like the set of nested define forms inside nested let lexical contours.

And that was my instinct: somehow we are going to smash all these modules together into a big nested letrec, and the compiler will go to town. And this would work, for a “normal” programming language.

But with Scheme, there is a problem: macros. Scheme is a “programmable programming language” that allows users to extend its syntax as well as its semantics. R6RS defines a procedural syntax transformer (“macro”) facility, in which the user can define functions that run on code at compile-time (specifically, during syntax expansion). Scheme macros manage to compose lexical scope from the macro definition with the scope at the macro instantiation site, by annotating these expressions with source location and scope information, and making syntax transformers mostly preserve those annotations.

“Macros are great!”, you say: well yes, of course. But they are a problem too. Consider this incomplete library:

(library (ctinc) (import (rnrs) (inc)) (export ctinc) (define-syntax ctinc (lambda (stx) ...)) // ***

The idea is to define a version of inc, but at compile-time: a (ctinc 42) form should expand directly to 43, not a call to inc (or even +, or %+). We define syntax transformers with define-syntax instead of define. The right-hand-side of the definition ((lambda (stx) ...)) should be a procedure of one argument, which returns one value: so far so good. Or is it? How do we actually evaluate what (lambda (stx) ...) means? What should we fill in for ...? When evaluating the transformer value, what definitions are in scope? What does lambda even mean in this context?

Well... here we butt up against the phasing wars of the mid-2000s. R6RS defines a whole system to explicitly declare what bindings are available when, then carves out a huge exception to allow for so-called implicit phasing, in which the compiler figures it out on its own. In this example we imported (rnrs) for the default phase, and this is the module that defines lambda (and indeed define and define-syntax). The standard defines that (rnrs) makes its bindings available both at run-time and expansion-time (compilation-time), so lambda means what we expect that it does. Whew! Let’s just assume implicit phasing, going forward.

The operand to the syntax transformer is a syntax object: an expression annotated with source and scope information. To pick it apart, R6RS defines a pattern-matching helper, syntax-case. In our case ctinc is unary, so we can begin to flesh out the syntax transformer:

(library (ctinc) (import (rnrs) (inc)) (export ctinc) (define-syntax ctinc (lambda (stx) (syntax-case stx () ((ctinc n) (inc n)))))) // ***

But here there’s a detail, which is that when syntax-case destructures stx to its parts, those parts themselves are syntax objects which carry the scope and source location annotations. To strip those annotations, we call the syntax->datum procedure, exported by (rnrs).

(library (ctinc) (import (rnrs) (inc)) (export ctinc) (define-syntax ctinc (lambda (stx) (syntax-case stx () ((ctinc n) (inc (syntax->datum #'n)))))))

And with this, voilà our program:

(library-group (library (inc) ...) (library (ctinc) ...) (import (rnrs) (ctinc)) (ctinc 42))

This program should pre-expand to something like:

(let () (define (inc n) (+ n 1)) (let () (define-syntax ctinc (lambda (stx) (syntax-case stx () ((ctinc n) (inc (syntax->datum #'n)))))) (ctinc 42)))

And then expansion should transform (ctinc 42) to 43. However, our naïve pre-expansion is not good enough for this to be possible. If you ran this in Guile you would get an error:

Syntax error: unknown file:8:12: reference to identifier outside its scope in form inc

Which is to say, inc is not available as a value within the definition of ctinc. ctinc could residualize an expression that refers to inc, but it can’t use it to produce the output.

modules are not expressible with local lexical binding

This brings us to the heart of the issue: with procedural macros, modules impose a phasing discipline on the expansion process. Definitions from any given module must be available both at expand-time and at run-time. In our example, ctinc needs inc at expand-time, which is an early part of the compiler that is unrelated to any later partial evaluation by the optimizer. We can’t make inc available at expand-time just using let / letrec bindings.

This is an annoying result! What do other languages do? Well, mostly they aren’t programmable, in the sense that they don’t have macros. There are some ways to get programmability using e.g. eval in JavaScript, but these systems are not very amenable to “offline” analysis of the kind needed by an ahead-of-time compiler.

For those declarative languages with macros, Scheme included, I understand the state of the art is to expand module-by-module and then stitch together the results of expansion later, using a kind of link-time optimization. You visit a module’s definitions twice: once to evaluate them while expanding, resulting in live definitions that can be used by further syntax expanders, and once to residualize an abstract syntax tree, which will eventually be spliced into the compilation unit.

Note that in general the expansion-time and the residual definitions don’t need to be the same, and indeed during cross-compilation they are often different. If you are compiling with Guile as host and Hoot as target, you might implement cons one way in Guile and another way in Hoot, choosing between them with cond-expand.

lexical scope regained?

What is to be done? Glad you asked, Vladimir. But, I don’t really know. The compiler wants a big blob of letrec, but the expander wants a pearl-string of modules. Perhaps we try to satisfy them both? The library-group paper suggests that modules should be expanded one by one, then stitched into a letrec by AST transformations. It’s not that lexical scope is incompatible with modules and whole-program compilation; the problems arise when you add in macros. So by expanding first, in units of modules, we reduce high-level Scheme to a lower-level language without syntax transformers, but still on the level of letrec.

I was unreasonably pleased by the effectiveness of the “just splat in a prelude” approach, and I will miss it. I even pled for a kind of stop-gap fat-fingered solution to sloppily parse module forms and keep on splatting things together, but colleagues helpfully talked me away from the edge. So good-bye, sloppy: I repent my ways and will make amends, with 40 hail-maries and an alpha renaming thrice daily and more often if in moral distress. Further bulletins as events warrant. Until then, happy scheming!

Categories: FLOSS Project Planets

Andy Wingo: v8's precise field-logging remembered set

Fri, 2024-01-05 04:44

A remembered set is used by a garbage collector to identify graph edges between partitioned sub-spaces of a heap. The canonical example is in generational collection, where you allocate new objects in newspace, and eventually promote survivor objects to oldspace. If most objects die young, we can focus GC effort on newspace, to avoid traversing all of oldspace all the time.

Collecting a subspace instead of the whole heap is sound if and only if we can identify all live objects in the subspace. We start with some set of roots that point into the subspace from outside, and then traverse all links in those objects, but only to other objects within the subspace.

The roots are, like, global variables, and the stack, and registers; and in the case of a partial collection in which we identify live objects only within newspace, also any link into newspace from other spaces (oldspace, in our case). This set of inbound links is a remembered set.

There are a few strategies for maintaining a remembered set. Generally speaking, you start by implementing a write barrier that intercepts all stores in a program. Instead of:

obj[slot] := val;

You might abstract this away:

write_slot(obj, sizeof obj, &obj[slot], val);

As you can see, it’s quite an annoying transformation to do by hand; typically you will want some sort of language-level abstraction that lets you keep the more natural syntax. C++ can do this pretty well, or if you are implementing a compiler, you just add this logic to the code generator.

Then the actual write barrier... well its implementation is twingled up with implementation of the remembered set. The simplest variant is a card-marking scheme, whereby the heap is divided into equal-sized power-of-two-sized cards, and each card has a bit. If the heap is also divided into blocks (say, 2 MB in size), then you might divide those blocks into 256-byte cards, yielding 8192 cards per block. A barrier might look like this:

void write_slot(ObjRef obj, size_t size, SlotAddr slot, ObjRef val) { obj[slot] := val; // Start with the store. uintptr_t block_size = 1<<21; uintptr_t card_size = 1<<8; uintptr_t cards_per_block = block_size / card_size; uintptr_t obj_addr = obj; uintptr_t card_idx = (obj_addr / card_size) % cards_per_block; // Assume remset allocated at block start. void *block_start = obj_addr & ~(block_size-1); uint32_t *cards = block_start; // Set the bit. cards[card_idx / 32] |= 1 << (card_idx % 32); }

Then when marking the new generation, you visit all cards, and for all marked cards, trace all outbound links in all live objects that begin on the card.

Card-marking is simple to implement and simple to statically allocate as part of the heap. Finding marked cards takes time proportional to the size of the heap, but you hope that the constant factors and SIMD minimize this cost. However iterating over objects within a card can be costly. You hope that there are few old-to-new links but what do you know?

In Whippet I have been struggling a bit with sticky-mark-bit generational marking, in which new and old objects are not spatially partitioned. Sometimes generational collection is a win, but in benchmarking I find that often it isn’t, and I think Whippet’s card-marking barrier is at fault: it is simply too imprecise. Consider firstly that our write barrier applies to stores to slots in all objects, not just those in oldspace; a store to a new object will mark a card, but that card may contain old objects which would then be re-scanned. Or consider a store to an old object in a more dense part of oldspace; scanning the card may incur more work than needed. It could also be that Whippet is being too aggressive at re-using blocks for new allocations, where it should be limiting itself to blocks that are very sparsely populated with old objects.

what v8 does

There is a tradeoff in write barriers between the overhead imposed on stores, the size of the remembered set, and the precision of the remembered set. Card-marking is relatively low-overhead and usually small as a fraction of the heap, but not very precise. It would be better if a remembered set recorded objects, not cards. And it would be even better if it recorded slots in objects, not just objects.

V8 takes this latter strategy: it has per-block remembered sets which record slots containing “interesting” links. All of the above words were to get here, to take a brief look at its remembered set.

The main operation is RememberedSet::Insert. It takes the MemoryChunk (a block, in our language from above) and the address of a slot in the block. Each block has a remembered set; in fact, six remembered sets for some reason. The remembered set itself is a SlotSet, whose interesting operations come from BasicSlotSet.

The structure of a slot set is a bitvector partitioned into equal-sized, possibly-empty buckets. There is one bit per slot in the block, so in the limit the size overhead for the remembered set may be 3% (1/32, assuming compressed pointers). Currently each bucket is 1024 bits (128 bytes), plus the 4 bytes for the bucket pointer itself.

Inserting into the slot set will first allocate a bucket (using C++ new) if needed, then load the “cell” (32-bit integer) containing the slot. There is a template parameter declaring whether this is an atomic or normal load. Finally, if the slot bit in the cell is not yet set, V8 will set the bit, possibly using atomic compare-and-swap.

In the language of Blackburn’s Design and analysis of field-logging write barriers, I believe this is a field-logging barrier, rather than the bit-stealing slot barrier described by Yang et al in the 2012 Barriers Reconsidered, Friendlier Still!. Unlike Blackburn’s field-logging barrier, however, this remembered set is implemented completely on the side: there is no in-object remembered bit, nor remembered bits for the fields.

On the one hand, V8’s remembered sets are precise. There are some tradeoffs, though: they require off-managed-heap dynamic allocation for the buckets, and traversing the remembered sets takes time proportional to the whole heap size. And, should V8 ever switch its minor mark-sweep generational collector to use sticky mark bits, the lack of a spatial partition could lead to similar problems as I am seeing in Whippet. I will be interested to see what they come up with in this regard.

Well, that’s all for today. Happy hacking in the new year!

Categories: FLOSS Project Planets

FSF Blogs: FSD meeting recap 2023-12-29

Wed, 2024-01-03 15:58
Check out the important work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.
Categories: FLOSS Project Planets

FSF Blogs: LibrePlanet 2024: May 4 and 5, Wentworth Institute of Technology, Boston, MA

Tue, 2024-01-02 20:00
The dates and location of LibrePlanet 2024 have been announced!
Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, January 05, starting at 12:00 EST (17:00 UTC)

Tue, 2024-01-02 16:42
Join the FSF and friends on Friday, January 05, from 12:00 to 15:00 EST (17:00 to 20:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Pages