GNU Planet!
Applied Pokology: Applied Pokology - Interesting poke idiom: sparse tables
During tonight poke online office hours our friend hdzki came with an interesting use case. He is poking at some binary structures that are like sparse tables whose entries are distributed in the file in an arbitrary way. Each sparse table is characterized by an array of consecutive non-NULL pointers. Each pointer points to an entry in the table. The table entries can be anywhere in the IO space, and are not necessarily consecutive, nor be in order.
Andy Wingo: three approaches to heap sizing
How much memory should a program get? Tonight, a quick note on sizing for garbage-collected heaps. There are a few possible answers, depending on what your goals are for the system.
you: doctor scienceSometimes you build a system and you want to study it: to identify its principal components and see how they work together, or to isolate the effect of altering a single component. In that case, what you want is a fixed heap size. You run your program a few times and determine a heap size that is sufficient for your problem, and then in future run the program with that new fixed heap size. This allows you to concentrate on the other components of the system.
A good approach to choosing the fixed heap size for a program is to determine the minimum heap size a program can have by bisection, then multiplying that size by a constant factor. Garbage collection is a space/time tradeoff: the factor you choose represents a point on the space/time tradeoff curve. I would choose 1.5 in general, but this is arbitrary; I'd go more with 3 or even 5 if memory isn't scarce and I'm really optimizing for throughput.
Note that a fixed-size heap is not generally what you want. It's not good user experience for running ./foo at the command line, for example. The reason for this is that program memory use is usually a function of the program's input, and only in some cases do you know what the input might look like, and until you run the program you don't know what the exact effect of input on memory is. Still, if you have a team of operations people that knows what input patterns look like and has experience with a GC-using server-side process, fixed heap sizes could be a good solution there too.
you: average josé/finaOn the other end of the spectrum is the average user. You just want to run your program. The program should have the memory it needs! Not too much of course; that would be wasteful. Not too little either; I can tell you, my house is less than 100m², and I spend way too much time shuffling things from one surface to another. If I had more space I could avoid this wasted effort, and in a similar way, you don't want to be too stingy with a program's heap. Do the right thing!
Of course, you probably have multiple programs running on a system that are making similar heap sizing choices at the same time, and the relative needs and importances of these programs could change over time, for example as you switch tabs in a web browser, so the right thing really refers to overall system performance, whereas what you are controlling is just one process' heap size; what is the Right Thing, anyway?
My corner of the GC discourse agrees that something like the right solution was outlined by Kirisame, Shenoy, and Panchekha in a 2022 OOPSLA paper, in which the optimum heap size depends on the allocation rate and the gc cost for a process, which you measure on an ongoing basis. Interestingly, their formulation of heap size calculation can be made by each process without coordination, but results in a whole-system optimum.
There are some details but you can imagine some instinctive results: for example, when a program stops allocating because it's waiting for some external event like user input, it doesn't need so much memory, so it can start shrinking its heap. After all, it might be quite a while before the program has new input. If the program starts allocating again, perhaps because there is new input, it can grow its heap rapidly, and might then shrink again later. The mechanism by which this happens is pleasantly simple, and I salute (again!) the authors for identifying the practical benefits that an abstract model brings to the problem domain.
you: a damaged, suspicious individualHoo, friends-- I don't know. I've seen some things. Not to exaggerate, I like to think I'm a well-balanced sort of fellow, but there's some suspicion too, right? So when I imagine a background thread determining that my web server hasn't gotten so much action in the last 100ms and that really what it needs to be doing is shrinking its heap, kicking off additional work to mark-compact it or whatever, when the whole point of the virtual machine is to run that web server and not much else, only to have to probably give it more heap 50ms later, I-- well, again, I exaggerate. The MemBalancer paper has a heartbeat period of 1 Hz and a smoothing function for the heap size, but it just smells like danger. Do I need danger? I mean, maybe? Probably in most cases? But maybe it would be better to avoid danger if I can. Heap growth is usually both necessary and cheap when it happens, but shrinkage is never necessary and is sometimes expensive because you have to shuffle around data.
So, I think there is probably a case for a third mode: not fixed, not adaptive like the MemBalancer approach, but just growable: grow the heap when and if its size is less than a configurable multiplier (e.g. 1.5) of live data. Never shrink the heap. If you ever notice that a process is taking too much memory, manually kill it and start over, or whatever. Default to adaptive, of course, but when you start to troubleshoot a high GC overhead in a long-lived proess, perhaps switch to growable to see its effect.
unavoidable badnessThere is some heuristic badness that one cannot avoid: even with the adaptive MemBalancer approach, you have to choose a point on the space/time tradeoff curve. Regardless of what you do, your system will grow a hairy nest of knobs and dials, and if your system is successful there will be a lively aftermarket industry of tuning articles: "Are you experiencing poor object transit? One knob you must know"; "Four knobs to heaven"; "It's raining knobs"; "GC engineers DO NOT want you to grab this knob!!"; etc. (I hope that my British readers are enjoying this.)
These ad-hoc heuristics are just part of the domain. What I want to say though is that having a general framework for how you approach heap sizing can limit knob profusion, and can help you organize what you have into a structure of sorts.
At least, this is what I tell myself; inshallah. Now I have told you too. Until next time, happy hacking!
GNU Taler news: GNU Taler v0.9.1 released
a2ps @ Savannah: a2ps 4.14.93 released [alpha]
I am happy to announce another pre-release of what will eventually be the
first release of GNU a2ps since 2007.
I have had very little feedback about previous pre-releases, so I intend to
make a stable release soon. If you’re interested in GNU a2ps, please try
this pre-release! I hope that once I make a full release it will quickly be
packaged for distributions.
Here are the compressed sources and a GPG detached signature:
https://alpha.gnu.org/gnu/a2ps/a2ps-4.14.93.tar.gz
https://alpha.gnu.org/gnu/a2ps/a2ps-4.14.93.tar.gz.sig
Use a mirror for higher download bandwidth:
https://www.gnu.org/order/ftp.html
Here are the SHA1 and SHA256 checksums:
8eb28d7a8ca933a08918d706f231978a91e42d3f a2ps-4.14.93.tar.gz
VoCuvBKrC1y5P/wZbx92C6O28jvtCfs9ZnskCjx/xmM a2ps-4.14.93.tar.gz
The SHA256 checksum is base64 encoded, instead of the
hexadecimal encoding that most checksum tools default to.
Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact. First, be sure to download both the .sig file
and the corresponding tarball. Then, run a command like this:
gpg --verify a2ps-4.14.93.tar.gz.sig
The signature should match the fingerprint of the following key:
pub rsa2048 2013-12-11 [SC]
2409 3F01 6FFE 8602 EF44 9BB8 4C8E F3DA 3FD3 7230
uid Reuben Thomas <rrt@sc3d.org>
uid keybase.io/rrt <rrt@keybase.io>
If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.
gpg --locate-external-key rrt@sc3d.org
gpg --recv-keys 4C8EF3DA3FD37230
wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=a2ps&download=1' | gpg --import -
As a last resort to find the key, you can try the official GNU
keyring:
wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
gpg --keyring gnu-keyring.gpg --verify a2ps-4.14.93.tar.gz.sig
This release was bootstrapped with the following tools:
Autoconf 2.71
Automake 1.16.5
Gnulib v0.1-5639-g80b225fe1e
NEWS
* Noteworthy changes in release 4.14.93 (2023-01-26) [alpha]
* Features:
- Use libpaper's paper sizes. This includes user-defined paper sizes
when using libpaper 2. It is still possible to define custom margins
using "Medium:" specifications in the configuration file, and the
one size defined by a2ps that libpaper does not know about, Quarto, is
retained for backwards compatiblity, and as an example.
* Bug fixes:
- Avoid a crash when a medium is not specified; instead, use the default
libpaper size (configured by the user or sysadmin, or the locale
default).
- Fix some other potential crashes and compiler warnings.
* Documentation:
- Reformat --help output consistently to 80 columns.
* Build:
- Require autoconf 2.71.
- Require libpaper.
FSF Blogs: Thank you and a very warm welcome to our new members
poke @ Savannah: GNU poke 3.0 released
I am happy to announce a new major release of GNU poke, version 3.0.
This release is the result of a year of development. A lot of things have changed and improved with respect to the 2.x series; we have fixed many bugs and added quite a lot of new exciting and useful features. See below for a description of many of them.
From now on, we intend to do not one but two major releases of poke every year. What is moving us to change this is the realization that users have to wait for too long to enjoy new features, which are continuously being added in a project this young and active.
The tarball poke-3.0.tar.gz is now available at
https://ftp.gnu.org/gnu/poke/poke-3.0.tar.gz.
> GNU poke (http://www.jemarch.net/poke) is an interactive, extensible editor for binary data. Not limited to editing basic entities such as bits and bytes, it provides a full-fledged procedural, interactive programming language designed to describe data structures and to operate on them.
Thanks to the people who contributed with code and/or documentation to this release. In certain but no significant order they are:
Mohammad-Reza Nabipoor
Arsen Arsenović
Luca Saiu
Bruno Haible
apache2
Indu Bhagat
Agathe Porte
Alfred M. Szmidt
Daiki Ueno
Darshit Shah
Jan Seeger
Sergio Durigan Junior
... and yours truly
As always, thank you all!
But wait, this time we also have special thanks:
To Bruno Haible for his invaluable advise and his help in throughfully testing this new release in many different platforms and configurations.
To the Sourceware overseers, Mark Wielaard, Arsen Arsenović, and Sam James for their help in setting up the buildbots we are using for CI at sourceware.
What is new in this release:
- A screen pager has been added to the poke application. If enabled with the `.set pager yes' option, output will be paged one screenful at a time.
- A tracer has been added to libpoke and the poke application. If enabled with the `.set tracer yes' option, subsequent loaded Poke types will be instrumentalized so calls to user-defined handlers are executed when certain events happen:
- Every time a field gets mapped.
- Every time a struct/union gets mapped.
- Every time a field gets constructed.
- Every time a struct/union gets constructed.
- Every time an optional field is omitted when mapping or constructing.
- A new command sdiff (for "structured diff") has been added to the poke application, that provides a way to generate patchable diffs of mapped structured Poke values. This command is an interface to the structured diffs provided by the new diff.pk pickle.
- When no name is passed to the .mem command, an unique name for the memory IOS with the form N will be used automatically, where N is a positive integer.
- Auto-completion of 'attributes is now available in the poke application.
- Constraint errors now contain details on the location (which field) where the constraint error happens, along with the particular expression that failed.
- Inline assembler expressions and statements are now supported:
,----
| asm (TEMPLATE [: OUTPUTS [: INPUTS]])
| asm TYPE: (TEMPLATE [: INPUTS])
`----
- Both `printf' and `format' now support printing values of type `any'.
- Both `printf' and `format' now support printing integral values interpreted as floating-point values encoded in IEEE 754. Format tags %f, %g and %e are supported. This feature, along with the new ieee754.pk pickle, eases dealing with floating-point data in binary data.
- Pre-conditional optional fields are added to complement the currently supported post-conditional optional fields. A pre-conditional optional field like the following makes FNAME optional based on the evaluation of CONDITION. But the field itself is not mapped if the condition evaluates to false:
,----
| if (CONDITION)
| TYPE FNAME;
`----
- A new option `.set autoremap no' can be used in order to tell poke to not remap mapped values automatically. This greatly speeds up things, but assumes that the contents of the IO space are not updated out of the control of the user. See the manual for details.
- The :to argument to the `extract' command is now optional, and defaults to the empty string.
- ${XDG_CONFIG_HOME:-$HOME/.config} is now preferred to XDG_CONFIG_DIRS.
- Array and struct constructors are now primaries in the Poke syntax. This means that it is no longer necessary to enclose them between parenthesis in constructions like:
,----
| (Packet {}).field
`----
and this is now accepted:
,----
| Packet {}.field
`----
- Bit-concatenation is now supported in l-values. After executing the following code the value of `a' is 0x1N and the value of `b' is (uint<28>)0x2345678:
,----
| var a = 0 as int<4>;
| var b = 0 as uint<28>;
|
| a:::b = 0x12345678;
`----
- Arrays can now be indented by size, by specifying an offset as an index. This is particularly useful for accessing structures such as string tables without having to explicitly iterate on the array's elements.
- Union types can now be declared as "integral". The same features of integral structs are now available for unions: integration, deintegration, the ability of being used in contexts where an integer is expected, etc.
- Support for "computed fields" has been added to struct and union types. Computed fields are accessed just like regular fields, but the semantics of referring to them and of assigning to them are specified by the user by the way of defining getter and setter methods.
- This version introduces three new Poke attributes that work on values of type `any':
,----
| VAL'elem (N)
| evaluates to the Nth element in VAL, as a value of type `any'.
|
| VAL'eoffset (N)
| evaluates to the offset of the Nth element in VAL.
|
| VAL'esize (N)
| evaluates to the size of the Nth element in VAL.
|
| VAL'ename (N)
| attribute evaluates to the name of the Nth element in VAL.
`----
- Two new operators have been introduced to facilitate operating Poke array as stacks in an efficient way: apush and apop. Since these operators change the size of the involved arrays, they are only allowed in unbounded arrays.
- Poke programs can now hook in the IO subsystem by installing functions that will be invoked when certain operations on IO spaces are being performed:
,----
| ios_open_hook
| Functions in this hook are invoked once a new IO space has been
| opened.
|
| ios_set_hook
| Functions in this hook are invoked once the current IO space
| changes.
|
| ios_close_pre_hook
| ios_close_hook
| Functions in these hooks are invoked before and after an IO space is
| closed, respectively.
`----
- The 'length attribute is now valid in values of type `any'.
- Poke declarations can now be annotated as `immutable'. It is not allowed to re-define immutable definitions.
- A new compiler built-in `iolist' has been introduced, that returns an array with the IO space identifiers of currently open IOS.
- We have changed the logic of the EXCOND operator ?!. It now evaluates to 1 (true) if the execution of the first operand raises the specified exception, and to 0 (false) otherwise. We profusedly apologize for the backwards incompatibility, but this is way better than the previous (reversed) logic.
- The containing struct or union value can now be refered as SELF in the body of methods. SELF is of type `any'.
- Integer literal suffixes (B, H, U, etc) are case-insensitive. But until now little-case `b' wasn't being recognized as such. Now `1B' is the same than `1b'.
- Casting to union types now raise a compile-time error.
- If no explicit message is specified in calls to `assert', a default one showing the source code of the failing condition is constructed and used instead.
- An operator `remap' has been used in order to force a re-map of some mapped Poke value.
- Signed integral types of one bit are not allowed. How could they be, in two's complement?
- The built-in function get_time has been renamed to gettime, to follow the usual naming of the corresponding standard C function.
- New standard functions:
,----
| eoffset (V, N)
| Given a value of type `any' and a name, returns the offset of
| the element having that name.
|
| openset (HANDLER, [FLAGS])
| Open an IO space and make it the current IO space.
|
| with_temp_ios ([HANDLER], [FLAGS], [DO], [ENDIAN])
| Execute some code with a temporary IO space.
|
| with_cur_ios (IOS, [DO], [ENDIAN])
| Execute some code on some given IO space.
`----
- New API function pk_struct_ref_set_field_value.
- New API function pk_type_name.
- New pickles provided in the poke distribution:
,----
| diff.pk
| Useful binary diffing utilities. In particular, it implements
| the "structured diff" format as described in
| https://binary-tools.net/bindiff.pdf.
|
| io.pk
| Facilities to dump data to the terminal.
|
| pk-table.pk
| Convenient facilities to Poke programs to print tabulated data.
|
| openpgp.pk
| Pickle to poke at OpenPGP RFC 4880 data.
|
| sframe.pk
| sframe-dump.pk
| Pickles for the SFrame unwinding format, and related dump
| utilities.
|
| search.pk
| Utility for searching data in IO spaces that conform to some
| given Poke type.
|
| riscv.pk
| Pickle to poke at instructions encoded in the RISC-V instruction
| set (RV32I). It also provides methods to generate assembly
| language.
|
| coff.pk
| coff-aarch64.pk
| coff-i386.pk
| COFF object files.
|
| pe.pk
| pe-amd64.pk
| pe-arm.pk
| pe-arm64.pk
| pe-debug.pk
| pe-i386.pk
| pe-ia64.pk
| pe-m32r.pk
| pe-mips.pk
| pe-ppc.pk
| pe-riscv.pk
| pe-sh3.pk
| PE/COFF object files.
|
| pcap.pk
| Capture file format.
|
| uuid.pk
| Universally Unique Identifier (UUID) as defined by RFC4122.
|
| redoxfs.pk
| RedoxFS files ystem of Redox OS.
|
| ieee754.pk
| IEEE Standard for Floating-Point Arithmetic.
`----
- The ELF pickle now provides functions implementing ELF hashing.
- It is now supported to configure the poke sources with --disable-hserver.
- Documentation for the `format' language construction has been added to the poke manual.
- A new program poked, for "poke daemon", has been contributed to the poke distribution by Mohammad-Reza Nabipoor. poked links with libpoke and uses Unix sockets to act as a broker to communicate with an instance of a Poke incremental compiler. This is already used by several user interfaces to poke.
- The machine-interface subsystem has been removed from poke, in favor of the poked approach.
- The example GUI that was intended to be a test tool for the machine interface has been removed from the poke distribution.
- Many bugs have been fixed.
--
Jose E. Marchesi
Frankfurt am Main
26 January 2023
GNU Guile: GNU Guile 3.0.9 released
We are pleased to announce the release of GNU Guile 3.0.9! This release fixes a number of bugs and adds several new features, among which:
- New bindings for POSIX functionality, including bindings for the at family of functions (openat, statat, etc.), a new spawn procedure that wraps posix_spawn and that system* now uses, and the ability to pass flags such as O_CLOEXEC to the pipe procedure.
- A new bytevector-slice procedure.
- Reduced memory consumption for the linker and assembler.
For full details, see the NEWS entry, and check out the download page.
Happy Guile hacking!
FSF News: FSF board adopts updated by-laws to protect copyleft
Andy Wingo: parallel ephemeron tracing
Hello all, and happy new year. Today's note continues the series on implementing ephemerons in a garbage collector.
In our last dispatch we looked at a serial algorithm to trace ephemerons. However, production garbage collectors are parallel: during collection, they trace the object graph using multiple worker threads. Our problem is to extend the ephemeron-tracing algorithm with support for multiple tracing threads, without introducing stalls or serial bottlenecks.
Recall that we ended up having to define a table of pending ephemerons:
struct gc_pending_ephemeron_table { struct gc_ephemeron *resolved; size_t nbuckets; struct gc_ephemeron *buckets[0]; };This table holds pending ephemerons that have been visited by the graph tracer but whose keys haven't been found yet, as well as a singly-linked list of resolved ephemerons that are waiting to have their values traced. As a global data structure, the pending ephemeron table is a point of contention between tracing threads that we need to design around.
a confessionAllow me to confess my sins: things would be a bit simpler if I didn't allow tracing workers to race.
As background, if your GC supports marking in place instead of always evacuating, then there is a mark bit associated with each object. To reduce the overhead of contention, a common strategy is to actually use a whole byte for the mark bit, and to write to it using relaxed atomics (or even raw stores). This avoids the cost of a compare-and-swap, but at the cost that multiple marking threads might see that an object's mark was unset, go to mark the object, and think that they were the thread that marked the object. As far as the mark byte goes, that's OK because everybody is writing the same value. The object gets pushed on the to-be-traced grey object queues multiple times, but that's OK too because tracing should be idempotent.
This is a common optimization for parallel marking, and it doesn't have any significant impact on other parts of the GC--except ephemeron marking. For ephemerons, because the state transition isn't simply from unmarked to marked, we need more coordination.
high levelThe parallel ephemeron marking algorithm modifies the serial algorithm in just a few ways:
We have an atomically-updated state field in the ephemeron, used to know if e.g. an ephemeron is pending or resolved;
We use separate fields for the pending and resolved links, to allow for concurrent readers across a state change;
We introduce "traced" and "claimed" states to resolve races between parallel tracers on the same ephemeron, and track the "epoch" at which an ephemeron was last traced;
We remove resolved ephemerons from the pending ephemeron hash table lazily, and use atomic swaps to pop from the resolved ephemerons list;
We have to re-check key liveness after publishing an ephemeron to the pending ephemeron table.
Regarding the first point, there are four possible values for the ephemeron's state field:
enum { TRACED, CLAIMED, PENDING, RESOLVED };The state transition diagram looks like this:
,----->TRACED<-----. , | ^ . , v | . | CLAIMED | | ,-----/ \---. | | v v | PENDING--------->RESOLVEDWith this information, we can start to flesh out the ephemeron object itself:
struct gc_ephemeron { uint8_t state; uint8_t is_dead; unsigned epoch; struct gc_ephemeron *pending; struct gc_ephemeron *resolved; void *key; void *value; };The state field holds one of the four state values; is_dead indicates if a live ephemeron was ever proven to have a dead key, or if the user explicitly killed the ephemeron; and epoch is the GC count at which the ephemeron was last traced. Ephemerons are born TRACED in the current GC epoch, and the collector is responsible for incrementing the current epoch before each collection.
algorithm: tracing ephemeronsWhen the collector first finds an ephemeron, it does a compare-and-swap (CAS) on the state from TRACED to CLAIMED. If that succeeds, we check the epoch; if it's current, we revert to the TRACED state: there's nothing to do.
(Without marking races, you wouldn't need either TRACED or CLAIMED states, or the epoch; it would be implicit in the fact that the ephemeron was being traced at all that you had a TRACED ephemeron with an old epoch.)
So now we have a CLAIMED ephemeron with an out-of-date epoch. We update the epoch and clear the pending and resolved fields, setting them to NULL. If, then, the ephemeron is_dead, we are done, and we go back to TRACED.
Otherwise we check if the key has already been traced. If so we forward it (if evacuating) and then trace the value edge as well, and transition to TRACED.
Otherwise we have a live E but we don't know about K; this ephemeron is pending. We transition E's state to PENDING and add it to the front of K's hash bucket in the pending ephemerons table, using CAS to avoid locks.
We then have to re-check if K is live, after publishing E, to account for other threads racing to mark to K while we mark E; if indeed K is live, then we transition to RESOLVED and push E on the global resolved ephemeron list, using CAS, via the resolved link.
So far, so good: either the ephemeron is fully traced, or it's pending and published, or (rarely) published-then-resolved and waiting to be traced.
algorithm: tracing objectsThe annoying thing about tracing ephemerons is that it potentially impacts tracing of all objects: any object could be the key that resolves a pending ephemeron.
When we trace an object, we look it up in the pending ephemeron hash table. But, as we traverse the chains in a bucket, we also load each node's state. If we find a node that's not in the PENDING state, we atomically forward its predecessor to point to its successor. This is correct for concurrent readers because the end of the chain is always reachable: we only skip nodes that are not PENDING, nodes never become PENDING after they transition away from being PENDING, and we only add PENDING nodes to the front of the chain. We even leave the pending field in place, so that any concurrent reader of the chain can still find the tail, even when the ephemeron has gone on to be RESOLVED or even TRACED.
(I had thought I would need Tim Harris' atomic list implementation, but it turns out that since I only ever insert items at the head, having annotated links is not necessary.)
If we find a PENDING ephemeron that has K as its key, then we CAS its state from PENDING to RESOLVED. If this works, we CAS it onto the front of the resolved list. (Note that we also have to forward the key at this point, for a moving GC; this was a bug in my original implementation.)
algorithm: resolved ephemeronsPeriodically a thread tracing the graph will run out of objects to trace (its mark stack is empty). That's a good time to check if there are resolved ephemerons to trace. We atomically exchange the global resolved list with NULL, and then if there were resolved ephemerons, then we trace their values and transition them to TRACED.
At the very end of the GC cycle, we sweep the pending ephemeron table, marking any ephemeron that's still there as is_dead, transitioning them back to TRACED, clearing the buckets of the pending ephemeron table as we go.
nitsSo that's it. There are some drawbacks, for example that this solution takes at least three words per ephemeron. Oh well.
There is also an annoying point of serialization, which is related to the lazy ephemeron resolution optimization. Consider that checking the pending ephemeron table on every object visit is overhead; it would be nice to avoid this. So instead, we start in "lazy" mode, in which pending ephemerons are never resolved by marking; and then once the mark stack / grey object worklist fully empties, we sweep through the pending ephemeron table, checking each ephemeron's key to see if it was visited in the end, and resolving those ephemerons; we then switch to "eager" mode in which each object visit could potentially resolve ephemerons. In this way the cost of ephemeron tracing is avoided for that part of the graph that is strongly reachable. However, with parallel markers, would you switch to eager mode when any thread runs out of objects to mark, or when all threads run out of objects? You would get greatest parallelism with the former, but you run the risk of some workers prematurely running out of data, but when there is still a significant part of the strongly-reachable graph to traverse. If you wait for all threads to be done, you introduce a serialization point. There is a related question of when to pump the resolved ephemerons list. But these are engineering details.
Speaking of details, there are some gnarly pitfalls, particularly that you have to be very careful about pre-visit versus post-visit object addresses; for a semi-space collector, visiting an object will move it, so for example in the pending ephemeron table which by definition is keyed by pre-visit (fromspace) object addresses, you need to be sure to trace the ephemeron key for any transition to RESOLVED, and there are a few places this happens (the re-check after publish, sweeping the table after transitioning from lazy to eager, and when resolving eagerly).
implementationIf you've read this far, you may be interested in the implementation; it's only a few hundred lines long. It took me quite a while to whittle it down!
Ephemerons are challenging from a software engineering perspective, because they are logically a separate module, but they interact both with users of the GC and with the collector implementations. It's tricky to find the abstractions that work for all GC algorithms, whether they mark in place or move their objects, and whether they mark the heap precisely or if there are some conservative edges. But if this is the sort of thing that interests you, voilà the API for users and the API to and from collector implementations.
And, that's it! I am looking forward to climbing out of this GC hole, one blog at a time. There are just a few more features before I can seriously attack integrating this into Guile. Until the next time, happy hacking :)
texinfo @ Savannah: Texinfo 7.0.2 released
We have released version 7.0.2 of Texinfo, the GNU documentation format. This is a minor bug-fix release.
It's available via a mirror (xz is much smaller than gz, but gz is available too just in case):
http://ftpmirror.gnu.org/texinfo/texinfo-7.0.2.tar.xz
http://ftpmirror.gnu.org/texinfo/texinfo-7.0.2.tar.gz
Please send any comments to bug-texinfo@gnu.org.
Full announcement:
https://lists.gnu.org/archive/html/info-gnu/2023-01/msg00008.html
GNU Guix: Meet Guix at FOSDEM
GNU Guix will be present at FOSDEM next week, February 4th and 5th. This is the first time since the pandemic that FOSDEM takes place again “in the flesh” in Brussels, which is exciting to those of us lucky enough to get there! Everything will be live-streamed and recorded thanks to the amazing FOSDEM crew, so everyone can enjoy wherever they are; some of the talks this year will be “remote” too: pre-recorded videos followed by live Q&A sessions with the speaker.
Believe it or not, it’s the 9th year Guix is represented at FOSDEM, with more than 30 talks given in past editions! This year brings several talks that will let you learn more about different areas of the joyful Hydra Guix has become.
This all starts on Saturday, in particular with the amazing declarative and minimalistic computing track:
- “Bringing RISC-V to Guix's bootstrap” (remote), as a continuation of last year’s talk, will be Ekaitz Zarraga’s account of the successful port the full-source bootstrap to RISC-V—no less!
- In “Using GNU Guix Containers with FHS (Filesystem Hierarchy Standard) Support” (remote) John Kehayias will present the recently-added guix shell --container --emulate-fhs.
- “Declaring just what is necessary” (remote) will show how to create system images that contain just what you need, by Efraim Flashner.
- In “GNU Guix and Open Science, a crush?”, Simon Tournier will illustrates ways in which Guix can be beneficial to “open science”.
- “How Replicant, a 100% free software Android distribution, uses (or doesn't use) Guix” will showcase an unusual and exciting use case for Guix, by one of Replicant’s core developers, Denis “GNUtoo” Carikli.
- “An Introduction to Guix Home” will be given on Sunday (remote) by David Wilson of System Crafters fame—a must if you want to understand this newfangled Guix Home thing!
There are many other exciting talks in this track, some of which closely related to Guix and Guile; check it out!
You can also discover Guix in other tracks:
- On Saturday, “Guix, toward practical transparent, verifiable and long-term reproducible research” will be an introduction to Guix (by Simon Tournier) for an audience of scientists interested in coming up with scientific practices that improves verifiability and transparency.
- On Saturday in the security track, “Where does that code come from?” (by Ludovic Courtès) will talk Git checkout authentication in Guix and how this fits in the broader picture of “software supply chain” security.
- On Sunday, Efraim Flashner will talk about “Porting RISC-V to GNU Guix” in the RISC-V track.
- On Sunday, in the high-performance computing (HPC) track, Ludovic Courtès will give a lightning talk about CPU tuning in Guix entitled “Reproducibility and performance: why choose?”.
As was the case pre-pandemic, we are also organizing the Guix Days as a FOSDEM fringe event, a two-day Guix workshop where contributors and enthusiasts will meet. The workshop takes place on Thursday Feb. 2nd and Friday Feb. 3rd at the Institute of Cultural Affairs (ICAB) in Brussels.
Again this year there will be few talks; instead, the event will consist primarily of “unconference-style” sessions focused on specific hot topics about Guix, the Shepherd, continuous integration, and related tools and workflows.
Attendance to the workshop is free and open to everyone, though you are invited to register (there are few seats left!). Check out the workshop’s wiki page for registration and practical info. Hope to see you in Brussels!
About GNU GuixGNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64, and POWER9 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.
parallel @ Savannah: GNU Parallel 20230122 ('Bolsonaristas') released [stable]
GNU Parallel 20230122 ('Bolsanaristas') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
Colorful output
parallel, with --color flag
tasks more vibrant now
-- ChatGPT
New in this release:
- Bug fixes and man page updates.
News about GNU Parallel:
- The Best Ethical Hacking Tools of 2023 (and their basic usage) https://www.purevpn.com/blog/the-best-hacking-tools-of-2023/#11_GNU_Parallel
- GNU Parallel: criando atividades em paralelo com shell script https://www.vivaolinux.com.br/artigo/GNU-Parallel-criando-atividades-em-paralelo-com-shell-script/
GNU Parallel - For people who live life in the parallel lane.
If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
For example you can run this to convert all jpeg files into png and gif files and have a progress bar:
parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif
Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:
find . -name '*.jpg' |
parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.
If you like GNU Parallel:
- Give a demo at your local user group/team/colleagues
- Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
- Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
- Request or write a review for your favourite blog or magazine
- Request or build a package for your favourite distribution (if it is not already there)
- Invite me for your next conference
If you use programs that use GNU Parallel for research:
- Please cite GNU Parallel in you publications (use --citation)
If GNU Parallel saves you money:
- (Have your company) donate to FSF https://my.fsf.org/donate/
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.
Simon Josefsson: Understanding Trisquel
Ever wondered how Trisquel and Ubuntu differs and what’s behind the curtain from a developer perspective? I have. Sharing what I’ve learnt will allow you to increase knowledge and trust in Trisquel too.
Trisquel GNU/Linux logoThe scripts to convert an Ubuntu archive into a Trisquel archive are available in the ubuntu-purge repository. The easy to read purge-focal script lists the packages to remove from Ubuntu 20.04 Focal when it is imported into Trisquel 10.0 Nabia. The purge-jammy script provides the same for Ubuntu 22.04 Jammy and (the not yet released) Trisquel 11.0 Aramo. The list of packages is interesting, and by researching the reasons for each exclusion you can learn a lot about different attitudes towards free software and understand the desire to improve matters. I wish there were a wiki-page that for each removed package summarized relevant links to earlier discussions. At the end of the script there is a bunch of packages that are removed for branding purposes that are less interesting to review.
Trisquel adds a couple of Trisquel-specific packages. The source code for these packages are in the trisquel-packages repository, with sub-directories for each release: see 10.0/ for Nabia and 11.0/ for Aramo. These packages appears to be mostly for branding purposes.
Trisquel modify a set of packages, and here is starts to get interesting. Probably the most important package to modify is to use GNU Linux-libre instead of Linux as the kernel. The scripts to modify packages are in the package-helpers repository. The relevant scripts are in the helpers/ sub-directory. There is a branch for each Trisquel release, see helpers/ for Nabia and helpers/ for Aramo. To see how Linux is replaced with Linux-libre you can read the make-linux script.
This covers the basic of approaching Trisquel from a developers perspective. As a user, I have identified some areas that need more work to improve trust in Trisquel:
- Auditing the Trisquel archive to confirm that the intended changes covered above are the only changes that are published.
- Rebuild all packages that were added or modified by Trisquel and publish diffoscope output comparing them to what’s in the Trisquel archive. The goal would be to have reproducible builds of all Trisquel-related packages.
- Publish an audit log of the Trisquel archive to allow auditing of what packages are published. This boils down to trust of the OpenPGP key used to sign the Trisquel archive.
- Trisquel archive mirror auditing to confirm that they are publishing only what comes from the official archive, and that they do so timely.
I hope to publish more about my work into these areas. Hopefully this will inspire similar efforts in related distributions like PureOS and the upstream distributions Ubuntu and Debian.
Happy hacking!
FSF Blogs: Associate members are invited: Nominate new candidates to the FSF board
FSF News: FSF now accepting board nominations from associate members
FSF Events: Free Software Directory meeting on IRC: Friday, January 27, starting at 12:00 EST (17:00 UTC)
FSF Events: Free Software Directory meeting on IRC: Friday, January 20, starting at 12:00 EST (17:00 UTC)
diffutils @ Savannah: diffutils-3.9 released [stable]
This is to announce diffutils-3.9, a stable release.
There have been 51 commits by 3 people in the 76 weeks since 3.8.
See the NEWS below for a brief summary.
Thanks to everyone who has contributed!
The following people contributed changes to this release:
Bruno Haible (1)
Jim Meyering (14)
Paul Eggert (36)
Jim [on behalf of the diffutils maintainers]
==================================================================
Here is the GNU diffutils home page:
http://gnu.org/s/diffutils/
For a summary of changes and contributors, see:
http://git.sv.gnu.org/gitweb/?p=diffutils.git;a=shortlog;h=v3.9
or run this command from a git-cloned diffutils directory:
git shortlog v3.8..v3.9
To summarize the 931 gnulib-related changes, run these commands
from a git-cloned diffutils directory:
git checkout v3.9
git submodule summary v3.8
Here are the compressed sources and a GPG detached signature:
https://ftp.gnu.org/gnu/diffutils/diffutils-3.9.tar.xz
https://ftp.gnu.org/gnu/diffutils/diffutils-3.9.tar.xz.sig
Use a mirror for higher download bandwidth:
https://ftpmirror.gnu.org/diffutils/diffutils-3.9.tar.xz
https://ftpmirror.gnu.org/diffutils/diffutils-3.9.tar.xz.sig
Here are the SHA1 and SHA256 checksums:
35905d7c3d1ce116e6794be7fe894cd25b2ded74 diffutils-3.9.tar.xz
2A076QogGGjeg9eNrTQTrYgWDMU7zDbrnq98INvwI/E diffutils-3.9.tar.xz
The SHA256 checksum is base64 encoded, instead of the
hexadecimal encoding that most checksum tools default to.
Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact. First, be sure to download both the .sig file
and the corresponding tarball. Then, run a command like this:
gpg --verify diffutils-3.9.tar.xz.sig
The signature should match the fingerprint of the following key:
pub rsa4096/0x7FD9FCCB000BEEEE 2010-06-14 [SCEA]
Key fingerprint = 155D 3FC5 00C8 3448 6D1E EA67 7FD9 FCCB 000B EEEE
uid [ unknown] Jim Meyering <jim@meyering.net>
uid [ unknown] Jim Meyering <meyering@fb.com>
uid [ unknown] Jim Meyering <meyering@gnu.org>
If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.
gpg --locate-external-key jim@meyering.net
gpg --recv-keys 7FD9FCCB000BEEEE
wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=diffutils&download=1' | gpg --import -
As a last resort to find the key, you can try the official GNU
keyring:
wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
gpg --keyring gnu-keyring.gpg --verify diffutils-3.9.tar.xz.sig
This release was bootstrapped with the following tools:
Autoconf 2.72a.65-d081
Automake 1.16i
Gnulib v0.1-5689-g83adc2f722
==================================================================
NEWS
* Noteworthy changes in release 3.9 (2023-01-15) [stable]
** Bug fixes
diff -c and -u no longer output incorrect timezones in headers
on platforms like Solaris where struct tm lacks tm_gmtoff.
[bug#51228 introduced in 3.4]
GNU Health: Jérôme Lejeune Foundation adopts GNU Health
We start 2023 with exciting news for the medical and scientific community!
GNU Health has been adopted by he Jérôme Lejeune foundation, a leading organization in the research and management of trisomy 21 (Down Syndrome) and other intellectual disabilities of genetic origin.
Lejeune foundation has its headquarters in France, with offices in Argentina, the United States and Spain.
On December 2022, the faculty of engineering from the University of Entre Rios, represented by the dean Diego Campana and the head of the school of Public Health, Fernando Sassetti, formalized the agreement with the president of the Lejeune foundation in Argentina, Luz Morano.
The same month, I met in Madrid with the medical director and IT team of the Lejeune foundation Spain.
Luz Morano declared “[GNU Health] goes beyond the Foundation, providing the health professionals the specific features to manage a patient with trisomy 21. We are putting a project in the hands of humanity“
[GNU Health] goes beyond the Foundation, providing the health professionals the specific features to manage a patient with trisomy 21. We are putting a project in the hands of humanity
Luz Morano, President of Lejeune Foundation, ArgentinaMorano also stated: “GNU Health will pave the road for the medical management, and let us focus on our two other missions: Research and the defense of patient rights“
The agreement is in the context of the GNU Health Alliance of Academic and Research Institutions that UNER has with GNU Solidario. In this sense, Fernando Sassetti explained “It provides tools for an integrative approach of those people with certain pathologies that due to the reduced number are not managed in the best way. This will benefit the organizations and health professionals, that today lack the means to do so in the best way and timely manner. It benefits the patients, in their right to have an integral health record.”
Research and Open ScienceThe adoption of GNUHealth by the Jérôme Lejeune Foundation opens new exciting avenues for the scientific community. In addition to the clinical management and medical history, GNU Health will enable scientists to dive into the fields of genomics, epigenetics and exposomics, gathering and processing information from multiple contexts and subjects, thanks to the distributed nature of the GNU Health Federation.
The GNU Health HMIS counts many packages and features, some of them of special interest for this project. In addition to the specific customizations for the foundation, the packages already present in GNUHealth, such as obstetrics, pediatrics, genomics, socioeconomics or lifestyle will provide a holistic approach to the person with trisomy 21 and other related conditions.
All of this will be done using exclusively Free/Libre software and open science.
People before PatientsTrisomy 21 poses challenges for the individual, their family, health professionals and the society. The scientific community needs to push the research to shed light on the etiology, physiopathology and associated clinical manifestations, such as heart defects, blood disorders or Alzheimer’s.
Most importantly, as part of the scientific community, we must put a stop to the discrimination and stigmatization. We must tear down the barriers and walls built on our societies that prevent the inclusion of individuals with trisomy 21.
As part of this effort, GNU Health provides the WHO International Classification on Functioning, disability and health (ICF). In other words, is not just the health condition or disorder we may have, but how the environmental factors and barriers influence the normal functioning and integration as individuals in the society. Many times, those physical, artificial barriers present in our daily lives are way more pernicious than the condition itself.
The strong focus of GNU Health in Social Medicine, and the way we perceive medicine as a social science will help improving the life of the person living with trisomy 21, and contribute to the much needed healing process in our societies. We need to work on the molecular basis of the health conditions, but little can be done if without empathetic, inclusive and supportive societies so people can live and enjoy life with dignity, no matter their health or socioeconomic status.
Projects like this represent the spirit of GNU Health and make me immensely proud to be part of this community.
Happy and healthy hacking!
Luis Falcon, MD
President, GNU Solidario
Links:
- Convenio con la Fundación Jérôme Lejeune para implementación del sistema de software libre GNU Health – UNER: http://ingenieria.uner.edu.ar/boletin/index.php/noticias/956-convenio-con-la-fundacion-jerome-lejeune-para-implementacion-de-gnu-health
- Fundación Lejeune Argentina : https://fundacionlejeune.org/
- (French/English/Spanish) Fondation Jérôme Lejeune: https://www.fondationlejeune.org
- GNU Health : https://www.gnuhealth.org
- GNU Solidario: https://www.gnusolidario.org