FLOSS Project Planets

Michael Prokop: A Ceph war story

Planet Debian - Fri, 2021-04-09 08:39

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them.

At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let’s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it’s worth sharing those with others. But first things first, let’s step back and clarify what we had to deal with.

The system and its upgrade

One part of the upgrade included 3 Debian servers (we’re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster.

First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration.

During the first two servers’ reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn’t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors’ clocks aren’t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:

% sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 65.44138 root default -2 21.81310 host server1 0 hdd 1.08989 osd.0 down 1.00000 1.00000 1 hdd 1.08989 osd.1 down 1.00000 1.00000 2 hdd 1.63539 osd.2 down 1.00000 1.00000 3 hdd 1.63539 osd.3 down 1.00000 1.00000 4 hdd 1.63539 osd.4 down 1.00000 1.00000 5 hdd 1.63539 osd.5 down 1.00000 1.00000 18 hdd 2.18279 osd.18 down 1.00000 1.00000 20 hdd 2.18179 osd.20 down 1.00000 1.00000 28 hdd 2.18179 osd.28 down 1.00000 1.00000 29 hdd 2.18179 osd.29 down 1.00000 1.00000 30 hdd 2.18179 osd.30 down 1.00000 1.00000 31 hdd 2.18179 osd.31 down 1.00000 1.00000 -4 21.81409 host server2 6 hdd 1.08989 osd.6 down 1.00000 1.00000 7 hdd 1.08989 osd.7 down 1.00000 1.00000 8 hdd 1.63539 osd.8 down 1.00000 1.00000 9 hdd 1.63539 osd.9 down 1.00000 1.00000 10 hdd 1.63539 osd.10 down 1.00000 1.00000 11 hdd 1.63539 osd.11 down 1.00000 1.00000 19 hdd 2.18179 osd.19 up 1.00000 1.00000 21 hdd 2.18279 osd.21 up 1.00000 1.00000 22 hdd 2.18279 osd.22 up 1.00000 1.00000 32 hdd 2.18179 osd.32 down 1.00000 1.00000 33 hdd 2.18179 osd.33 down 1.00000 1.00000 34 hdd 2.18179 osd.34 down 1.00000 1.00000 -3 21.81419 host server3 12 hdd 1.08989 osd.12 down 1.00000 1.00000 13 hdd 1.08989 osd.13 down 1.00000 1.00000 14 hdd 1.63539 osd.14 down 1.00000 1.00000 15 hdd 1.63539 osd.15 down 1.00000 1.00000 16 hdd 1.63539 osd.16 down 1.00000 1.00000 17 hdd 1.63539 osd.17 down 1.00000 1.00000 23 hdd 2.18190 osd.23 down 1.00000 1.00000 24 hdd 2.18279 osd.24 down 1.00000 1.00000 25 hdd 2.18279 osd.25 down 1.00000 1.00000 35 hdd 2.18179 osd.35 down 1.00000 1.00000 36 hdd 2.18179 osd.36 down 1.00000 1.00000 37 hdd 2.18179 osd.37 down 1.00000 1.00000

Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back?

We stumbled upon this beauty in our logs:

kernel: [ 73.697957] XFS (sdl1): SB stripe unit sanity check failed kernel: [ 73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff kernel: [ 73.698799] XFS (sdl1): Unmount and run xfs_repair kernel: [ 73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer: kernel: [ 73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. kernel: [ 73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: [ 73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8 bD+.."@..=..e... kernel: [ 73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... kernel: [ 73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning. ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5 ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32 kernel: [ 73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ kernel: [ 73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ kernel: [ 73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ kernel: [ 73.703373] XFS (sdl1): SB validate failed with error -117.

The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what’s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD.

Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:

synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 cache_node_purge: refcount was 1, not zero (node=0x1d3c400) SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000 releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number bad magic number Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.

Ouch. This very much looked related to the actual issue we’re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using ‘-d sunit=512 -d swidth=512‘ at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!).

Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ‘ceph --format json osd dump‘, like this for all our OSDs (Zsh syntax ftw!):

synpromika@server1 ~ % for f in {0..37} ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump | jq -r ".osds[] | select(.osd==$f) | .uuid")" osd-0: 4568c300-ad83-4288-963e-badcd99bf54f osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c [...]

Identifying the corresponding raw device for each OSD UUID is possible via:

synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f" synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"${UUID}" /dev/sdc1

The OSD’s key ID can be retrieved via:

synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph auth get osd."${OSD_ID}" -f json 2>/dev/null | jq -r '.[] | .key' AQCKFpZdm0We[...]

Now we also need to identify the underlying block device:

synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph osd metadata osd."${OSD_ID}" -f json | jq -r '.bluestore_bdev_partition_path' /dev/sdc2

With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD – hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure.

We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server.

Thanks to this, we managed to get the Ceph cluster up and running again. We didn’t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA!

Root Cause Analysis

So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:

synpromika@server1 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156 synpromika@server2 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.21.0-0112 Firmware Version = 4.680.00-8489 synpromika@server3 ~ % sudo storcli64 /c0 show all | grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156

This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make?

Now let’s check firmware changelogs. For the 24.21.0-0097 release we found this:

- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889) - xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)

Our XFS problem certainly was related to the controller’s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:

WARN server2.example.org Mount options of /var/lib/ceph/osd/ceph-21 WARN - Missing: sunit=1024, Exceeding: sunit=512

We compared the new OSD 21 with an existing one (OSD 25 on server3):

synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount | grep sunit Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount | grep sunit Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota

Thanks to our documentation, we could compare execution logs of their creation:

% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log -synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25 +synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21 [...] -command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1 -meta-data=/dev/sdj1 isize=2048 agcount=4, agsize=6272 blks [...] +command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 +meta-data=/dev/sdi1 isize=2048 agcount=4, agsize=6336 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 -data = bsize=4096 blocks=25088, imaxpct=25 - = sunit=128 swidth=64 blks +data = bsize=4096 blocks=25344, imaxpct=25 + = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [...]

So back then, we even tried to track this down but couldn’t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam’s razor, assuming the simplest explanation is usually the right one, so let’s check the disk properties and see what differs:

synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 524288 262144 synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 262144 262144

See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:

synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk 524288 synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size 524288 synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 262144 synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size 262144

It doesn’t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 – cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version?

The XFS behaviour change

Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left).

Let’s look at such a failing XFS partition with the Grml live system:

root@grml ~ # grml-version grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24] root@grml ~ # uname -a Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux root@grml ~ # grml-hostname grml-2020-06 Setting hostname to grml-2020-06: done root@grml ~ # exec zsh root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-========================================= ii util-linux 2.35.2-4 amd64 miscellaneous system utilities ii xfsprogs 5.6.0-1+b2 amd64 Utilities for managing the XFS filesystem

There it’s failing, no matter which mount option we try:

root@grml-2020-06 ~ # mount ./sdd1.dd /mnt mount: /mnt: mount(2) system call failed: Structure needs cleaning. root@grml-2020-06 ~ # dmesg | tail -30 [...] [ 64.788640] XFS (loop1): SB stripe unit sanity check failed [ 64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 64.788671] XFS (loop1): Unmount and run xfs_repair [ 64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 64.788679] XFS (loop1): SB validate failed with error -117. root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error. 32 root@grml-2020-06 ~ # dmesg | tail -1 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: mount(2) system call failed: Structure needs cleaning. 32 root@grml-2020-06 ~ # dmesg | tail -14 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) [ 80.751277] XFS (loop1): SB stripe unit sanity check failed [ 80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 80.751324] XFS (loop1): Unmount and run xfs_repair [ 80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 80.751338] XFS (loop1): SB validate failed with error -117.

Also xfs_repair doesn’t help either:

root@grml-2020-06 ~ # xfs_info ./sdd1.dd meta-data=./sdd1.dd isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now.

With the “SB stripe unit sanity check failed” message, we could easily track this down to the following commit fa4ca9c:

% git show fa4ca9c5574605d1e48b7e617705230a0640b6da | cat commit fa4ca9c5574605d1e48b7e617705230a0640b6da Author: Dave Chinner <dchinner@redhat.com> Date: Tue Jun 5 10:06:16 2018 -0700 xfs: catch bad stripe alignment configurations When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c index b5dca3c8c84d..c06b6fc92966 100644 --- fs/xfs/libxfs/xfs_sb.c +++ fs/xfs/libxfs/xfs_sb.c @@ -278,6 +278,22 @@ xfs_mount_validate_sb( return -EFSCORRUPTED; } + if (sbp->sb_unit) { + if (!xfs_sb_version_hasdalign(sbp) || + sbp->sb_unit > sbp->sb_width || + (sbp->sb_width % sbp->sb_unit) != 0) { + xfs_notice(mp, "SB stripe unit sanity check failed"); + return -EFSCORRUPTED; + } + } else if (xfs_sb_version_hasdalign(sbp)) { + xfs_notice(mp, "SB stripe alignment sanity check failed"); + return -EFSCORRUPTED; + } else if (sbp->sb_width) { + xfs_notice(mp, "SB stripe width sanity check failed"); + return -EFSCORRUPTED; + } + + if (xfs_sb_version_hascrc(&mp->m_sb) && sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) { xfs_notice(mp, "v5 SB sanity check failed");

This change is included in kernel versions 4.18-rc1 and newer:

% git describe --contains fa4ca9c5574605d1e48 v4.18-rc1~37^2~14

Now let’s try with an older kernel version (4.9.0), using old Grml 2017.05 release:

root@grml ~ # grml-version grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31] root@grml ~ # uname -a Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux root@grml ~ # lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 9.0 (stretch) Release: 9.0 Codename: stretch root@grml ~ # grml-hostname grml-2017-05 Setting hostname to grml-2017-05: done root@grml ~ # exec zsh root@grml-2017-05 ~ # root@grml-2017-05 ~ # xfs_info ./sdd1.dd xfs_info: ./sdd1.dd is not a mounted XFS filesystem 1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now. 1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt root@grml-2017-05 ~ # mount -t xfs /root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota) root@grml-2017-05 ~ # ls /mnt activate.monmap active block block_uuid bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready require_osd_release systemd type whoami root@grml-2017-05 ~ # xfs_info /mnt meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:

root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/ root@grml-2017-05 ~ # umount /mnt/

And indeed, mounting this rewritten filesystem then also works with newer kernels:

root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/ root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # mount -t xfs /root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)

FTR: The ‘sunit=512,swidth=512‘ from the xfs mount option is identical to xfs_info’s output ‘sunit=64,swidth=64‘ (because mount.xfs’s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so ‘sunit = 512*512 := 64*4096‘).

mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):

synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 262144 262144 /sys/block/sdd/queue/: 262144 262144 /sys/block/sde/queue/: 262144 262144 /sys/block/sdf/queue/: 262144 262144 /sys/block/sdg/queue/: 262144 262144 /sys/block/sdh/queue/: 262144 262144 /sys/block/sdi/queue/: 262144 262144 /sys/block/sdj/queue/: 262144 262144 /sys/block/sdk/queue/: 262144 262144 /sys/block/sdl/queue/: 262144 262144 /sys/block/sdm/queue/: 262144 262144 /sys/block/sdn/queue/: 262144 262144 [...] synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 524288 262144 /sys/block/sdd/queue/: 524288 262144 /sys/block/sde/queue/: 524288 262144 /sys/block/sdf/queue/: 524288 262144 /sys/block/sdg/queue/: 524288 262144 /sys/block/sdh/queue/: 524288 262144 /sys/block/sdi/queue/: 524288 262144 /sys/block/sdj/queue/: 524288 262144 /sys/block/sdk/queue/: 524288 262144 /sys/block/sdl/queue/: 524288 262144 /sys/block/sdm/queue/: 524288 262144 /sys/block/sdn/queue/: 524288 262144 [...]

This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings – they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones.

Make sure to also read the XFS FAQ regarding “How to calculate the correct sunit,swidth values for optimal performance”. We also stumbled upon two interesting reads in RedHat’s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947.

Am I affected? How to work around it?

To check whether your XFS mount points are affected by this issue, the following command line should be useful:

awk '$3 == "xfs"{print $2}' /proc/self/mounts | while read mount ; do echo -n "$mount " ; xfs_info $mount | awk '$0 ~ "swidth"{gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3}' | awk '{ if ($1 > $2) print "impacted"; else print "OK"}' ; done

If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise).

Lessons learned
  • document everything and ensure to have all relevant information available (including actual times of changes, used kernel/package/firmware/… versions. The thorough documentation was our most significant asset in this case, because we had all the data and information we needed during the emergency handling as well as for the post mortem/RCA)
  • if something changes unexpectedly, dig deeper
  • know who to ask, a network of experts pays off
  • including timestamps in your shell makes reconstruction easier (the more people and documentation involved, the harder it gets to wade through it)
  • keep an eye on changelogs/release notes
  • apply regular updates and don’t forget invisible layers (e.g. BIOS, controller/disk firmware, IPMI/OOB (ILO/RAC/IMM/…) firmware)
  • apply regular reboots, to avoid a possible delta becoming bigger (which makes debugging harder)

Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho.

Looking for help with your IT infrastructure? Let us know!

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #55: Getting Started With Refactoring Your Python Code

Planet Python - Fri, 2021-04-09 08:00

Do you think it's time to refactor your Python code? What should you think about before starting this task? This week on the show, we have Brendan Maginnis and Nick Thapen from Sourcery. Sourcery is an automated refactoring tool that integrates into your IDE and suggests improvements to your code.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python for Beginners: Convert a list containing float numbers to string in Python

Planet Python - Fri, 2021-04-09 07:58

There may be situations where we want to convert a list containing float numbers to string while working in python. In this article, we will look at different ways to convert the list containing float numbers to a string which contains the elements of the list as space separated sub strings.

Important functions for the task to convert a list containing float numbers to string

To convert a list of floating point numbers to string, we will need several string methods which we will discuss first and then perform the operations.

str() function

The str() function converts an integer literal,a float literal or any other given  input to a string literal and returns the string literal of the input after conversion. This can be done as follows.

float_num=10.0 str_num=str(float_num)

join() method

join() method is invoked on a separator and an iterable object of strings is passed to the method as input. It joins each string in the iterable object with the separator and returns a new string. 

The syntax for join() method is as separator.join(iterable) where separator can be any character and iterable can be a list or tuple etc. This can be understood as follows.

str_list=["I","am","Python","String"] print("List of strings is:") print(str_list) separator=" " str_output=separator.join(str_list) print("String output is:") print(str_output)

Output:

List of strings is: ['I', 'am', 'Python', 'String'] String output is: I am Python String

map() function

The map() function takes as input a function(functions are first class objects and can be passed as a parameter in python) and an iterable as argument and executes the function on each element of the iterable object and returns  the output map object which can be converted into any iterable.

The syntax for map() function is map(function,iterable). This can be understood as follows.

float_list=[10.0,11.2,11.7,12.1] print("List of floating point numbers is:") print(float_list) str_list=list(map(str,float_list)) print("List of String numbers is:") print(str_list)

Output:

List of floating point numbers is: [10.0, 11.2, 11.7, 12.1] List of String numbers is: ['10.0', '11.2', '11.7', '12.1']

Now we will see how to convert a list containing floating point numbers to string using the above functions.

Convert a list containing float numbers to string using for loop

We can convert the list of float numbers to string by declaring an empty string and then performing string concatenation to add the elements of the list to the string as follows.

float_list=[10.0,11.2,11.7,12.1] print("List of floating point numbers is:") print(float_list) float_string="" for num in float_list: float_string=float_string+str(num)+" " print("Output String is:") print(float_string.rstrip())

Output:

List of floating point numbers is: [10.0, 11.2, 11.7, 12.1] Output String is: 10.0 11.2 11.7 12.1

 In the above program, the output string contains an extra space at the end which has to be removed using rstrip() method.To avoid this additional operation, We can use the join() method instead of performing concatenation operation by adding the strings to create the output string as follows.

float_list=[10.0,11.2,11.7,12.1] print("List of floating point numbers is:") print(float_list) float_string="" for num in float_list: float_string=" ".join([float_string,str(num)]) print("Output String is:") print(float_string.lstrip())

Output

List of floating point numbers is: [10.0, 11.2, 11.7, 12.1] Output String is: 10.0 11.2 11.7 12.1

In the above method, an extra space is added at left of the output string which has to be removed using lstrip() method. To avoid this,Instead of applying str() function on every element of the list, we can use map() function  to convert the list of float numbers to a list of strings and then perform the string concatenation using join() method to get the output string as follows.

float_list=[10.0,11.2,11.7,12.1] print("List of floating point numbers is:") print(float_list) str_list=list(map(str,float_list)) print("List of String numbers is:") print(str_list) float_string="" for num in float_list: float_string=" ".join(str_list) print("Output String is:") print(float_string)

Output:

List of floating point numbers is: [10.0, 11.2, 11.7, 12.1] List of String numbers is: ['10.0', '11.2', '11.7', '12.1'] Output String is: 10.0 11.2 11.7 12.1 Convert a list containing float numbers to string using list comprehension

Instead of for loop, we can use list comprehension to perform the conversion of a list of floating point numbers to string as follows. We can use the join() method with list comprehension as follows.

float_list=[10.0,11.2,11.7,12.1] print("List of floating point numbers is:") print(float_list) str_list=[str(i) for i in float_list] print("List of floating string numbers is:") print(str_list) float_string=" ".join(str_list) print("Output String is:") print(float_string)

Output:

List of floating point numbers is: [10.0, 11.2, 11.7, 12.1] List of floating string numbers is: ['10.0', '11.2', '11.7', '12.1'] Output String is: 10.0 11.2 11.7 12.1 Conclusion

In this article, we have seen how to convert a list containing floating point numbers into string using different string methods like str(), join() and for loop or list comprehension in python. Stay tuned for more informative articles.

The post Convert a list containing float numbers to string in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Advanced Comic Book Linking

Planet KDE - Fri, 2021-04-09 06:44

In my previous post about Peruse's support for ACBF Textareas, i talked about how the formatting system there allowed for all manner of niftiness with languages, and rotating text, and making it coloured, and styled, and all of that visual fanciness. In this one, i'm going to talk a bit more in depth about a quite powerful feature the Textareas have that i kind of glossed over in that one: Hyperlinks.

Yes, hyperlinks. Links, like the one you just clicked on above to work out what that last post actually said about textareas (thanks for reading! :) ). Except that these links are inside any of the things in an ACBF document which can hold paragraphs of text, being primarily Textareas, References, and Annotations. The reason this is a separate post (apart from the fact the other one had a different focus) is that ACBF recently gained the ability to add a resource target (that is to say, a href property) to the Jump element.

Linking to some URL, and protecting people

For a simple starter, the most usual suspects for hyperlinks work for this: Websites (http(s)://) and email links (mailto:), primarily. If you click on one of those in Peruse, don't worry, we're not going to send you anywhere without making sure you want to actually go there. On the web, you usually would be expected to trust whatever you're clicking on, but inside a book, you'd reasonably expect to be constrained, so before sending you anywhere that isn't inside the book, you get a cute little popup that shows you the address of the link you clicked on, and asks if you really do want to follow it.

Additionally, and very possibly more interesting when creating a book: Internal links work as well. If an object in the book has been given an ID (which is now also more widely possible), you can link to it by prepending it with a # and using that as your link target (which is similar to how you might link to a named anchor in a simple html document). In its simplest form, this allows you to jump from page to page (which also is what the Jump element was originally designed to do), but with this system, much more power is added, by being able to link to not only pages, but also individual frames, and the reference and binary objects.

Linking to a page or a frame will make Peruse navigate to that location in the book, as you would expect (the way Jumps previously would link directly to a page). The idea there is to allow you to create branching narratives and other interactions of that type, like those you would find in a Choose Your Own Adventure book. Of course, ACBF is not a scripting system, and Peruse doesn't have one either; this is "just" markup, so it's not like you can create a full blown game here, in the modern sense, but even then, engaging narratives do not require score keeping for their interest :D

Say you've got a lovely CC-BY banner, and want to show the license? Pop the text in a Reference, and Jump away!

Linking to a reference or a binary object will make Peruse display that object in a popup or a page. An author can use this to create a glossary of terms, for example, or character reference art, a location map, or a world encyclopedia built right into the book itself. As someone who does writing and world building in their spare time, this in itself is something that i'm hoping to expand on greatly (how does having a pile of world-building focused templates for references sound to you? Sounds kind of neat to me, and i'd really like to see that happen :) ).

Something else Peruse does is keep track of all these internal links. Or, rather, the libacbf code does - we built that library as a non-gui Qt style library to be consumable without being bound to Peruse, but more on that at some later date (Tier 1 ACBF KDE Framework anybody? ;) ). What that means is that if you are at a location in a book which is linked to by something else, Peruse knows about it.

Right now, that information isn't used, but we might imagine things like building two-way navigation in the book (say you click a link, and you want to go back to where you came from; well with this we can make sure that happens), or even building entire navigation maps for a book (technically this tracking system for the internal links constitutes a directional graph, where each node is a location in the book, and the edges are the links). How such a thing would be presented visually, i'm frankly unsure, but as a basic idea it feels like it could be fun!

If this sounds like something you'd like to work with, get in touch - either here, or anywhere else you find me - i'm leinir basically everywhere, and we've got our own little corner over on Matrix as well. In short, get in touch, we'd love to hear from you! :)

The word of the day is: Emergent. Because that's the kind of behaviour these bits of functionality promises

Categories: FLOSS Project Planets

Web Review, Week 2021-14

Planet KDE - Fri, 2021-04-09 06:06

Let’s go for my web review for the week 2021-14.

What are Insecure Direct Object References (IDOR)? | Hacker Noon

Tags: tech, security

A way to common mistake which can blow the security of your service

https://hackernoon.com/what-are-insecure-direct-object-references-idor-hz1j33e0


This blog is now hosted on a GPS/LTE modem

Tags: tech, pinephone, hacking

Now, this is a really cool hack on the PinePhone!

https://nns.ee/blog/2021/04/01/modem-blog.html


Adactio: Journal—The principle of most availability

Tags: tech, architecture, web, frontend, sms

I recognize myself quite a bit into that: boring but pervasive technologies is generally good for users.

https://adactio.com/journal/17987


What problems do people solve with strace?

Tags: tech, system, strace, backend

Excellent reminder of how awesome strace is. This is one tool every developer needs to know, it often saves the day when everything else fails.

https://jvns.ca/blog/2021/04/03/what-problems-do-people-solve-with-strace/


10 Things I Hate About PostgreSQL | by Rick Branson | Medium

Tags: tech, sql, postgresql

Because when things are presented a bit too rosy I get suspicious… it’s nice to have such counterpoints to realize PostgreSQL is not perfect which means there are scenarios where you might not want it.

https://rbranson.medium.com/10-things-i-hate-about-postgresql-20dbab8c2791


Down on Scrum

Tags: tech, scrum, agile

Excellent take from Ron Jeffries about the current state of the Scrum industry. So many certified Scrum Masters that it’s not fun anymore and likely useless… if not outright harmful. That’s in part why I always refused to be certified. This kind of schemes tend to lead to such abuses.

https://ronjeffries.com/articles/020-01ff/down-on-scrum/


What Movie Studios Refuse to Understand About Streaming | Electronic Frontier Foundation

Tags: streaming, movie, licensing, vendor-lockin

It exactly explains my beef toward the streaming platforms. It just creates the old movie studios cartel and push them to hoard intellectual property to have exclusive content. Things would probably have been less dire with a global licensing system.

https://www.eff.org/deeplinks/2021/04/what-movie-studios-refuse-understand-about-streaming


Screw it, I’ll host it myself • Marko Živanović

Tags: tech, backup, self-hosting

If you’re a bit technical this is completely doable. It’s somewhat similar to what I’m doing at home. Gave me a couple of ideas on what to improve too.

https://www.markozivanovic.com/screw-it-ill-host-it-myself/


All C++20 core language features with examples | Oleksandr Koval’s blog

Tags: tech, c++

A comprehensive catalog of C++20 language features.

https://oleksandrkvl.github.io/2021/04/02/cpp-20-overview.html


Embrace the Grind - Jacob Kaplan-Moss

Tags: tech, craftsmanship

Very good advice, there’s a lot in programming which is really just mundane and boring. That doesn’t make it easy but you might end up doing what everyone else tried to avoid.

https://jacobian.org/2021/apr/7/embrace-the-grind/


CPU algorithm trains deep neural nets up to 15 times faster than top GPU trainers

Tags: tech, ai, machine-learning

Interesting, could be a another breakthrough in training performance.

https://techxplore.com/news/2021-04-rice-intel-optimize-ai-commodity.html


Idempotence Now Prevents Pain Later by Eric Lathrop

Tags: tech, programming

Good reminder of why idempotence is a very important property.

https://ericlathrop.com/2021/04/idempotence-now-prevents-pain-later/


How to make an awesome Python package in 2021 | Anton Zhiyanov

Tags: tech, python

Interesting advices on how to package your python tools.

https://antonz.org/python-packaging/


Bye for now!

Categories: FLOSS Project Planets

Jacob Rockowitz: To Drupal or not to Drupal…The Webform module’s pickle of a sustainability problem

Planet Drupal - Fri, 2021-04-09 04:59

My pickle of a problem

Previously, I have discussed my current work situation, which revolves around the fact that my organization is moving away from Drupal. This situation has led me to ask the question to Drupal or not to Drupal. To date, I’m committed to helping them migrate from Drupal to Sitecore.

I’ve also decided to remain committed to the sustainability of the Webform module. The “sustainability” of the Webform module has different meanings to different people and organizations within the Drupal community. Individuals want to know that there are resources available to provide guidance, review patches, and commit code to a project. An organization using Drupal and the Webform module wants to know that the code is stable, maintained, and secure. For me, I’d like to see compensation for my time in return for my continued commitment to the sustainability of the Webform module.

Being compensated to contribute and maintain code within an open source project can take on many forms. An organization could sponsor my work on the Webform module. At the same time, these opportunities are rare within our community. Frankly, assuming that a single organization could take on the responsibility of something like the Webform module might not be financially feasible.

Stepping back from my problem, there might be a more general solution for improving the Webform module’s sustainability. Still, sustainability is a pickle of a problem for an open source project like the Webform module.

The Webform module’s pickle of a problem

The pickle of a problem around the Webform module, Drupal, and Open Source is that people assume that everything is free. Open source code is free to use and distribute. The ecosystem built around the Webform module reinforces the notion that everything is free because we rarely put a “price tag” next to our...Read More

Categories: FLOSS Project Planets

CubicWeb: CubicWeb Monthly news february/march 2021

Planet Python - Fri, 2021-04-09 03:38

It has been quite a time since the last public news ; we will try in the letter to summarize what we did during february and march 2021. During this period, we tried to fix issues, migrate a lot of our code base to latest version of CubicWeb. We also did some archeology ; we have been looking at all the old heads in CubicWeb's repository, and we have tried to rebase them on the latest public head. A lot of merge requests have been created and are still under review.

CubicWeb 3.30 is out!

We released Cubicweb 3.30 on march 16th. There are no “big” changes in this release. A lot of issues have been fixed, the documentation have been improved − a new tuto is coming ! − and a few new features have been implemented.

Here is an extract of the changelog:

  • it is now possible to use smtp authentification to send an email (#221)
  • required variables can be read from the environment (#85)
  • one can specify scripts attributes when using the add_js function (#210)
  • GROUP_CONCAT was not returning the expected results when NULL values were encountered (#109)
  • it is now possible not to drop table indexes when using the MassiveStore (#219)

One can find the full changelog on Cubicweb release page here.

Adding python types to RQL

In the past months, python types have progressively been included in RQL. This is a step forward to python types in CubicWeb itself.

A lot of work has been done, and there is still a lot of work to do. Adding types to RQL help us to simplify the refactorings. As RQL's API is not clearly defined, we will run CubicWeb's tests on each merge request of RQL, to make sure nothing breaks.

Separate Back and Front

For some time now, the trend has been towards a clear separation between front and back. We are adding more features to use React in the front.

CwClientLibJS, a library allowing to use rql in the browser. CwClientLibJS is now in version 1.1.0 and is able to generate query on entity state (see !20).

react-admin-cubicweb, new tools to generate admin pages. react-admin-cubicweb is still at an early stage but can display entity attribute and relation. It also allows edition as well. The version 0.3.0 has been released !

Release-new

Release-new is one of our latest utility to release new versions of our cubes. Assuming that you are using conventional commits, you should really like this tool to release new version of your cubes.

Release-new takes care to:

  • update the version of the cube in __pkginfo__.py
  • update the debian/control if there is one
  • update the changelog
  • create a new commit with theses changes
  • tag the commit

By default, it will try to guess if it is a major, minor or patch release (you can specify the release type if need be). The changelog will be updated and you will be able to edit it before the commit is done. The update of the changelog will be eased if you use conventionnal commit as your commit history will be dispatched in the appropriate sections. For instance, the last changelog of CubicWeb has been produced by this tool !

Docker images

During our last hackathon, a team worked on our CubicWeb Docker images. The code to generate them has been rewritten to be simpler. During this refactoring, we took care to generate docker images for minor versions of CubicWeb as well. Therefore, on https://hub.docker.com/r/logilab/cubicweb/tags you can find docker images for major and minor versions of CubicWeb with different versions of python, to suit your needs the best we can :)

See you next month!

Categories: FLOSS Project Planets

Janusworx: I Can’t Do This … Yet.

Planet Python - Thu, 2021-04-08 23:20

I’ve been “soft” looking for a job, since the end of last year when I learnt the basics of Python.
(Want me to come work you as a junior developer? Here’s my resume!)

Read more… (2 min remaining to read)

Categories: FLOSS Project Planets

hussainweb.me: Why you shouldn’t decouple Drupal and why you should

Planet Drupal - Thu, 2021-04-08 23:11
Today's post is going to be a quick one; not because it is an easy topic but because a lot has been said about it already. Today, I want to share my thoughts on decoupling Drupal; thoughts that are mainly a mix of borrowed thoughts from several places. I will try to link where I can but I can't separate my thoughts from borrowed ones now. Anyway, by the end of the post, you might have read a lot of things you already knew and hopefully, one or two new things about Decoupling Drupal.
Categories: FLOSS Project Planets

Sean Whitton: consfigurator-live-build

Planet Debian - Thu, 2021-04-08 19:35

One of my goals for Consfigurator is to make it capable of installing Debian to my laptop, so that I can stop booting to GRML and manually partitioning and debootstrapping a basic system, only to then turn to configuration management to set everything else up. My configuration management should be able to handle the partitioning and debootstrapping, too.

The first stage was to make Consfigurator capable of debootstrapping a basic system, chrooting into it, and applying other arbitrary configuration, such as installing packages. That’s been in place for some weeks now. It’s sophisticated enough to avoid starting up newly installed services, but I still need to add some bind mounting.

Another significant piece is teaching Consfigurator how to partition block devices. That’s quite tricky to do in a sufficiently general way – I want to cleanly support various combinations of LUKS, LVM and regular partitions, including populating /etc/crypttab and /etc/fstab. I have some ideas about how to do it, but it’ll probably take a few tries to get the abstractions right.

Let’s imagine that code is all in place, such that Consfigurator can be pointed at a block device and it will install a bootable Debian system to it. Then to install Debian to my laptop I’d just need to take my laptop’s disk drive out and plug it into another system, and run Consfigurator on that system, as root, pointed at the block device representing my laptop’s disk drive. For virtual machines, it would be easy to write code which loop-mounts an empty disk image, and then Consfigurator could be pointed at the loop-mounted block device, thereby making the disk image file bootable.

This is adequate for virtual machines, or small single-board computers with tiny storage devices (not that I actually use any of those, but I want Consfigurator to be able to make disk images for them!). But it’s not much good for my laptop. I casually referred to taking out my laptop’s disk drive and connecting it to another computer, but this would void my laptop’s warranty. And Consfigurator would not be able to update my laptop’s NVRAM, as is needed on UEFI systems.

What’s wanted here is a live system which can run Consfigurator directly on the laptop, pointed at the block device representing its physical disk drive. Ideally this live system comes with a chroot with the root filesystem for the new Debian install already built, so that network access is not required, and all Consfigurator has to do is partition the drive and copy in the contents of the chroot. The live system could be set up to automatically start doing that upon boot, but another option is to just make Consfigurator itself available to be used interactively. The user boots the live system, starts up Emacs, starts up Lisp, and executes a Consfigurator deployment, supplying the block device representing the laptop’s disk drive as an argument to the deployment. Consfigurator goes off and partitions that drive, copies in the contents of the chroot, and executes grub-install to make the laptop bootable. This is also much easier to debug than a live system which tries to start partitioning upon boot. It would look something like this:

;; melete.silentflame.com is a Consfigurator host object representing the ;; laptop, including information about the partitions it should have (deploy-these :local ... (chroot:partitioned-and-installed melete.silentflame.com "/srv/chroot/melete" "/dev/nvme0n1"))

Now, building live systems is a fair bit more involved than installing Debian to a disk drive and making it bootable, it turns out. While I want Consfigurator to be able to completely replace the Debian Installer, I decided that it is not worth trying to reimplement the relevant parts of the Debian Live tool suite, because I do not need to make arbitrary customisations to any live systems. I just need to have some packages installed and some files in place. Nevertheless, it is worth teaching Consfigurator how to invoke Debian Live, so that the customisation of the chroot which isn’t just a matter of passing options to lb_config(1) can be done with Consfigurator. This is what I’ve ended up with – in Consfigurator’s source code:

(defpropspec image-built :lisp (config dir properties) "Build an image under DIR using live-build(7), where the resulting live system has PROPERTIES, which should contain, at a minimum, a property from CONSFIGURATOR.PROPERTY.OS setting the Debian suite and architecture. CONFIG is a list of arguments to pass to lb_config(1), not including the '-a' and '-d' options, which Consfigurator will supply based on PROPERTIES. This property runs the lb_config(1), lb_bootstrap(1), lb_chroot(1) and lb_binary(1) commands to build or rebuild the image. Rebuilding occurs only when changes to CONFIG or PROPERTIES mean that the image is potentially out-of-date; e.g. if you just add some new items to PROPERTIES then in most cases only lb_chroot(1) and lb_binary(1) will be re-run. Note that lb_chroot(1) and lb_binary(1) both run after applying PROPERTIES, and might undo some of their effects. For example, to configure /etc/apt/sources.list, you will need to use CONFIG not PROPERTIES." (:desc (declare (ignore config properties)) #?"Debian Live image built in ${dir}") (let* (...) ;; ... `(eseqprops ;; ... (on-change (eseqprops (on-change (file:has-content ,auto/config ,(auto/config config) :mode #o755) (file:does-not-exist ,@clean) (%lbconfig ,dir) (%lbbootstrap t ,dir)) (%lbbootstrap nil ,dir) (deploys ((:chroot :into ,chroot)) ,host)) (%lbchroot ,dir) (%lbbinary ,dir)))))

Here, %lbconfig is a property running lb_config(1), %lbbootstrap one which runs lb_bootstrap(1), etc. Those properties all just change directory to the right place and run the command, essentially, with a little extra code to handle failed debootstraps and the like.

The ON-CHANGE and ESEQPROPS combinators work together to sequence the interaction of the Debian Live suite and Consfigurator.

  • In the innermost ON-CHANGE expression: create the file auto/config and populate it with the call to lb_config(1) that we need to make, as described in the Debian Live manual, chapter 6.

    • If doing so resulted in a change to the auto/config file – e.g. the user added some more options – ensure that lb_config(1) and lb_bootstrap(1) both get rerun.
  • Now in the inner ESEQPROPS expression, use DEPLOYS to configure the chroot, essentially by forking into the chroot and recursively reinvoking Consfigurator.

  • Finally, if any of the above resulted in a change being made, call lb_chroot(1) and lb_binary(1).

This way, we only rebuild the chroot if the configuration changed, and we only rebuild the image if the chroot changed.

Now over in my personal consfig:

(try-register-data-source :git-snapshot :name "consfig" :repo #P"src/cl/consfig/" ...) (defproplist hybrid-live-iso-built :lisp () "Build a Debian Live system in /srv/live/spw. Typically this property is not applied in a DEFHOST form, but rather run as needed at the REPL. The reason for this is that otherwise the whole image will get rebuilt each time a commit is made to my dotfiles repo or to my consfig." (:desc "Sean's Debian Live system image built") (live-build:image-built. '("--archive-areas" "main contrib non-free" ...) "/srv/live/spw" (os:debian-stable "buster" :amd64) (basic-props) (apt:installed "whatever" "you" "want") (git:snapshot-extracted "/etc/skel/src" "dotfiles") (file:is-copy-of "/etc/skel/.bashrc" "/etc/skel/src/dotfiles/.bashrc") (git:snapshot-extracted "/root/src/cl" "consfig")))

The first argument to LIVE-BUILD:IMAGE-BUILT. is additional arguments to lb_config(1). The third argument onwards are the properties for the live system. The cool thing is GIT:SNAPSHOT-EXTRACTED – the calls to this ensure that a copy of my Emacs configuration and my consfig end up in the live image, ready to be used interactively to install Debian, as described above. I’ll need to add something like (chroot:host-chroot-bootstrapped melete.silentflame.com "/srv/chroot/melete") too.

As with everything Consfigurator-related, Joey Hess’s Propellor is the giant upon whose shoulders I’m standing.

Categories: FLOSS Project Planets

SoK Final Status Report

Planet KDE - Thu, 2021-04-08 15:30
For the Season of KDE 2021, I decided to work on Okular’s website. Okular is a multifaceted program that I use almost every day for my PDF reading and annotating needs, although it can do much more. Sadly, its website was a bit outdated and not mobile friendly. I thus proposed to rewrite the Okular website using the HUGO framework, in a similar way as was done with the kde.org main website, and keeping consistency with other KDE applications such as Konsole.
Categories: FLOSS Project Planets

DrupalCon News: Designed to fit your day: DrupalCon North America 2021

Planet Drupal - Thu, 2021-04-08 15:11

Learn, connect, and build at DrupalCon and make it work with your schedule.

Categories: FLOSS Project Planets

Andy Wingo: sign of the times

GNU Planet! - Thu, 2021-04-08 15:09

Hello all! There is a mounting backlog of things that landed in Guile recently and to avoid having to eat the whole plate in one bite, I'm going to try to send some shorter missives over the next weeks.

Today's is about a silly thing, dynamic-link. This interface is dlopen, but "portable". See, back in the day -- like, 1998 -- there were lots of kinds of systems and how to make and load a shared library portably was hard. You'd have people with AIX and Solaris and all kinds of weird compilers and linkers filing bugs on your project if you hard-coded a GNU toolchain invocation when creating loadable extensions, or hard-coded dlopen or similar to use them.

Libtool provided a solution to create portable loadable libraries, which involved installing .la files alongside the .so files. You could use libtool to link them to a library or an executable, or you could load them at run-time via the libtool-provided libltdl library.

But, the .la files were a second source of truth, and thus a source of bugs. If a .la file is present, so is an .so file, and you could always just use the .so file directly. For linking against an installed shared library on modern toolchains, the .la files are strictly redundant. Therefore, all GNU/Linux distributions just delete installed .la files -- Fedora, Debian, and even Guix do so.

Fast-forward to today: there has been a winnowing of platforms, and a widening of the GNU toolchain (in which I include LLVM as well as it has a mostly-compatible interface). The only remaining toolchain flavors are GNU and Windows, from the point of view of creating loadable shared libraries. Whether you use libtool or not to create shared libraries, the result can be loaded either way. And from the user side, dlopen is the universally supported interface, outside of Windows; even Mac OS fixed their implementation a few years back.

So in Guile we have been in an unstable equilibrium: creating shared libraries by including a probably-useless libtool into the toolchain, and loading them by using a probably-useless libtool-provided libltdl.

But the use of libltdl has not been without its costs. Because libltdl intends to abstract over different platforms, it encourages you to leave off the extension when loading a library, instead promising to try a platform-specific set such as .so, .dll, .dylib etc as appropriate. In practice the abstraction layer was under-maintained and we always had some problems on Mac OS, for example.

Worse, as ltdl would search through the path for candidates, it would only report the last error it saw from the underlying dlopen interface. It was almost always the case that if A and B were in the search path, and A/foo.so failed to load because of a missing dependency, the error you would get as a user would instead be "file not found", because ltdl swallowed the first error and kept trucking to try to load B/foo.so which didn't exist.

In summary, this is a case where the benefits of an abstraction layer decline over time. For a few years now, libltdl hasn't been paying for itself. Libtool is dead, for all intents and purposes (last release in 2015); best to make plans to migrate away, somehow.

In the case of the dlopen replacement, in Guile we ended up rewriting the functionality in Scheme. The underlying facility is now just plain dlopen, for which we shim a version of dlopen on Windows, inspired by the implementation in cygwin. There are still platform-specific library extensions, but that is handled easily on the Scheme layer.

Looking forward, I think it's probably time to replace Guile's use of libtool to create its libraries and executables. I loathe the fact that libtool puts shell scripts in the place of executables in build directories and stashes the actual executables elsewhere -- like, visceral revulsion. There is no need for that nowadays. Not sure what to replace it with, nor on what timeline.

And what about autotools? That, my friends, would be a whole nother blog post. Until then, & probably sooner, happy hacking!

Categories: FLOSS Project Planets

GNU Guix: Outreachy 'guix git log' internship wrap-up

GNU Planet! - Thu, 2021-04-08 13:00

Magali Lemes joined Guix in December for a three-month internship with Outreachy. Magali implemented a guix git log command to browse the history of packaging changes, with mentoring from Simon Tournier and Gábor Boskovits. In this blog post, Magali and Simon wrap up on what's been accomplished.

Magali

The first tasks I had to do were pretty simple and were mainly meant to both get me acquainted with the source code and set the building blocks of the project. They were very important so that I could gradually build the subcommand. For starters, I created a Guix repository on Gitlab, so that I could push all the work I had done there, tweaked a little bit of the source code, and then proceeded to have the guix git log subcommand show the default channel checkout path.

From there on, I started adding options to the subcommand. The oneline option was the first and simplest option, and it pretty much emulates what git log --oneline does: it displays the commit short hash id and title. Afterwards, other options such as format, pretty, and grep came along. The possibility of retrieving commits from other channels---and not only from the default one---was also implemented. An example of invoking the subcommand would be:

guix git log --oneline --grep=guile-git

The road to doing all of this wasn't always smooth. Right at the beginning of the internship, for instance, I struggled with getting a segmentation fault error. It was a known bug, and I was able to overcome it.

I also got to participate in the one-day Guix workshop---a FOSDEM 2021 fringe event---and presented the work I had done so far. It was quite nice demonstrating the subcommand, receiving feedback and questions, and I could also get to know other things that were being worked on in Guix.

In the post I wrote three months ago, I mention that I wish I could gain meaningful experience and improve my communication skills. I'm glad to say that I feel like I was able to achieve that. Sending emails, explaining what I had done, and asking questions about what I had to do during the weekly meetings were a few of the situations I had to face, and that made me improve these skills. I also had a taste of what it's like to take part in a free software project, got to know a few people, and learned quite a lot about Guile Scheme.

One thing, though, that I wasn't able to implement in due time was the commit limiting options, such as guix git log --after=YYYY-MM-DD and guix git log --before=YYYY-MM-DD.

Hopefully, soon users will be able to invoke guix git log, and have the commit history from all Guix channels they have.

Simon

It was my first experience mentoring for the Outreachy program and now I am glad I did it. I have learnt a lot on various topics. I had already mentored interns occasionally, though it was the first time fully remote, not on the same timezone, and with code on which I am not expert. Thanks Gábor, Ludovic and Ricardo for pushing me to jump in this journey.

Reading back the initial proposal coming from a 2019 question, I am happy by the insofar Magali's internship. It paves the way for future proposals or the implementation of other guix git subcommands.

Next Outreachy round & acknowledgment

Guix is participating in the upcoming Outreachy round. If you are eligible, please get in touch with us and consider applying by April 30th!

In light of recent changes at the Free Software Foundation (FSF) and Outreachy’s subsequent decision to refuse funds coming from the FSF, we are grateful to Software Freedom Conservancy (SFC) for their decision to sponsor our upcoming internship. We are working on a longer-term solution so we can keep participating in Outreachy. In the meantime, thanks a lot, Conservancy!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Categories: FLOSS Project Planets

Patrick Kennedy: What is Werkzeug?

Planet Python - Thu, 2021-04-08 11:51

I wrote a blog post on TestDriven.io explaining what Werkzeug is and how Flask uses it for its core HTTP functionality. In this blog post, you’ll develop your own WSGI-compatible application using Werkzeug to create a Flask-like web framework:

https://testdriven.io/blog/what-is-werkzeug/

This blog post looks at explores the following functionality provided by Werkzeug (which Flask uses):

  1. Request processing
  2. Response handling
  3. URL routing
  4. Middleware
  5. HTTP utilities
  6. Exception handling
  7. Development server (with hot reloading)

To help explain this functionality, a web application is built using Werkzeug:

https://gitlab.com/patkennedy79/werkzeug_movie_app

Categories: FLOSS Project Planets

Qt for MCUs 1.8 released

Planet KDE - Thu, 2021-04-08 10:12

A new feature update of Qt for MCUs is now available. Download version 1.8 to get access to additional MCU platforms, more options to limit the memory footprint, new APIs to create advanced and scalable user interfaces, and a solution to simplify the integration of Qt into existing projects.

Categories: FLOSS Project Planets

Janusworx: A Love Letter to Books

Planet Python - Thu, 2021-04-08 09:08

Despite financial troubles there’s a sense in which my childhood was immensely privileged — a pauper in the material world, I was a sultan in the world of ideas.

Erik Hoel

I started by wanting to share that quote and link on my microblog and then my thoughts turned into a blog post sized comment.
I decided to post it here too, then.

Read more… (2 min remaining to read)

Categories: FLOSS Project Planets

Mike Driscoll: Python 101 is Pay What You Want for 24 Hours

Planet Python - Thu, 2021-04-08 08:39
For the next 24 hours, Python 101 will be Pay What You Want, $3 minimum (usually $25).

The second edition of Python 101 is completely rewritten from the ground up. In this book, you will learn the Python programming language and lots more.

This book is split up into four sections:

  1. The Python Language
  2. Intermediate Topics
  3. Creating Sample Applications
  4. Distributing Your Code
 

The post Python 101 is Pay What You Want for 24 Hours appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

1xINTERNET blog: 1xINTERNET at DrupalCon NA 2021

Planet Drupal - Thu, 2021-04-08 08:00
1xINTERNET is participating in DrupalCon NA this year as Hallway Track sponsor, as well as hosting few sessions. Read this blog to get information about the talks our team is doing during this online DrupalCon.
Categories: FLOSS Project Planets

Thorsten Alteholz: My Debian Activities in March 2021

Planet Debian - Thu, 2021-04-08 06:11

FTP master

Things never turn out the way you expect, so this month I was only able to accept 38 packages and rejected none. Due to the freeze, the overall number of packages that got accepted was 88.

Debian LTS

This was my eighty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

This month my all in all workload has been 30h. During that time I did LTS and normal security uploads of:

  • [DLA 2606-1] lxml security update for one CVE
  • [DSA 4880-1] lxml security update for one CVE
  • [DLA 2611-1] ldb security update for two CVEs
  • [DLA 2612-1] leptonlib security update for four CVEs

I also prepared debdiffs for unstable and/or buster for leptonlib and libebml, which for one reason or another did not result in an upload yet.

Last but not least I did some days of frontdesk duties.

Debian ELTS

This month was the thirty-third ELTS month.

During my allocated time I uploaded:

  • ELA-388-1 for zeromq3
  • ELA-390-1 for lxml
  • ELA-391-1 for jasper
  • ELA-393-1 for ldb
  • ELA-394-1 for leptonlib

Last but not least I did some days of frontdesk duties.

Other stuff

On my neverending golang challenge I uploaded (or sponsored for thola dependencies):
golang-github-tombuildsstuff-giovanni, golang-github-apparentlymart-go-userdirs, golang-github-apparentlymart-go-shquot, golang-github-likexian-gokit, olang-gopkg-mail.v2, golang-gopkg-redis.v5, golang-github-facette-natsort, golang-github-opentracing-contrib-go-grpc, golang-github-felixge-fgprof, golang-ithub-gogo-status, golang-github-leanovate-gopter, golang-github-opentracing-basictracer-go, golang-github-lightstep-lightstep-tracer-common, golang-github-o-sourcemap-sourcemap, golang-github-igm-pubsub, golang-github-igm-sockjs-go, golang-github-centrifugal-protocol, golang-github-mna-redisc, golang-github-fzambia-eagle, golang-github-centrifugal-centrifuge, golang-github-chromedp-sysutil, golang-github-client9-misspell, golang-github-knq-snaker, cdproto-gen, golang-github-mattermost-xml-roundtrip-validator, golang-github-crewjam-saml, ssllabs-scan, golang-uber-automaxprocs, golang-uber-goleak, golang-github-k0kubun-go-ansi, golang-github-schollz-progressbar, golang-github-komkom-toml, golang-github-labstack-echo, golang-github-inexio-go-monitoringplugin

Categories: FLOSS Project Planets

Pages