Feeds

FSF Events: Free Software Directory meeting on IRC: Friday, December 20, starting at 12:00 EST (17:00 UTC)

GNU Planet! - Fri, 2024-12-13 10:37
Join the FSF and friends on Friday, December 13 from 12:00 to 15:00 EST (17:00 to 20:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Emanuele Rocca: Murder Mystery: GCC Builds Failing After sbuild Refactoring

Planet Debian - Fri, 2024-12-13 10:31

This is the story of an investigation conducted by Jochen Sprickerhof, Helmut Grohne, and myself. It was true teamwork, and we would have not reached the bottom of the issue working individually. We think you will find it as interesting and fun as we did, so here is a brief writeup. A few of the steps mentioned here took several days, others just a few minutes. What is described as a natural progression of events did not always look very obvious at the moment at all.

Let us go through the Six Stages of Debugging together.

Stage 1: That cannot happen

Official Debian GCC builds start failing on multiple architectures in late November.

The build error happens on the build servers when running the testuite, but we know this cannot happen. GCC builds are not meant to fail in case of testsuite failures! Return codes are not making the build fail, make is begin called with -k, it just cannot happen.

A lot of the GCC tests are always failing in fact, and an extensive log of the results is posted to the debian-gcc mailing list, but the packages always build fine regardless.

On the build daemons, build failures take several hours.

Stage 2: That does not happen on my machine

Building on my machine running Bookworm is just fine. The Build Daemons run Bookworm and use a Sid chroot for the build environment, just like I am. Same kernel.

Debian packages are built by a network of autobuilding machines using a program called sbuild. In my last blog post I mentioned the transition from the schroot backend to a new one based on unshare.

The only obvious difference between my setup and the Debian buildds is that I am using sbuild 0.85.0 from bookworm, and the buildds have 0.86.3~bpo12+1 from bookworm-backports. Trying again with 0.86.3~bpo12+1, the build fails on my system too. The build daemons were updated to the bookworm-backports version of sbuild at some point in late November. Ha.

Stage 3: That should not happen

There are quite a few sbuild versions in between 0.85.0 and 0.86.3~bpo12+1, but looking at recent sbuild bugs shows that sbuild 0.86.0 was breaking "quite a number of packages". Indeed, with 0.86.0 the build still fails. Trying the version immediately before, 0.85.11, the build finishes correctly. This took more time than it sounds, one run including the tests takes several hours. We need a way to shorten this somehow.

The Debian packaging of GCC allows to specify which languages you may want to skip, and by default it builds Ada, Go, C, C++, D, Fortran, Objective C, Objective C++, M2, and Rust. When running the tests sequentially, the build logs stop roughly around the tests of a runtime library for D, libphobos. So can we still reproduce the failure by skipping everything except for D? With DEB_BUILD_OPTIONS=nolang=ada,go,c,c++,fortran,objc,obj-c++,m2,rust the build still fails, and it fails faster than before. Several minutes, not hours. This is progress, and time to file a bug. The report contains massive spoilers, so no link. :-)

Stage 4: Why does that happen?

Something is causing the build to end prematurely. It’s not the OOM killer, and the kernel does not have anything useful to say in the logs. Can it be that the D language tests are sending signals to some process, and that is what’s killing make ? We start tracing signals sent with bpftrace by writing the following script, signals.bt:

tracepoint:signal:signal_generate { printf("%s PID %d (%s) sent signal %d to PID %d\n", comm, pid, args->sig, args->pid); }

And executing it with sudo bpftrace signals.bt.

The build takes its sweet time, and it fails. Looking at the trace output there’s a suspicious process.exe terminating stuff.

process.exe (PID: 2868133) sent signal 15 to PID 711826

That looks interesting, but we have no clue what PID 711826 may be. Let’s change the script a bit, and trace signals received as well.

tracepoint:signal:signal_generate { printf("PID %d (%s) sent signal %d to %d\n", pid, comm, args->sig, args->pid); } tracepoint:signal:signal_deliver { printf("PID %d (%s) received signal %d\n", pid, comm, args->sig); }

The working version of sbuild was using dumb-init, whereas the new one features a little init in perl. We patch the current version of sbuild by making it use dumb-init instead, and trace two builds: one with the perl init, one with dumb-init.

Here are the signals observed when building with dumb-init.

PID 3590011 (process.exe) sent signal 2 to 3590014 PID 3590014 (sleep) received signal 9 PID 3590011 (process.exe) sent signal 15 to 3590063 PID 3590063 (std.process tem) received signal 9 PID 3590011 (process.exe) sent signal 9 to 3590065 PID 3590065 (std.process tem) received signal 9

And this is what happens with the new init in perl:

PID 3589274 (process.exe) sent signal 2 to 3589291 PID 3589291 (sleep) received signal 9 PID 3589274 (process.exe) sent signal 15 to 3589338 PID 3589338 (std.process tem) received signal 9 PID 3589274 (process.exe) sent signal 9 to 3589340 PID 3589340 (std.process tem) received signal 9 PID 3589274 (process.exe) sent signal 15 to 3589341 PID 3589274 (process.exe) sent signal 15 to 3589323 PID 3589274 (process.exe) sent signal 15 to 3589320 PID 3589274 (process.exe) sent signal 15 to 3589274 PID 3589274 (process.exe) received signal 9 PID 3589341 (sleep) received signal 9 PID 3589273 (sbuild-usernsex) sent signal 9 to 3589320 PID 3589273 (sbuild-usernsex) sent signal 9 to 3589323

There are a few additional SIGTERM being sent when using the perl init, that’s helpful. At this point we are fairly convinced that process.exe is worth additional inspection. The source code of process.d shows something interesting:

1221 @system unittest 1222 { [...] 1247 auto pid = spawnProcess(["sleep", "10000"], [...] 1260 // kill the spawned process with SIGINT 1261 // and send its return code 1262 spawn((shared Pid pid) { 1263 auto p = cast() pid; 1264 kill(p, SIGINT);

So yes, there’s our sleep and the SIGINT (signal 2) right in the unit tests of process.d, just like we have observed in the bpftrace output.

Can we study the behavior of process.exe in isolation, separatedly from the build? Indeed we can. Let’s take the executable from a failed build, and try running it under /usr/libexec/sbuild-usernsexec.

First, we prepare a chroot inside a suitable user namespace:

unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs cd /tmp/rootfs cat /home/ema/.cache/sbuild/unstable-arm64.tar | unshare --map-auto --setuid 0 --setgid 0 tar xf - unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs/whatever unshare --map-auto --setuid 0 --setgid 0 cp process.exe /tmp/rootfs/

Now we can run process.exe on its own using the perl init, and trace signals at will:

/usr/libexec/sbuild-usernsexec --pivotroot --nonet u:0:100000:65536 g:0:100000:65536 /tmp/rootfs ema /whatever -- /process.exe

We can compare the behavior of the perl init vis-a-vis the one using dumb-init in milliseconds instead of minutes.

Stage 5: Oh, I see.

Why does process.exe send more SIGTERMs when using the perl init is now the big question. We have a simple reproducer, so this is where using strace becomes possible.

sudo strace --user ema --follow-forks -o sbuild-dumb-init.strace ./sbuild-usernsexec-dumb-init --pivotroot --nonet u:0:100000:65536 g:0:100000:65536 /tmp/dumbroot ema /whatever -- /process.exe

We start comparing the strace output of dumb-init with that of perl-init, looking in particular for different calls to kill.

Here is what process.exe does under dumb-init:

3593883 kill(-2, SIGTERM) = -1 ESRCH (No such process)

No such process. Under perl-init instead:

3593777 kill(-2, SIGTERM <unfinished ...>

The process is there under perl-init!

That is a kill with negative pid. From the kill(2) man page:

If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.

It would have been very useful to see this kill with negative pid in the output of bpftrace, why didn’t we? The tracepoint used, tracepoint:signal:signal_generate, shows when signals are actually being sent, and not the syscall being called. To confirm, one can trace tracepoint:syscalls:sys_enter_kill and see the negative PIDs, for example:

PID 312719 (bash) sent signal 2 to -312728

The obvious question at this point is: why is there no process group 2 when using dumb-init?

Stage 6: How did that ever work?

We know that process.exe sends a SIGTERM to every process in the process group with ID 2. To find out what this process group may be, we spawn a shell with dumb-init and observe under /proc PIDs 1, 16, and 17. With perl-init we have 1, 2, and 17. When running dumb-init, there are a few forks before launching the program, explaining the difference. Looking at /proc/2/cmdline we see that it’s bash, ie. the program we are running under perl-init. When building a package, that is dpkg-buildpackage itself.

The test is accidentally killing its own process group.

Now where does this -2 come from in the test?

2363 // Special values for _processID. 2364 enum invalid = -1, terminated = -2;

Oh. -2 is used as a special value for PID, meaning "terminated". And there’s a call to kill() later on:

2694 do { s = tryWait(pid); } while (!s.terminated); [...] 2697 assertThrown!ProcessException(kill(pid));

What sets pid to terminated you ask?

Here is tryWait:

2568 auto tryWait(Pid pid) @safe 2569 { 2570 import std.typecons : Tuple; 2571 assert(pid !is null, "Called tryWait on a null Pid."); 2572 auto code = pid.performWait(false);

And performWait:

2306 _processID = terminated;

The solution, dear reader, is not to kill.

Categories: FLOSS Project Planets

Freelock Blog: Change the display of an event after it happens

Planet Drupal - Fri, 2024-12-13 10:00
Change the display of an event after it happens Anonymous (not verified) Fri, 12/13/2024 - 07:00 Tags Drupal Drupal Planet ECA Engagement Event Management

Event Calendars seem to be very common on the Drupal sites we build. One of the best ways of improving engagement on a site is to add content about the event after it happens. People who attended an event might come back for a recap, or to see pictures or notes from other participants, while people who did not attend can get a sense of what a future event might be like based on your past events.

Categories: FLOSS Project Planets

Droptica: Top 8 Challenges When Migrating from Drupal 7 to Drupal 10 or 11

Planet Drupal - Fri, 2024-12-13 07:29

Migrating from Drupal 7 to Drupal 10 or 11 can be quite challenging. Common issues, such as neglecting a detailed website analysis or failing to prioritize user training, frequently result in delays, increased costs, and frustration. In this blog post, we’ll explore the top pitfalls in Drupal migration and provide tips on how to avoid them, helping you make the transition smoother and more predictable.

Categories: FLOSS Project Planets

Web Review, Week 2024-50

Planet KDE - Fri, 2024-12-13 07:24

Let’s go for my web review for the week 2024-50.

Census III of Free and Open Source Software

Tags: tech, foss, supply-chain

Interesting report, some findings are kind of unexpected. It’s interesting to see how much npm and maven dominate the supply chain. Clearly there’s a need for a global scheme to identify dependencies, hopefully we’ll get there.

https://www.linuxfoundation.org/research/census-iii


Open Source Archetypes: A Framework For Purposeful Open Source

Tags: tech, foss, business, strategy

An important white paper which probably went unnoticed. It gives a nice overview of the strategies one can build around Open Source components.

https://blog.mozilla.org/wp-content/uploads/2018/05/MZOTS_OS_Archetypes_report_ext_scr.pdf


Fool Me Twice We Don’t Get Fooled Again

Tags: tech, social-media, fediverse

Excellent post from Cory Doctorow about why he is only on Mastodon. Not being federated should indeed just be a deal breaker by now. Empty promises should be avoided.

https://pluralistic.net/2023/08/06/fool-me-twice-we-dont-get-fooled-again/


Firefox is the superior browser

Tags: tech, web, browser, firefox

Obviously I agree with this. It’s time people stop jumping on chromium based browsers.

https://asindu.xyz/posts/switching-to-firefox/


TRELLIS: Structured 3D Latents for Scalable and Versatile 3D Generation

Tags: tech, 3d, ai, machine-learning, generator

Looks like a nice model to produce 3D assets. Should speed up a bit the work of artists for producing background elements, I guess there will be manual adjustments needed in the end still.

https://trellis3d.github.io/


Who and What comprise AI Skepticism? - by Benjamin Riley

Tags: tech, ai, machine-learning, gpt, criticism

Excellent post showing all the nuances of AI skepticism. Can you find in which category you are? I definitely match several of them.

https://buildcognitiveresonance.substack.com/p/who-and-what-comprises-ai-skepticism


Reverse engineering of the Pentium FDIV bug

Tags: tech, cpu, hardware

It’s interesting to see such a reverse engineering of this infamous bug straight from the gates layout.

https://oldbytes.space/@kenshirriff/113606898880486330


How to Think About Time

Tags: tech, time

A good summary on the various concepts needed to reason about time.

https://errorprone.info/docs/time


Galloping Search - blag

Tags: tech, algorithm

Nice principle for a search in a sorted list when you don’t know the upper bound.

https://avi.im/blag/2024/galloping-search/


I’m daily driving Jujutsu, and maybe you should too

Tags: tech, version-control, git

Jujutsu is indeed alluring… but its long term support is questionable, that’s what keeps me away from it for now.

https://drewdevault.com/2024/12/10/2024-12-10-Daily-driving-jujutsu.html


mise-en-place

Tags: tech, tools, developer-experience

A single tool to manage your environment and dev tools across projects? Seems a bit young and needs a proper community still. I’m surely tempted to give it a spin though.

https://mise.jdx.dev/


Raw loops vs. STL algorithms

Tags: tech, c++, algorithm

An old one now, but since I keep giving this advice it seems relevant still. If you’re using raw loops at least that no again, there is likely a good alternative in the STL.

https://www.meetingcpp.com/blog/items/raw-loops-vs-stl-algorithms.html


Generic programming to fight the rigidity in the C++ projects

Tags: tech, architecture, type-systems, generics, c++

A good reminder that genericity can help fight against the rigidity one can accumulate using purely object oriented couplings… but it comes at a price in terms of complexity.

https://codergears.com/Blog/?p=945


Nobody Gets Fired for Picking JSON, but Maybe They Should? · mcyoung

Tags: tech, json, safety, type-systems

JSON is full of pitfalls. Here is a good summary. Still it is very widespread.

https://mcyoung.xyz/2024/12/10/json-sucks/


JSON5 – JSON for Humans

Tags: tech, json

Interesting JSON superset which makes it more usable for humans. I wonder if it’ll see more parsers appear.

https://json5.org/


Improving my desktop’s responsiveness with the cgroup V2 ‘cpu.idle’ setting

Tags: tech, systemd, cgroups

Nice little systemd trick, definitely an alias to add to your setup.

https://utcc.utoronto.ca/~cks/space/blog/linux/CgroupV2CpuIdleForResponsiveness


“Rules” that terminal programs follow

Tags: tech, shell, tools, unix

Good list of the undocumented rules terminal programs tend to follow. It’s nice to have this kind of consistency even though a bit by accident.

https://jvns.ca/blog/2024/11/26/terminal-rules/


htmy

Tags: tech, web, backend, frontend, python, htmx

The idea is interesting even though it probably needs to mature. It’s interesting to see this kind of libraries popup though, there’s clearly some kind of “backend - frontend split” fatigue going on.

https://volfpeter.github.io/htmy/


The errors of TeX (1989)

Tags: tech, latex, history, estimates, craftsmanship

A very precious document. Shows great organization in the work of Knuth of course but the self-reflection has profound lessons pertaining to estimates, type of errors we make, etc.

https://yurichev.com/mirrors/knuth1989.pdf


An Undefeated Pull Request Template

Tags: tech, codereview

This is indeed a nice template for submitting changes for review. It’s very thorough and helps reviewers.

https://ashleemboyer.com/blog/pull-request-template/


On the criteria to be used in decomposing systems into modules

Tags: tech, design, architecture, research

We’re still struggling about how to modularize our code. Sometimes we should go back to the basics, this paper by Parnas from 1972 basically gave us the code insights needs to modularize programs properly.

https://dl.acm.org/doi/pdf/10.1145⁄361598.361623


TDD as the crack cocaine of software

Tags: tech, tdd, flow

Indeed, it is often overlooked that TDD can really help finding a state of flow. Unlike other addictive activities presented in this article it requires a non negligible initial effort though, that’s why I wouldn’t describe it as an addiction though.

https://jefclaes.be/2014/12/tdd-as-crack-cocaine-of-software.html


Demo Driven Development

Tags: tech, agile, product-management

A good reminder of what agile is about from the product management perspective. If you can regularly demo your work you ensure a feeling of progress.

https://oanasagile.blogspot.com/2013/12/demo-driven-development.html


The 6 Mistakes You’re Going to Make as a New Manager

Tags: tech, leadership, management

Good points, this is indeed often where we are struggling when we move to a leadership role. This changes the nature of the work at least in part and we need to adjust to it.

https://terriblesoftware.org/2024/12/04/the-6-mistakes-youre-going-to-make-as-a-new-manager/


Bye for now!

Categories: FLOSS Project Planets

LostCarPark Drupal Blog: Drupal Advent Calendar day 13 - Accessibility Tools track

Planet Drupal - Fri, 2024-12-13 04:00
Drupal Advent Calendar day 13 - Accessibility Tools track james Fri, 12/13/2024 - 09:00

Welcome back to the Drupal Advent Calendar. For our thirteenth door we are joined by Gareth Alexander, who is leading the Drupal CMS Accessibility Tools track.

When creating content there are so many things to consider: Target Audience, SEO issues like keyword relevance, making content that is actually engaging and relevant, and then there is the accessibility of your content as well.

With the Drupal CMS accessibility tools track we hope to provide a way to help with one part of that. These tools will help guide a content author to make and keep their content as accessible as possible with…

Tags
Categories: FLOSS Project Planets

Spyder IDE: Spyder 6 under the hood: Editor migration, remote dev QA, test overhaul and more!

Planet Python - Thu, 2024-12-12 19:00
Beyond the headline features, there's a lot more new and improved under the hood in Spyder 6. Daniel Althviz, Spyder's release manager and co-maintainer, was at the forefront of much of it, and we're here to share the highlights with all of you and what he plans to work on next!
Categories: FLOSS Project Planets

Matt Layman: 1Password and DigitalOcean Droplet - Building SaaS #208

Planet Python - Thu, 2024-12-12 19:00
In this episode, I continued a migration of my JourneyInbox app from Heroku to DigitalOcean. We configured the secrets using 1Password and created the droplet that will host the app.
Categories: FLOSS Project Planets

Freexian Collaborators: Monthly report about Debian Long Term Support, November 2024 (by Roberto C. Sánchez)

Planet Debian - Thu, 2024-12-12 19:00

Like each month, have a look at the work funded by Freexian’s Debian LTS offering.

Debian LTS contributors

In November, 20 contributors have been paid to work on Debian LTS, their reports are available:

  • Abhijith PA did 14.0h (out of 6.0h assigned and 8.0h from previous period).
  • Adrian Bunk did 53.0h (out of 15.0h assigned and 85.0h from previous period), thus carrying over 47.0h to the next month.
  • Andrej Shadura did 7.0h (out of 7.0h assigned).
  • Arturo Borrero Gonzalez did 1.0h (out of 10.0h assigned), thus carrying over 9.0h to the next month.
  • Bastien Roucariès did 20.0h (out of 20.0h assigned).
  • Ben Hutchings did 0.0h (out of 24.0h assigned), thus carrying over 24.0h to the next month.
  • Chris Lamb did 18.0h (out of 18.0h assigned).
  • Daniel Leidert did 17.0h (out of 26.0h assigned), thus carrying over 9.0h to the next month.
  • Emilio Pozuelo Monfort did 40.5h (out of 60.0h assigned), thus carrying over 19.5h to the next month.
  • Guilhem Moulin did 7.25h (out of 7.5h assigned and 12.5h from previous period), thus carrying over 12.75h to the next month.
  • Jochen Sprickerhof did 3.5h (out of 10.0h assigned), thus carrying over 6.5h to the next month.
  • Lee Garrett did 14.75h (out of 15.25h assigned and 44.75h from previous period), thus carrying over 45.25h to the next month.
  • Lucas Kanashiro did 10.0h (out of 54.0h assigned and 10.0h from previous period), thus carrying over 54.0h to the next month.
  • Markus Koschany did 20.0h (out of 40.0h assigned), thus carrying over 20.0h to the next month.
  • Roberto C. Sánchez did 6.75h (out of 9.75h assigned and 14.25h from previous period), thus carrying over 17.25h to the next month.
  • Santiago Ruano Rincón did 24.75h (out of 23.5h assigned and 1.5h from previous period), thus carrying over 0.25h to the next month.
  • Sean Whitton did 2.0h (out of 6.0h assigned), thus carrying over 4.0h to the next month.
  • Sylvain Beucler did 21.5h (out of 9.5h assigned and 50.5h from previous period), thus carrying over 38.5h to the next month.
  • Thorsten Alteholz did 11.0h (out of 11.0h assigned).
  • Tobias Frost did 12.0h (out of 10.5h assigned and 1.5h from previous period).
Evolution of the situation

In November, we have released 38 DLAs.

The LTS coordinators, Roberto and Santiago, delivered a talk at the Mini-DebConf event in Toulouse, France. The title of the talk was “How LTS goes beyond LTS”. The talk covered work done by the LTS Team during the past year. This included contributions related to individual packages in Debian (such as tomcat, jetty, radius, samba, apache2, ruby, and many others); improvements to tooling and documentation useful to the Debian project as a whole; and contributions to upstream work (apache2, freeimage, node-dompurify, samba, and more). Additionally, several contributors external to the LTS Team were highlighted for their contributions to LTS. Readers are encouraged to watch the video of the presentation for a more detailed review of various ways in which the LTS team has contributed more broadly to the Debian project and to the free software community during the past year.

We wish to specifically thank Salvatore (of the Debian Security Team) for swiftly handling during November the updates of needrestart and libmodule-scandeps-perl, both of which involved arbitrary code execution vulnerabilities. We are happy to see increased involvement in LTS work by contributors from outside the formal LTS Team.

The work of the LTS Team in November was otherwise unremarkable, encompassing the customary triage, development, testing, and release of numerous DLAs, along with some associated contributions to related packages in stable and unstable.

Thanks to our sponsors

Sponsors that joined recently are in bold.

Categories: FLOSS Project Planets

KDE Ships Frameworks 6.9.0

Planet KDE - Thu, 2024-12-12 19:00

Friday, 13 December 2024

KDE today announces the release of KDE Frameworks 6.9.0.

KDE Frameworks are 72 addon libraries to Qt which provide a wide variety of commonly needed functionality in mature, peer reviewed and well tested libraries with friendly licensing terms. For an introduction see the KDE Frameworks release announcement.

This release is part of a series of planned monthly releases making improvements available to developers in a quick and predictable manner.

New in this version Attica
  • It compiles fine without deprecated methods. Commit.
Baloo
  • [termgeneratortest] Rework unit test for negative numbers. Commit.
  • Remove unneeded qOverload statements. Commit.
  • Ci: use suse-qt68 image for clang-format. Commit.
  • [balooctl] Refactor the "index" and "clear" code. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Don't include quiet packages in feature_summary. Commit.
Bluez Qt
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
Breeze Icons
  • Bring back directory symlinks for breeze-dark. Commit. Fixes bug #482648
  • Add symbolic version of system-software-update for small sizes. Commit. Fixes bug #399139
  • Also link to 22px versions for Duplicate icons. Commit.
  • Add symbolic version of preferences-desktop-keyboard-shortcut. Commit.
  • Add transport-mode-flight icon. Commit.
  • Add symbolic version of preferences-system-users. Commit.
  • Add symbolic version of preferences-desktop-notification-symbolic. Commit.
  • Add symbolic version of preferences-desktop-theme-global. Commit.
  • Generate index.theme unconditionally to fix qrc/rcc. Commit.
  • Make qrc generation fail if no *.theme file was found. Commit.
  • Add missing CSS properties for blur and pixelate icons. Commit. Fixes bug #495755
  • Fix class attribute for places/32/folder-{log,podcast}.svg. Commit.
  • Add boost and boost-boosted icons. Commit.
  • Update WINE app icons to match new symbolic versions. Commit.
  • Add wine-symbolic icon. Commit. Fixes bug #494450
  • Add favorite-favorited, change favorite to non-filled. Commit.
  • Make base donate and help-donate icons be hearts. Commit.
  • Add love. Commit.
  • Improve README with more guidelines and contributing information. Commit.
  • Add icon for keyboard shortcut preferences. Commit. See bug #426753
  • Add dialog-password icon. Commit.
  • Optimize-svg: Clarify that you need to install svgo globally. Commit.
  • Add laser printer icon. Commit.
Extra CMake Modules
  • Align multi-language catalog loading with KI18n. Commit.
  • EGPF: Handle case where INTERFACE_INCLUDE_DIRECTORIES is empty. Commit. Fixes bug #496781
  • KDEClangFormat: Avoid spammy warnings with cmake >= 3.31.0. Commit. Fixes bug #496537
  • Consider all QLocale::uiLanguages for QM catalog loading. Commit.
  • ECMGeneratePythonBindings: Build without system isolation. Commit.
  • ECMGeneratePythonBindings: Remove broken RPATH settings. Commit.
  • Include Qt's translations in what we bundle on Android. Commit.
  • Fix FindLibMount without pkgconfig. Commit.
  • Don't use KDEInstallDirs6 for the new ECMGeneratePkgConfigFile test. Commit.
  • Fix reproducible build issue with ECMGeneratedHeaders. Commit.
Framework Integration
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KArchive
  • Kzip: fix reading of ZIP64 fields on certain architectures. Commit.
  • K7zip: fix/simplify GetUi*() functions. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Handle device open error. Commit.
  • Remove usage of QMutableListIterator. Commit.
KAuth
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KBookmarks
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KCalendarCore
  • Use isEmpty() vs "count() > 0". Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KCMUtils
  • We depend against kf6.8.0. Commit.
  • Split Quick library and QML module into different folders. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KCodecs
  • It compiles fine without deprecated methods. Commit.
KColorScheme
  • Fix isKdePlatformTheme for Flatpaks. Commit. Fixes bug #494734
  • It compiles fine without deprecated methods. Commit.
  • Now we depend against qt6.6. Commit.
  • Ci: add Alpine/musl job. Commit.
KCompletion
  • Remove declaration of KLineEdit::setUrlDropsEnabled. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
  • Add missing find_dependency calls for private dependencies. Commit.
KConfig
  • Add QML_REGISTRATION option to the config macro documentation. Commit.
  • KWindowStateSaver: Increase the rate limit on the slow part of config saving. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Now we depend against qt6.6.0. Commit.
  • Fix restoration of maximization state for QtQuick windows (for real). Commit. Fixes bug #494359
KConfigWidgets
  • Combine doc comments. Commit.
  • KRecentFilesAction: allow to specify mimeType for urls. Commit. See bug #496179
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KContacts
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KCoreAddons
  • Provide more license for KAboutLicense. Commit.
  • Add 7d4a6f31521 to git-blame-ignore-revs. Commit.
  • Add Python bindings. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Kaboutdata: Add overload taking KAboutPerson and KAboutComponent. Commit.
  • Kfileutils: compare to basename in makeSuggestedName. Commit. Fixes bug #493270
  • Apply 1 suggestion(s) to 1 file(s). Commit.
  • Link with libnetwork on Haiku. Commit.
  • Don't put copyright statements if the license is not BSD. Commit.
KCrash
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
  • Add static CI build. Commit.
  • Disable X11 and link with libnetwork on Haiku. Commit.
KDav
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
  • Use const reference for headers. Commit.
  • Compare HTTP headers types case-insensitively. Commit.
KDBusAddons
  • It compiles fine without deprecated methods. Commit.
  • Disable X11 on Haiku also. Commit.
  • Extend timeout for --replace option. Commit.
KDeclarative
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KDE Daemon
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KDE SU
  • Build with POSITION_INDEPENDENT_CODE. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KDNSSD
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
KDocTools
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KFileMetaData
  • Don't include quiet packages in feature_summary. Commit.
KGlobalAccel
  • It compiles fine without deprecated methods. Commit.
KGuiAddons
  • Add Python bindings. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Kmodifierkeyinfo: Update to v5 of the Wayland protocol. Commit. Fixes bug #483657. Fixes bug #488870
  • Don't try to access QDBusMessage if not successful reply. Commit.
KHolidays
  • Update holiday_cn_zh-cn: add newline. Commit.
  • Update holiday_cn_zh-cn for 2025 holidays. Commit.
  • Adds public holiday for Nigeria. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Holiday_bj_fr - update Benin holidays. Commit. Fixes bug #496260
  • Document HolidayRegion::rawHolidaysWithAstroSeasons(). Commit.
KI18n
  • Handle multiple country-specific locales for the same language correctly. Commit.
  • Look up Qt translations catalogs ourselves. Commit.
  • Add auto tests for Qt catalog loading. Commit.
  • Improve fallback handling for Qt translation catalog loading. Commit.
  • Fix license identifier. Commit.
  • Remove obsolete Qt translation catalogs. Commit.
  • Fix loading of Qt's translation catalogs on Android. Commit.
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
KIconThemes
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Now we depend against qt6.6.0. Commit.
KIdletime
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
  • Disable X11 and Wayland on Haiku also. Commit.
KImageformats
  • Jxl: Disable color conversion for animations. Commit.
  • Improve CMYK writing support. Commit.
  • Improved write test. Commit.
  • JXL: load error with some lossless file. Commit. See bug #496350
  • JXR: jxrlib cannot write HDP and WDP formats. Commit.
  • Heif: avoid crash in heif_image_handle_has_alpha_channel. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
  • RGB: avoid to read wrong data. Commit.
  • JXL: Fix OSS Fuzz issue 377971416. Commit.
  • Fix compilation warnings. Commit.
  • JXR: Fix libraries link under FreeBSD. Commit.
  • JXL: fixed bug when saving grayscale images without color profile. Commit.
  • PFM: extended to half float format. Commit.
  • Rename SCT plugin for OSS-FUZZ. Commit.
  • PCX: support for more formats. Commit.
  • SCT: added read only support. Commit.
KIO
  • Adapt test to new error code. Commit.
  • [ftp] Give better error message when creating files is not allowed. Commit.
  • File_unix: check chown return when setting owner. Commit.
  • CommandLauncherJob: fail when launch an non-existing executable. Commit.
  • Don't static cast qobjects. Commit.
  • Kpropertiesdialog: fix user display to actually use the user data. Commit. Fixes bug #496745
  • Add autotest for parsing bug and actually report error status. Commit.
  • Fix out of bounds for KRunMX1::expandEscapedMacro. Commit. Fixes bug #495606
  • Kcoredirlister: Remove iterator assert, use if instead. Commit. Fixes bug #493319
  • Haiku support: Disable SHM, link to libnetwork, further fixes. Commit.
  • It compiles fine without deprecated methods. Commit.
  • KDirOperator: improve handling of forward/back mouse buttons. Commit. See bug #443169
  • KUrlNavigator: Fix Tab order. Commit.
  • Haiku build fixes. Commit.
  • [previewjob] Assert that path is absolute. Commit. See bug #490827
  • Deprecate http_update_cache. Commit.
  • KUrlNavigatorDropDownButton: Add text and tooltip. Commit.
Kirigami
  • Chip: Add visible hover state. Commit.
  • Fix accessibility of InlineMessage. Commit.
  • ActionMenuItem: make a11y press work. Commit.
  • PrivateActionToolButton: make a11y press work. Commit.
  • SelectableLabel: Allow disabling the built-in context menu. Commit.
  • Always use a ToolBar for pages on the stack. Commit.
  • SelectableLabel: Make selection persistent. Commit. Fixes bug #496214
  • PlaceholderMessage: Let use overwrite icon color. Commit.
  • PlaceholderMessage: Forward icon.width/icon.height to internal Icon. Commit.
  • NavigationTabBar: Fix warning related to assigning a Repeater instead of a AbstractButton. Commit.
  • Use border for keyboard active focus in NavigationTabButton. Commit.
  • SelectableLabel: Remove onLinkActivated. Commit.
  • [SelectableLabel] restore font property. Commit.
  • Add missing REQUIRED for ECM. Commit.
  • Remove Useless empty contentItem. Commit.
  • Fix mobile mode. Commit.
  • Fix doc for PlatformTheme::ColorSet. Commit.
  • ColumnView: Note that FixedColumns is the default value for columnResizeMode. Commit.
  • Add optional Breeze style import also for static builds. Commit.
KItemModels
  • It compiles fine without deprecated methods. Commit.
KItemViews
  • It compiles fine without deprecated methods. Commit.
  • Add Linux static CI build. Commit.
KJobWidgets
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Add a first basic autotest. Commit.
  • JobView: expose elapsedTime. Commit.
  • Disable X11 on Haiku also. Commit.
KNewStuff
  • Transaction: use cache2 not the deprecated legacy cache. Commit.
  • Do not finish the transaction before it actually did anything. Commit. Fixes bug #496551
  • Cache: become a facade for Cache2. Commit.
  • Use isEmpty() vs count() > 0. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Providerbase: split done signal from loaded signal. Commit.
  • Staticxmlprovider: remove unused member. Commit.
  • Ci: add Alpine/musl job. Commit.
  • ResultsStream: Restore the providers upon ::fetchMore. Commit.
  • Fixup! the grand API refactor of 2024. Commit.
  • The grand API refactor of 2024. Commit.
  • Port test away from deprecated API. Commit.
  • Add missing KNEWSTUFFCORE_BUILD_DEPRECATED_SINCE. Commit.
  • Fix random timeouts in attica test. Commit.
  • Transaction: deprecate ambiguous install function. Commit.
KNotifications
  • Add Python bindings. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
  • Disable Canberra check for Haiku also. Commit.
KNotifyConfig
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KPackage
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
  • Fix copyright utils. Commit.
KParts
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KPlotting
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KPTY
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
KQuickCharts
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KRunner
  • Allow to set RunnerManager instance in model from outside. Commit. See bug #483147
  • It compiles fine without deprecated methods. Commit.
KService
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KStatusNotifieritem
  • It compiles fine without deprecated methods. Commit.
  • Ci: add Alpine/musl job. Commit.
KSVG
  • It compiles fine without deprecated methods. Commit.
KTextEditor
  • Sort and remove duplicates in outRanges in Kate::TextBuffer::rangesForLine. Commit.
  • Add test case for line unwrapping crash. Commit.
  • Don't leave non-multiblock Kate::TextRange in m_buffer->m_multilineRanges. Commit.
  • Don't crash on insert at lastLine + 1. Commit. Fixes bug #496612
  • Avoid closeUrl() call. Commit.
  • Clear all references/uses of aboutToDeleteMovingInterfaceContent. Commit.
  • Align completion with the word being completed. Commit. Fixes bug #485885
  • Try to relax unstable test. Commit.
  • Use a QLabel for scrollbar linenumbers tooltip. Commit.
  • Add functions for jumping to next/prev blank line. Commit.
  • Disable ENABLE_KAUTH_DEFAULT on Haiku also. Commit.
  • Remove misleading dead code. Commit.
  • Fix crash if feedback or dyn attr is cleared before deletion. Commit. Fixes bug #495925
  • Fix ranges with dynamic attribute dont notify deletion. Commit.
  • Deprecate aboutToDeleteMovingInterfaceContent. Commit.
  • Remove m_ranges from buffer. Commit.
  • Dont take ownership of the MovingRange/MovingCursor. Commit.
  • Buffer: Remove m_invalidCursors. Commit.
  • Allow shifted numbers for Dvorak and Co. Commit. Fixes bug #388138
  • Keep hinting as set by the user. Commit. Fixes bug #482659
KTextTemplate
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
KTextWidgets
  • It compiles fine without deprecated methods. Commit.
  • Add missing find_dependency calls for private dependencies. Commit.
  • Add linux-qt6-static CI. Commit.
KUnitConversion
  • Add missing since documentation for Xpf currency. Commit.
  • Fix Xpf enum value. Commit.
  • Install python bindings into site-packages dir. Commit.
  • Add CFP franc to currencies list. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Add Python bindings. Commit.
KUserFeedback
  • Don't include quiet packages in feature_summary. Commit.
KWallet
  • It compiles fine without deprecated methods. Commit.
  • Link with libnetwork on Haiku. Commit.
  • Add global option to disable X11. Commit.
KWidgetsAddons
  • Fix since version for KAdjustingScrollArea. Commit.
  • Kmessagebox: Add option to force label to be plain text. Commit.
  • Install python bindings into site-packages dir. Commit.
  • Add python examples. Commit.
  • It compiles fine without deprecated methods. Commit.
  • Add Python bindings. Commit.
  • Ci: add Alpine/musl job. Commit.
  • KPageView: Strip mnemonics before matching search text. Commit.
  • KPasswordDialog: Vertically center prompt. Commit.
  • Introduce KAdjustingScrollArea. Commit.
  • Kratingwidget: Draw icon at native resolution. Commit.
KWindowSystem
  • Xcb: Be more strict about icon sizes. Commit.
  • Add manual test for activating window. Commit.
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
  • Disable Wayland and X11 on Haiku also. Commit.
  • Make use of QWaylandWindow::surfaceRoleCreated for setMainWindow. Commit.
KXMLGUI
  • KXmlGuiWindow: Create KHelpMenu without application data. Commit.
  • KHelpMenu: Use up-to-date application data if not set explicitly. Commit.
  • Add Python bindings. Commit.
  • Kbugreport: Specify what the second version number refers too. Commit.
  • AboutDialog: Add copy button for components info. Commit.
  • It compiles fine without deprecated methods. Commit.
  • About dialog: Put app specific components before generic components. Commit.
  • KHelpMenu: Allow showing and hiding the What's This menu entry. Commit.
  • KHelpMenu: Deprecate second constructor with bool parameter. Commit.
  • KHelpMenu: Deprecate constructor with unused parameter. Commit.
  • KHelpMenu: Remove unnecessary member variables. Commit.
  • KHelpMenu: Remove long dead support for a simple About text. Commit.
  • Ensure action insertion order is preserved. Commit.
  • Skip first column when resizing columns. Commit.
  • Simplify action storage in KActionCollection. Commit.
  • Add Linux static CI build. Commit.
  • Add component description to default components. Commit.
  • Simplify about data dialog. Commit.
  • [kactioncategory] Add new-style connect variants for addAction. Commit.
  • [kactioncategory] Deprecate functions that use KStandardAction. Commit.
  • [kactioncategory] Add overloads for KStandardActions. Commit.
Modem Manager Qt
  • It compiles fine without deprecated methods. Commit.
Network Manager Qt
  • Stop spamming about Unhandled property "VersionId". Commit.
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
Prison
  • It compiles fine without deprecated methods. Commit.
Purpose
  • [imgur] Improve error reporting. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Clipboard: Also allow to export to the clipboard. Commit. Fixes bug #477553
  • Introduce a clipboard plugin. Commit.
  • AlternativesModel: Don't filter by fields that don't pertain to the current plugintype. Commit.
QQC2 Desktop Style
  • Org.kde.desktop: Add null contentItem checks to check/radio/switch controls. Commit.
  • Use null contentItem instead of empty Item. Commit.
  • Use Qt text rendering when high DPI scaling. Commit. Fixes bug #479891
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Add a SwipeDelegate. Commit.
  • Don't include quiet packages in feature_summary. Commit.
Solid
  • Battery: Add cycleCount. Commit.
  • Bump KF and QT versions in cem_set_disabled_deprecation_versions. Commit.
  • Fstab: Fix memory leak when a network or overlay mount has changed. Commit.
  • Fix build on Haiku. Commit.
  • Consistenly use correct include statements for libmount. Commit.
  • Add support for rclone mounts and fstab entries. Commit.
Sonnet
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
Syndication
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Ci: add Alpine/musl job. Commit.
Syntax Highlighting
  • Update odin highlighting. Commit.
  • The lua indenter was removed long ago in ktexteditor. Commit.
  • Odin: Fix numbers getting highlighted in the middle of words. Commit.
  • Highlight odin 'context' keyword differently. Commit.
  • Improve odin lang highlighting. Commit.
  • Bump KF and QT versions in ecm_set_disabled_deprecation_versions. Commit.
  • Cmake.xml: updates for the recently released CMake 3.31. Commit.
Threadweaver
  • It compiles fine without deprecated methods. Commit.
  • Build fix for Haiku. Commit.
Categories: FLOSS Project Planets

Dirk Eddelbuettel: #44: r2u For ML and MLops Talk

Planet Debian - Thu, 2024-12-12 17:02

Welcome to the 44th post in the $R^4 series.

A few weeks ago, and following an informal ‘call for talks’ by James Lamb, I had an opportunity to talk about r2u to the Chicago ML and MLops meetup groups. You can find the slides here.

Over the last 2 1/2 years, r2u has become a widely-deployed mechanism in a number of settings, including (but not limited to) software testing via continuous integration, deployment on cloud servers—besides of course to more standard use on local laptops or workstation. 30 million downloads illustrate this. My thesis for the talk was that this extends equally to ML(ops) where no surprises, no hickups automated deployments are key for large-scale model training, evaluation and of course production deployments.

In this context, I introduce r2u while giving credit both to what came before it, the existing alternatives (or ‘competitors’ for mindshare if one prefers that phrasing), and of course what lies underneath it.

The central takeaway, I argue, is that r2u can and does take advantage of a unique situation in that we can ‘join’ the package manager task for the underlying (operating) system and and the application domain, here R and its unique CRAN repository network. Other approaches can, and of course do, provide binaries, but by doing this outside the realm of the system package manager can only arrive at a lesser integration (and I show a common error arising in that case). So where r2u is feasible, it dominates the alternatives (while the alternatives may well provide deployment on more platforms which, even when less integrated, may be of greater importance for some). As always, it all depends.

But the talk, and its slides, motivate and illustrate why we keep calling r2u by its slogan of r2u: Fast. Easy. Reliable. Pick All Three.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

FSF Blogs: IDAD 2024 - Dec. 20: For freedom, against restriction

GNU Planet! - Thu, 2024-12-12 15:50
Don't let computers go to waste and join us in fighting restriction on December 20 for the eighteenth International Day Against Digital Restrictions Management (IDAD).
Categories: FLOSS Project Planets

IDAD 2024 - Dec. 20: For freedom, against restriction

FSF Blogs - Thu, 2024-12-12 15:50
Don't let computers go to waste and join us in fighting restriction on December 20 for the eighteenth International Day Against Digital Restrictions Management (IDAD).
Categories: FLOSS Project Planets

Drupal Association blog: New Critical Security Updates for Drupal 7 Highlight Importance of Drupal 7 Extended Support by Tag1

Planet Drupal - Thu, 2024-12-12 15:00

This blog post is published on behalf of Tag1.

As we count down to the end-of-life (EOL) for Drupal 7 on 5 January 2025, the Drupal Security Team has just released what is likely to be the final D7 updates from the community.

This latest security release includes important fixes for two D7 vulnerabilities: an XSS (cross-site scripting) vulnerability in Drupal core’s Overlay module and a potential object injection vulnerability, which, when combined with other vulnerabilities in Drupal core, contrib, or custom modules, could lead to Remote Code Execution. Tag1’s Ra Mänd and Fabian Franz both contributed to getting the security release out. The Drupal security team also issued multiple security releases for Drupal 7 contributed modules on the same day.
 

Starting January 2025, the Drupal Security team will no longer review reported issues or release security updates for Drupal 7 core or contrib modules. To address this, the Drupal Association has authorized Tag1 to be a D7 Extended Support Partner, ensuring your D7 sites stay protected with Tag1's Drupal 7 Extended Support (D7ES). We will continue to monitor for security vulnerabilities and provide updates and support to ensure your site remains safe and secure beyond January 2025.

The Critical Role of Drupal 7 Extended Support (D7ES)

This security release illustrates why the Drupal community established the Drupal 7 Extended Support program (D7ES) and authorized Tag1 to become a D7 Extended Support Partner in order to commercially assume the responsibilities of the Drupal Security Team. Simply put, the question isn't whether new security issues will be found but when. 

Through Tag1 D7ES, Tag1 will ensure that organizations can continue operating their Drupal 7 sites securely beyond the official EOL date, providing the critical security updates that every D7 site will inevitably need.

Why Tag1 is Your Optimal D7ES Partner

Tag1 stands apart in several crucial ways:

  • We have more people on the Drupal Security team than any other Drupal consulting company or D7ES provider and you have always relied on our team to fix security issues, including these latest updates.

  • We are responsible for much of the Drupal 7 codebase. Our team includes many of the key contributors to Drupal 7, including one of only a few core committers responsible for the platform's overall architecture and many of the core component and module maintainers.

  • We are the only D7ES provider with proven experience running Drupal Extended Support, having successfully managed D6 support for over 6 years post-EOL.

  • We created and will continue to maintain the QA and testing systems for Drupal 7, a critical component that ensures the reliability you expect from Drupal updates. You can trust that our updates will work on your operating system, version of php, database, etc. - the same way that you do today.

  • By choosing Tag1, you maintain as much continuity as possible - our experts will continue operating using processes similar to what we use to build and release Drupal today, minimizing changes to your workflows and release procedures.

The Path Forward

As we approach the EOL date, organizations running Drupal 7 sites must take proactive steps to ensure they remain secure. Enrolling in Tag1's D7ES program isn't just about maintaining security - it's about partnering with the team that has been integral to Drupal 7's security and stability from the beginning. We'll continue to provide the same level of expertise and attention to security that your organization has come to expect from Drupal.

Categories: FLOSS Project Planets

Matt Glaman: phpstan-drupal now supports PHPStan 2.0

Planet Drupal - Thu, 2024-12-12 11:42

PHPStan 2.0 was released a month ago, a massive milestone for the project. To learn about all the changes, I recommend reading the release announcement. phpstan-drupal now has a PHPStan 2.0 compatible release: https://github.com/mglaman/phpstan-drupal/releases/tag/2.0.0. The 1.x branch will be maintained as long as a version of Drupal Core uses it, at least until Drupal 10's end-of-life near the end of 2026. If applicable, I will backport bug fixes and features to 1.x.

Categories: FLOSS Project Planets

Matthew Garrett: When should we require that firmware be free?

Planet Debian - Thu, 2024-12-12 10:57
The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.

Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.

How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.

But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.

So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?

I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.

Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.

This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).

RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.

The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.

For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.

But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.

As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.

Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.

[1] Yes yes SMM

comments
Categories: FLOSS Project Planets

Matthew Garrett: Android privacy improvements break key attestation

Planet Debian - Thu, 2024-12-12 07:16
Sometimes you want to restrict access to something to a specific set of devices - for instance, you might want your corporate VPN to only be reachable from devices owned by your company. You can't really trust a device that self attests to its identity, for instance by reporting its MAC address or serial number, for a couple of reasons:
  • These aren't fixed - MAC addresses are trivially reprogrammable, and serial numbers are typically stored in reprogrammable flash at their most protected
  • A malicious device could simply lie about them
If we want a high degree of confidence that the device we're talking to really is the device it claims to be, we need something that's much harder to spoof. For devices with a TPM this is the TPM itself. Every TPM has an Endorsement Key (EK) that's associated with a certificate that chains back to the TPM manufacturer. By verifying that certificate path and having the TPM prove that it's in posession of the private half of the EK, we know that we're communicating with a genuine TPM[1].

Android has a broadly equivalent thing called ID Attestation. Android devices can generate a signed attestation that they have certain characteristics and identifiers, and this can be chained back to the manufacturer. Obviously providing signed proof of the device identifier is kind of problematic from a privacy perspective, so the short version[2] is that only apps installed using a corporate account rather than a normal user account are able to do this.

But that's still not ideal - the device identifiers involved included the IMEI and serial number of the device, and those could potentially be used to correlate devices across privacy boundaries since they're static[3] identifiers that are the same both inside a corporate work profile and in the normal user profile, and also remains static if you move between different employers and use the same phone[4]. So, since Android 12, ID Attestation includes an "Enterprise Specific ID" or ESID. The ESID is based on a hash of device-specific data plus the enterprise that the corporate work profile is associated with. If a device is enrolled with the same enterprise then this ID will remain static, if it's enrolled with a different enterprise it'll change, and it just doesn't exist outside the work profile at all. The other device identifiers are no longer exposed.

But device ID verification isn't enough to solve the underlying problem here. When we receive a device ID attestation we know that someone at the far end has posession of a device with that ID, but we don't know that that device is where the packets are originating. If our VPN simply has an API that asks for an attestation from a trusted device before routing packets, we could pass that on to said trusted device and then simply forward the attestation to the VPN server[5]. We need some way to prove that the the device trying to authenticate is actually that device.

The answer to this is key provenance attestation. If we can prove that an encryption key was generated on a trusted device, and that the private half of that key is stored in hardware and can't be exported, then using that key to establish a connection proves that we're actually communicating with a trusted device. TPMs are able to do this using the attestation keys generated in the Credential Activation process, giving us proof that a specific keypair was generated on a TPM that we've previously established is trusted.

Android again has an equivalent called Key Attestation. This doesn't quite work the same way as the TPM process - rather than being tied back to the same unique cryptographic identity, Android key attestation chains back through a separate cryptographic certificate chain but contains a statement about the device identity - including the IMEI and serial number. By comparing those to the values in the device ID attestation we know that the key is associated with a trusted device and we can now establish trust in that key.

"But Matthew", those of you who've been paying close attention may be saying, "Didn't Android 12 remove the IMEI and serial number from the device ID attestation?" And, well, congratulations, you were apparently paying more attention than Google. The key attestation no longer contains enough information to tie back to the device ID attestation, making it impossible to prove that a hardware-backed key is associated with a specific device ID attestation and its enterprise enrollment.

I don't think this was any sort of deliberate breakage, and it's probably more an example of shipping the org chart - my understanding is that device ID attestation and key attestation are implemented by different parts of the Android organisation and the impact of the ESID change (something that appears to be a legitimate improvement in privacy!) on key attestation was probably just not realised. But it's still a pain.

[1] Those of you paying attention may realise that what we're doing here is proving the identity of the TPM, not the identity of device it's associated with. Typically the TPM identity won't vary over the lifetime of the device, so having a one-time binding of those two identities (such as when a device is initially being provisioned) is sufficient. There's actually a spec for distributing Platform Certificates that allows device manufacturers to bind these together during manufacturing, but I last worked on those a few years back and don't know what the current state of the art there is

[2] Android has a bewildering array of different profile mechanisms, some of which are apparently deprecated, and I can never remember how any of this works, so you're not getting the long version

[3] Nominally, anyway. Cough.

[4] I wholeheartedly encourage people not to put work accounts on their personal phones, but I am a filthy hypocrite here

[5] Obviously if we have the ability to ask for attestation from a trusted device, we have access to a trusted device. Why not simply use the trusted device? The answer there may be that we've compromised one and want to do as little as possible on it in order to reduce the probability of triggering any sort of endpoint detection agent, or it may be because we want to run on a device with different security properties than those enforced on the trusted device.

comments
Categories: FLOSS Project Planets

Top articles at OpenSource.net in 2024

Open Source Initiative - Thu, 2024-12-12 06:46

OpenSource.net, a platform designed to foster knowledge sharing, was launched in September 2023. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions. Here are some of the top articles published at OpenSource.net in 2024:

Business with Open Source Community and projects Sustainability and environment Open Source AI

A special thank you to the authors who have contributed with articles and Cisco for sponsoring OpenSource.net. If you are interested in contributing with articles on Open Source software, hardware, open culture, and open knowledge, please submit a proposal.

Categories: FLOSS Research

GNU Guix: The Shepherd 1.0.0 released!

GNU Planet! - Thu, 2024-12-12 06:02

Finally, twenty-one years after its inception (twenty-one!), the Shepherd leaves ZeroVer territory to enter a glorious 1.0 era. This 1.0.0 release is published today because we think Shepherd has become a solid tool, meeting user experience standards one has come to expect since systemd changed the game of free init systems and service managers alike. It’s also a major milestone for Guix, which has been relying on the Shepherd from a time when doing so counted as dogfooding.

To celebrate this release, the amazing Luis Felipe López Acevedo designed a new logo, available under CC-BY-SA, and the project got a proper web site!

Let’s first look at what the Shepherd actually is and what it can do for you.

At a glance

The Shepherd is a minimalist but featureful service manager and as such, it herds services: it keeps track of services, their state and their dependencies, and it can start, stop, and restart them when needed. It’s a simple job; doing it right and providing users with insight and control over services is a different story.

The Shepherd consists of two commands: shepherd is the daemon that manages services, and herd is the command that lets you interact with it to inspect and control the status of services. The shepherd command can run as the first process (PID 1) and serve as the “init system”, as is the case on Guix System; or it can manage services for unprivileged users, as is the case with Guix Home. For example, running herd status ntpd as root allows me to know what the Network Time Protocol (NTP) daemon is up to:

$ sudo herd status ntpd ● Status of ntpd: It is running since Fri 06 Dec 2024 02:08:08 PM CET (2 days ago). Main PID: 11359 Command: /gnu/store/s4ra0g0ym1q1wh5jrqs60092x1nrb8h9-ntp-4.2.8p18/bin/ntpd -n -c /gnu/store/7ac2i2c6dp2f9006llg3m5vkrna7pjbf-ntpd.conf -u ntpd -g It is enabled. Provides: ntpd Requires: user-processes networking Custom action: configuration Will be respawned. Log file: /var/log/ntpd.log Recent messages (use '-n' to view more or less): 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 25 tun0 128.93.179.24:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 26 tun0 [fe80::e6b7:4575:77ef:eaf4%12]:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: new interface(s) found: waking up resolver 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 25 tun0, [128.93.179.24]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 26 tun0, [fe80::e6b7:4575:77ef:eaf4%12]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs

It’s running, and it’s logging messages: the latest ones are shown here and I can open /var/log/ntpd.log to view more. Running herd stop ntpd would terminate the ntpd process, and there’s also a start and a restart action.

Services can also have custom actions; in the example above, we see there’s a configuration action. As it turns out, that action is a handy way to get the file name of the ntpd configuration file:

$ head -2 $(sudo herd configuration ntpd) driftfile /var/run/ntpd/ntp.drift pool 2.guix.pool.ntp.org iburst

Of course a typical system runs quite a few services, many of which depend on one another. The herd graph command returns a representation of that service dependency graph that can be piped to dot or xdot to visualize it; here’s what I get on my laptop:

It’s quite a big graph (you can zoom in for details!) but we can learn a few things from it. Each node in the graph is a service; rectangles are for “regular” services (typically daemons like ntpd), round nodes correspond to one-shot services (services that perform one action and immediately stop), and diamonds are for timed services (services that execute code periodically).

Blurring the user/developer line

A unique feature of the Shepherd is that you configure and extend it in its own implementation language: in Guile Scheme. That does not mean you need to be an expert in that programming language to get started. Instead, we try to make sure anyone can start simple for their configuration file and gradually get to learn more if and when they feel the need for it. With this approach, we keep the user in the loop, as Andy Wingo put it.

A Shepherd configuration file is a Scheme snippet that goes like this:

(register-services (list (service '(ntpd) …) …)) (start-in-the-background '(ntpd …))

Here we define ntpd and get it started as soon as shepherd has read the configuration file. The ellipses can be filled in with more services.

As an example, our ntpd service is defined like this:

(service '(ntpd) #:documentation "Run the Network Time Protocol (NTP) daemon." #:requirement '(user-processes networking) #:start (make-forkexec-constructor (list "…/bin/ntpd" "-n" "-c" "/…/…-ntpd.conf" "-u" "ntpd" "-g") #:log-file "/var/log/ntpd.log") #:stop (make-kill-destructor) #:respawn? #t)

The important parts here are #:start bit, which says how to start the service, and #:stop, which says how to stop it. In this case we’re just spawning the ntpd program but other startup mechanisms are supported by default: inetd, socket activation à la systemd, and timers. Check out the manual for examples and a reference.

There’s no limit to what #:start and #:stop can do. In Guix System you’ll find services that run daemons in containers, that mount/unmount file systems (as can be guessed from the graph above), that set up/tear down a static networking configuration, and a variety of other things. The Swineherd project goes as far as extending the Shepherd to turn it into a tool to manage system containers—similar to what the Docker daemon does.

Note that when writing service definitions for Guix System and Guix Home, you’re targeting a thin layer above the Shepherd programming interface. As is customary in Guix, this is multi-stage programming: G-expressions specified in the start and stop fields are staged and make it into the resulting Shepherd configuration file.

New since 0.10.x

For those of you who were already using the Shepherd, here are the highlights compared to the 0.10.x series:

  • Support for timed services has been added: these services spawn a command or run Scheme code periodically according to a predefined calendar.
  • herd status SERVICE now shows high-level information about services (main PID, command, addresses it is listening to, etc.) instead of its mere “running value”. It also shows recently-logged messages.
  • To make it easier to discover functionality, that command also displays custom actions applicable to the service, if any. It also lets you know if a replacement is pending, in which case you can restart the service to upgrade it.
  • herd status root is no longer synonymous with herd status; instead it shows information about the shepherd process itself.
  • On Linux, reboot --kexec lets you reboot straight into a new Linux kernel previously loaded with kexec --load.

The service collection has grown:

  • The new log rotation service is responsible for periodically rotating log files, compressing them, and eventually deleting them. It’s very much like similar log rotation tools from the 80’s since shepherd logs to plain text files like in the good ol’ days.

    There’s a couple of be benefits that come from its integration into the Shepherd. First, it already knows all the files that services log to, so no additional configuration is needed to teach it about these files. Second, log rotation is race free: no single line of log can be lost in the process.

  • The new system log service what’s traditionally devoted to a separate syslogd program. The advantage of having it in shepherd is that it can start logging earlier and integrates nicely with the rest of the system.

  • The timer service provides functionality similar to the venerable at command, allowing you to run a command at a particular time:

herd schedule timer at 07:00 -- mpg123 alarm.mp3
  • The transient service maker lets you run a command in the background as a transient service (it is similar in spirit to the systemd-run command):
herd spawn transient -d $PWD -- make -j4
  • The GOOPS interface that was deprecated in 0.10.x is now gone.

As always, the NEWS file has additional details.

In the coming weeks, we will most likely gradually move service definitions in Guix from mcron to timed services and similarly replace Rottlog and syslogd. This should be an improvement for Guix users and system administrators!

Cute code

I did mention that the Shepherd is minimalist, and it really is: 7.4K lines of Scheme, excluding tests, according to SLOCCount. This is in large part thanks to the use of a high-level memory-safe language and due to the fact that it’s extensible—peripheral features can live outside the Shepherd.

Significant benefits also come from the concurrency framework: the concurrent sequential processes (CSP) model and Fibers. Internally, the state of each service is encapsulated in a fiber. Accessing a service’s state amounts to sending a message to its fiber. This way to structure code is itself very much inspired by the actor model. This results in simpler code (no dreaded event loop, no callback hell) and better separation of concern.

Using a high-level framework like Fibers does come with its challenges. For example, we had the case of a memory leak in Fibers under certain conditions, and we certainly don’t want that in PID 1. But the challenge really lies in squashing those low-level bugs so that the foundation is solid. The Shepherd itself is free from such low-level issues; its logic is easy to reason about and that alone is immensely helpful, it allows us to extend the code without fear, and it avoids concurrency bugs that plague programs written in the more common event-loop-with-callbacks style.

In fact, thanks to all this, the Shepherd is probably the coolest init system to hack on. It even comes with a REPL for live hacking!

What’s next

There’s a number of down-to-earth improvements that can be made in the Shepherd, such as adding support for dynamically-reconfigurable services (being able to restart a service but with different options), integration with control groups (“cgroups”) on Linux, proper integration for software suspend, etc.

In the longer run, we envision an exciting journey towards a distributed and capability-style Shepherd. Spritely Goblins provides the foundation for this; using it looks like a natural continuation of the design work of the Shepherd: Goblins is an actor model framework! Juliana Sims has been working on adapting the Shepherd to Goblins and we’re eager to see what comes out of it in the coming year. Stay tuned!

Enjoy!

In the meantime, we hope you enjoy the Shepherd 1.0 as much as we enjoyed making it. Four people contributed code that led to this release, but there are other ways to help: through graphics and web design, translation, documentation, and more. Join us!

Originally published on the Shepherd web site.

Categories: FLOSS Project Planets

PyCharm: Introduction to Sentiment Analysis in Python

Planet Python - Thu, 2024-12-12 05:01

Sentiment analysis is one of the most popular ways to analyze text. It allows us to see at a glance how people are feeling across a wide range of areas and has useful applications in fields like customer service, market and product research, and competitive analysis.

Like any area of natural language processing (NLP), sentiment analysis can get complex. Luckily, Python has excellent packages and tools that make this branch of NLP much more approachable.

In this blog post, we’ll explore some of the most popular packages for analyzing sentiment in Python, how they work, and how you can train your own sentiment analysis model using state-of-the-art techniques. We’ll also look at some PyCharm features that make working with these packages easier and faster.

What is sentiment analysis?

Sentiment analysis is the process of analyzing a piece of text to determine its emotional tone. As you can probably see from this definition, sentiment analysis is a very broad field that incorporates a wide variety of methods within the field of natural language processing.

There are many ways to define “emotional tone”. The most commonly used methods determine the valence or polarity of a piece of text – that is, how positive or negative the sentiment expressed in a text is. Emotional tone is also usually treated as a text classification problem, where text is categorized as either positive or negative.

Take the following Amazon product review:

This is obviously not a happy customer, and sentiment analysis techniques would classify this review as negative.

Contrast this with a much more satisfied buyer:

This time, sentiment analysis techniques would classify this as positive.

Different types of sentiment analysis

There are multiple ways of extracting emotional information from text. Let’s review a few of the most important ones.

Ways of defining sentiment

First, sentiment analysis approaches have several different ways of defining sentiment or emotion.

Binary: This is where the valence of a document is divided into two categories, either positive or negative, as with the SST-2 dataset. Related to this are classifications of valence that add a neutral class (where a text expresses no sentiment about a topic) or even a conflict class (where a text expresses both positive and negative sentiment about a topic).

Some sentiment analyzers use a related measure to classify texts into subjective or objective.

Fine-grained: This term describes several different ways of approaching sentiment analysis, but here it refers to breaking down positive and negative valence into a Likert scale. A well-known example of this is the SST-5 dataset, which uses a five-point Likert scale with the classes very positive, positive, neutral, negative, and very negative.

Continuous: The valence of a piece of text can also be measured continuously, with scores indicating how positive or negative the sentiment of the writer was. For example, the VADER sentiment analyzer gives a piece of text a score between –1 (strongly negative) and 1 (strongly positive), with scores close to 0 indicating a neutral sentiment.

Emotion-based: Also known as emotion detection or emotion identification, this approach attempts to detect the specific emotion being expressed in a piece of text. You can approach this in two ways. Categorical emotion detection tries to classify the sentiment expressed by a text into one of a handful of discrete emotions, usually based on the Ekman model, which includes anger, disgust, fear, joy, sadness, and surprise. A number of datasets exist for this type of emotion detection. Dimensional emotional detection is less commonly used in sentiment analysis and instead tries to measure three emotional aspects of a piece of text: polarity, arousal (how exciting a feeling is), and dominance (how restricted the emotional expression is).

Levels of analysis

We can also consider different levels at which we can analyze a piece of text. To understand this better, let’s consider another review of the coffee maker:

Document-level: This is the most basic level of analysis, where one sentiment for an entire piece of text will be returned. Document-level analysis might be fine for very short pieces of text, such as Tweets, but can give misleading answers if there is any mixed sentiment. For example, if we based the sentiment analysis for this review on the whole document, it would likely be classified as neutral or conflict, as we have two opposing sentiments about the same coffee machine.

Sentence-level: This is where the sentiment for each sentence is predicted separately. For the coffee machine review, sentence-level analysis would tell us that the reviewer felt positively about some parts of the product but negatively about others. However, this analysis doesn’t tell us what things the reviewer liked and disliked about the coffee machine.

Aspect-based: This type of sentiment analysis dives deeper into a piece of text and tries to understand the sentiment of users about specific aspects. For our review of the coffee maker, the reviewer mentioned two aspects: appearance and noise. By extracting these aspects, we have more information about what the user specifically did and did not like. They had a positive sentiment about the machine’s appearance but a negative sentiment about the noise it made.

Coupling sentiment analysis with other NLP techniques

Intent-based: In this final type of sentiment analysis, the text is classified in two ways: in terms of the sentiment being expressed, and the topic of the text. For example, if a telecommunication company receives a ticket complaining about how often their service goes down, they could classify the text intent or topic as service reliability and the sentiment as negative. As with aspect-based sentiment analysis, this analysis gives the company much more information than knowing whether their customers are generally happy or unhappy.

Applications of sentiment analysis

By now, you can probably already think of some potential use cases for sentiment analysis. Basically, it can be used anywhere that you could get text feedback or opinions about a topic. Organizations or individuals can use sentiment analysis to do social media monitoring and see how people feel about a brand, government entity, or topic.

Customer feedback analysis can be used to find out the sentiments expressed in feedback or tickets. Product reviews can be analyzed to see how satisfied or dissatisfied people are with a company’s products. Finally, sentiment analysis can be a key component in market research and competitive analysis, where how people feel about emerging trends, features, and competitors can help guide a company’s strategies.

How does sentiment analysis work?

At a general level, sentiment analysis operates by linking words (or, in more sophisticated models, the overall tone of a text) to an emotion. The most common approaches to sentiment analysis fall into one of the three methods below.

Lexicon-based approaches

These methods rely on a lexicon that includes sentiment scores for a range of words. They combine these scores using a set of rules to get the overall sentiment for a piece of text. These methods tend to be very fast and also have the advantage of yielding more fine-grained continuous sentiment scores. However, as the lexicons need to be handcrafted, they can be time-consuming and expensive to produce.

Machine learning models

These methods train a machine learning model, most commonly a Naive Bayes classifier, on a dataset that contains text and their sentiment labels, such as movie reviews. In this model, texts are generally classified as positive, negative, and sometimes neutral. These models also tend to be very fast, but as they usually don’t take into account the relationship between words in the input, they may struggle with more complex texts that involve qualifiers and negations.

Large language models

These methods rely on fine-tuning a pre-trained transformer-based large language model on the same datasets used to train the machine learning classifiers mentioned earlier. These sophisticated models are capable of modeling complex relationships between words in a piece of text but tend to be slower than the other two methods.

Sentiment analysis in Python

Python has a rich ecosystem of packages for NLP, meaning you are spoiled for choice when doing sentiment analysis in this language.

Let’s review some of the most popular Python packages for sentiment analysis.

The best Python libraries for sentiment analysis VADER

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a popular lexicon-based sentiment analyzer. Built into the powerful NLTK package, this analyzer returns four sentiment scores: the degree to which the text was positive, neutral, or negative, as well as a compound sentiment score. The positive, neutral, and negative scores range from 0 to 1 and indicate the proportion of the text that was positive, neutral, or negative. The compound score ranges from –1 (extremely negative) to 1 (extremely positive) and indicates the overall sentiment valence of the text.

Let’s look at a basic example of how it works:

from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltk

We first need to download the VADER lexicon.

nltk.download('vader_lexicon')

We can then instantiate the VADER SentimentIntensityAnalyzer() and extract the sentiment scores using the polarity_scores() method.

analyzer = SentimentIntensityAnalyzer() sentence = "I love PyCharm! It's my favorite Python IDE." sentiment_scores = analyzer.polarity_scores(sentence) print(sentiment_scores) {'neg': 0.0, 'neu': 0.572, 'pos': 0.428, 'compound': 0.6696}

We can see that VADER has given this piece of text an overall sentiment score of 0.67 and classified its contents as 43% positive, 57% neutral, and 0% negative.

VADER works by looking up the sentiment scores for each word in its lexicon and combining them using a nuanced set of rules. For example, qualifiers can increase or decrease the intensity of a word’s sentiment, so a qualifier such as “a bit” before a word would decrease the sentiment intensity, but “extremely” would amplify it.

VADER’s lexicon includes abbreviations such as “smh” (shaking my head) and emojis, making it particularly suitable for social media text. VADER’s main limitation is that it doesn’t work for languages other than English, but you can use projects such as vader-multi as an alternative. I wrote about how VADER works if you’re interested in taking a deeper dive into this package.

NLTK

Additionally, you can use NLTK to train your own machine learning-based sentiment classifier, using classifiers from scikit-learn.

There are many ways of processing the text to feed into these models, but the simplest way is doing it based on the words that are present in the text, a type of text modeling called the bag-of-words approach. The most straightforward type of bag-of-words modeling is binary vectorisation, where each word is treated as a feature, with the value of that feature being either 0 or 1 (whether the word is absent or present in the text, respectively).

If you’re new to working with text data and NLP, and you’d like more information about how text can be converted into inputs for machine learning models, I gave a talk on this topic that provides a gentle introduction.

You can see an example in the NLTK documentation, where a Naive Bayes classifier is trained to predict whether a piece of text is subjective or objective. In this example, they add an additional negation qualifier to some of the terms based on rules which indicate whether that word or character is likely involved in negating a sentiment expressed elsewhere in the text. Real Python also has a sentiment analysis tutorial on training your own classifiers using NLTK, if you want to learn more about this topic.

Pattern and TextBlob

The Pattern package provides another lexicon-based approach to analyzing sentiment. It uses the SentiWordNet lexicon, where each synonym group (synset) from WordNet is assigned a score for positivity, negativity, and objectivity. The positive and negative scores for each word are combined using a series of rules to give a final polarity score. Similarly, the objectivity score for each word is combined to give a final subjectivity score.

As WordNet contains part-of-speech information, the rules can take into account whether adjectives or adverbs preceding a word modify its sentiment. The ruleset also considers negations, exclamation marks, and emojis, and even includes some rules to handle idioms and sarcasm.

However, Pattern as a standalone library is only compatible with Python 3.6. As such, the most common way to use Pattern is through TextBlob. By default, the TextBlob sentiment analyzer uses its own implementation of the Pattern library to generate sentiment scores.

Let’s have a look at this in action:

from textblob import TextBlob

You can see that we run the TextBlob method over our text, and then extract the sentiment using the sentiment attribute.

pattern_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.") sentiment = pattern_blob.sentiment print(f"Polarity: {sentiment.polarity}") print(f"Subjectivity: {sentiment.subjectivity}") Polarity: 0.625 Subjectivity: 0.6

For our example sentence, Pattern in TextBlob gives us a polarity score of 0.625 (relatively close to the score given by VADER), and a subjectivity score of 0.6.

But there’s also a second way of getting sentiment scores in TextBlob. This package also includes a pre-trained Naive Bayes classifier, which will label a piece of text as either positive or negative, and give you the probability of the text being either positive or negative.

To use this method, we first need to download both the punkt module and the movie-reviews dataset from NLTK, which is used to train this model.

import nltk nltk.download('movie_reviews') nltk.download('punkt') from textblob import TextBlob from textblob.sentiments import NaiveBayesAnalyzer

Once again, we need to run TextBlob over our text, but this time we add the argument analyzer=NaiveBayesAnalyzer(). Then, as before, we use the sentiment attribute to extract the sentiment scores.

nb_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.", analyzer=NaiveBayesAnalyzer()) sentiment = nb_blob.sentiment print(sentiment) Sentiment(classification='pos', p_pos=0.5851800554016624, p_neg=0.4148199445983381)

This time we end up with a label of pos (positive), with the model predicting that the text has a 59% probability of being positive and a 41% probability of being negative.

spaCy

Another option is to use spaCy for sentiment analysis. spaCy is another popular package for NLP in Python, and has a wide range of options for processing text.

The first method is by using the spacytextblob plugin to use the TextBlob sentiment analyzer as part of your spaCy pipeline. Before you can do this, you’ll first need to install both spacy and spacytextblob and download the appropriate language model.

import spacy import spacy.cli from spacytextblob.spacytextblob import SpacyTextBlob spacy.cli.download("en_core_web_sm")

We then load in this language model and add spacytextblob to our text processing pipeline. TextBlob can be used through spaCy’s pipe method, which means we can include it as part of a more complex text processing pipeline, including preprocessing steps such as part-of-speech tagging, lemmatization, and named-entity recognition. Preprocessing can normalize and enrich text, helping downstream models to get the most information out of the text inputs.

nlp = spacy.load('en_core_web_sm') nlp.add_pipe('spacytextblob')

For now, we’ll just analyze our sample sentence without preprocessing:

doc = nlp("I love PyCharm! It's my favorite Python IDE.") print('Polarity: ', doc._.polarity) print('Subjectivity: ', doc._.subjectivity) Polarity: 0.625 Subjectivity: 0.6

We get the same results as when using TextBlob above.

A second way we can do sentiment analysis in spaCy is by training our own model using the TextCategorizer class. This allows you to train a range of spaCY created models using a sentiment analysis training set. Again, as this can be used as part of the spaCy pipeline, you have many options for pre-processing your text before training your model.

Finally, you can use large language models to do sentiment analysis through spacy-llm. This allows you to prompt a variety of proprietary large language models (LLMs) from OpenAI, Anthropic, Cohere, and Google to perform sentiment analysis over your texts.

This approach works slightly differently from the other methods we’ve discussed. Instead of training the model, we can use generalist models like GPT-4 to predict the sentiment of a text. You can do this either through zero-shot learning (where a prompt but no examples are passed to the model) or few-shot learning (where a prompt and a number of examples are passed to the model).

Transformers

The final Python package for sentiment analysis we’ll discuss is Transformers from Hugging Face.

Hugging Face hosts all major open-source LLMs for free use (among other models, including computer vision and audio models), and provides a platform for training, deploying, and sharing these models. Its Transformers package offers a wide range of functionality (including sentiment analysis) for working with the LLMs hosted by Hugging Face.

Understanding the results of sentiment analyzers

Now that we’ve covered all of the ways you can do sentiment analysis in Python, you might be wondering, “How can I apply this to my own data?”

To understand this, let’s use PyCharm to compare two packages, VADER and TextBlob. Their multiple sentiment scores offer us a few different perspectives on our data. We’ll use these packages to analyze the Amazon reviews dataset.

PyCharm Professional is a powerful Python IDE for data science that supports advanced Python code completion, inspections and debugging, rich databases, Jupyter, Git, Conda, and more – all out of the box. In addition to these, you’ll also get incredibly useful features like our DataFrame Column Statistics and Chart View, as well as Hugging Face integrations that make working with LLMs much quicker and easier. In this blog post, we’ll explore PyCharm’s advanced features for working with dataframes, which will allow us to get a quick overview of how our sentiment scores are distributed between the two packages.

If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.

Activate your 3-month subscription

The first thing we need to do is load in the data. We can use the load_dataset() method from the Datasets package to download this data from the Hugging Face Hub.

from datasets import load_dataset amazon = load_dataset("fancyzhx/amazon_polarity")

You can hover over the name of the dataset to see the Hugging Face dataset card right inside PyCharm, providing you with a convenient way to get information about Hugging Face assets without leaving the IDE.

We can see the contents of this dataset here:

amazon DatasetDict({ train: Dataset({ features: ['label', 'title', 'content'], num_rows: 3600000 }) test: Dataset({ features: ['label', 'title', 'content'], num_rows: 400000 }) })

The training dataset has 3.6 million observations, and the test dataset contains 400,000. We’ll be working with the training dataset in this tutorial.

We’ll now load in the VADER SentimentIntensityAnalyzer and the TextBlob method.

from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltk nltk.download("vader_lexicon") analyzer = SentimentIntensityAnalyzer() from textblob import TextBlob

The training dataset has too many observations to comfortably visualize, so we’ll take a random sample of 1,000 reviews to represent the general sentiment of all the reviewers.

from random import sample sample_reviews = sample(amazon["train"]["content"], 1000)

Let’s now get the VADER and TextBlob scores for each of these reviews. We’ll loop over each review text, run them through the sentiment analyzers, and then attach the scores to a dedicated list.

vader_neg = [] vader_neu = [] vader_pos = [] vader_compound = [] textblob_polarity = [] textblob_subjectivity = [] for review in sample_reviews: vader_sent = analyzer.polarity_scores(review) vader_neg += [vader_sent["neg"]] vader_neu += [vader_sent["neu"]] vader_pos += [vader_sent["pos"]] vader_compound += [vader_sent["compound"]] textblob_sent = TextBlob(review).sentiment textblob_polarity += [textblob_sent.polarity] textblob_subjectivity += [textblob_sent.subjectivity]

We’ll then pop each of these lists into a pandas DataFrame as a separate column:

import pandas as pd sent_scores = pd.DataFrame({ "vader_neg": vader_neg, "vader_neu": vader_neu, "vader_pos": vader_pos, "vader_compound": vader_compound, "textblob_polarity": textblob_polarity, "textblob_subjectivity": textblob_subjectivity })

Now, we’re ready to start exploring our results.

Typically, this would be the point where we’d start creating a bunch of code for exploratory data analysis. This might be done using pandas’ describe method to get summary statistics over our columns, and writing Matplotlib or seaborn code to visualize our results. However, PyCharm has some features to speed this whole thing up.

Let’s go ahead and print our DataFrame.

sent_scores

We can see a button in the top right-hand corner, called Show Column Statistics. Clicking this gives us two different options: Compact and Detailed. Let’s select Detailed.

Now we have summary statistics provided as part of our column headers! Looking at these, we can see the VADER compound score has a mean of 0.4 (median = 0.6), while the TextBlob polarity score provides a mean of 0.2 (median = 0.2).

This result indicates that, on average, VADER tends to estimate the same set of reviews more positively than TextBlob does. It also shows that for both sentiment analyzers, we likely have more positive reviews than negative ones – we can dive into this in more detail by checking some visualizations.

Another PyCharm feature we can use is the DataFrame Chart View. The button for this function is in the top left-hand corner.

When we click on the button, we switch over to the chart editor. From here, we can create no-code visualizations straight from our DataFrame.

Let’s start with VADER’s compound score. To start creating this chart, go to Show Series Settings in the top right-hand corner.

Remove the default values for X Axis and Y Axis. Replace the X Axis value with vader_compound, and the Y Axis value with vader_compound. Click on the arrow next to the variable name in the Y Axis field, and select count.

Finally, select Histogram from the chart icons, just under Series Settings. We likely have a bimodal distribution for the VADER compound score, with a slight peak around –0.8 and a much larger one around 0.9. This peak likely represents the split of negative and positive reviews. There are also far more positive reviews than negative.

Let’s repeat the same exercise and create a histogram to see the distribution of the TextBlob polarity scores.

In contrast, TextBlob tends to rate most reviews as neutral, with very few reviews being strongly positive or negative. To understand why we have a discrepancy in the scores these two sentiment analyzers provide, let’s look at a review VADER rated as strongly positive and another that VADER rated strongly negative but that TextBlob rated as neutral.

We’ll get the index of the first review where VADER rated them as positive but TextBlob rated them as neutral:

sent_scores[(sent_scores["vader_compound"] >= 0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 42

Next, we get the index of the first review where VADER rated them as negative but TextBlob as neutral:

sent_scores[(sent_scores["vader_compound"] <= -0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 0

Let’s first retrieve the positive review:

sample_reviews[42] "I love carpet sweepers for a fast clean up and a way to conserve energy. The Ewbank Multi-Sweep is a solid, well built appliance. However, if you have pets, you will find that it takes more time cleaning the sweeper than it does to actually sweep the room. The Ewbank does pick up pet hair most effectively but emptying it is a bit awkward. You need to take a rag to clean out both dirt trays and then you need a small tooth comb to pull the hair out of the brushes and the wheels. To do a proper cleaning takes quite a bit of time. My old Bissell is easier to clean when it comes to pet hair and it does a great job. If you do not have pets, I would recommend this product because it is definitely well made and for small cleanups, it would suffice. For those who complain about appliances being made of plastic, unfortunately, these days, that's the norm. It's not great and plastic definitely does not hold up but, sadly, product quality is no longer a priority in business."

This review seems mixed, but is overall somewhat positive.

Now, let’s look at the negative review:

sample_reviews[0] 'The only redeeming feature of this Cuisinart 4-cup coffee maker is the sleek black and silver design. After that, it rapidly goes downhill. It is frustratingly difficult to pour water from the carafe into the chamber unless it\'s done extremely slow and with accurate positioning. Even then, water still tends to dribble out and create a mess. The lid, itself, is VERY poorly designed with it\'s molded, round "grip" to supposedly remove the lid from the carafe. The only way I can remove it is to insert a sharp pointed object into one of the front pouring holes and pry it off! I\'ve also occasionally had a problem with the water not filtering down through the grounds, creating a coffee ground lake in the upper chamber and a mess below. I think the designer should go back to the drawing-board for this one.'

This review is unambiguously negative. From comparing the two, VADER appears more accurate, but it does tend to overly prioritize positive terms in a piece of text.

The final thing we can consider is how subjective versus objective each review is. We’ll do this by creating a histogram of TextBlob’s subjectivity score.

Interestingly, there is a good distribution of subjectivity in the reviews, with most reviews being a mixture of subjective and objective writing. A small number of reviews are also very subjective (close to 1) or very objective (close to 0).

These scores between them give us a nice way of cutting up the data. If you need to know the objective things that people did and did not like about the products, you could look at the reviews with a low subjectivity score and VADER compound scores close to 1 and –1, respectively.

In contrast, if you want to know what people’s emotional reaction to the products are, you could take those with a high subjectivity score and high and low VADER compound scores.

Things to consider

As with any problem in natural language processing, there are a number of things to watch out for when doing sentiment analysis.

One of the biggest considerations is the language of the texts you’re trying to analyze. Many of the lexicon-based methods only work for a limited number of languages, so if you’re working with languages not supported by these lexicons, you may need to take another approach, such as using a fine-tuned LLM or training your own model(s).

As texts increase in complexity, it can also be difficult for lexicon-based analyzers and bag-of-words-based models to correctly detect sentiment. Sarcasm or more subtle context indicators can be hard for simpler models to detect, and these models may not be able to accurately classify the sentiment of such texts. LLMs may be able to handle more complex texts, but you would need to experiment with different models.

Finally, when doing sentiment analysis, the same issues also come up as when dealing with any machine learning problem. Your models will only be as good as the training data you use. If you cannot get high-quality training and testing datasets suitable to your problem domain, you will not be able to correctly predict the sentiment of your target audience.

You should also make sure that your targets are appropriate for your business problem. It might seem attractive to build a model to know whether your products make your customers “sad”, “angry”, or “disgusted”, but if this doesn’t help you make a decision about how to improve your products, then it isn’t solving your problem.

Wrapping up

In this blog post, we dove deeply into the fascinating area of Python sentiment analysis and showed how this complex field is made more approachable by a range of powerful packages.

We covered the potential applications of sentiment analysis, different ways of assessing sentiment, and the main methods of extracting sentiment from a piece of text. We also saw some helpful features in PyCharm that make working with models and interpreting their results simpler and faster.

While the field of natural language processing is currently focused intently on large language models, the older techniques of using lexicon-based analyzers or traditional machine learning models, like Naive Bayes classifiers, still have their place in sentiment analysis. These techniques shine when analyzing simpler texts, or when speed, predictions, or ease of deployment are priorities. LLMs are best suited to more complex or nuanced texts.

Now that you’ve grasped the basics, you can learn how to do sentiment analysis with LLMs in our tutorial. The step-by-step guide helps you discover how to select the right model for your task, use it for sentiment analysis, and even fine-tune it yourself.

If you’d like to continue learning about natural language processing or machine learning more broadly after finishing this blog post, here are some resources:

Get started with sentiment analysis in PyCharm today

If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.

Activate your 3-month subscription
Categories: FLOSS Project Planets

Pages