Feeds
Drupal Association blog: New Critical Security Updates for Drupal 7 Highlight Importance of Drupal 7 Extended Support by Tag1
This blog post is published on behalf of Tag1.
As we count down to the end-of-life (EOL) for Drupal 7 on 5 January 2025, the Drupal Security Team has just released what is likely to be the final D7 updates from the community.
This latest security release includes important fixes for two D7 vulnerabilities: an XSS (cross-site scripting) vulnerability in Drupal core’s Overlay module and a potential object injection vulnerability, which, when combined with other vulnerabilities in Drupal core, contrib, or custom modules, could lead to Remote Code Execution. Tag1’s Ra Mänd and Fabian Franz both contributed to getting the security release out. The Drupal security team also issued multiple security releases for Drupal 7 contributed modules on the same day.
Starting January 2025, the Drupal Security team will no longer review reported issues or release security updates for Drupal 7 core or contrib modules. To address this, the Drupal Association has authorized Tag1 to be a D7 Extended Support Partner, ensuring your D7 sites stay protected with Tag1's Drupal 7 Extended Support (D7ES). We will continue to monitor for security vulnerabilities and provide updates and support to ensure your site remains safe and secure beyond January 2025.
The Critical Role of Drupal 7 Extended Support (D7ES)This security release illustrates why the Drupal community established the Drupal 7 Extended Support program (D7ES) and authorized Tag1 to become a D7 Extended Support Partner in order to commercially assume the responsibilities of the Drupal Security Team. Simply put, the question isn't whether new security issues will be found but when.
Through Tag1 D7ES, Tag1 will ensure that organizations can continue operating their Drupal 7 sites securely beyond the official EOL date, providing the critical security updates that every D7 site will inevitably need.
Why Tag1 is Your Optimal D7ES PartnerTag1 stands apart in several crucial ways:
-
We have more people on the Drupal Security team than any other Drupal consulting company or D7ES provider and you have always relied on our team to fix security issues, including these latest updates.
-
We are responsible for much of the Drupal 7 codebase. Our team includes many of the key contributors to Drupal 7, including one of only a few core committers responsible for the platform's overall architecture and many of the core component and module maintainers.
-
We are the only D7ES provider with proven experience running Drupal Extended Support, having successfully managed D6 support for over 6 years post-EOL.
-
We created and will continue to maintain the QA and testing systems for Drupal 7, a critical component that ensures the reliability you expect from Drupal updates. You can trust that our updates will work on your operating system, version of php, database, etc. - the same way that you do today.
-
By choosing Tag1, you maintain as much continuity as possible - our experts will continue operating using processes similar to what we use to build and release Drupal today, minimizing changes to your workflows and release procedures.
As we approach the EOL date, organizations running Drupal 7 sites must take proactive steps to ensure they remain secure. Enrolling in Tag1's D7ES program isn't just about maintaining security - it's about partnering with the team that has been integral to Drupal 7's security and stability from the beginning. We'll continue to provide the same level of expertise and attention to security that your organization has come to expect from Drupal.
Matt Glaman: phpstan-drupal now supports PHPStan 2.0
PHPStan 2.0 was released a month ago, a massive milestone for the project. To learn about all the changes, I recommend reading the release announcement. phpstan-drupal now has a PHPStan 2.0 compatible release: https://github.com/mglaman/phpstan-drupal/releases/tag/2.0.0. The 1.x branch will be maintained as long as a version of Drupal Core uses it, at least until Drupal 10's end-of-life near the end of 2026. If applicable, I will backport bug fixes and features to 1.x.
Matthew Garrett: When should we require that firmware be free?
Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.
How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.
But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.
So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?
I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.
Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.
This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).
RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.
The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.
For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.
But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.
As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.
Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.
[1] Yes yes SMM
comments
Matthew Garrett: Android privacy improvements break key attestation
- These aren't fixed - MAC addresses are trivially reprogrammable, and serial numbers are typically stored in reprogrammable flash at their most protected
- A malicious device could simply lie about them
Android has a broadly equivalent thing called ID Attestation. Android devices can generate a signed attestation that they have certain characteristics and identifiers, and this can be chained back to the manufacturer. Obviously providing signed proof of the device identifier is kind of problematic from a privacy perspective, so the short version[2] is that only apps installed using a corporate account rather than a normal user account are able to do this.
But that's still not ideal - the device identifiers involved included the IMEI and serial number of the device, and those could potentially be used to correlate devices across privacy boundaries since they're static[3] identifiers that are the same both inside a corporate work profile and in the normal user profile, and also remains static if you move between different employers and use the same phone[4]. So, since Android 12, ID Attestation includes an "Enterprise Specific ID" or ESID. The ESID is based on a hash of device-specific data plus the enterprise that the corporate work profile is associated with. If a device is enrolled with the same enterprise then this ID will remain static, if it's enrolled with a different enterprise it'll change, and it just doesn't exist outside the work profile at all. The other device identifiers are no longer exposed.
But device ID verification isn't enough to solve the underlying problem here. When we receive a device ID attestation we know that someone at the far end has posession of a device with that ID, but we don't know that that device is where the packets are originating. If our VPN simply has an API that asks for an attestation from a trusted device before routing packets, we could pass that on to said trusted device and then simply forward the attestation to the VPN server[5]. We need some way to prove that the the device trying to authenticate is actually that device.
The answer to this is key provenance attestation. If we can prove that an encryption key was generated on a trusted device, and that the private half of that key is stored in hardware and can't be exported, then using that key to establish a connection proves that we're actually communicating with a trusted device. TPMs are able to do this using the attestation keys generated in the Credential Activation process, giving us proof that a specific keypair was generated on a TPM that we've previously established is trusted.
Android again has an equivalent called Key Attestation. This doesn't quite work the same way as the TPM process - rather than being tied back to the same unique cryptographic identity, Android key attestation chains back through a separate cryptographic certificate chain but contains a statement about the device identity - including the IMEI and serial number. By comparing those to the values in the device ID attestation we know that the key is associated with a trusted device and we can now establish trust in that key.
"But Matthew", those of you who've been paying close attention may be saying, "Didn't Android 12 remove the IMEI and serial number from the device ID attestation?" And, well, congratulations, you were apparently paying more attention than Google. The key attestation no longer contains enough information to tie back to the device ID attestation, making it impossible to prove that a hardware-backed key is associated with a specific device ID attestation and its enterprise enrollment.
I don't think this was any sort of deliberate breakage, and it's probably more an example of shipping the org chart - my understanding is that device ID attestation and key attestation are implemented by different parts of the Android organisation and the impact of the ESID change (something that appears to be a legitimate improvement in privacy!) on key attestation was probably just not realised. But it's still a pain.
[1] Those of you paying attention may realise that what we're doing here is proving the identity of the TPM, not the identity of device it's associated with. Typically the TPM identity won't vary over the lifetime of the device, so having a one-time binding of those two identities (such as when a device is initially being provisioned) is sufficient. There's actually a spec for distributing Platform Certificates that allows device manufacturers to bind these together during manufacturing, but I last worked on those a few years back and don't know what the current state of the art there is
[2] Android has a bewildering array of different profile mechanisms, some of which are apparently deprecated, and I can never remember how any of this works, so you're not getting the long version
[3] Nominally, anyway. Cough.
[4] I wholeheartedly encourage people not to put work accounts on their personal phones, but I am a filthy hypocrite here
[5] Obviously if we have the ability to ask for attestation from a trusted device, we have access to a trusted device. Why not simply use the trusted device? The answer there may be that we've compromised one and want to do as little as possible on it in order to reduce the probability of triggering any sort of endpoint detection agent, or it may be because we want to run on a device with different security properties than those enforced on the trusted device.
comments
Top articles at OpenSource.net in 2024
OpenSource.net, a platform designed to foster knowledge sharing, was launched in September 2023. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions. Here are some of the top articles published at OpenSource.net in 2024:
Business with Open Source- Open Source projects vs products: A strategic approach (Thomas Di Giacomo)
- Open Source visibility hacks — No icky marketing needed (Olga Rusakova)
- So, You Have Your 20-Page Open Source Strategy Doc. Now What? (Amanda Katona)
- Pajamas to profit: Launch your Open Source empire (Gaël Duval)
- Demystifying Open Source as a Business (Julia Machado)
- Why single vendor is the new proprietary (Thierry Carrez)
- Open code for closed services: The Open Source paradox of the cloud (Vittorio Bertola)
- Beyond the binary: The nuances of Open Source innovation (Roberto Galoppini)
- From data to action: Using metrics to improve Open Source communities (Dawn Foster)
- Diversity, Equity and Inclusion (DEI) metrics: Breaking barriers in Open Source (Anita Ihuman)
- How to make reviewing pull requests a better experience (Alya Abbott)
- Steady in a shifting Open Source world: FreeBSD’s enduring stability (Jason Perlow)
- Celebrating 30 years of Open Source with FreeDOS (Jim Hall)
- Sustain Open Source, sustain the planet: A new conversation (Tobias Augspurger)
- Closing the Gap: Accelerating environmental Open Source (Tobias Augspurger)
- Preserving Open Values in artificial intelligence (Mia Lykou Lund)
A special thank you to the authors who have contributed with articles and Cisco for sponsoring OpenSource.net. If you are interested in contributing with articles on Open Source software, hardware, open culture, and open knowledge, please submit a proposal.
GNU Guix: The Shepherd 1.0.0 released!
Finally, twenty-one years after its inception (twenty-one!), the Shepherd leaves ZeroVer territory to enter a glorious 1.0 era. This 1.0.0 release is published today because we think Shepherd has become a solid tool, meeting user experience standards one has come to expect since systemd changed the game of free init systems and service managers alike. It’s also a major milestone for Guix, which has been relying on the Shepherd from a time when doing so counted as dogfooding.
To celebrate this release, the amazing Luis Felipe López Acevedo designed a new logo, available under CC-BY-SA, and the project got a proper web site!
Let’s first look at what the Shepherd actually is and what it can do for you.
At a glanceThe Shepherd is a minimalist but featureful service manager and as such, it herds services: it keeps track of services, their state and their dependencies, and it can start, stop, and restart them when needed. It’s a simple job; doing it right and providing users with insight and control over services is a different story.
The Shepherd consists of two commands: shepherd is the daemon that manages services, and herd is the command that lets you interact with it to inspect and control the status of services. The shepherd command can run as the first process (PID 1) and serve as the “init system”, as is the case on Guix System; or it can manage services for unprivileged users, as is the case with Guix Home. For example, running herd status ntpd as root allows me to know what the Network Time Protocol (NTP) daemon is up to:
$ sudo herd status ntpd ● Status of ntpd: It is running since Fri 06 Dec 2024 02:08:08 PM CET (2 days ago). Main PID: 11359 Command: /gnu/store/s4ra0g0ym1q1wh5jrqs60092x1nrb8h9-ntp-4.2.8p18/bin/ntpd -n -c /gnu/store/7ac2i2c6dp2f9006llg3m5vkrna7pjbf-ntpd.conf -u ntpd -g It is enabled. Provides: ntpd Requires: user-processes networking Custom action: configuration Will be respawned. Log file: /var/log/ntpd.log Recent messages (use '-n' to view more or less): 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 25 tun0 128.93.179.24:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 26 tun0 [fe80::e6b7:4575:77ef:eaf4%12]:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: new interface(s) found: waking up resolver 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 25 tun0, [128.93.179.24]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 26 tun0, [fe80::e6b7:4575:77ef:eaf4%12]:123, stats: received=0, sent=0, dropped=0, active_time=644 secsIt’s running, and it’s logging messages: the latest ones are shown here and I can open /var/log/ntpd.log to view more. Running herd stop ntpd would terminate the ntpd process, and there’s also a start and a restart action.
Services can also have custom actions; in the example above, we see there’s a configuration action. As it turns out, that action is a handy way to get the file name of the ntpd configuration file:
$ head -2 $(sudo herd configuration ntpd) driftfile /var/run/ntpd/ntp.drift pool 2.guix.pool.ntp.org iburstOf course a typical system runs quite a few services, many of which depend on one another. The herd graph command returns a representation of that service dependency graph that can be piped to dot or xdot to visualize it; here’s what I get on my laptop:
It’s quite a big graph (you can zoom in for details!) but we can learn a few things from it. Each node in the graph is a service; rectangles are for “regular” services (typically daemons like ntpd), round nodes correspond to one-shot services (services that perform one action and immediately stop), and diamonds are for timed services (services that execute code periodically).
Blurring the user/developer lineA unique feature of the Shepherd is that you configure and extend it in its own implementation language: in Guile Scheme. That does not mean you need to be an expert in that programming language to get started. Instead, we try to make sure anyone can start simple for their configuration file and gradually get to learn more if and when they feel the need for it. With this approach, we keep the user in the loop, as Andy Wingo put it.
A Shepherd configuration file is a Scheme snippet that goes like this:
(register-services (list (service '(ntpd) …) …)) (start-in-the-background '(ntpd …))Here we define ntpd and get it started as soon as shepherd has read the configuration file. The ellipses can be filled in with more services.
As an example, our ntpd service is defined like this:
(service '(ntpd) #:documentation "Run the Network Time Protocol (NTP) daemon." #:requirement '(user-processes networking) #:start (make-forkexec-constructor (list "…/bin/ntpd" "-n" "-c" "/…/…-ntpd.conf" "-u" "ntpd" "-g") #:log-file "/var/log/ntpd.log") #:stop (make-kill-destructor) #:respawn? #t)The important parts here are #:start bit, which says how to start the service, and #:stop, which says how to stop it. In this case we’re just spawning the ntpd program but other startup mechanisms are supported by default: inetd, socket activation à la systemd, and timers. Check out the manual for examples and a reference.
There’s no limit to what #:start and #:stop can do. In Guix System you’ll find services that run daemons in containers, that mount/unmount file systems (as can be guessed from the graph above), that set up/tear down a static networking configuration, and a variety of other things. The Swineherd project goes as far as extending the Shepherd to turn it into a tool to manage system containers—similar to what the Docker daemon does.
Note that when writing service definitions for Guix System and Guix Home, you’re targeting a thin layer above the Shepherd programming interface. As is customary in Guix, this is multi-stage programming: G-expressions specified in the start and stop fields are staged and make it into the resulting Shepherd configuration file.
New since 0.10.xFor those of you who were already using the Shepherd, here are the highlights compared to the 0.10.x series:
- Support for timed services has been added: these services spawn a command or run Scheme code periodically according to a predefined calendar.
- herd status SERVICE now shows high-level information about services (main PID, command, addresses it is listening to, etc.) instead of its mere “running value”. It also shows recently-logged messages.
- To make it easier to discover functionality, that command also displays custom actions applicable to the service, if any. It also lets you know if a replacement is pending, in which case you can restart the service to upgrade it.
- herd status root is no longer synonymous with herd status; instead it shows information about the shepherd process itself.
- On Linux, reboot --kexec lets you reboot straight into a new Linux kernel previously loaded with kexec --load.
The service collection has grown:
The new log rotation service is responsible for periodically rotating log files, compressing them, and eventually deleting them. It’s very much like similar log rotation tools from the 80’s since shepherd logs to plain text files like in the good ol’ days.
There’s a couple of be benefits that come from its integration into the Shepherd. First, it already knows all the files that services log to, so no additional configuration is needed to teach it about these files. Second, log rotation is race free: no single line of log can be lost in the process.
The new system log service what’s traditionally devoted to a separate syslogd program. The advantage of having it in shepherd is that it can start logging earlier and integrates nicely with the rest of the system.
The timer service provides functionality similar to the venerable at command, allowing you to run a command at a particular time:
- The transient service maker lets you run a command in the background as a transient service (it is similar in spirit to the systemd-run command):
- The GOOPS interface that was deprecated in 0.10.x is now gone.
As always, the NEWS file has additional details.
In the coming weeks, we will most likely gradually move service definitions in Guix from mcron to timed services and similarly replace Rottlog and syslogd. This should be an improvement for Guix users and system administrators!
Cute codeI did mention that the Shepherd is minimalist, and it really is: 7.4K lines of Scheme, excluding tests, according to SLOCCount. This is in large part thanks to the use of a high-level memory-safe language and due to the fact that it’s extensible—peripheral features can live outside the Shepherd.
Significant benefits also come from the concurrency framework: the concurrent sequential processes (CSP) model and Fibers. Internally, the state of each service is encapsulated in a fiber. Accessing a service’s state amounts to sending a message to its fiber. This way to structure code is itself very much inspired by the actor model. This results in simpler code (no dreaded event loop, no callback hell) and better separation of concern.
Using a high-level framework like Fibers does come with its challenges. For example, we had the case of a memory leak in Fibers under certain conditions, and we certainly don’t want that in PID 1. But the challenge really lies in squashing those low-level bugs so that the foundation is solid. The Shepherd itself is free from such low-level issues; its logic is easy to reason about and that alone is immensely helpful, it allows us to extend the code without fear, and it avoids concurrency bugs that plague programs written in the more common event-loop-with-callbacks style.
In fact, thanks to all this, the Shepherd is probably the coolest init system to hack on. It even comes with a REPL for live hacking!
What’s nextThere’s a number of down-to-earth improvements that can be made in the Shepherd, such as adding support for dynamically-reconfigurable services (being able to restart a service but with different options), integration with control groups (“cgroups”) on Linux, proper integration for software suspend, etc.
In the longer run, we envision an exciting journey towards a distributed and capability-style Shepherd. Spritely Goblins provides the foundation for this; using it looks like a natural continuation of the design work of the Shepherd: Goblins is an actor model framework! Juliana Sims has been working on adapting the Shepherd to Goblins and we’re eager to see what comes out of it in the coming year. Stay tuned!
Enjoy!In the meantime, we hope you enjoy the Shepherd 1.0 as much as we enjoyed making it. Four people contributed code that led to this release, but there are other ways to help: through graphics and web design, translation, documentation, and more. Join us!
Originally published on the Shepherd web site.
PyCharm: Introduction to Sentiment Analysis in Python
Sentiment analysis is one of the most popular ways to analyze text. It allows us to see at a glance how people are feeling across a wide range of areas and has useful applications in fields like customer service, market and product research, and competitive analysis.
Like any area of natural language processing (NLP), sentiment analysis can get complex. Luckily, Python has excellent packages and tools that make this branch of NLP much more approachable.
In this blog post, we’ll explore some of the most popular packages for analyzing sentiment in Python, how they work, and how you can train your own sentiment analysis model using state-of-the-art techniques. We’ll also look at some PyCharm features that make working with these packages easier and faster.
What is sentiment analysis?Sentiment analysis is the process of analyzing a piece of text to determine its emotional tone. As you can probably see from this definition, sentiment analysis is a very broad field that incorporates a wide variety of methods within the field of natural language processing.
There are many ways to define “emotional tone”. The most commonly used methods determine the valence or polarity of a piece of text – that is, how positive or negative the sentiment expressed in a text is. Emotional tone is also usually treated as a text classification problem, where text is categorized as either positive or negative.
Take the following Amazon product review:
This is obviously not a happy customer, and sentiment analysis techniques would classify this review as negative.
Contrast this with a much more satisfied buyer:
This time, sentiment analysis techniques would classify this as positive.
Different types of sentiment analysisThere are multiple ways of extracting emotional information from text. Let’s review a few of the most important ones.
Ways of defining sentimentFirst, sentiment analysis approaches have several different ways of defining sentiment or emotion.
Binary: This is where the valence of a document is divided into two categories, either positive or negative, as with the SST-2 dataset. Related to this are classifications of valence that add a neutral class (where a text expresses no sentiment about a topic) or even a conflict class (where a text expresses both positive and negative sentiment about a topic).
Some sentiment analyzers use a related measure to classify texts into subjective or objective.
Fine-grained: This term describes several different ways of approaching sentiment analysis, but here it refers to breaking down positive and negative valence into a Likert scale. A well-known example of this is the SST-5 dataset, which uses a five-point Likert scale with the classes very positive, positive, neutral, negative, and very negative.
Continuous: The valence of a piece of text can also be measured continuously, with scores indicating how positive or negative the sentiment of the writer was. For example, the VADER sentiment analyzer gives a piece of text a score between –1 (strongly negative) and 1 (strongly positive), with scores close to 0 indicating a neutral sentiment.
Emotion-based: Also known as emotion detection or emotion identification, this approach attempts to detect the specific emotion being expressed in a piece of text. You can approach this in two ways. Categorical emotion detection tries to classify the sentiment expressed by a text into one of a handful of discrete emotions, usually based on the Ekman model, which includes anger, disgust, fear, joy, sadness, and surprise. A number of datasets exist for this type of emotion detection. Dimensional emotional detection is less commonly used in sentiment analysis and instead tries to measure three emotional aspects of a piece of text: polarity, arousal (how exciting a feeling is), and dominance (how restricted the emotional expression is).
Levels of analysisWe can also consider different levels at which we can analyze a piece of text. To understand this better, let’s consider another review of the coffee maker:
Document-level: This is the most basic level of analysis, where one sentiment for an entire piece of text will be returned. Document-level analysis might be fine for very short pieces of text, such as Tweets, but can give misleading answers if there is any mixed sentiment. For example, if we based the sentiment analysis for this review on the whole document, it would likely be classified as neutral or conflict, as we have two opposing sentiments about the same coffee machine.
Sentence-level: This is where the sentiment for each sentence is predicted separately. For the coffee machine review, sentence-level analysis would tell us that the reviewer felt positively about some parts of the product but negatively about others. However, this analysis doesn’t tell us what things the reviewer liked and disliked about the coffee machine.
Aspect-based: This type of sentiment analysis dives deeper into a piece of text and tries to understand the sentiment of users about specific aspects. For our review of the coffee maker, the reviewer mentioned two aspects: appearance and noise. By extracting these aspects, we have more information about what the user specifically did and did not like. They had a positive sentiment about the machine’s appearance but a negative sentiment about the noise it made.
Coupling sentiment analysis with other NLP techniquesIntent-based: In this final type of sentiment analysis, the text is classified in two ways: in terms of the sentiment being expressed, and the topic of the text. For example, if a telecommunication company receives a ticket complaining about how often their service goes down, they could classify the text intent or topic as service reliability and the sentiment as negative. As with aspect-based sentiment analysis, this analysis gives the company much more information than knowing whether their customers are generally happy or unhappy.
Applications of sentiment analysisBy now, you can probably already think of some potential use cases for sentiment analysis. Basically, it can be used anywhere that you could get text feedback or opinions about a topic. Organizations or individuals can use sentiment analysis to do social media monitoring and see how people feel about a brand, government entity, or topic.
Customer feedback analysis can be used to find out the sentiments expressed in feedback or tickets. Product reviews can be analyzed to see how satisfied or dissatisfied people are with a company’s products. Finally, sentiment analysis can be a key component in market research and competitive analysis, where how people feel about emerging trends, features, and competitors can help guide a company’s strategies.
How does sentiment analysis work?At a general level, sentiment analysis operates by linking words (or, in more sophisticated models, the overall tone of a text) to an emotion. The most common approaches to sentiment analysis fall into one of the three methods below.
Lexicon-based approachesThese methods rely on a lexicon that includes sentiment scores for a range of words. They combine these scores using a set of rules to get the overall sentiment for a piece of text. These methods tend to be very fast and also have the advantage of yielding more fine-grained continuous sentiment scores. However, as the lexicons need to be handcrafted, they can be time-consuming and expensive to produce.
Machine learning modelsThese methods train a machine learning model, most commonly a Naive Bayes classifier, on a dataset that contains text and their sentiment labels, such as movie reviews. In this model, texts are generally classified as positive, negative, and sometimes neutral. These models also tend to be very fast, but as they usually don’t take into account the relationship between words in the input, they may struggle with more complex texts that involve qualifiers and negations.
Large language modelsThese methods rely on fine-tuning a pre-trained transformer-based large language model on the same datasets used to train the machine learning classifiers mentioned earlier. These sophisticated models are capable of modeling complex relationships between words in a piece of text but tend to be slower than the other two methods.
Sentiment analysis in PythonPython has a rich ecosystem of packages for NLP, meaning you are spoiled for choice when doing sentiment analysis in this language.
Let’s review some of the most popular Python packages for sentiment analysis.
The best Python libraries for sentiment analysis VADERVADER (Valence Aware Dictionary and Sentiment Reasoner) is a popular lexicon-based sentiment analyzer. Built into the powerful NLTK package, this analyzer returns four sentiment scores: the degree to which the text was positive, neutral, or negative, as well as a compound sentiment score. The positive, neutral, and negative scores range from 0 to 1 and indicate the proportion of the text that was positive, neutral, or negative. The compound score ranges from –1 (extremely negative) to 1 (extremely positive) and indicates the overall sentiment valence of the text.
Let’s look at a basic example of how it works:
from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltkWe first need to download the VADER lexicon.
nltk.download('vader_lexicon')We can then instantiate the VADER SentimentIntensityAnalyzer() and extract the sentiment scores using the polarity_scores() method.
analyzer = SentimentIntensityAnalyzer() sentence = "I love PyCharm! It's my favorite Python IDE." sentiment_scores = analyzer.polarity_scores(sentence) print(sentiment_scores) {'neg': 0.0, 'neu': 0.572, 'pos': 0.428, 'compound': 0.6696}We can see that VADER has given this piece of text an overall sentiment score of 0.67 and classified its contents as 43% positive, 57% neutral, and 0% negative.
VADER works by looking up the sentiment scores for each word in its lexicon and combining them using a nuanced set of rules. For example, qualifiers can increase or decrease the intensity of a word’s sentiment, so a qualifier such as “a bit” before a word would decrease the sentiment intensity, but “extremely” would amplify it.
VADER’s lexicon includes abbreviations such as “smh” (shaking my head) and emojis, making it particularly suitable for social media text. VADER’s main limitation is that it doesn’t work for languages other than English, but you can use projects such as vader-multi as an alternative. I wrote about how VADER works if you’re interested in taking a deeper dive into this package.
NLTKAdditionally, you can use NLTK to train your own machine learning-based sentiment classifier, using classifiers from scikit-learn.
There are many ways of processing the text to feed into these models, but the simplest way is doing it based on the words that are present in the text, a type of text modeling called the bag-of-words approach. The most straightforward type of bag-of-words modeling is binary vectorisation, where each word is treated as a feature, with the value of that feature being either 0 or 1 (whether the word is absent or present in the text, respectively).
If you’re new to working with text data and NLP, and you’d like more information about how text can be converted into inputs for machine learning models, I gave a talk on this topic that provides a gentle introduction.
You can see an example in the NLTK documentation, where a Naive Bayes classifier is trained to predict whether a piece of text is subjective or objective. In this example, they add an additional negation qualifier to some of the terms based on rules which indicate whether that word or character is likely involved in negating a sentiment expressed elsewhere in the text. Real Python also has a sentiment analysis tutorial on training your own classifiers using NLTK, if you want to learn more about this topic.
Pattern and TextBlobThe Pattern package provides another lexicon-based approach to analyzing sentiment. It uses the SentiWordNet lexicon, where each synonym group (synset) from WordNet is assigned a score for positivity, negativity, and objectivity. The positive and negative scores for each word are combined using a series of rules to give a final polarity score. Similarly, the objectivity score for each word is combined to give a final subjectivity score.
As WordNet contains part-of-speech information, the rules can take into account whether adjectives or adverbs preceding a word modify its sentiment. The ruleset also considers negations, exclamation marks, and emojis, and even includes some rules to handle idioms and sarcasm.
However, Pattern as a standalone library is only compatible with Python 3.6. As such, the most common way to use Pattern is through TextBlob. By default, the TextBlob sentiment analyzer uses its own implementation of the Pattern library to generate sentiment scores.
Let’s have a look at this in action:
from textblob import TextBlobYou can see that we run the TextBlob method over our text, and then extract the sentiment using the sentiment attribute.
pattern_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.") sentiment = pattern_blob.sentiment print(f"Polarity: {sentiment.polarity}") print(f"Subjectivity: {sentiment.subjectivity}") Polarity: 0.625 Subjectivity: 0.6For our example sentence, Pattern in TextBlob gives us a polarity score of 0.625 (relatively close to the score given by VADER), and a subjectivity score of 0.6.
But there’s also a second way of getting sentiment scores in TextBlob. This package also includes a pre-trained Naive Bayes classifier, which will label a piece of text as either positive or negative, and give you the probability of the text being either positive or negative.
To use this method, we first need to download both the punkt module and the movie-reviews dataset from NLTK, which is used to train this model.
import nltk nltk.download('movie_reviews') nltk.download('punkt') from textblob import TextBlob from textblob.sentiments import NaiveBayesAnalyzerOnce again, we need to run TextBlob over our text, but this time we add the argument analyzer=NaiveBayesAnalyzer(). Then, as before, we use the sentiment attribute to extract the sentiment scores.
nb_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.", analyzer=NaiveBayesAnalyzer()) sentiment = nb_blob.sentiment print(sentiment) Sentiment(classification='pos', p_pos=0.5851800554016624, p_neg=0.4148199445983381)This time we end up with a label of pos (positive), with the model predicting that the text has a 59% probability of being positive and a 41% probability of being negative.
spaCyAnother option is to use spaCy for sentiment analysis. spaCy is another popular package for NLP in Python, and has a wide range of options for processing text.
The first method is by using the spacytextblob plugin to use the TextBlob sentiment analyzer as part of your spaCy pipeline. Before you can do this, you’ll first need to install both spacy and spacytextblob and download the appropriate language model.
import spacy import spacy.cli from spacytextblob.spacytextblob import SpacyTextBlob spacy.cli.download("en_core_web_sm")We then load in this language model and add spacytextblob to our text processing pipeline. TextBlob can be used through spaCy’s pipe method, which means we can include it as part of a more complex text processing pipeline, including preprocessing steps such as part-of-speech tagging, lemmatization, and named-entity recognition. Preprocessing can normalize and enrich text, helping downstream models to get the most information out of the text inputs.
nlp = spacy.load('en_core_web_sm') nlp.add_pipe('spacytextblob')For now, we’ll just analyze our sample sentence without preprocessing:
doc = nlp("I love PyCharm! It's my favorite Python IDE.") print('Polarity: ', doc._.polarity) print('Subjectivity: ', doc._.subjectivity) Polarity: 0.625 Subjectivity: 0.6We get the same results as when using TextBlob above.
A second way we can do sentiment analysis in spaCy is by training our own model using the TextCategorizer class. This allows you to train a range of spaCY created models using a sentiment analysis training set. Again, as this can be used as part of the spaCy pipeline, you have many options for pre-processing your text before training your model.
Finally, you can use large language models to do sentiment analysis through spacy-llm. This allows you to prompt a variety of proprietary large language models (LLMs) from OpenAI, Anthropic, Cohere, and Google to perform sentiment analysis over your texts.
This approach works slightly differently from the other methods we’ve discussed. Instead of training the model, we can use generalist models like GPT-4 to predict the sentiment of a text. You can do this either through zero-shot learning (where a prompt but no examples are passed to the model) or few-shot learning (where a prompt and a number of examples are passed to the model).
TransformersThe final Python package for sentiment analysis we’ll discuss is Transformers from Hugging Face.
Hugging Face hosts all major open-source LLMs for free use (among other models, including computer vision and audio models), and provides a platform for training, deploying, and sharing these models. Its Transformers package offers a wide range of functionality (including sentiment analysis) for working with the LLMs hosted by Hugging Face.
Understanding the results of sentiment analyzersNow that we’ve covered all of the ways you can do sentiment analysis in Python, you might be wondering, “How can I apply this to my own data?”
To understand this, let’s use PyCharm to compare two packages, VADER and TextBlob. Their multiple sentiment scores offer us a few different perspectives on our data. We’ll use these packages to analyze the Amazon reviews dataset.
PyCharm Professional is a powerful Python IDE for data science that supports advanced Python code completion, inspections and debugging, rich databases, Jupyter, Git, Conda, and more – all out of the box. In addition to these, you’ll also get incredibly useful features like our DataFrame Column Statistics and Chart View, as well as Hugging Face integrations that make working with LLMs much quicker and easier. In this blog post, we’ll explore PyCharm’s advanced features for working with dataframes, which will allow us to get a quick overview of how our sentiment scores are distributed between the two packages.
If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.
Activate your 3-month subscriptionThe first thing we need to do is load in the data. We can use the load_dataset() method from the Datasets package to download this data from the Hugging Face Hub.
from datasets import load_dataset amazon = load_dataset("fancyzhx/amazon_polarity")You can hover over the name of the dataset to see the Hugging Face dataset card right inside PyCharm, providing you with a convenient way to get information about Hugging Face assets without leaving the IDE.
We can see the contents of this dataset here:
amazon DatasetDict({ train: Dataset({ features: ['label', 'title', 'content'], num_rows: 3600000 }) test: Dataset({ features: ['label', 'title', 'content'], num_rows: 400000 }) })The training dataset has 3.6 million observations, and the test dataset contains 400,000. We’ll be working with the training dataset in this tutorial.
We’ll now load in the VADER SentimentIntensityAnalyzer and the TextBlob method.
from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltk nltk.download("vader_lexicon") analyzer = SentimentIntensityAnalyzer() from textblob import TextBlobThe training dataset has too many observations to comfortably visualize, so we’ll take a random sample of 1,000 reviews to represent the general sentiment of all the reviewers.
from random import sample sample_reviews = sample(amazon["train"]["content"], 1000)Let’s now get the VADER and TextBlob scores for each of these reviews. We’ll loop over each review text, run them through the sentiment analyzers, and then attach the scores to a dedicated list.
vader_neg = [] vader_neu = [] vader_pos = [] vader_compound = [] textblob_polarity = [] textblob_subjectivity = [] for review in sample_reviews: vader_sent = analyzer.polarity_scores(review) vader_neg += [vader_sent["neg"]] vader_neu += [vader_sent["neu"]] vader_pos += [vader_sent["pos"]] vader_compound += [vader_sent["compound"]] textblob_sent = TextBlob(review).sentiment textblob_polarity += [textblob_sent.polarity] textblob_subjectivity += [textblob_sent.subjectivity]We’ll then pop each of these lists into a pandas DataFrame as a separate column:
import pandas as pd sent_scores = pd.DataFrame({ "vader_neg": vader_neg, "vader_neu": vader_neu, "vader_pos": vader_pos, "vader_compound": vader_compound, "textblob_polarity": textblob_polarity, "textblob_subjectivity": textblob_subjectivity })Now, we’re ready to start exploring our results.
Typically, this would be the point where we’d start creating a bunch of code for exploratory data analysis. This might be done using pandas’ describe method to get summary statistics over our columns, and writing Matplotlib or seaborn code to visualize our results. However, PyCharm has some features to speed this whole thing up.
Let’s go ahead and print our DataFrame.
sent_scoresWe can see a button in the top right-hand corner, called Show Column Statistics. Clicking this gives us two different options: Compact and Detailed. Let’s select Detailed.
Now we have summary statistics provided as part of our column headers! Looking at these, we can see the VADER compound score has a mean of 0.4 (median = 0.6), while the TextBlob polarity score provides a mean of 0.2 (median = 0.2).
This result indicates that, on average, VADER tends to estimate the same set of reviews more positively than TextBlob does. It also shows that for both sentiment analyzers, we likely have more positive reviews than negative ones – we can dive into this in more detail by checking some visualizations.
Another PyCharm feature we can use is the DataFrame Chart View. The button for this function is in the top left-hand corner.
When we click on the button, we switch over to the chart editor. From here, we can create no-code visualizations straight from our DataFrame.
Let’s start with VADER’s compound score. To start creating this chart, go to Show Series Settings in the top right-hand corner.
Remove the default values for X Axis and Y Axis. Replace the X Axis value with vader_compound, and the Y Axis value with vader_compound. Click on the arrow next to the variable name in the Y Axis field, and select count.
Finally, select Histogram from the chart icons, just under Series Settings. We likely have a bimodal distribution for the VADER compound score, with a slight peak around –0.8 and a much larger one around 0.9. This peak likely represents the split of negative and positive reviews. There are also far more positive reviews than negative.
Let’s repeat the same exercise and create a histogram to see the distribution of the TextBlob polarity scores.
In contrast, TextBlob tends to rate most reviews as neutral, with very few reviews being strongly positive or negative. To understand why we have a discrepancy in the scores these two sentiment analyzers provide, let’s look at a review VADER rated as strongly positive and another that VADER rated strongly negative but that TextBlob rated as neutral.
We’ll get the index of the first review where VADER rated them as positive but TextBlob rated them as neutral:
sent_scores[(sent_scores["vader_compound"] >= 0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 42Next, we get the index of the first review where VADER rated them as negative but TextBlob as neutral:
sent_scores[(sent_scores["vader_compound"] <= -0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 0Let’s first retrieve the positive review:
sample_reviews[42] "I love carpet sweepers for a fast clean up and a way to conserve energy. The Ewbank Multi-Sweep is a solid, well built appliance. However, if you have pets, you will find that it takes more time cleaning the sweeper than it does to actually sweep the room. The Ewbank does pick up pet hair most effectively but emptying it is a bit awkward. You need to take a rag to clean out both dirt trays and then you need a small tooth comb to pull the hair out of the brushes and the wheels. To do a proper cleaning takes quite a bit of time. My old Bissell is easier to clean when it comes to pet hair and it does a great job. If you do not have pets, I would recommend this product because it is definitely well made and for small cleanups, it would suffice. For those who complain about appliances being made of plastic, unfortunately, these days, that's the norm. It's not great and plastic definitely does not hold up but, sadly, product quality is no longer a priority in business."This review seems mixed, but is overall somewhat positive.
Now, let’s look at the negative review:
sample_reviews[0] 'The only redeeming feature of this Cuisinart 4-cup coffee maker is the sleek black and silver design. After that, it rapidly goes downhill. It is frustratingly difficult to pour water from the carafe into the chamber unless it\'s done extremely slow and with accurate positioning. Even then, water still tends to dribble out and create a mess. The lid, itself, is VERY poorly designed with it\'s molded, round "grip" to supposedly remove the lid from the carafe. The only way I can remove it is to insert a sharp pointed object into one of the front pouring holes and pry it off! I\'ve also occasionally had a problem with the water not filtering down through the grounds, creating a coffee ground lake in the upper chamber and a mess below. I think the designer should go back to the drawing-board for this one.'This review is unambiguously negative. From comparing the two, VADER appears more accurate, but it does tend to overly prioritize positive terms in a piece of text.
The final thing we can consider is how subjective versus objective each review is. We’ll do this by creating a histogram of TextBlob’s subjectivity score.
Interestingly, there is a good distribution of subjectivity in the reviews, with most reviews being a mixture of subjective and objective writing. A small number of reviews are also very subjective (close to 1) or very objective (close to 0).
These scores between them give us a nice way of cutting up the data. If you need to know the objective things that people did and did not like about the products, you could look at the reviews with a low subjectivity score and VADER compound scores close to 1 and –1, respectively.
In contrast, if you want to know what people’s emotional reaction to the products are, you could take those with a high subjectivity score and high and low VADER compound scores.
Things to considerAs with any problem in natural language processing, there are a number of things to watch out for when doing sentiment analysis.
One of the biggest considerations is the language of the texts you’re trying to analyze. Many of the lexicon-based methods only work for a limited number of languages, so if you’re working with languages not supported by these lexicons, you may need to take another approach, such as using a fine-tuned LLM or training your own model(s).
As texts increase in complexity, it can also be difficult for lexicon-based analyzers and bag-of-words-based models to correctly detect sentiment. Sarcasm or more subtle context indicators can be hard for simpler models to detect, and these models may not be able to accurately classify the sentiment of such texts. LLMs may be able to handle more complex texts, but you would need to experiment with different models.
Finally, when doing sentiment analysis, the same issues also come up as when dealing with any machine learning problem. Your models will only be as good as the training data you use. If you cannot get high-quality training and testing datasets suitable to your problem domain, you will not be able to correctly predict the sentiment of your target audience.
You should also make sure that your targets are appropriate for your business problem. It might seem attractive to build a model to know whether your products make your customers “sad”, “angry”, or “disgusted”, but if this doesn’t help you make a decision about how to improve your products, then it isn’t solving your problem.
Wrapping upIn this blog post, we dove deeply into the fascinating area of Python sentiment analysis and showed how this complex field is made more approachable by a range of powerful packages.
We covered the potential applications of sentiment analysis, different ways of assessing sentiment, and the main methods of extracting sentiment from a piece of text. We also saw some helpful features in PyCharm that make working with models and interpreting their results simpler and faster.
While the field of natural language processing is currently focused intently on large language models, the older techniques of using lexicon-based analyzers or traditional machine learning models, like Naive Bayes classifiers, still have their place in sentiment analysis. These techniques shine when analyzing simpler texts, or when speed, predictions, or ease of deployment are priorities. LLMs are best suited to more complex or nuanced texts.
Now that you’ve grasped the basics, you can learn how to do sentiment analysis with LLMs in our tutorial. The step-by-step guide helps you discover how to select the right model for your task, use it for sentiment analysis, and even fine-tune it yourself.
If you’d like to continue learning about natural language processing or machine learning more broadly after finishing this blog post, here are some resources:
- Learn how to do sentiment analysis with large language models
- Start studying machine learning with PyCharm
- Explore machine learning methods in software engineering
If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.
Activate your 3-month subscriptionLostCarPark Drupal Blog: Drupal Advent Calendar day 12 - Dashboard track
We are half way through our Advent Calendar, and we open with some exciting news. The first Drupal CMS Release Candidate is now available. We have been busy trying it out, but managed to take some time out to prepare today’s Advent Calendar, with some help from Matthew Tift. Over to you, Matthew.
The first page a user encounters after logging into a Drupal site is pivotal. It sets the tone for their entire experience, often defining how they will interact with the system.
The current Drupal user pageBut with the introduction of the Dashboard initiative, that first page is about to change.
This initiative, inspired by a core…
TagsMariatta: Generating (and Sending) Conference Certificates Using Python
Not sure how common is this practice of giving out certificates to conference attendees. I’ve been attending mostly Python-related conferences in North America, and we don’t usually get any certificates here. However, when I went to Python Brasil in Manaus 2022, they gave me a certificate of attendance. And as a conference organizer, occasionally I’d receive request from a few attendees and volunteers about such certificate, saying that their employer or school requires it as proof of attendance.
Talk Python to Me: #488: Multimodal data with LanceDB
Dirk Eddelbuettel: RcppCCTZ 0.2.13 on CRAN: Maintenance
A new release 0.2.13 of RcppCCTZ is now on CRAN.
RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. And while CCTZ is made by Google(rs), it is not an official Google product. The RcppCCTZ page has a few usage examples and details. This package was the first CRAN package to use CCTZ; by now several others packages (four the last time we counted) include its sources too. Not ideal, but beyond our control.
This version include most routine package maintenance as well as one small contributed code improvement. The changes since the last CRAN release are summarised below.
Changes in version 0.2.13 (2024-12-11)No longer set compilation standard as recent R version set a sufficiently high minimum
Qualify a call to cctz::format (Michael Quinn in #44)
Routine updates to continuous integration and badges
Switch to Authors@R in DESCRIPTION
Courtesy of my CRANberries, there is a diffstat report relative to to the previous version. More details are at the RcppCCTZ page; code, issue tickets etc at the GitHub repository. If you like this or other open-source work I do, you can sponsor me at GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
Matt Layman: UV and Ruff: Next-gen Python Tooling
KDE ⚙️ Gear 24.12
View and annotate documents
Okular is much more than a PDF reader: it can open all sorts of files, sign and verify the signatures of official documents, and annotate and fill in embedded forms.
Speaking of which, we implemented support for more types of items in comboboxes of PDF forms, and improved the speed and correctness of printing.
We also made it easier to digitally sign a document, and no longer hide the signing window prematurely until the signing process is actually finished.
Kleopatra
Certificate manager and cryptography app
Kleopatra keeps track of your digital signatures, encryption keys, and certificates. It helps you sign, encrypt, and decrypt emails and confidential messages.
We redesigned Kleopatra's notepad and signing encryption dialog, as well as making the resulting messages and errors clearer. In the notepad, the text editor and the recipients view are now also shown side-by-side.
Which brings us to…
MerkuroManage your tasks, events and contacts with speed and ease
…Where the OpenPGP and S/MIME certificates of a contact are now displayed directly in Merkuro Contact. Clicking on them will open Kleopatra and show additional information.
Create Kdenlive
Video editor
Kdenlive, KDE's acclaimed video editor, keeps adding features and now lets you resize multiple items on the timeline at the same time.
KwaveSound editor
Kwave, KDE's native audio editor, has long been on the development backburner, but is now receiving updates again.
First it was ported to Qt6, which means it will work natively with Plasma 6. After that, the interface received some visual improvements in the way of new and more modern icons and a better visual indication when playback is paused.
Manage Dolphin
Manage your files
The latest changes to KDE's file explorer/manager tend heavily towards accessibility* and usability.
For starters, the main view of Dolphin was completely overhauled to make it work with screen readers, and improved the keyboard navigation: pressing Ctrl+L multiple times will switch back and forth between focusing and selecting the location bar path and focusing the view. Pressing Escape in the location bar will now move the focus to the active view. The keyboard navigation in the toolbar has also been improved, as now the elements are focused in the right order.
Dolphin's sorting of files is more natural and "human" in this version: a file called "a.txt", for example, will appear before "a 2.txt", and you can also sort your videos by duration.
When it comes to your safety and checking your files, Dolphin has overhauled the checksum and permissions tab in the Properties dialog to make it easier for you. You will see this improvement in other KDE applications too.
Finally… Dolphin goes mobile! Dolphin now includes a mobile-optimized interface for Plasma Mobile. After the addition of a selection mode and improvements to touchscreen-compatibility, Dolphin works surprisingly well on phones! That said, more work is still needed and planned over time to more closely align the user interface with typical expectations for mobile apps.
* Many of the accessibility improvements made to Dolphin 24.12 were possible thanks to funding provided by the NGI0 Entrust Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program.
KCronTask Scheduler
A less well-known utility, but also very useful, is KCron. UNIX old-timers will recognize this as a frontend for the venerable cron command of yore. For the rest of you, it lets you schedule any kind of jobs to run at any time on your machine.
Once installed, you will find it in System Settings under Session > Task Scheduler. In the new version, KCron's configuration page was ported to QML and given a fancy new look.
KDE Connect
Seamless connection of your devices
KDE Connect is our popular app for connecting your desktop with your phone and, indeed, all your other devices. It allows you to share files, clipboards, and resources, as well as providing a remote control for media players, input devices, and presentations.
Great news: Bluetooth support for KDE Connect now works! Plus KDE Connect starts up much faster on macOS, dropping from from 3s to 100ms!
In the looks department, the list of devices you can connect to now shows the connected and remembered devices separately, and the list of plugins can be filtered and comes with icons.
KRDC
Connect with RDP or VNC to another computer
If you need to access a remote desktop from your computer, you can start KRDC by opening a .rdp file containing the RDP connection configuration. KRDC now works much better on Wayland too.
Travel KDE ItineraryDigital travel assistant
The biggest change to your KDE travel assistant is how it handles concert, train, bus, and flight tickets, as well as hotel reservations. Itinerary now groups entries into individual trips, with each of them having their own timeline.
Itinerary suggests an appropriate existing trip when importing a new ticket, and displays some statistics about your trip, like the CO2 emissions, the distance traveled, and the costs (if available). Whole trips can also be exported directly and displayed on a map.
Itinerary can now handle geo:// URLs by opening the "Plan Trip" page with a pre-selected arrival location. This is supported both on Android and Linux.
Itinerary now supports search for places (e.g. street names) in addition to stops, and can show the date of the connection when searching for a public transport connection.
New services supported by Itinerary include:
- GoOut tickets (an event platform in Poland, Czechia and Slovakia)
- The Luma and Dimedis Fairmate event ticket sale systems
- Colosseum Ticket in Rome
- Droplabs, a Polish online ticket sale system
- The Leo Express train booking platform
- Google Maps links
- European Sleeper seat reservations in French
- Thai state railway tickets
- VietJet Air
- planway.com
- Koleo
- Reisnordland ferries
- Reservix
…And more.
KongressConference companion
Kongress is an app which helps you navigate conferences and events.
The newest version will display more information in the event list. This includes whether the event is in your bookmarked events and the locations within the event (e.g. the rooms).
MarbleVirtual Globe
Marble is a virtual globe and world atlas. It has recently been ported to Qt6 and its very old Kirigami looks were largely rewritten and modernized.
Marble Behaim — a special version of Marble that lets you explore the oldest globe representation of the Earth known to exist — now also works.
Communicate Tokodon
Browse the Fediverse
Tokodon is your gateway into the Fediverse.
Developers of KDE's desktop and phone app have worked hard to improve your experience when accessing Mastodon for the first time. We have redesigned the welcome page, and, more importantly, Tokodon now fetches a list of public servers to simplify the registration process.
We have also focused on safety, so now you can forcibly remove users from your followers list. A safety page has been added to the Tokodon settings to manage the list of muted and blocked users.
So you can travel further through the Fediverse, Tokodon has improved the support for alternative server implementations, such as GoToSocial, Iceshrimp.NET, and Pixelfed. Tokodon has also added "News" and "Users" tabs to the Explore page.
We also added a new "Following" feed, to quickly page through your follows and their feeds. It's now easier to start private conversations or mention users right from their profile page.
Tokodon now supports quoting posts, and when you are writing a post, your user info is on display, which is useful if you post from multiple accounts. Right clicking on a link on a post will show a context menu allowing users to copy or share the URL directly.
Finally, a proper grid view for the media tab has been added in the profile page.
NeoChat
Chat on Matrix
NeoChat gives you a convenient way to interact with users on the Matrix chat network.
As your trust and safety are important when talking with strangers, you now have the option to block images and videos by default, and we implemented a Matrix Spec that redirects searches for harmful and potentially illegal content to a support message.
Besides that, when replying to users you ignored, your message will not be shown, avoiding accidentally interacting with disagreeable people. We have also improved the Security settings page to be more relevant and useful to normal users.
NeoChat's looks and usability have also improved and include a nicer emoji picker, more room list sorting options, a more complete message context menu, and better-looking polls.
Develop Kate
Advanced text editor
Instead of big features, devs have concentrated on the small things this time around, aiming to improve the overall experience. For example, Kate now starts up faster and gives visual cues of the Git status ("modified" or "staged") within the Project tree.
The order of the tabs is correctly remembered when restoring a previous session, and the options of the LSP Servers are more easily discoverable as they are no longer only available via a context menu, but also within a menu button at the top.
Kate's inline code formatting tooltips have been improved and can now also be displayed in a special context tool view, plus plugins now work on Windows, and have been expanded to include an out-of-the-box support for Flutter debugging.
The Quick Open tool lets you search and browse the projects open in the current session, and a Reopen latest closed documents option has been added to the tab context menu.
And all this too…- Francis, the app that helps you plan your work sessions and avoid fatigue, lets you skip the current phase of work or break time in its new version.
- Konqueror, our venerable file explorer/web browser, comes with improved auto-filling of login information.
- The Elisa music player supports loading lyrics from .lrc files sitting alongside the song files.
- Falkon comes with a context menu for Greasemonkey. Greasemonkey lets you run little scripts that make on-the-fly changes to web page content.
- The Alligator RSS feed reader offers bookmarks for your favorite posts.
- Telly Skout, one of the newcomer apps for scheduling your TV viewing, comes with a redesigned display that lists your favorite TV channels and the TV shows that are currently airing.
Full changelog here Where to get KDE Apps
Although we fully support distributions that ship our software, KDE Gear 24.12 apps will also be available on these Linux app stores shortly:
Flathub SnapcraftIf you'd like to help us get more KDE applications into the app stores, support more app stores and get the apps better integrated into our development process, come say hi in our All About the Apps chat room.
Resolve to have a freer 2025
Python Engineering at Microsoft: Python in Visual Studio Code – December 2024 Release
We’re excited to announce the December 2024 release of the Python, Pylance and Jupyter extensions for Visual Studio Code!
This release includes the following announcements:
- Docstring generation features using Pylance and Copilot
- Python Environments extension in preview
- Pylance “full” language server mode
If you’re interested, you can check the full list of improvements in our changelogs for the Python, Jupyter and Pylance extensions.
Docstring generation using Pylance and CopilotA docstring is a string literal that appears right after the definition of a function, method, class, or module used to document the purpose and usage of the code it describes. Docstrings are essential for understanding and maintaining code, as they provide a clear explanation of what the code does, including parameters and return values. Writing docstrings manually can be time-consuming and prone to inconsistencies, however automating this process can ensure your code is well-documented, making it easier for others, and yourself, to understand and maintain. Automated docstring generation can also help enforce documentation standards across your codebase.
How to enable docstring generationTo start, open the Command Palette (Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (macOS)) and select Preferences: Open Settings (JSON).
Add the following Pylance setting to enable support for generating docstring templates automatically within VS Code:
"python.analysis.supportDocstringTemplate": trueAdd the following settings to enable generation with AI code actions:
"python.analysis.aiCodeActions": { "generateDocstring": true } Triggering docstring templates- Define Your Function or Method: def my_function(param1: int, param2: str) -> bool: pass
- Add an Empty Docstring:
- Directly below the function definition, add triple quotes for a docstring. def my_function(param1: int, param2: str) -> bool: """""" pass
- Place the Cursor Inside the Docstring:
- Place your cursor between the triple quotes. def my_function(param1: int, param2: str) -> bool: """| # Place cursor here """ pass
When using Pylance, there are different ways you can request that docstrings templates are added to your code.
Using IntelliSense Completion-
- Press Ctrl+Space (Windows/Linux) or Cmd+Space (macOS) to trigger IntelliSense completion suggestions.
- Open the Context Menu:
- Right-click inside the docstring or press Ctrl+. (Windows/Linux) or Cmd+. (macOS).
- Select Generate Docstring:
- From the context menu, select Generate Docstring.
- Pylance will suggest a docstring template based on the function signature.
- Select Generate Docstring With Copilot:
- From the context menu, select Generate Docstring With Copilot.
- Accept Suggestions:
- GitHub Copilot chat will appear. Press Accept to take the suggestions or continue to iterate with Copilot.
We’re excited to introduce the new Python Environments extension, now available in preview on the Marketplace.
This extension simplifies Python environment management with an Environments view accessible via the VS Code Activity Bar. Here you can create, delete, and switch between environments, and manage packages within the selected environment. It also uniquely supports specifying environments for specific files or entire Python projects, including multi-root and mono-repo scenarios.
By default, the extension uses the venv environment manager and pip package manager to determine how environments and packages are handled. You can customize these defaults by setting python-envs.defaultEnvManager and python-envs.defaultPackageManager to your preferred environment and package managers. Furthermore, if you have uv installed the extension will use it for quick and efficient environment creation and package installation.
Designed to integrate seamlessly with your preferred environment managers via various APIs, it supports Global Python interpreters, venv, and Conda out of the box. Developers can build extensions to add support for their favorite Python environment managers and integrate with our extension UI, enhancing functionality and user experience.
This extension is poised to eventually replace the environment functionality in the main Python extension and will be installed alongside it by default. In the meantime, you can download the Python Environments extensions from the Marketplace and use it in VS Code – Insiders (v1.96 or greater) and with the pre-release version of the Python extension (v2024.23 or greater). We are looking forward to hearing your feedback on improvements by opening issues in the vscode-python-environments repository.
Pylance “full” language server modeThe python.analysis.languageServerMode setting now supports full mode, allowing you to take advantage of the complete range of Pylance’s functionality and the most comprehensive IntelliSense experience. It’s worth noting that this comes at the cost of lower performance, as it can cause Pylance to be resource-intensive, particularly in large codebases.
The python.analysis.languageServerMode setting now changes the default values of the following settings, depending on whether it’s set to light, default or full:
Setting light default full python.analysis.exclude [“**”] [] [] python.analysis.useLibraryCodeForTypes false true true python.analysis.enablePytestSupport false true true python.analysis.indexing false true true python.analysis.autoImportCompletions false false true python.analysis.showOnlyDirectDependenciesInAutoImport false false true python.analysis.packageIndexDepths [ { "name": "sklearn", "depth": 2 }, { "name": "matplotlib", "depth": 2 }, { "name": "scipy", "depth": 2 }, { "name": "django", "depth": 2 }, { "name": "flask", "depth": 2 }, { "name": "fastapi", "depth": 2 } ] | [ { "name": "sklearn", "depth": 2 }, { "name": "matplotlib", "depth": 2 }, { "name": "scipy", "depth": 2 }, { "name": "django", "depth": 2 }, { "name": "flask", "depth": 2 }, { "name": "fastapi", "depth": 2 } ] | { "name": "", "depth": 4, "includeAllSymbols": true } python.analysis.regenerateStdLibIndices false false true python.analysis.userFileIndexingLimit 2000 2000 -1 python.analysis.includeAliasesFromUserFiles false false true python.analysis.functionReturnTypes false false true python.analysis.pytestParameters false false true python.analysis.supportRestructuredText false false true python.analysis.supportDocstringTemplate false false true Other Changes and EnhancementsWe have also added small enhancements and fixed issues requested by users that should improve your experience working with Python and Jupyter Notebooks in Visual Studio Code. Some notable changes include:
- The testing rewrite nearing default status: This release addresses the final known issue in the testing rewrite, and we plan to turn off the rewrite experiment and set it to the default in early 2025
- Python Native REPL handles window reload in @vscode-python#24021
- Leave focus on editor after Smart Send to Native REPL in @vscode-python#23843
- Add error communication around dynamic adapter activation in @vscode-python#23234
- Pytest --rootdir argument for pytest is now dynamically adjusted based on the presence of a python.testing.cwd setting in your workspace in @vscode-python#9553
- Add support for interpreter paths with spaces in the debugger extension in @vscode-python-debugger#233
- pytest-describe plugin is supported with test detection and execution in the UI in @vscode-python#21705
- Test coverage support updated to handle NoSource exceptions in @vscode-python#24366
- Restarting a test debugging session now reruns only the specified tests in @vscode-python-debugger#338
- The testing rewrite now leverages FIFO instead of UDS for inter-process communication allowing users to harness pytest plugins like pytest_socket in their own testing design in @vscode-python#23279
We would also like to extend special thanks to this month’s contributors:
- @joar Ruff 0.8.0 fixes in @vscode-python#24488
- @renan-r-santos Add native pixi locator in @vscode-python#244420
- @tomoki Fix the wrong Content-Length in python-server.py for non-ascii characters in @vscode-python#24480
Try out these new improvements by downloading the Python extension and the Jupyter extension from the Marketplace, or install them directly from the extensions view in Visual Studio Code (Ctrl + Shift + X or ⌘ + ⇧ + X). You can learn more about Python support in Visual Studio Code in the documentation. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.
The post Python in Visual Studio Code – December 2024 Release appeared first on Python.
Freelock Blog: Cache-bust pages containing embedded content
The saying goes, there are two hard problems in computer science: caching, naming things, and off-by-1 errors. While Drupal certainly has not solved the naming things, it has made a valiant attempt at a decent caching strategy. And for the most part it works great, allowing millions of lines of code to load up quickly the vast majority of the time.
This is more a tip about our favorite automation tool, the Events, Conditions, and Actions (ECA) module, and how it can get you out of a bind when Drupal caching goes too far.
The Drop Times: Jay Callicot on DrupalX, Decoupled Architectures, and the Future of Drupal Development
Divine Attah-Ohiemi: From Sisterly Wisdom to Debian Dreams: My Outreachy Journey
Discovering Open Source: How I Got Introduced
Hey there! I’m Divine Attah-Ohiemi, a sophomore studying Computer Science. My journey into the world of open source was anything but grand. It all started with a simple question to my sister: “How do people get jobs without experience?” Her answer? Open source! I dove into this vibrant community, and it felt like discovering a hidden treasure chest filled with knowledge and opportunities.
Choosing Debian: Why This Community?
Why Debian, you ask? Well, I applied to Outreachy twice, and both times, I chose Debian. It’s not just my first operating system; it feels like home. The Debian community is incredibly welcoming, like a big family gathering where everyone supports each other. Whether I was updating my distro or poring over documentation, the care and consideration in this community were palpable. It reminded me of the warmth of homeschooling with relatives. Plus, knowing that Debian's name comes from its creator Ian and his wife Debra adds a personal touch that makes me feel even more honored to contribute to making the website better!
Why I Applied to Outreachy: What Inspired Me
Outreachy is my golden ticket to the open source world! As a 19-year-old, I see this internship as a unique opportunity to gain invaluable experience while contributing to something meaningful. It’s the perfect platform for me to learn, grow, and connect with like-minded individuals who share my passion for technology and community.
I’m excited for this journey and can’t wait to see where it takes me! 🌟