Feeds
Top articles at OpenSource.net in 2024
OpenSource.net, a platform designed to foster knowledge sharing, was launched in September 2023. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions. Here are some of the top articles published at OpenSource.net in 2024:
Business with Open Source- Open Source projects vs products: A strategic approach (Thomas Di Giacomo)
- Open Source visibility hacks — No icky marketing needed (Olga Rusakova)
- So, You Have Your 20-Page Open Source Strategy Doc. Now What? (Amanda Katona)
- Pajamas to profit: Launch your Open Source empire (Gaël Duval)
- Demystifying Open Source as a Business (Julia Machado)
- Why single vendor is the new proprietary (Thierry Carrez)
- Open code for closed services: The Open Source paradox of the cloud (Vittorio Bertola)
- Beyond the binary: The nuances of Open Source innovation (Roberto Galoppini)
- From data to action: Using metrics to improve Open Source communities (Dawn Foster)
- Diversity, Equity and Inclusion (DEI) metrics: Breaking barriers in Open Source (Anita Ihuman)
- How to make reviewing pull requests a better experience (Alya Abbott)
- Steady in a shifting Open Source world: FreeBSD’s enduring stability (Jason Perlow)
- Celebrating 30 years of Open Source with FreeDOS (Jim Hall)
- Sustain Open Source, sustain the planet: A new conversation (Tobias Augspurger)
- Closing the Gap: Accelerating environmental Open Source (Tobias Augspurger)
- Preserving Open Values in artificial intelligence (Mia Lykou Lund)
A special thank you to the authors who have contributed with articles and Cisco for sponsoring OpenSource.net. If you are interested in contributing with articles on Open Source software, hardware, open culture, and open knowledge, please submit a proposal.
GNU Guix: The Shepherd 1.0.0 released!
Finally, twenty-one years after its inception (twenty-one!), the Shepherd leaves ZeroVer territory to enter a glorious 1.0 era. This 1.0.0 release is published today because we think Shepherd has become a solid tool, meeting user experience standards one has come to expect since systemd changed the game of free init systems and service managers alike. It’s also a major milestone for Guix, which has been relying on the Shepherd from a time when doing so counted as dogfooding.
To celebrate this release, the amazing Luis Felipe López Acevedo designed a new logo, available under CC-BY-SA, and the project got a proper web site!
Let’s first look at what the Shepherd actually is and what it can do for you.
At a glanceThe Shepherd is a minimalist but featureful service manager and as such, it herds services: it keeps track of services, their state and their dependencies, and it can start, stop, and restart them when needed. It’s a simple job; doing it right and providing users with insight and control over services is a different story.
The Shepherd consists of two commands: shepherd is the daemon that manages services, and herd is the command that lets you interact with it to inspect and control the status of services. The shepherd command can run as the first process (PID 1) and serve as the “init system”, as is the case on Guix System; or it can manage services for unprivileged users, as is the case with Guix Home. For example, running herd status ntpd as root allows me to know what the Network Time Protocol (NTP) daemon is up to:
$ sudo herd status ntpd ● Status of ntpd: It is running since Fri 06 Dec 2024 02:08:08 PM CET (2 days ago). Main PID: 11359 Command: /gnu/store/s4ra0g0ym1q1wh5jrqs60092x1nrb8h9-ntp-4.2.8p18/bin/ntpd -n -c /gnu/store/7ac2i2c6dp2f9006llg3m5vkrna7pjbf-ntpd.conf -u ntpd -g It is enabled. Provides: ntpd Requires: user-processes networking Custom action: configuration Will be respawned. Log file: /var/log/ntpd.log Recent messages (use '-n' to view more or less): 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 25 tun0 128.93.179.24:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 26 tun0 [fe80::e6b7:4575:77ef:eaf4%12]:123 2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: new interface(s) found: waking up resolver 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 25 tun0, [128.93.179.24]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs 2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 26 tun0, [fe80::e6b7:4575:77ef:eaf4%12]:123, stats: received=0, sent=0, dropped=0, active_time=644 secsIt’s running, and it’s logging messages: the latest ones are shown here and I can open /var/log/ntpd.log to view more. Running herd stop ntpd would terminate the ntpd process, and there’s also a start and a restart action.
Services can also have custom actions; in the example above, we see there’s a configuration action. As it turns out, that action is a handy way to get the file name of the ntpd configuration file:
$ head -2 $(sudo herd configuration ntpd) driftfile /var/run/ntpd/ntp.drift pool 2.guix.pool.ntp.org iburstOf course a typical system runs quite a few services, many of which depend on one another. The herd graph command returns a representation of that service dependency graph that can be piped to dot or xdot to visualize it; here’s what I get on my laptop:
It’s quite a big graph (you can zoom in for details!) but we can learn a few things from it. Each node in the graph is a service; rectangles are for “regular” services (typically daemons like ntpd), round nodes correspond to one-shot services (services that perform one action and immediately stop), and diamonds are for timed services (services that execute code periodically).
Blurring the user/developer lineA unique feature of the Shepherd is that you configure and extend it in its own implementation language: in Guile Scheme. That does not mean you need to be an expert in that programming language to get started. Instead, we try to make sure anyone can start simple for their configuration file and gradually get to learn more if and when they feel the need for it. With this approach, we keep the user in the loop, as Andy Wingo put it.
A Shepherd configuration file is a Scheme snippet that goes like this:
(register-services (list (service '(ntpd) …) …)) (start-in-the-background '(ntpd …))Here we define ntpd and get it started as soon as shepherd has read the configuration file. The ellipses can be filled in with more services.
As an example, our ntpd service is defined like this:
(service '(ntpd) #:documentation "Run the Network Time Protocol (NTP) daemon." #:requirement '(user-processes networking) #:start (make-forkexec-constructor (list "…/bin/ntpd" "-n" "-c" "/…/…-ntpd.conf" "-u" "ntpd" "-g") #:log-file "/var/log/ntpd.log") #:stop (make-kill-destructor) #:respawn? #t)The important parts here are #:start bit, which says how to start the service, and #:stop, which says how to stop it. In this case we’re just spawning the ntpd program but other startup mechanisms are supported by default: inetd, socket activation à la systemd, and timers. Check out the manual for examples and a reference.
There’s no limit to what #:start and #:stop can do. In Guix System you’ll find services that run daemons in containers, that mount/unmount file systems (as can be guessed from the graph above), that set up/tear down a static networking configuration, and a variety of other things. The Swineherd project goes as far as extending the Shepherd to turn it into a tool to manage system containers—similar to what the Docker daemon does.
Note that when writing service definitions for Guix System and Guix Home, you’re targeting a thin layer above the Shepherd programming interface. As is customary in Guix, this is multi-stage programming: G-expressions specified in the start and stop fields are staged and make it into the resulting Shepherd configuration file.
New since 0.10.xFor those of you who were already using the Shepherd, here are the highlights compared to the 0.10.x series:
- Support for timed services has been added: these services spawn a command or run Scheme code periodically according to a predefined calendar.
- herd status SERVICE now shows high-level information about services (main PID, command, addresses it is listening to, etc.) instead of its mere “running value”. It also shows recently-logged messages.
- To make it easier to discover functionality, that command also displays custom actions applicable to the service, if any. It also lets you know if a replacement is pending, in which case you can restart the service to upgrade it.
- herd status root is no longer synonymous with herd status; instead it shows information about the shepherd process itself.
- On Linux, reboot --kexec lets you reboot straight into a new Linux kernel previously loaded with kexec --load.
The service collection has grown:
The new log rotation service is responsible for periodically rotating log files, compressing them, and eventually deleting them. It’s very much like similar log rotation tools from the 80’s since shepherd logs to plain text files like in the good ol’ days.
There’s a couple of be benefits that come from its integration into the Shepherd. First, it already knows all the files that services log to, so no additional configuration is needed to teach it about these files. Second, log rotation is race free: no single line of log can be lost in the process.
The new system log service what’s traditionally devoted to a separate syslogd program. The advantage of having it in shepherd is that it can start logging earlier and integrates nicely with the rest of the system.
The timer service provides functionality similar to the venerable at command, allowing you to run a command at a particular time:
- The transient service maker lets you run a command in the background as a transient service (it is similar in spirit to the systemd-run command):
- The GOOPS interface that was deprecated in 0.10.x is now gone.
As always, the NEWS file has additional details.
In the coming weeks, we will most likely gradually move service definitions in Guix from mcron to timed services and similarly replace Rottlog and syslogd. This should be an improvement for Guix users and system administrators!
Cute codeI did mention that the Shepherd is minimalist, and it really is: 7.4K lines of Scheme, excluding tests, according to SLOCCount. This is in large part thanks to the use of a high-level memory-safe language and due to the fact that it’s extensible—peripheral features can live outside the Shepherd.
Significant benefits also come from the concurrency framework: the concurrent sequential processes (CSP) model and Fibers. Internally, the state of each service is encapsulated in a fiber. Accessing a service’s state amounts to sending a message to its fiber. This way to structure code is itself very much inspired by the actor model. This results in simpler code (no dreaded event loop, no callback hell) and better separation of concern.
Using a high-level framework like Fibers does come with its challenges. For example, we had the case of a memory leak in Fibers under certain conditions, and we certainly don’t want that in PID 1. But the challenge really lies in squashing those low-level bugs so that the foundation is solid. The Shepherd itself is free from such low-level issues; its logic is easy to reason about and that alone is immensely helpful, it allows us to extend the code without fear, and it avoids concurrency bugs that plague programs written in the more common event-loop-with-callbacks style.
In fact, thanks to all this, the Shepherd is probably the coolest init system to hack on. It even comes with a REPL for live hacking!
What’s nextThere’s a number of down-to-earth improvements that can be made in the Shepherd, such as adding support for dynamically-reconfigurable services (being able to restart a service but with different options), integration with control groups (“cgroups”) on Linux, proper integration for software suspend, etc.
In the longer run, we envision an exciting journey towards a distributed and capability-style Shepherd. Spritely Goblins provides the foundation for this; using it looks like a natural continuation of the design work of the Shepherd: Goblins is an actor model framework! Juliana Sims has been working on adapting the Shepherd to Goblins and we’re eager to see what comes out of it in the coming year. Stay tuned!
Enjoy!In the meantime, we hope you enjoy the Shepherd 1.0 as much as we enjoyed making it. Four people contributed code that led to this release, but there are other ways to help: through graphics and web design, translation, documentation, and more. Join us!
Originally published on the Shepherd web site.
PyCharm: Introduction to Sentiment Analysis in Python
Sentiment analysis is one of the most popular ways to analyze text. It allows us to see at a glance how people are feeling across a wide range of areas and has useful applications in fields like customer service, market and product research, and competitive analysis.
Like any area of natural language processing (NLP), sentiment analysis can get complex. Luckily, Python has excellent packages and tools that make this branch of NLP much more approachable.
In this blog post, we’ll explore some of the most popular packages for analyzing sentiment in Python, how they work, and how you can train your own sentiment analysis model using state-of-the-art techniques. We’ll also look at some PyCharm features that make working with these packages easier and faster.
What is sentiment analysis?Sentiment analysis is the process of analyzing a piece of text to determine its emotional tone. As you can probably see from this definition, sentiment analysis is a very broad field that incorporates a wide variety of methods within the field of natural language processing.
There are many ways to define “emotional tone”. The most commonly used methods determine the valence or polarity of a piece of text – that is, how positive or negative the sentiment expressed in a text is. Emotional tone is also usually treated as a text classification problem, where text is categorized as either positive or negative.
Take the following Amazon product review:
This is obviously not a happy customer, and sentiment analysis techniques would classify this review as negative.
Contrast this with a much more satisfied buyer:
This time, sentiment analysis techniques would classify this as positive.
Different types of sentiment analysisThere are multiple ways of extracting emotional information from text. Let’s review a few of the most important ones.
Ways of defining sentimentFirst, sentiment analysis approaches have several different ways of defining sentiment or emotion.
Binary: This is where the valence of a document is divided into two categories, either positive or negative, as with the SST-2 dataset. Related to this are classifications of valence that add a neutral class (where a text expresses no sentiment about a topic) or even a conflict class (where a text expresses both positive and negative sentiment about a topic).
Some sentiment analyzers use a related measure to classify texts into subjective or objective.
Fine-grained: This term describes several different ways of approaching sentiment analysis, but here it refers to breaking down positive and negative valence into a Likert scale. A well-known example of this is the SST-5 dataset, which uses a five-point Likert scale with the classes very positive, positive, neutral, negative, and very negative.
Continuous: The valence of a piece of text can also be measured continuously, with scores indicating how positive or negative the sentiment of the writer was. For example, the VADER sentiment analyzer gives a piece of text a score between –1 (strongly negative) and 1 (strongly positive), with scores close to 0 indicating a neutral sentiment.
Emotion-based: Also known as emotion detection or emotion identification, this approach attempts to detect the specific emotion being expressed in a piece of text. You can approach this in two ways. Categorical emotion detection tries to classify the sentiment expressed by a text into one of a handful of discrete emotions, usually based on the Ekman model, which includes anger, disgust, fear, joy, sadness, and surprise. A number of datasets exist for this type of emotion detection. Dimensional emotional detection is less commonly used in sentiment analysis and instead tries to measure three emotional aspects of a piece of text: polarity, arousal (how exciting a feeling is), and dominance (how restricted the emotional expression is).
Levels of analysisWe can also consider different levels at which we can analyze a piece of text. To understand this better, let’s consider another review of the coffee maker:
Document-level: This is the most basic level of analysis, where one sentiment for an entire piece of text will be returned. Document-level analysis might be fine for very short pieces of text, such as Tweets, but can give misleading answers if there is any mixed sentiment. For example, if we based the sentiment analysis for this review on the whole document, it would likely be classified as neutral or conflict, as we have two opposing sentiments about the same coffee machine.
Sentence-level: This is where the sentiment for each sentence is predicted separately. For the coffee machine review, sentence-level analysis would tell us that the reviewer felt positively about some parts of the product but negatively about others. However, this analysis doesn’t tell us what things the reviewer liked and disliked about the coffee machine.
Aspect-based: This type of sentiment analysis dives deeper into a piece of text and tries to understand the sentiment of users about specific aspects. For our review of the coffee maker, the reviewer mentioned two aspects: appearance and noise. By extracting these aspects, we have more information about what the user specifically did and did not like. They had a positive sentiment about the machine’s appearance but a negative sentiment about the noise it made.
Coupling sentiment analysis with other NLP techniquesIntent-based: In this final type of sentiment analysis, the text is classified in two ways: in terms of the sentiment being expressed, and the topic of the text. For example, if a telecommunication company receives a ticket complaining about how often their service goes down, they could classify the text intent or topic as service reliability and the sentiment as negative. As with aspect-based sentiment analysis, this analysis gives the company much more information than knowing whether their customers are generally happy or unhappy.
Applications of sentiment analysisBy now, you can probably already think of some potential use cases for sentiment analysis. Basically, it can be used anywhere that you could get text feedback or opinions about a topic. Organizations or individuals can use sentiment analysis to do social media monitoring and see how people feel about a brand, government entity, or topic.
Customer feedback analysis can be used to find out the sentiments expressed in feedback or tickets. Product reviews can be analyzed to see how satisfied or dissatisfied people are with a company’s products. Finally, sentiment analysis can be a key component in market research and competitive analysis, where how people feel about emerging trends, features, and competitors can help guide a company’s strategies.
How does sentiment analysis work?At a general level, sentiment analysis operates by linking words (or, in more sophisticated models, the overall tone of a text) to an emotion. The most common approaches to sentiment analysis fall into one of the three methods below.
Lexicon-based approachesThese methods rely on a lexicon that includes sentiment scores for a range of words. They combine these scores using a set of rules to get the overall sentiment for a piece of text. These methods tend to be very fast and also have the advantage of yielding more fine-grained continuous sentiment scores. However, as the lexicons need to be handcrafted, they can be time-consuming and expensive to produce.
Machine learning modelsThese methods train a machine learning model, most commonly a Naive Bayes classifier, on a dataset that contains text and their sentiment labels, such as movie reviews. In this model, texts are generally classified as positive, negative, and sometimes neutral. These models also tend to be very fast, but as they usually don’t take into account the relationship between words in the input, they may struggle with more complex texts that involve qualifiers and negations.
Large language modelsThese methods rely on fine-tuning a pre-trained transformer-based large language model on the same datasets used to train the machine learning classifiers mentioned earlier. These sophisticated models are capable of modeling complex relationships between words in a piece of text but tend to be slower than the other two methods.
Sentiment analysis in PythonPython has a rich ecosystem of packages for NLP, meaning you are spoiled for choice when doing sentiment analysis in this language.
Let’s review some of the most popular Python packages for sentiment analysis.
The best Python libraries for sentiment analysis VADERVADER (Valence Aware Dictionary and Sentiment Reasoner) is a popular lexicon-based sentiment analyzer. Built into the powerful NLTK package, this analyzer returns four sentiment scores: the degree to which the text was positive, neutral, or negative, as well as a compound sentiment score. The positive, neutral, and negative scores range from 0 to 1 and indicate the proportion of the text that was positive, neutral, or negative. The compound score ranges from –1 (extremely negative) to 1 (extremely positive) and indicates the overall sentiment valence of the text.
Let’s look at a basic example of how it works:
from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltkWe first need to download the VADER lexicon.
nltk.download('vader_lexicon')We can then instantiate the VADER SentimentIntensityAnalyzer() and extract the sentiment scores using the polarity_scores() method.
analyzer = SentimentIntensityAnalyzer() sentence = "I love PyCharm! It's my favorite Python IDE." sentiment_scores = analyzer.polarity_scores(sentence) print(sentiment_scores) {'neg': 0.0, 'neu': 0.572, 'pos': 0.428, 'compound': 0.6696}We can see that VADER has given this piece of text an overall sentiment score of 0.67 and classified its contents as 43% positive, 57% neutral, and 0% negative.
VADER works by looking up the sentiment scores for each word in its lexicon and combining them using a nuanced set of rules. For example, qualifiers can increase or decrease the intensity of a word’s sentiment, so a qualifier such as “a bit” before a word would decrease the sentiment intensity, but “extremely” would amplify it.
VADER’s lexicon includes abbreviations such as “smh” (shaking my head) and emojis, making it particularly suitable for social media text. VADER’s main limitation is that it doesn’t work for languages other than English, but you can use projects such as vader-multi as an alternative. I wrote about how VADER works if you’re interested in taking a deeper dive into this package.
NLTKAdditionally, you can use NLTK to train your own machine learning-based sentiment classifier, using classifiers from scikit-learn.
There are many ways of processing the text to feed into these models, but the simplest way is doing it based on the words that are present in the text, a type of text modeling called the bag-of-words approach. The most straightforward type of bag-of-words modeling is binary vectorisation, where each word is treated as a feature, with the value of that feature being either 0 or 1 (whether the word is absent or present in the text, respectively).
If you’re new to working with text data and NLP, and you’d like more information about how text can be converted into inputs for machine learning models, I gave a talk on this topic that provides a gentle introduction.
You can see an example in the NLTK documentation, where a Naive Bayes classifier is trained to predict whether a piece of text is subjective or objective. In this example, they add an additional negation qualifier to some of the terms based on rules which indicate whether that word or character is likely involved in negating a sentiment expressed elsewhere in the text. Real Python also has a sentiment analysis tutorial on training your own classifiers using NLTK, if you want to learn more about this topic.
Pattern and TextBlobThe Pattern package provides another lexicon-based approach to analyzing sentiment. It uses the SentiWordNet lexicon, where each synonym group (synset) from WordNet is assigned a score for positivity, negativity, and objectivity. The positive and negative scores for each word are combined using a series of rules to give a final polarity score. Similarly, the objectivity score for each word is combined to give a final subjectivity score.
As WordNet contains part-of-speech information, the rules can take into account whether adjectives or adverbs preceding a word modify its sentiment. The ruleset also considers negations, exclamation marks, and emojis, and even includes some rules to handle idioms and sarcasm.
However, Pattern as a standalone library is only compatible with Python 3.6. As such, the most common way to use Pattern is through TextBlob. By default, the TextBlob sentiment analyzer uses its own implementation of the Pattern library to generate sentiment scores.
Let’s have a look at this in action:
from textblob import TextBlobYou can see that we run the TextBlob method over our text, and then extract the sentiment using the sentiment attribute.
pattern_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.") sentiment = pattern_blob.sentiment print(f"Polarity: {sentiment.polarity}") print(f"Subjectivity: {sentiment.subjectivity}") Polarity: 0.625 Subjectivity: 0.6For our example sentence, Pattern in TextBlob gives us a polarity score of 0.625 (relatively close to the score given by VADER), and a subjectivity score of 0.6.
But there’s also a second way of getting sentiment scores in TextBlob. This package also includes a pre-trained Naive Bayes classifier, which will label a piece of text as either positive or negative, and give you the probability of the text being either positive or negative.
To use this method, we first need to download both the punkt module and the movie-reviews dataset from NLTK, which is used to train this model.
import nltk nltk.download('movie_reviews') nltk.download('punkt') from textblob import TextBlob from textblob.sentiments import NaiveBayesAnalyzerOnce again, we need to run TextBlob over our text, but this time we add the argument analyzer=NaiveBayesAnalyzer(). Then, as before, we use the sentiment attribute to extract the sentiment scores.
nb_blob = TextBlob("I love PyCharm! It's my favorite Python IDE.", analyzer=NaiveBayesAnalyzer()) sentiment = nb_blob.sentiment print(sentiment) Sentiment(classification='pos', p_pos=0.5851800554016624, p_neg=0.4148199445983381)This time we end up with a label of pos (positive), with the model predicting that the text has a 59% probability of being positive and a 41% probability of being negative.
spaCyAnother option is to use spaCy for sentiment analysis. spaCy is another popular package for NLP in Python, and has a wide range of options for processing text.
The first method is by using the spacytextblob plugin to use the TextBlob sentiment analyzer as part of your spaCy pipeline. Before you can do this, you’ll first need to install both spacy and spacytextblob and download the appropriate language model.
import spacy import spacy.cli from spacytextblob.spacytextblob import SpacyTextBlob spacy.cli.download("en_core_web_sm")We then load in this language model and add spacytextblob to our text processing pipeline. TextBlob can be used through spaCy’s pipe method, which means we can include it as part of a more complex text processing pipeline, including preprocessing steps such as part-of-speech tagging, lemmatization, and named-entity recognition. Preprocessing can normalize and enrich text, helping downstream models to get the most information out of the text inputs.
nlp = spacy.load('en_core_web_sm') nlp.add_pipe('spacytextblob')For now, we’ll just analyze our sample sentence without preprocessing:
doc = nlp("I love PyCharm! It's my favorite Python IDE.") print('Polarity: ', doc._.polarity) print('Subjectivity: ', doc._.subjectivity) Polarity: 0.625 Subjectivity: 0.6We get the same results as when using TextBlob above.
A second way we can do sentiment analysis in spaCy is by training our own model using the TextCategorizer class. This allows you to train a range of spaCY created models using a sentiment analysis training set. Again, as this can be used as part of the spaCy pipeline, you have many options for pre-processing your text before training your model.
Finally, you can use large language models to do sentiment analysis through spacy-llm. This allows you to prompt a variety of proprietary large language models (LLMs) from OpenAI, Anthropic, Cohere, and Google to perform sentiment analysis over your texts.
This approach works slightly differently from the other methods we’ve discussed. Instead of training the model, we can use generalist models like GPT-4 to predict the sentiment of a text. You can do this either through zero-shot learning (where a prompt but no examples are passed to the model) or few-shot learning (where a prompt and a number of examples are passed to the model).
TransformersThe final Python package for sentiment analysis we’ll discuss is Transformers from Hugging Face.
Hugging Face hosts all major open-source LLMs for free use (among other models, including computer vision and audio models), and provides a platform for training, deploying, and sharing these models. Its Transformers package offers a wide range of functionality (including sentiment analysis) for working with the LLMs hosted by Hugging Face.
Understanding the results of sentiment analyzersNow that we’ve covered all of the ways you can do sentiment analysis in Python, you might be wondering, “How can I apply this to my own data?”
To understand this, let’s use PyCharm to compare two packages, VADER and TextBlob. Their multiple sentiment scores offer us a few different perspectives on our data. We’ll use these packages to analyze the Amazon reviews dataset.
PyCharm Professional is a powerful Python IDE for data science that supports advanced Python code completion, inspections and debugging, rich databases, Jupyter, Git, Conda, and more – all out of the box. In addition to these, you’ll also get incredibly useful features like our DataFrame Column Statistics and Chart View, as well as Hugging Face integrations that make working with LLMs much quicker and easier. In this blog post, we’ll explore PyCharm’s advanced features for working with dataframes, which will allow us to get a quick overview of how our sentiment scores are distributed between the two packages.
If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.
Activate your 3-month subscriptionThe first thing we need to do is load in the data. We can use the load_dataset() method from the Datasets package to download this data from the Hugging Face Hub.
from datasets import load_dataset amazon = load_dataset("fancyzhx/amazon_polarity")You can hover over the name of the dataset to see the Hugging Face dataset card right inside PyCharm, providing you with a convenient way to get information about Hugging Face assets without leaving the IDE.
We can see the contents of this dataset here:
amazon DatasetDict({ train: Dataset({ features: ['label', 'title', 'content'], num_rows: 3600000 }) test: Dataset({ features: ['label', 'title', 'content'], num_rows: 400000 }) })The training dataset has 3.6 million observations, and the test dataset contains 400,000. We’ll be working with the training dataset in this tutorial.
We’ll now load in the VADER SentimentIntensityAnalyzer and the TextBlob method.
from nltk.sentiment.vader import SentimentIntensityAnalyzer import nltk nltk.download("vader_lexicon") analyzer = SentimentIntensityAnalyzer() from textblob import TextBlobThe training dataset has too many observations to comfortably visualize, so we’ll take a random sample of 1,000 reviews to represent the general sentiment of all the reviewers.
from random import sample sample_reviews = sample(amazon["train"]["content"], 1000)Let’s now get the VADER and TextBlob scores for each of these reviews. We’ll loop over each review text, run them through the sentiment analyzers, and then attach the scores to a dedicated list.
vader_neg = [] vader_neu = [] vader_pos = [] vader_compound = [] textblob_polarity = [] textblob_subjectivity = [] for review in sample_reviews: vader_sent = analyzer.polarity_scores(review) vader_neg += [vader_sent["neg"]] vader_neu += [vader_sent["neu"]] vader_pos += [vader_sent["pos"]] vader_compound += [vader_sent["compound"]] textblob_sent = TextBlob(review).sentiment textblob_polarity += [textblob_sent.polarity] textblob_subjectivity += [textblob_sent.subjectivity]We’ll then pop each of these lists into a pandas DataFrame as a separate column:
import pandas as pd sent_scores = pd.DataFrame({ "vader_neg": vader_neg, "vader_neu": vader_neu, "vader_pos": vader_pos, "vader_compound": vader_compound, "textblob_polarity": textblob_polarity, "textblob_subjectivity": textblob_subjectivity })Now, we’re ready to start exploring our results.
Typically, this would be the point where we’d start creating a bunch of code for exploratory data analysis. This might be done using pandas’ describe method to get summary statistics over our columns, and writing Matplotlib or seaborn code to visualize our results. However, PyCharm has some features to speed this whole thing up.
Let’s go ahead and print our DataFrame.
sent_scoresWe can see a button in the top right-hand corner, called Show Column Statistics. Clicking this gives us two different options: Compact and Detailed. Let’s select Detailed.
Now we have summary statistics provided as part of our column headers! Looking at these, we can see the VADER compound score has a mean of 0.4 (median = 0.6), while the TextBlob polarity score provides a mean of 0.2 (median = 0.2).
This result indicates that, on average, VADER tends to estimate the same set of reviews more positively than TextBlob does. It also shows that for both sentiment analyzers, we likely have more positive reviews than negative ones – we can dive into this in more detail by checking some visualizations.
Another PyCharm feature we can use is the DataFrame Chart View. The button for this function is in the top left-hand corner.
When we click on the button, we switch over to the chart editor. From here, we can create no-code visualizations straight from our DataFrame.
Let’s start with VADER’s compound score. To start creating this chart, go to Show Series Settings in the top right-hand corner.
Remove the default values for X Axis and Y Axis. Replace the X Axis value with vader_compound, and the Y Axis value with vader_compound. Click on the arrow next to the variable name in the Y Axis field, and select count.
Finally, select Histogram from the chart icons, just under Series Settings. We likely have a bimodal distribution for the VADER compound score, with a slight peak around –0.8 and a much larger one around 0.9. This peak likely represents the split of negative and positive reviews. There are also far more positive reviews than negative.
Let’s repeat the same exercise and create a histogram to see the distribution of the TextBlob polarity scores.
In contrast, TextBlob tends to rate most reviews as neutral, with very few reviews being strongly positive or negative. To understand why we have a discrepancy in the scores these two sentiment analyzers provide, let’s look at a review VADER rated as strongly positive and another that VADER rated strongly negative but that TextBlob rated as neutral.
We’ll get the index of the first review where VADER rated them as positive but TextBlob rated them as neutral:
sent_scores[(sent_scores["vader_compound"] >= 0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 42Next, we get the index of the first review where VADER rated them as negative but TextBlob as neutral:
sent_scores[(sent_scores["vader_compound"] <= -0.8) & (sent_scores["textblob_polarity"].between(-0.1, 0.1))].index[0] 0Let’s first retrieve the positive review:
sample_reviews[42] "I love carpet sweepers for a fast clean up and a way to conserve energy. The Ewbank Multi-Sweep is a solid, well built appliance. However, if you have pets, you will find that it takes more time cleaning the sweeper than it does to actually sweep the room. The Ewbank does pick up pet hair most effectively but emptying it is a bit awkward. You need to take a rag to clean out both dirt trays and then you need a small tooth comb to pull the hair out of the brushes and the wheels. To do a proper cleaning takes quite a bit of time. My old Bissell is easier to clean when it comes to pet hair and it does a great job. If you do not have pets, I would recommend this product because it is definitely well made and for small cleanups, it would suffice. For those who complain about appliances being made of plastic, unfortunately, these days, that's the norm. It's not great and plastic definitely does not hold up but, sadly, product quality is no longer a priority in business."This review seems mixed, but is overall somewhat positive.
Now, let’s look at the negative review:
sample_reviews[0] 'The only redeeming feature of this Cuisinart 4-cup coffee maker is the sleek black and silver design. After that, it rapidly goes downhill. It is frustratingly difficult to pour water from the carafe into the chamber unless it\'s done extremely slow and with accurate positioning. Even then, water still tends to dribble out and create a mess. The lid, itself, is VERY poorly designed with it\'s molded, round "grip" to supposedly remove the lid from the carafe. The only way I can remove it is to insert a sharp pointed object into one of the front pouring holes and pry it off! I\'ve also occasionally had a problem with the water not filtering down through the grounds, creating a coffee ground lake in the upper chamber and a mess below. I think the designer should go back to the drawing-board for this one.'This review is unambiguously negative. From comparing the two, VADER appears more accurate, but it does tend to overly prioritize positive terms in a piece of text.
The final thing we can consider is how subjective versus objective each review is. We’ll do this by creating a histogram of TextBlob’s subjectivity score.
Interestingly, there is a good distribution of subjectivity in the reviews, with most reviews being a mixture of subjective and objective writing. A small number of reviews are also very subjective (close to 1) or very objective (close to 0).
These scores between them give us a nice way of cutting up the data. If you need to know the objective things that people did and did not like about the products, you could look at the reviews with a low subjectivity score and VADER compound scores close to 1 and –1, respectively.
In contrast, if you want to know what people’s emotional reaction to the products are, you could take those with a high subjectivity score and high and low VADER compound scores.
Things to considerAs with any problem in natural language processing, there are a number of things to watch out for when doing sentiment analysis.
One of the biggest considerations is the language of the texts you’re trying to analyze. Many of the lexicon-based methods only work for a limited number of languages, so if you’re working with languages not supported by these lexicons, you may need to take another approach, such as using a fine-tuned LLM or training your own model(s).
As texts increase in complexity, it can also be difficult for lexicon-based analyzers and bag-of-words-based models to correctly detect sentiment. Sarcasm or more subtle context indicators can be hard for simpler models to detect, and these models may not be able to accurately classify the sentiment of such texts. LLMs may be able to handle more complex texts, but you would need to experiment with different models.
Finally, when doing sentiment analysis, the same issues also come up as when dealing with any machine learning problem. Your models will only be as good as the training data you use. If you cannot get high-quality training and testing datasets suitable to your problem domain, you will not be able to correctly predict the sentiment of your target audience.
You should also make sure that your targets are appropriate for your business problem. It might seem attractive to build a model to know whether your products make your customers “sad”, “angry”, or “disgusted”, but if this doesn’t help you make a decision about how to improve your products, then it isn’t solving your problem.
Wrapping upIn this blog post, we dove deeply into the fascinating area of Python sentiment analysis and showed how this complex field is made more approachable by a range of powerful packages.
We covered the potential applications of sentiment analysis, different ways of assessing sentiment, and the main methods of extracting sentiment from a piece of text. We also saw some helpful features in PyCharm that make working with models and interpreting their results simpler and faster.
While the field of natural language processing is currently focused intently on large language models, the older techniques of using lexicon-based analyzers or traditional machine learning models, like Naive Bayes classifiers, still have their place in sentiment analysis. These techniques shine when analyzing simpler texts, or when speed, predictions, or ease of deployment are priorities. LLMs are best suited to more complex or nuanced texts.
Now that you’ve grasped the basics, you can learn how to do sentiment analysis with LLMs in our tutorial. The step-by-step guide helps you discover how to select the right model for your task, use it for sentiment analysis, and even fine-tune it yourself.
If you’d like to continue learning about natural language processing or machine learning more broadly after finishing this blog post, here are some resources:
- Learn how to do sentiment analysis with large language models
- Start studying machine learning with PyCharm
- Explore machine learning methods in software engineering
If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24. You’ll then receive an activation code via email.
Activate your 3-month subscriptionLostCarPark Drupal Blog: Drupal Advent Calendar day 12 - Dashboard track
We are half way through our Advent Calendar, and we open with some exciting news. The first Drupal CMS Release Candidate is now available. We have been busy trying it out, but managed to take some time out to prepare today’s Advent Calendar, with some help from Matthew Tift. Over to you, Matthew.
The first page a user encounters after logging into a Drupal site is pivotal. It sets the tone for their entire experience, often defining how they will interact with the system.
The current Drupal user pageBut with the introduction of the Dashboard initiative, that first page is about to change.
This initiative, inspired by a core…
TagsMariatta: Generating (and Sending) Conference Certificates Using Python
Not sure how common is this practice of giving out certificates to conference attendees. I’ve been attending mostly Python-related conferences in North America, and we don’t usually get any certificates here. However, when I went to Python Brasil in Manaus 2022, they gave me a certificate of attendance. And as a conference organizer, occasionally I’d receive request from a few attendees and volunteers about such certificate, saying that their employer or school requires it as proof of attendance.
Talk Python to Me: #488: Multimodal data with LanceDB
Dirk Eddelbuettel: RcppCCTZ 0.2.13 on CRAN: Maintenance
A new release 0.2.13 of RcppCCTZ is now on CRAN.
RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. And while CCTZ is made by Google(rs), it is not an official Google product. The RcppCCTZ page has a few usage examples and details. This package was the first CRAN package to use CCTZ; by now several others packages (four the last time we counted) include its sources too. Not ideal, but beyond our control.
This version include most routine package maintenance as well as one small contributed code improvement. The changes since the last CRAN release are summarised below.
Changes in version 0.2.13 (2024-12-11)No longer set compilation standard as recent R version set a sufficiently high minimum
Qualify a call to cctz::format (Michael Quinn in #44)
Routine updates to continuous integration and badges
Switch to Authors@R in DESCRIPTION
Courtesy of my CRANberries, there is a diffstat report relative to to the previous version. More details are at the RcppCCTZ page; code, issue tickets etc at the GitHub repository. If you like this or other open-source work I do, you can sponsor me at GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
Matt Layman: UV and Ruff: Next-gen Python Tooling
KDE ⚙️ Gear 24.12
View and annotate documents
Okular is much more than a PDF reader: it can open all sorts of files, sign and verify the signatures of official documents, and annotate and fill in embedded forms.
Speaking of which, we implemented support for more types of items in comboboxes of PDF forms, and improved the speed and correctness of printing.
We also made it easier to digitally sign a document, and no longer hide the signing window prematurely until the signing process is actually finished.
Kleopatra
Certificate manager and cryptography app
Kleopatra keeps track of your digital signatures, encryption keys, and certificates. It helps you sign, encrypt, and decrypt emails and confidential messages.
We redesigned Kleopatra's notepad and signing encryption dialog, as well as making the resulting messages and errors clearer. In the notepad, the text editor and the recipients view are now also shown side-by-side.
Which brings us to…
MerkuroManage your tasks, events and contacts with speed and ease
…Where the OpenPGP and S/MIME certificates of a contact are now displayed directly in Merkuro Contact. Clicking on them will open Kleopatra and show additional information.
Create Kdenlive
Video editor
Kdenlive, KDE's acclaimed video editor, keeps adding features and now lets you resize multiple items on the timeline at the same time.
KwaveSound editor
Kwave, KDE's native audio editor, has long been on the development backburner, but is now receiving updates again.
First it was ported to Qt6, which means it will work natively with Plasma 6. After that, the interface received some visual improvements in the way of new and more modern icons and a better visual indication when playback is paused.
Manage Dolphin
Manage your files
The latest changes to KDE's file explorer/manager tend heavily towards accessibility* and usability.
For starters, the main view of Dolphin was completely overhauled to make it work with screen readers, and improved the keyboard navigation: pressing Ctrl+L multiple times will switch back and forth between focusing and selecting the location bar path and focusing the view. Pressing Escape in the location bar will now move the focus to the active view. The keyboard navigation in the toolbar has also been improved, as now the elements are focused in the right order.
Dolphin's sorting of files is more natural and "human" in this version: a file called "a.txt", for example, will appear before "a 2.txt", and you can also sort your videos by duration.
When it comes to your safety and checking your files, Dolphin has overhauled the checksum and permissions tab in the Properties dialog to make it easier for you. You will see this improvement in other KDE applications too.
Finally… Dolphin goes mobile! Dolphin now includes a mobile-optimized interface for Plasma Mobile. After the addition of a selection mode and improvements to touchscreen-compatibility, Dolphin works surprisingly well on phones! That said, more work is still needed and planned over time to more closely align the user interface with typical expectations for mobile apps.
* Many of the accessibility improvements made to Dolphin 24.12 were possible thanks to funding provided by the NGI0 Entrust Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program.
KCronTask Scheduler
A less well-known utility, but also very useful, is KCron. UNIX old-timers will recognize this as a frontend for the venerable cron command of yore. For the rest of you, it lets you schedule any kind of jobs to run at any time on your machine.
Once installed, you will find it in System Settings under Session > Task Scheduler. In the new version, KCron's configuration page was ported to QML and given a fancy new look.
KDE Connect
Seamless connection of your devices
KDE Connect is our popular app for connecting your desktop with your phone and, indeed, all your other devices. It allows you to share files, clipboards, and resources, as well as providing a remote control for media players, input devices, and presentations.
Great news: Bluetooth support for KDE Connect now works! Plus KDE Connect starts up much faster on macOS, dropping from from 3s to 100ms!
In the looks department, the list of devices you can connect to now shows the connected and remembered devices separately, and the list of plugins can be filtered and comes with icons.
KRDC
Connect with RDP or VNC to another computer
If you need to access a remote desktop from your computer, you can start KRDC by opening a .rdp file containing the RDP connection configuration. KRDC now works much better on Wayland too.
Travel KDE ItineraryDigital travel assistant
The biggest change to your KDE travel assistant is how it handles concert, train, bus, and flight tickets, as well as hotel reservations. Itinerary now groups entries into individual trips, with each of them having their own timeline.
Itinerary suggests an appropriate existing trip when importing a new ticket, and displays some statistics about your trip, like the CO2 emissions, the distance traveled, and the costs (if available). Whole trips can also be exported directly and displayed on a map.
Itinerary can now handle geo:// URLs by opening the "Plan Trip" page with a pre-selected arrival location. This is supported both on Android and Linux.
Itinerary now supports search for places (e.g. street names) in addition to stops, and can show the date of the connection when searching for a public transport connection.
New services supported by Itinerary include:
- GoOut tickets (an event platform in Poland, Czechia and Slovakia)
- The Luma and Dimedis Fairmate event ticket sale systems
- Colosseum Ticket in Rome
- Droplabs, a Polish online ticket sale system
- The Leo Express train booking platform
- Google Maps links
- European Sleeper seat reservations in French
- Thai state railway tickets
- VietJet Air
- planway.com
- Koleo
- Reisnordland ferries
- Reservix
…And more.
KongressConference companion
Kongress is an app which helps you navigate conferences and events.
The newest version will display more information in the event list. This includes whether the event is in your bookmarked events and the locations within the event (e.g. the rooms).
MarbleVirtual Globe
Marble is a virtual globe and world atlas. It has recently been ported to Qt6 and its very old Kirigami looks were largely rewritten and modernized.
Marble Behaim — a special version of Marble that lets you explore the oldest globe representation of the Earth known to exist — now also works.
Communicate Tokodon
Browse the Fediverse
Tokodon is your gateway into the Fediverse.
Developers of KDE's desktop and phone app have worked hard to improve your experience when accessing Mastodon for the first time. We have redesigned the welcome page, and, more importantly, Tokodon now fetches a list of public servers to simplify the registration process.
We have also focused on safety, so now you can forcibly remove users from your followers list. A safety page has been added to the Tokodon settings to manage the list of muted and blocked users.
So you can travel further through the Fediverse, Tokodon has improved the support for alternative server implementations, such as GoToSocial, Iceshrimp.NET, and Pixelfed. Tokodon has also added "News" and "Users" tabs to the Explore page.
We also added a new "Following" feed, to quickly page through your follows and their feeds. It's now easier to start private conversations or mention users right from their profile page.
Tokodon now supports quoting posts, and when you are writing a post, your user info is on display, which is useful if you post from multiple accounts. Right clicking on a link on a post will show a context menu allowing users to copy or share the URL directly.
Finally, a proper grid view for the media tab has been added in the profile page.
NeoChat
Chat on Matrix
NeoChat gives you a convenient way to interact with users on the Matrix chat network.
As your trust and safety are important when talking with strangers, you now have the option to block images and videos by default, and we implemented a Matrix Spec that redirects searches for harmful and potentially illegal content to a support message.
Besides that, when replying to users you ignored, your message will not be shown, avoiding accidentally interacting with disagreeable people. We have also improved the Security settings page to be more relevant and useful to normal users.
NeoChat's looks and usability have also improved and include a nicer emoji picker, more room list sorting options, a more complete message context menu, and better-looking polls.
Develop Kate
Advanced text editor
Instead of big features, devs have concentrated on the small things this time around, aiming to improve the overall experience. For example, Kate now starts up faster and gives visual cues of the Git status ("modified" or "staged") within the Project tree.
The order of the tabs is correctly remembered when restoring a previous session, and the options of the LSP Servers are more easily discoverable as they are no longer only available via a context menu, but also within a menu button at the top.
Kate's inline code formatting tooltips have been improved and can now also be displayed in a special context tool view, plus plugins now work on Windows, and have been expanded to include an out-of-the-box support for Flutter debugging.
The Quick Open tool lets you search and browse the projects open in the current session, and a Reopen latest closed documents option has been added to the tab context menu.
And all this too…- Francis, the app that helps you plan your work sessions and avoid fatigue, lets you skip the current phase of work or break time in its new version.
- Konqueror, our venerable file explorer/web browser, comes with improved auto-filling of login information.
- The Elisa music player supports loading lyrics from .lrc files sitting alongside the song files.
- Falkon comes with a context menu for Greasemonkey. Greasemonkey lets you run little scripts that make on-the-fly changes to web page content.
- The Alligator RSS feed reader offers bookmarks for your favorite posts.
- Telly Skout, one of the newcomer apps for scheduling your TV viewing, comes with a redesigned display that lists your favorite TV channels and the TV shows that are currently airing.
Full changelog here Where to get KDE Apps
Although we fully support distributions that ship our software, KDE Gear 24.12 apps will also be available on these Linux app stores shortly:
Flathub SnapcraftIf you'd like to help us get more KDE applications into the app stores, support more app stores and get the apps better integrated into our development process, come say hi in our All About the Apps chat room.
Resolve to have a freer 2025
Python Engineering at Microsoft: Python in Visual Studio Code – December 2024 Release
We’re excited to announce the December 2024 release of the Python, Pylance and Jupyter extensions for Visual Studio Code!
This release includes the following announcements:
- Docstring generation features using Pylance and Copilot
- Python Environments extension in preview
- Pylance “full” language server mode
If you’re interested, you can check the full list of improvements in our changelogs for the Python, Jupyter and Pylance extensions.
Docstring generation using Pylance and CopilotA docstring is a string literal that appears right after the definition of a function, method, class, or module used to document the purpose and usage of the code it describes. Docstrings are essential for understanding and maintaining code, as they provide a clear explanation of what the code does, including parameters and return values. Writing docstrings manually can be time-consuming and prone to inconsistencies, however automating this process can ensure your code is well-documented, making it easier for others, and yourself, to understand and maintain. Automated docstring generation can also help enforce documentation standards across your codebase.
How to enable docstring generationTo start, open the Command Palette (Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (macOS)) and select Preferences: Open Settings (JSON).
Add the following Pylance setting to enable support for generating docstring templates automatically within VS Code:
"python.analysis.supportDocstringTemplate": trueAdd the following settings to enable generation with AI code actions:
"python.analysis.aiCodeActions": { "generateDocstring": true } Triggering docstring templates- Define Your Function or Method: def my_function(param1: int, param2: str) -> bool: pass
- Add an Empty Docstring:
- Directly below the function definition, add triple quotes for a docstring. def my_function(param1: int, param2: str) -> bool: """""" pass
- Place the Cursor Inside the Docstring:
- Place your cursor between the triple quotes. def my_function(param1: int, param2: str) -> bool: """| # Place cursor here """ pass
When using Pylance, there are different ways you can request that docstrings templates are added to your code.
Using IntelliSense Completion-
- Press Ctrl+Space (Windows/Linux) or Cmd+Space (macOS) to trigger IntelliSense completion suggestions.
- Open the Context Menu:
- Right-click inside the docstring or press Ctrl+. (Windows/Linux) or Cmd+. (macOS).
- Select Generate Docstring:
- From the context menu, select Generate Docstring.
- Pylance will suggest a docstring template based on the function signature.
- Select Generate Docstring With Copilot:
- From the context menu, select Generate Docstring With Copilot.
- Accept Suggestions:
- GitHub Copilot chat will appear. Press Accept to take the suggestions or continue to iterate with Copilot.
We’re excited to introduce the new Python Environments extension, now available in preview on the Marketplace.
This extension simplifies Python environment management with an Environments view accessible via the VS Code Activity Bar. Here you can create, delete, and switch between environments, and manage packages within the selected environment. It also uniquely supports specifying environments for specific files or entire Python projects, including multi-root and mono-repo scenarios.
By default, the extension uses the venv environment manager and pip package manager to determine how environments and packages are handled. You can customize these defaults by setting python-envs.defaultEnvManager and python-envs.defaultPackageManager to your preferred environment and package managers. Furthermore, if you have uv installed the extension will use it for quick and efficient environment creation and package installation.
Designed to integrate seamlessly with your preferred environment managers via various APIs, it supports Global Python interpreters, venv, and Conda out of the box. Developers can build extensions to add support for their favorite Python environment managers and integrate with our extension UI, enhancing functionality and user experience.
This extension is poised to eventually replace the environment functionality in the main Python extension and will be installed alongside it by default. In the meantime, you can download the Python Environments extensions from the Marketplace and use it in VS Code – Insiders (v1.96 or greater) and with the pre-release version of the Python extension (v2024.23 or greater). We are looking forward to hearing your feedback on improvements by opening issues in the vscode-python-environments repository.
Pylance “full” language server modeThe python.analysis.languageServerMode setting now supports full mode, allowing you to take advantage of the complete range of Pylance’s functionality and the most comprehensive IntelliSense experience. It’s worth noting that this comes at the cost of lower performance, as it can cause Pylance to be resource-intensive, particularly in large codebases.
The python.analysis.languageServerMode setting now changes the default values of the following settings, depending on whether it’s set to light, default or full:
Setting light default full python.analysis.exclude [“**”] [] [] python.analysis.useLibraryCodeForTypes false true true python.analysis.enablePytestSupport false true true python.analysis.indexing false true true python.analysis.autoImportCompletions false false true python.analysis.showOnlyDirectDependenciesInAutoImport false false true python.analysis.packageIndexDepths [ { "name": "sklearn", "depth": 2 }, { "name": "matplotlib", "depth": 2 }, { "name": "scipy", "depth": 2 }, { "name": "django", "depth": 2 }, { "name": "flask", "depth": 2 }, { "name": "fastapi", "depth": 2 } ] | [ { "name": "sklearn", "depth": 2 }, { "name": "matplotlib", "depth": 2 }, { "name": "scipy", "depth": 2 }, { "name": "django", "depth": 2 }, { "name": "flask", "depth": 2 }, { "name": "fastapi", "depth": 2 } ] | { "name": "", "depth": 4, "includeAllSymbols": true } python.analysis.regenerateStdLibIndices false false true python.analysis.userFileIndexingLimit 2000 2000 -1 python.analysis.includeAliasesFromUserFiles false false true python.analysis.functionReturnTypes false false true python.analysis.pytestParameters false false true python.analysis.supportRestructuredText false false true python.analysis.supportDocstringTemplate false false true Other Changes and EnhancementsWe have also added small enhancements and fixed issues requested by users that should improve your experience working with Python and Jupyter Notebooks in Visual Studio Code. Some notable changes include:
- The testing rewrite nearing default status: This release addresses the final known issue in the testing rewrite, and we plan to turn off the rewrite experiment and set it to the default in early 2025
- Python Native REPL handles window reload in @vscode-python#24021
- Leave focus on editor after Smart Send to Native REPL in @vscode-python#23843
- Add error communication around dynamic adapter activation in @vscode-python#23234
- Pytest --rootdir argument for pytest is now dynamically adjusted based on the presence of a python.testing.cwd setting in your workspace in @vscode-python#9553
- Add support for interpreter paths with spaces in the debugger extension in @vscode-python-debugger#233
- pytest-describe plugin is supported with test detection and execution in the UI in @vscode-python#21705
- Test coverage support updated to handle NoSource exceptions in @vscode-python#24366
- Restarting a test debugging session now reruns only the specified tests in @vscode-python-debugger#338
- The testing rewrite now leverages FIFO instead of UDS for inter-process communication allowing users to harness pytest plugins like pytest_socket in their own testing design in @vscode-python#23279
We would also like to extend special thanks to this month’s contributors:
- @joar Ruff 0.8.0 fixes in @vscode-python#24488
- @renan-r-santos Add native pixi locator in @vscode-python#244420
- @tomoki Fix the wrong Content-Length in python-server.py for non-ascii characters in @vscode-python#24480
Try out these new improvements by downloading the Python extension and the Jupyter extension from the Marketplace, or install them directly from the extensions view in Visual Studio Code (Ctrl + Shift + X or ⌘ + ⇧ + X). You can learn more about Python support in Visual Studio Code in the documentation. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.
The post Python in Visual Studio Code – December 2024 Release appeared first on Python.
Freelock Blog: Cache-bust pages containing embedded content
The saying goes, there are two hard problems in computer science: caching, naming things, and off-by-1 errors. While Drupal certainly has not solved the naming things, it has made a valiant attempt at a decent caching strategy. And for the most part it works great, allowing millions of lines of code to load up quickly the vast majority of the time.
This is more a tip about our favorite automation tool, the Events, Conditions, and Actions (ECA) module, and how it can get you out of a bind when Drupal caching goes too far.
The Drop Times: Jay Callicot on DrupalX, Decoupled Architectures, and the Future of Drupal Development
Divine Attah-Ohiemi: From Sisterly Wisdom to Debian Dreams: My Outreachy Journey
Discovering Open Source: How I Got Introduced
Hey there! I’m Divine Attah-Ohiemi, a sophomore studying Computer Science. My journey into the world of open source was anything but grand. It all started with a simple question to my sister: “How do people get jobs without experience?” Her answer? Open source! I dove into this vibrant community, and it felt like discovering a hidden treasure chest filled with knowledge and opportunities.
Choosing Debian: Why This Community?
Why Debian, you ask? Well, I applied to Outreachy twice, and both times, I chose Debian. It’s not just my first operating system; it feels like home. The Debian community is incredibly welcoming, like a big family gathering where everyone supports each other. Whether I was updating my distro or poring over documentation, the care and consideration in this community were palpable. It reminded me of the warmth of homeschooling with relatives. Plus, knowing that Debian's name comes from its creator Ian and his wife Debra adds a personal touch that makes me feel even more honored to contribute to making the website better!
Why I Applied to Outreachy: What Inspired Me
Outreachy is my golden ticket to the open source world! As a 19-year-old, I see this internship as a unique opportunity to gain invaluable experience while contributing to something meaningful. It’s the perfect platform for me to learn, grow, and connect with like-minded individuals who share my passion for technology and community.
I’m excited for this journey and can’t wait to see where it takes me! 🌟
Consensus Enterprises: make targets, Droplets, and Aegir, oh my!
Real Python: Python Set Comprehensions: How and When to Use Them
Python set comprehensions provide a concise way to create and manipulate sets in your code. They generate sets with a clean syntax, making your code more readable and Pythonic. With set comprehensions, you can create, transform, and filter sets, which are great skills to add to your Python programming toolkit.
In this tutorial, you’ll learn the syntax and use cases of set comprehensions, ensuring you can decide when and how to use them in your code. Understanding set comprehensions will help you write cleaner, more efficient Python code.
By the end of this tutorial, you’ll understand that:
- Python has set comprehensions, which allow you to create sets with a concise syntax.
- Python has four types of comprehensions: list, set, dictionary, and generator expressions.
- A set comprehension can be written as {expression for item in iterable [if condition]}.
- Sets can’t contain duplicates, as they ensure that all their elements are unique.
To get the most out of this tutorial, you should be familiar with basic Python concepts such as for loops, iterables, list comprehensions, and dictionary comprehensions.
Get Your Code: Click here to download the free sample code that you’ll use to learn about Python set comprehensions.
Take the Quiz: Test your knowledge with our interactive “Python Set Comprehensions: How and When to Use Them” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Set Comprehensions: How and When to Use ThemIn this quiz, you'll test your understanding of Python set comprehensions. Set comprehensions are a concise and quick way to create, transform, and filter sets in Python. They can significantly enhance your code's conciseness and readability compared to using regular for loops to process your sets.
Creating and Transforming Sets in PythonIn Python programming, you may need to create, populate, and transform sets. To do this, you can use set literals, the set() constructor, and for loops. In the following sections, you’ll take a quick look at how to use these tools. You’ll also learn about set comprehensions, which are a powerful way to manipulate sets in Python.
Creating Sets With Literals and set()To create new sets, you can use literals. A set literal is a series of elements enclosed in curly braces. The syntax of a set literal is shown below:
Python Syntax {element_1, element_2,..., element_N} Copied!The elements must be hashable objects. The objects in the literal might be duplicated, but only one instance will be stored in the resulting set. Sets don’t allow duplicate elements. Here’s a quick example of a set:
Python >>> colors = {"blue", "red", "green", "orange", "green"} >>> colors {'red', 'green', 'orange', 'blue'} >>> colors.add("purple") >>> colors {'red', 'green', 'orange', 'purple', 'blue'} Copied!In this example, you create a set containing color names. The elements in your resulting set are unique string objects. You can add new elements using the .add() method. Remember that sets are unordered collections, so the order of elements in the resulting set won’t match the insertion order in most cases.
Note: To learn more about sets, check out the Sets in Python tutorial.
You can also create a new set using the set() constructor and an iterable of objects:
Python >>> numbers = [2, 2, 1, 4, 2, 3] >>> set(numbers) {1, 2, 3, 4} Copied!In this example, you create a new set using set() with a list of numeric values. Note how the resulting set doesn’t contain duplicate elements. In practice, the set() constructor is a great tool for eliminating duplicate values in iterables.
To create an empty set, you use the set() constructor without arguments:
Python >>> set() set() Copied!You can’t create an empty set with a literal because a pair of curly braces {} represents an empty dictionary, not a set. To create an empty set, you must use the set() constructor.
Using for Loops to Populate SetsSometimes, you need to start with an empty set and populate it with elements dynamically. To do this, you can use a for loop. For example, say that you want to create a set of unique words from a text. Here’s how to do this with a loop:
Python >>> unique_words = set() >>> text = """ ... Beautiful is better than ugly ... Explicit is better than implicit ... Simple is better than complex ... Complex is better than complicated ... """.lower() >>> for word in text.split(): ... unique_words.add(word) ... >>> unique_words { 'beautiful', 'ugly', 'better', 'implicit', 'complicated', 'than', 'explicit', 'is', 'complex', 'simple' } Copied! Read the full article at https://realpython.com/python-set-comprehension/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
ClearlyDefined: 2024 in review – milestones, growth and community impact
As 2024 draws to a close, it’s time to reflect on a transformative year for the ClearlyDefined project. From technical advancements to community growth, this year has been nothing short of extraordinary. Here’s a recap of our key milestones and how we’ve continued to bring clarity to the Open Source ecosystem.
ClearlyDefined 2.0: expanding license coverageThis year, we launched ClearlyDefined v2.0, a major milestone in improving license data quality. By integrating support for LicenseRefs, we expanded beyond the SPDX License List, enabling organizations to navigate complex licensing scenarios with ease. Thanks to contributions from the community and leadership from GitHub and SAP, this release brought over 2,000 new licenses into scope. Dive into the details here.
New harvester for Conda packagesIn response to the growing needs of the data science and machine learning communities, we introduced a new harvester for Conda packages. This implementation ensures comprehensive metadata coverage for one of the most popular package managers. Kudos to Basit Ayantunde and our collaborators for making this a reality. Learn more about this update here.
Integration with GUAC for supply chain transparencyOur partnership with GUAC (Graph for Understanding Artifact Composition) from OpenSSF took supply chain observability to new heights. By integrating ClearlyDefined’s license metadata, GUAC users now have access to enriched data for compliance and security. This collaboration underscores the importance of a unified Open Source supply chain. Read about the integration here.
Community growth and governanceIn 2024, we took significant steps toward a more open governance model by electing leaders to the Steering and Outreach Committees. These committees are pivotal in driving technical direction and expanding community engagement. Meet our new leaders here.
Showcasing ClearlyDefined globallyWe showcased ClearlyDefined’s mission and impact across three continents:
- At FOSS Backstage and ORT Community Days in Berlin, we connected with industry leaders to discuss best practices for software compliance.
- At SOSS Fusion 2024 in Atlanta, we presented our collaborative approach to license compliance alongside GitHub and SAP.
- At Open Compliance Summit in Tokyo, we showcased how Bloomberg leverages ClearlyDefined in order to detect and manage Open Source licenses.
Each event reinforced the global importance of a transparent Open Source ecosystem. Explore our conference highlights here and here.
A Revamped Online PresenceTo welcome new contributors and support existing ones, we unveiled a new website featuring comprehensive documentation and resources. Whether you’re exploring our guides, engaging in forums, or diving into the project roadmap, the platform is designed to foster collaboration. Take a tour here.
Looking ahead to 2025As we celebrate these achievements, we’re already planning for an even more impactful 2025. From enhancing our tools to expanding our community, the future of ClearlyDefined looks brighter than ever.
Thank you to everyone who contributed to our success this year. A special thank you to Microsoft for hosting and curating ClearlyDefined, GitHub and SAP for their technical leadership, and Bloomberg and Cisco for their sponsorship. Your dedication ensures that Open Source continues to thrive with clarity and confidence.
Tag1 Consulting: Migrating Your Data from D7 to D10: User and taxonomy term migrations
In this follow-up to migrating files, we focus on migrating users and taxonomy terms. Key topics include preventing entity ID conflicts, handling watermarks, and decoupling content migrations from configuration migrations. We’ll also create migration scripts for both entities and explore stylistic tips for cleaner, more compact migration files.
mauricio Wed, 12/11/2024 - 05:20Droptica: How to Effectively Manage Product Data on a Drupal Website for Manufacturers?
A manufacturing company's website is often the place where a lot of detailed product information is located. Efficiently managing this data can be a challenge, especially with a large amount of product assortment and technical information. In this article, I'll show you how Drupal - an advanced CMS - enables you to conveniently manage and present your products on your website. I encourage you to read the article or watch the video in the “Nowoczesny Drupal” series.