Feeds

PyCharm: How to Do Sentiment Analysis With Large Language Models

Planet Python - Thu, 2024-12-05 05:49

Sentiment analysis is a powerful tool for understanding emotions in text. While there are many ways to approach sentiment analysis, including more traditional lexicon-based and machine learning approaches, today we’ll be focusing on one of the most cutting-edge ways of working with text – large language models (LLMs). We’ll explain how you can use these powerful models to predict the sentiment expressed in a text.

As a practical tutorial, this post will introduce you to the types of LLMs most suited for sentiment analysis tasks and then show you how to choose the right model for your specific task.

We’ll cover using models that other people have fine-tuned for sentiment analysis and how to fine-tune one yourself. We’ll also look at some of the powerful tools and resources available that can help you work with these models easily, while demystifying what can feel like an overly complex and overwhelming topic.

To get the most out of this blog post, we’d recommend you have some experience training machine learning or deep learning models and be confident using Python. That said, you don’t necessarily need to have a background in large language models to enjoy it.

Let’s get started!

What are large language models?

Large language models are some of the latest and most powerful tools for solving natural language problems. In brief, they are generalist language models that can complete a range of natural language tasks, from named entity recognition to question answering. LLMs are based on the transformer architecture, a type of neural network that uses a mechanism called attention to represent complex and nuanced relationships between words in a piece of text. This design allows LLMs to accurately represent the information being conveyed in a piece of text.

The full transformer model architecture consists of two blocks. Encoder blocks are designed to receive text inputs and build a representation of them, creating a feature set based on the text corpus over which the model is trained. Decoder blocks take the features generated by the encoder and other inputs and attempt to generate a sequence based on these.

Transformer models can be divided up based on whether they contain encoder blocks, decoder blocks, or both.

  • Encoder-only models tend to be good at tasks requiring a detailed understanding of the input to do downstream tasks, like text classification and named entity recognition.
  • Decoder-only models are best for tasks such as text generation.
  • Encoder-decoder, or sequence-to-sequence models are mainly used for tasks that require the model to evaluate an input and generate a different output, such as translation. In fact, translation was the original task that transformer models were designed for!

This Hugging Face table (also featured below), which I took from their course on natural language processing, gives an overview of what each model tends to be strongest at.

After finishing this blog post and discovering what other natural language tasks you can perform with the Transformers library, I recommend the course if you’d like to learn more about LLMs. It strikes an excellent balance between accessibility and technical depth.

Model typeExamplesTasksEncoder-onlyALBERT, BERT, DistilBERT, ELECTRA, RoBERTaSentence classification, named entity recognition, extractive question answeringDecoder-onlyCTRL, GPT, GPT-2, Transformer XLText generationEncoder-decoderBART, T5, Marian, mBARTSummarization, translation, generative question answering

Sentiment analysis is usually treated as a text or sentence classification problem with LLMs, meaning that encoder-only models such as RoBERTa, BERT, and ELECTRA are most often used for this task. However, there are some exceptions. For example, the top scoring model for aspect-based sentiment analysis, InstructABSA, is based on a fine-tuned version of T5, an encoder-decoder model.

Using large language models for sentiment analysis

With all of the background out of the way, we can now get started with using LLMs to do sentiment analysis.

Install PyCharm to get started with sentiment analysis

We’ll use PyCharm Professional for this demo, but you can follow along with any other IDE that supports Python development.

PyCharm Professional is a powerful Python IDE for data science. It supports advanced Python code completion, inspections and debugging, rich databases, Jupyter, Git, Conda, and more right out of the box. You can try out great features such as our DataFrame Column Statistics and Chart View, as well as Hugging Face integrations, which make working with LLMs much simpler and faster.

If you’d like to follow along with this tutorial, you can activate your free three-month subscription to PyCharm using this special promo code: PCSA. Click on the link below, and enter the code. You’ll then receive an activation code through your email.

Activate your free three-month subscription Import the required libraries

There are two parts to this tutorial: using an LLM that someone else has fine-tuned for sentiment analysis, and fine-tuning a model ourselves.

In order to run both parts of this tutorial, we need to import the following packages:

  • Transformers: As described, this will allow us to use fine-tuned LLMs for sentiment analysis and fine-tune our own models.
  • PyTorch, Tensorflow, or Flax: Transformers acts as a high-level interface for deep learning frameworks, reusing their functionality for building, training, and running neural networks. In order to actually work with LLMs using the Transformers package, you will need to install your choice of PyTorch, Tensorflow, or Flax. PyTorch supports the largest number of models of the three frameworks, so that’s the one we’ll use in this tutorial.
  • Datasets: This is another package from Hugging Face that allows you to easily work with the datasets hosted on Hugging Face Hub. We’ll need this package to get a dataset to fine-tune an LLM for sentiment analysis.

In order to fine-tune our own model, we also need to import these additional packages:

  • NumPy: NumPy allows us to work with arrays. We’ll need this to do some post-processing on the predictions generated by our LLM.
  • scikit-learn: This package contains a huge range of functionality for machine learning. We’ll use it to evaluate the performance of our model.
  • Evaluate: This is another package from Hugging Face. Evaluate adds a convenient interface for measuring the performance of models. It will give us an alternative way of measuring our model’s performance.
  • Accelerate: This final package from Hugging Face, Accelerate, takes care of distributed model training.

We can easily find and install these in PyCharm. Make sure you’re using a Python 3.7 or higher interpreter. For this demo, we’ll be using Python 3.11.7.

Pick the right model

The next step is picking the right model. Before we get into that, we need to cover some terminology.

LLMs are made up of two components: an architecture and a checkpoint. The architecture is like the blueprint of the model, and describes what will be contained in each layer and each operation that takes place within the model.

The checkpoint refers to the weights that will be used within each layer. Each of the pretrained models will use an architecture like T5 or GPT, and obtain the specific weights (the model checkpoint) by training the model over a huge corpus of text data.

Fine-tuning will adjust the weights in the checkpoint by retraining the last layer(s) on a dataset specialized in a certain task or domain. To make predictions (called inference), an architecture will load in the checkpoint and use this to process text inputs, and together this is called a model.

If you’ve ever looked at the models available on Hugging Face, you might have been overwhelmed by the sheer number of them (even when we narrow them down to encoder-only models).

So, how do you know which one to use for sentiment analysis?

One useful place to start is the sentiment analysis page on Papers With Code. This page includes a very helpful overview of this task and a Benchmarks table that includes the top-performing models for each sentiment analysis benchmarking dataset. From this page, we can see that some of the commonly appearing models are those based on BERT and RoBERTa architectures.

While we may not be able to access these exact model checkpoints on Hugging Face (as not all of them will be uploaded there), it can give us a guide for what sorts of models might perform well at this task. Papers With Code also has similar pages for a range of other natural language tasks: If you search for the task in the upper left-hand corner of the site, you can navigate to these.

Now that we know what kinds of architectures are likely to do well for this problem, we can start searching for a specific model.

PyCharm has an built-in integration with Hugging Face that allows us to search for models directly. Simply right-click anywhere in your Jupyter notebook or Python script, and select Insert HF model. You’ll be presented with the following window:

You can see that we can find Hugging Face models either by the task type (which we can select from the menu on the left-hand side), by keyword search in the search box at the top of the window, or by a combination of both. Models are ranked by the number of likes by default, but we can also select models based on downloads or when the model was created or last modified.

When you use a model for a task, the checkpoint is downloaded and cached, making it faster the next time you need to use that model. You can see all of the models you’ve downloaded in the Hugging Face tool window.

Once we’ve downloaded the model, we can also look at its model card again by hovering over the model name in our Jupyter notebook or Python script. We can do the same thing with dataset cards.

Use a fine-tuned LLM for sentiment analysis

Let’s move on to how we can use a model that someone else has already fine-tuned for sentiment analysis.

As mentioned, sentiment analysis is usually treated as a text classification problem for LLMs.  This means that in our Hugging Face model selection window, we’ll select Text Classification, which can be found under Natural Language Processing on the left-hand side. To narrow the results down to sentiment analysis models, we’ll type “sentiment” in the search box in the upper left-hand corner.

We can see various fine-tuned models, and as expected from what we saw on the Papers With Code Benchmarks table, most of them use RoBERTa or BERT architectures. Let’s try out the top ranked model, Twitter-roBERTa-base for Sentiment Analysis.

You can see that after we select Use Model in the Hugging Face model selection window, code is automatically generated at the caret in our Jupyter notebook or Python script to allow us to start working with this model.

from transformers import pipeline pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment-latest")

Before we can do inference with this model, we’ll need to modify this code.

The first thing we can check is whether we have a GPU available, which will make the model run faster. We’ll check for two types: NVIDIA GPUs, which support CUDA, and Apple GPUs, which support MPS.

import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"MPS available: {torch.backends.mps.is_available()}")

My computer supports MPS, so we can add a device argument to the pipeline and add "mps". If your computer supports CUDA, you can instead add the argument device=0.

from transformers import pipeline pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment-latest", device="mps")

Finally, we can get the fine-tuned LLM to run inference over our example text.

result = pipe("I love PyCharm! It's my favorite Python IDE.") result [{'label': 'positive', 'score': 0.9914802312850952}]

You can see that this model predicts that the text will be positive, with 99% probability.

Fine-tune your own LLM for sentiment analysis

The other way we can use LLMs for sentiment analysis is to fine-tune our own model.

You might wonder why you’d bother doing this, given the huge number of fine-tuned models that already exist on Hugging Face Hub. The main reason you might want to fine-tune a model is so that you can tailor it to your specific use case.

Most models are fine-tuned on public datasets, especially social media posts and movie reviews, and you might need your model to be more sensitive to your specific domain or use case.

Model fine-tuning can be quite a complex topic, so in this demonstration, I’ll explain how to do it at a more general level. However, if you want to understand this in more detail, you can read more about it in Hugging Face’s excellent NLP course, which I recommended earlier. In their tutorial, they explain in detail how to process data for fine-tuning models and two different approaches to fine-tuning: with the trainer API and without it.

To demonstrate how to fine-tune a model, we’ll use the SST-2 dataset, which is composed of single lines pulled from movie reviews that have been annotated as either negative or positive.

As mentioned earlier, BERT models consistently show up as top performers on the Papers With Code benchmarks, so we’ll fine-tune a BERT checkpoint.

We can again search for these models in PyCharm’s Hugging Face model selection window.

We can see that the most popular BERT model is bert-base-uncased. This is perfect for our use case, as this was also trained on lowercase text, so it will match the casing of our dataset.

We could have used the popular bert-large-uncased, but the base model has only 110 million parameters compared to BERT large, which has 340 million, so the base model is a bit friendlier for fine-tuning on a local machine.

If you still want to use a smaller model, you could also try this with a DistilBERT model, which has far fewer parameters but still preserves most of the performance of the original BERT models.

Let’s start by reading in our dataset. We can do so using the load_dataset() function from the Datasets package. SST-2 is part of the GLUE dataset, which is designed to see how well a model can complete a range of natural language tasks.

from datasets import load_dataset sst_2_raw = load_dataset("glue", "sst2") sst_2_raw DatasetDict({ train: Dataset({ features: ['sentence', 'label', 'idx'], num_rows: 67349 }) validation: Dataset({ features: ['sentence', 'label', 'idx'], num_rows: 872 }) test: Dataset({ features: ['sentence', 'label', 'idx'], num_rows: 1821 }) })

This dataset has already been split into the train, validation, and test sets. We have around 67,349 training examples – quite a modest number for fine-tuning such a large model.

Here’s an example from this dataset.

sst_2_raw["train"][1] {'sentence': 'contains no wit , only labored gags ', 'label': 0, 'idx': 1}

We can see what the labels mean by calling the features attribute on the training set.

sst_2_raw["train"].features {'sentence': Value(dtype='string', id=None), 'label': ClassLabel(names=['negative', 'positive'], id=None), 'idx': Value(dtype='int32', id=None)}

0 indicates a negative sentiment, and 1 indicates a positive one.

Let’s look at the number in each class:

print(f'Number of negative examples: {sst_2_raw["train"]["label"].count(0)}') print(f'Number of positive examples: {sst_2_raw["train"]["label"].count(1)}') Number of negative examples: 29780 Number of positive examples: 37569

The classes in our training data are a tad unbalanced, but they aren’t excessively skewed.

We now need to tokenize our data, transforming the raw text into a form that our model can use. To do this, we need to use the same tokenizer that was used to train the bert-large-uncased model in the first place. The AutoTokenizer class will take care of all of the under-the-hood details for us.

from transformers import AutoTokenizer checkpoint = "google-bert/bert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Once we’ve loaded in the correct tokenizer, we can apply this to the training data.

tokenised_sentences = tokenizer(sst_2_raw["train"]["sentence"])

Finally, we need to add a function to pad our tokenized sentences. This will make sure all of the inputs in a training batch are the same length – text inputs are rarely the same length and models require a consistent number of features for each input.

from transformers import DataCollatorWithPadding def tokenize_function(example): return tokenizer(example["sentence"]) tokenized_datasets = sst_2_raw.map(tokenize_function, batched=True) data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Now that we’ve prepared our dataset, we need to determine how well the model is fitting to the data as it trains. To do this, we need to decide which metrics to use to evaluate the model’s prediction performance.

As we’re dealing with a binary classification problem, we have a few choices of metrics, the most popular of which are accuracy, precision, recall, and the F1 score. In the “Evaluate the model” section, we’ll discuss the pros and cons of using each of these measures.

We have two ways of creating an evaluation function for our model. The first is using the Evaluate package. This package allows us to use the specific evaluator for the SST-2 dataset, meaning we’ll evaluate the model fine-tuning using the specific metrics for this task. In the case of SST-2, the metric used is accuracy.

import evaluate import numpy as np def compute_metrics(eval_preds): metric = evaluate.load("glue", "sst2") logits, labels = eval_preds predictions = np.argmax(logits, axis=-1) return metric.compute(predictions=predictions, references=labels)

However, if we want to customize the metrics used, we can also create our own evaluation function. 

In this case, I’ve imported the accuracy, precision, recall, and F1 score metrics from scikit-learn. I’ve then created a function which takes in the predicted labels versus actual labels for each sentence and calculates the four required metrics. We’ll use this function, as it gives us a wider variety of metrics we can check our model performance against.

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score import numpy as np def compute_metrics(eval_preds): logits, labels = eval_preds predictions = np.argmax(logits, axis=-1) return { 'accuracy': accuracy_score(labels, predictions), 'f1': f1_score(labels, predictions, average='macro'), 'precision': precision_score(labels, predictions, average='macro'), 'recall': recall_score(labels, predictions, average='macro') }

Now that we’ve done all of the setup, we’re ready to train the model. The first thing we need to do is define some parameters that will control the training process using the TrainingArguments class. We’ve only specified a few parameters here, but this class has an enormous number of possible arguments allowing you to calibrate your model training to a high degree of specificity.

from transformers import TrainingArguments training_args = TrainingArguments(output_dir="sst2-bert-fine-tuning", eval_strategy="epoch", num_train_epochs=3)

In our case, we’ve used the following arguments:

  • output_dir: The output directory where we want our model predictions and checkpoints saved.
  • eval_strategy="epoch": This ensures that the evaluation is performed at the end of each training epoch. Other possible values are “steps” (meaning that evaluation is done at regular step intervals) and “no” (meaning that evaluation is not done during training).
  • num_train_epochs=3: This sets the number of training epochs (or the number of times the training loop will repeat over all of the data). In this case, it’s set to train on the data three times.

The next step is to load in our pre-trained BERT model.

from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2) Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Let’s break this down step-by-step:

  • The AutoModelForSequenceClassification class does two things. First, it automatically identifies the appropriate model architecture from the Hugging Face model hub given the provided checkpoint string. In our case, this would be the BERT architecture. Second, it converts this model into one we can use for classification. It does this by discarding the weights in the model’s final layer(s) so that we can retrain these using our sentiment analysis dataset.
  • The from_pretrained() method loads in our selected checkpoint, which in this case is bert-base-uncased.
  • The argument num_labels=2 indicates that we have two classes to predict in our model: positive and negative.

We get a message telling us that some model weights were not initialized when we ran this code. This message is exactly the one we want – it tells us that the AutoModelForSequenceClassification class reset the final model weights in preparation for our fine-tuning.

The last step is to set up our Trainer object. This stage takes in the model, the training arguments, the train and validation datasets, our tokenizer and padding function, and our evaluation function. It uses all of these to train the weights for the head (or final layers) of the BERT model, evaluating the performance of the model after each epoch on the validation set.

from transformers import Trainer trainer = Trainer( model, training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["validation"], data_collator=data_collator, tokenizer=tokenizer, compute_metrics=compute_metrics, )

We can now kick off the training. The Trainer class gives us a nice timer that tells us both the elapsed time and how much longer the training is estimated to take. We can also see the metrics after each epoch, as we requested when creating the TrainingArguments.

trainer.train() Evaluate the model Classification metrics

Before we have a look at how our model performed, let’s first discuss the evaluation metrics we used in more detail:

  • Accuracy: As mentioned, this is the default evaluation metric for the SST-2 dataset. Accuracy is the simplest metric for evaluating classification models, being the ratio of correct predictions to all predictions. Accuracy is a good choice when the target classes are well balanced, meaning each class has an approximately equal number of instances.
  • Precision: Precision calculates the percentage of the correctly predicted positive observations to the total predicted positives. It is important when the cost of a false positive is high. For example, in spam detection, you would rather miss a spam email (false negative) than have non-spam emails land in your spam folder (false positive).
  • Recall (also known as sensitivity): Recall calculates the percentage of the correctly predicted positive observations to all observations in the actual class. It is of interest when the cost of false negatives is high, meaning classifying a positive class incorrectly as negative. For example, in disease diagnosis, you would rather have false alarms (false positives) than miss someone who is actually ill (false negatives).
  • F1-score: The F1-score is the harmonic mean of precision and recall. It tries to find the balance between both measures. It is a more reliable metric than accuracy when dealing with imbalanced classes.

In our case, we had slightly imbalanced classes, so it’s a good idea to check both accuracy and the F1 score. If they differ, the F1 score is likely to be more trustworthy. However, if they are roughly the same, it is nice to be able to use accuracy, as it is easily interpretable.

Knowing whether your model is better at predicting one class versus the other is also useful. Depending on your application, capturing all customers who are unhappy with your service may be more important, even if you sometimes get false negatives. In this case, a model with high recall would be a priority over high precision.

Model predictions

Now that we’ve trained our model, we need to evaluate it. Normally, we would use the test set to get a final, unbiased evaluation, but the SST-2 test set does not have labels, so we cannot use it for evaluation. In this case, we’ll use the validation set accuracy scores for our final evaluation. We can do this using the following code:

trainer.evaluate(eval_dataset=tokenized_datasets["validation"]) {'eval_loss': 0.4223457872867584, 'eval_accuracy': 0.9071100917431193, 'eval_f1': 0.9070209502998072, 'eval_precision': 0.9074841225920363, 'eval_recall': 0.9068472678285763, 'eval_runtime': 3.9341, 'eval_samples_per_second': 221.649, 'eval_steps_per_second': 27.706, 'epoch': 3.0}

We see that the model has a 90% accuracy on the test set, comparable to other BERT models trained on SST-2. If we wanted to improve our model performance, we could investigate a few things:

  • Check whether the model is overfitting: While small by LLM standards, the BERT model we used for fine-tuning is still very large, and our training set was quite modest. In such cases, overfitting is quite common. To check this, we should compare our validation set metrics with our training set metrics. If the training set metrics are much higher than the validation set metrics, then we have overfit the model. You can adjust a range of parameters during model training to help mitigate this.
  • Train on more epochs: In this example, we only trained the model for three epochs. If the model is not overfitting, continuing to train it for longer may improve its performance.
  • Check where the model has misclassified: We could dig into where the model is classifying correctly and incorrectly to see if we could spot a pattern. This may allow us to spot any issues with ambiguous cases or mislabelled data. Perhaps the fact this is a binary classification problem with no label for “neutral” sentiment means there is a subset of sentences that the model cannot properly classify.

To finish our section on evaluating this model, let’s see how it goes with our test sentence. We’ll pass our fine-tuned model and tokenizer to a TextClassificationPipeline, then pass our sentence to this pipeline:

from transformers import TextClassificationPipeline pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True) predictions = pipeline("I love PyCharm! It's my favourite Python IDE.") print(predictions) [[{'label': 'LABEL_0', 'score': 0.0006891043740324676}, {'label': 'LABEL_1', 'score': 0.9993108510971069}]]

Our model assigns LABEL_0 (negative) a probability of 0.0007 and LABEL_1 (positive) a probability of 0.999, indicating it predicts that the sentence has a positive sentiment with 99% certainty. This result is similar to the one we got from the fine-tuned RoBERTa model we used earlier in the post.

Sentiment analysis benchmarks

Instead of evaluating the model on only the dataset it was trained on, we could also assess it on other datasets.

As you can see from the Papers With Code benchmarking table, you can use a wide variety of labeled datasets to assess the performance of your sentiment classifiers. These datasets include the SST-5 fine-grained classification, IMDB dataset, Yelp binary and fine-grained classification, Amazon review polarity, TweetEval, and the SemEval Aspect-based sentiment analysis dataset.

When evaluating your model, the main thing is to ensure that the datasets represent your problem domain.

Most of the benchmarking datasets contain either reviews or social media texts, so if your problem is in either of these domains, you may find an existing benchmark that mirrors your business domain closely enough. However, suppose you are applying sentiment analysis to a more specialized problem. In that case, it may be necessary to create your own benchmarks to ensure your model can generalize to your problem domain properly.

Since there are multiple ways of measuring sentiment, it’s also necessary to make sure that any benchmarks you use to assess your model have the same target as the dataset you trained your model on.

For example, it wouldn’t be a fair measure of a model’s performance to fine-tune it on the SST-2 with a binary target, and then test it on the SST-5. As the model has never seen the very positive, very negative, and neutral categories, it will not be able to accurately predict texts with these labels and hence will perform poorly.

Wrapping up

In this blog post, we saw how LLMs can be a powerful way of classifying the sentiment expressed in a piece of text and took a hands-on approach to fine-tuning an LLM for this purpose.

We saw how understanding which types of models are most suited for sentiment analysis, as well as how being able to see the top performing models on different benchmarks with resources like Papers With Code can help you narrow down your options for which models to use.

We also learned how Hugging Face’s powerful tooling for using these models and their integration into PyCharm makes using LLMs for sentiment analysis approachable for anyone with a background in machine learning.

If you’d like to continue learning about large language models, check out our guest blog post by Dido Grigorov, who explains how to build a chatbot using the LangChain package.

Get started with sentiment analysis with PyCharm today

If you’re ready to get started on your own sentiment analysis project, you can activate your free three-month subscription of PyCharm. Click on the link below, and enter this promo code: PCSA. You’ll then receive an activation code through your email.

Activate your free three-month subscription
Categories: FLOSS Project Planets

LostCarPark Drupal Blog: Drupal Advent Calendar day 5 - Blog Recipe

Planet Drupal - Thu, 2024-12-05 04:00
Drupal Advent Calendar day 5 - Blog Recipe james Thu, 12/05/2024 - 09:00

In the early days of Drupal, it was a popular blogging platform. Nowadays, while it is rare to use Drupal for a pure blog site, it is still quite common for Drupal sites to include a blog. There even used to be a dedicated blog module in Drupal, but it was largely superseded by Drupal’s core functionality.

The ‘Blog’ recipe for Drupal Starshot is designed to facilitate the creation and management of blog posts on a website. It will create the ‘Blog’ content type, equipped with the necessary fields and features that enable content creators to produce rich, informative, and engaging blog entries…

Tags
Categories: FLOSS Project Planets

KStars v3.7.4 is Released

Planet KDE - Thu, 2024-12-05 01:04

KStars v3.7.4 is released on 2024.12.05 for Windows, MacOS & Linux. It's a bi-monthly bug-fix release with a couple of exciting features.

Imaging Planner

Hy Murveit added a brand new Imaging Planner in KStars to facilitate imaging.

The Imaging Planner tool helps users choose which objects to image. Users can download catalogs of recommended objects, or possibly create and share their own catalogs. The tool computes when the objects in a read-in catalog may be imaged on the selected night given constraints such as minimum altitude, terrain and moon separation.

It can sort the objects along several different dimensions including the number of hours an object may be imaged tonight (given the users geography, constraints and possibly artificial horizon), its peak altitude, distance from the moon, constellation, name and type. Objects can also be filtered out for several reasons (e.g. type of object, whether it was previously imaged, keywords the user has added, whether the object has been selected, user not interested, etc). 

This tool helps users research the objects by showing small images of the objects, showing the objects' sky locations on the skymap, and by providing links to follow to internet sites with more information and images. It allows users to attach notes and links to objects, and select certain of them for further consideration. This tool can be used in conjunction with the Ekos imager or any other imaging tool. It does not currently directly interact with the actual imager; it only helps the user decide what to image.

Simbad Integration with FITSViewerJohn Evans added a new, experimental feature to the FITSViewer that allows the user to dynamically query the SIMBAD astronomical database and highlight the results on the image in the FITSViewer. The user draws a circle on the image and the objects within that circle are then displayed in a table and on the image.
It is possible to filter by object type and click through to the Simbad / CDS or NED websites for more information about the objects.

This is an interesting tool to see what is in your image, be it a subframe whilst you are imaging or a completed image that you have reloaded into the FITSViewer.

In order to use the feature you will need an internet connection to access the online Simbad database and an image must have WCS enabled within the FITSViewer. For the most accurate results, plate solve the image with the build-in FITSViewer plate solver. The feature is controlled by a toggle in the FITSViewer options.
New Focus Measures

John Evans introduced a new contrast based focusing algorithm suited for solar and planetary imaging. 

4 new focus measures have been added to the Focus Module to complement the existing measures of HFR, FWHM, etc.
·      StdDev. This is similar conceptually to the Fourier Algorithm but is simpler. It uses an algorithm based on the standard deviation of the pixels in the image as the measure of focus. It can be used on star fields.
·      Contrast based measures use algorithms that can be found in other areas of image processing and uses the contrast of texture in the image in various way as a measure of focus. The following measures are available:

o   Sobel
o   Laplassian
o   Canny

These measures require some form of extended object in the image so will not work on star fields. They are intended for Solar, Lunar and planetary focusing.


 

These algorithms can be used on the whole image or with the existing mask features, or with a user-defined region-of-interest that is used in single-star mode for star based focusing measures.
 
This new feature requires the openCV library to be installed (a standard installation is fine). This library is not installed by default with Kstars so anyone wishing to use these features will need to first install openCV and then rebuild Kstars on their system. It will not be available with pre-built executables.

Categories: FLOSS Project Planets

Russ Allbery: Review: Paladin's Hope

Planet Debian - Wed, 2024-12-04 22:56

Review: Paladin's Hope, by T. Kingfisher

Series: The Saint of Steel #3 Publisher: Red Wombat Studio Copyright: 2021 ISBN: 1-61450-613-2 Format: Kindle Pages: 303

Paladin's Hope is a fantasy romance novel and the third book of The Saint of Steel series. Each book of that series features different protagonists in closer to the romance series style than the fantasy series style and stands alone reasonably well. There are a few spoilers for the previous books here, so you probably want to read the series in order.

Galen is one of the former paladins of the Saint of Steel, left bereft and then adopted by the Temple of the Rat after their god dies. Even more than the paladin protagonists of the previous two books, he reacted very badly to that death and has ongoing problems with nightmares and going into berserker rages when awakened. As the book opens, he's the escort for a lich-doctor named Piper who is examining a corpse found in the river.

The last of the five was the only one who did not share a certain martial quality. He was slim and well-groomed and would be considered handsome, but he was also extraordinarily pale, as if he lived his life underground.

It was this fifth man who nudged the corpse with the toe of his boot and said, "Well, if you want my professional opinion, this great goddamn hole in his chest is probably what killed him."

As it turns out, slim and well-groomed and exceedingly pale is Galen's type.

This is another paladin romance, this time between two men. It's almost all romance; the plot is barely worth mentioning. About half of the book is an exploration of a puzzle dungeon of the sort that might be fun in a video game or tabletop RPG, but that I found rather boring and monotonous in a novel. This creates a lot more room for the yearning and angst.

Kingfisher tends towards slow-burn romances. This romance is a somewhat faster burn than some of her other books, but instead implodes into one of the most egregiously stupid third-act break-ups that I've read in a romance plot. Of all the Kingfisher paladin books, I think this one was hurt the most by my basic difference in taste from the author. Kingfisher finds constant worrying and despair over being good enough for the romantic partner to be an enjoyable element, and I find it incredibly annoying. I think your enjoyment of this book will heavily depend on where you fall on that taste divide.

The saving grace of this book are the gnoles, who are by far the best part of this world. Earstripe, a gnole constable, is the one who found the body that the book opens with and he drives most of the plot, such that it is. He's also the source of the best banter in the book, which is full of pointed and amused gnole observations about humans and their various stupidities. Given that I was also grumbling about human stupidities for most of the book, the gnole viewpoint and I got along rather well.

"God's stripes." Earstripe shook his head in disbelief. "Bone-doctor would save some gnole, yes? If some gnole was hurt."

"Of course," said Piper. "If I could."

"And tomato-man would save some gnole?" He swung his muzzle toward Galen. "If some gnome needed big human with sword?"

"Yes, of course."

Earstripe spread his hands, claws gleaming. "A gnole saves some human. Same thing." He took a deep breath, clearly choosing his words carefully. "A gnole's compassion does not require fur."

We learn a great deal more about gnole culture, all of which I found fascinating, and we get a rather satisfying amount of gnole acerbic commentary. Kingfisher is very good at banter, and dialogue in general, which also smoothes over the paucity of detailed plot. There was no salvaging the romance, at least for me, but I did at least like Piper, and Galen wasn't too bad when he wasn't being annoyingly self-destructive.

I had been wondering a little if gay romance would, like sapphic romance, avoid my dislike of heterosexual gender roles. I think the jury is still out, but it did not work in this book because Galen is so committed to being the self-sacrificing protector who is unable to talk about his feelings that he single-handedly introduced a bunch of annoying pieces of the male gender role anyway. I will have to try that experiment with a book that doesn't involve hard-headed paladins.

I have yet to read a bad T. Kingfisher novel, but I thought this one was on the weaker side. The gnoles are great and kept me reading, but I wish there had been a more robust plot, a lot less of the romance, and no third-act break-up. As is, I recommend the other Saint of Steel books over this one. Ah well.

Followed by Paladin's Faith.

Rating: 6 out of 10

Categories: FLOSS Project Planets

Dirk Eddelbuettel: corels 0.0.5 on CRAN: Maintenance

Planet Debian - Wed, 2024-12-04 18:13

An updated version of the corels package is now on CRAN! The ‘Certifiably Optimal RulE ListS (Corels)’ learner provides interpretable decision rules with an optimality guarantee—a nice feature which sets it apart in machine learning. You can learn more about corels at its UBC site.

The changes concern mostly maintenance for both the repository (such as continunous integration setup, badges, documentation links, …) and the package level (such as removing the no-longer-requiring C++ compilation standard setter now emitting a NOTE at CRAN.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Hacked and tis the season for surgeries

Planet KDE - Wed, 2024-12-04 12:30

I am still here. Sadly while I battle this insane infection from my broken arm I got back in July, the hackers got my blog. I am slowly building it back up. Further bad news is I have more surgeries, first one tomorrow. Furthering my current struggles I cannot start my job search due to hospitalization and recovery. Please consider a donation. https://gofund.me/6e99345d

On the open source work front, I am still working on stuff, mostly snaps ( Apps 24.08.3 released )

Thank you everyone that voted me into the Ubuntu Community Council!

I am trying to stay positive, but it seems I can’t catch a break. I will have my computer in the hospital and will work on what I can. Have a blessed day and see you soon.

Scarlett

Categories: FLOSS Project Planets

Scarlett Gately Moore: Hacked and tis the season for surgeries

Planet Debian - Wed, 2024-12-04 12:30

I am still here. Sadly while I battle this insane infection from my broken arm I got back in July, the hackers got my blog. I am slowly building it back up. Further bad news is I have more surgeries, first one tomorrow. Furthering my current struggles I cannot start my job search due to hospitalization and recovery. Please consider a donation. https://gofund.me/6e99345d

On the open source work front, I am still working on stuff, mostly snaps ( Apps 24.08.3 released )

Thank you everyone that voted me into the Ubuntu Community Council!

I am trying to stay positive, but it seems I can’t catch a break. I will have my computer in the hospital and will work on what I can. Have a blessed day and see you soon.

Scarlett

Categories: FLOSS Project Planets

Sven Hoexter: Looking at x509 Certificate Chains

Planet Debian - Wed, 2024-12-04 12:23

Sometimes you've to look at the content of x509 certificate chains. Usually one finds them pem encoded and concatenated in a text file.

Since the openssl x509 subcommand only decodes the first certificate it will find in a file, I did something like this:

csplit -z -f 'cert' fullchain.pem '/-----BEGIN CERTIFICATE-----/' '{*}' for x in cert*; do openssl x509 -in $x -noout -text; done

Apparently that's the "wrong" way and the more appropriate way is using the openssl crl2pkcs7 subcommand albeit we do not try to parse a revocation list here.

openssl crl2pkcs7 -nocrl -certfile fullchain.pem | \ openssl pkcs7 -print_certs -noout

Learned that one in a webinar presented by Victor Dukhovni. If you're new to the topic worth watching.

Categories: FLOSS Project Planets

Enrico Zini: How to right click

Planet Debian - Wed, 2024-12-04 11:51

I climbed on top of a mountain with a beautiful view, and when I started readying my new laptop for a work call (as one does on top of mountains), I realised that I couldn't right click and it kind of spoiled the mood.

Clicking on the bottom right corner of my touchpad left-clicked. Clicking with two fingers left-clicked. Alt-clicking, Super-clicking, Control-clicking, left clicked.

Here's there are two ways to simulate mouse buttons with touchpads in Wayland:

  • clicking on different areas at the bottom of the touchpad
  • double or triple-tapping, as long as the fingers are not too far apart

Skippable digression:

I'm not sure why Gnome insists in following Macs for defaults, which is what people with non-Mac hardware are less likely to be used to.

In my experience, Macs are as arbitrarily awkward to use as anything else, but they managed to build a community where if you don't understand how it works you get told you're stupid. All other systems (including Gnome) have communities where instead you get told (as is generally the case) that the system design is stupid, which at least gives you some amount of validation in your suffering.

Oh well.

How to configure right click

Surprisingly, this is not available in Gnome Shell settings. It can be found in gnome-tweaks: under "Keyboard & Mouse", "Mouse Click Emulation", one can choose between "Fingers" or "Area".

I tried both and went for "Area": I use right-drag a lot to resize windows, and I couldn't find a way, at least with this touchpad, to make it work consistently in "Fingers" mode.

Categories: FLOSS Project Planets

The Drop Times: Contribution Day at DrupalCon Singapore 2024: A Day of Collaboration and Innovation

Planet Drupal - Wed, 2024-12-04 11:42
As a leading open-source project, Drupal thrives on contributions from its global community. Contribution Day is a focused event at DrupalCon where individuals collaborate to improve Drupal. It’s a space for sharing expertise, mentoring others, and making tangible progress on various projects.
Categories: FLOSS Project Planets

The Drop Times: Meet the Speakers: DrupalCon Singapore 2024 Part 1

Planet Drupal - Wed, 2024-12-04 11:04
As part of the Meet the Speakers: DrupalCon Singapore 2024 series, The DropTimes highlights sessions by Yas Naoi on Behavior-Driven Development, Ajit Shinde on PHP 8 features, and Alexey Murz Korepov on observability in decoupled Drupal. This series provides a glimpse into the conference’s rich lineup and offers insights into what speakers are bringing to the event, happening December 9-11 at PARKROYAL COLLECTION Marina Bay.
Categories: FLOSS Project Planets

The Drop Times: TDT Is the Official Media Partner for DrupalCon Singapore 2024

Planet Drupal - Wed, 2024-12-04 11:04
The DropTimes will provide in-depth coverage of DrupalCon Singapore 2024, happening from December 9 to 11 at PARKROYAL COLLECTION Marina Bay. Follow for updates, insights, and highlights from Asia’s first DrupalCon in nearly a decade.
Categories: FLOSS Project Planets

Django Weblog: Django security releases issued: 5.1.4, 5.0.10, and 4.2.17

Planet Python - Wed, 2024-12-04 10:40

In accordance with our security release policy, the Django team is issuing releases for Django 5.1.4, Django 5.0.10, and Django 4.2.17. These releases address the security issues detailed below. We encourage all users of Django to upgrade as soon as possible.

CVE-2024-53907: Potential denial-of-service in django.utils.html.strip_tags()

The strip_tags() method and striptags template filter are subject to a potential denial-of-service attack via certain inputs containing large sequences of nested incomplete HTML entities.

Thanks to jiangniao for the report.

This issue has severity "moderate" according to the Django security policy.

CVE-2024-53908: Potential SQL injection in HasKey(lhs, rhs) on Oracle

Direct usage of the django.db.models.fields.json.HasKey lookup on Oracle is subject to SQL injection if untrusted data is used as a lhs value. Applications that use the jsonfield.has_key lookup through the __ syntax are unaffected.

Thanks to Seokchan Yoon for the report.

This issue has severity "high" according to the Django security policy.

Affected supported versions
  • Django main
  • Django 5.1
  • Django 5.0
  • Django 4.2
Resolution

Patches to resolve the issue have been applied to Django's main, 5.1, 5.0, and 4.2 branches. The patches may be obtained from the following changesets.

CVE-2024-53907: Potential denial-of-service in django.utils.html.strip_tags() CVE-2024-53908: Potential SQL injection in HasKey(lhs, rhs) on Oracle The following releases have been issued

The PGP key ID used for this release is Sarah Boyce: 3955B19851EA96EF

General notes regarding security reporting

As always, we ask that potential security issues be reported via private email to security@djangoproject.com, and not via Django's Trac instance, nor via the Django Forum, nor via the django-developers list. Please see our security policies for further information.

Categories: FLOSS Project Planets

Tag1 Consulting: Migrating your Data from D7 to D10: Public and private file migrations

Planet Drupal - Wed, 2024-12-04 10:20

After discussing how to avoid entity ID conflicts in the previous article, we are finally ready to start migrating content. The first entity we will focus on is files, covering both public and private file migrations. We will share tips and hacks related to performance optimizations and discuss how to handle files hosted outside of Drupal.

mauricio Wed, 12/04/2024 - 07:20
Categories: FLOSS Project Planets

Freelock Blog: Automatically send notifications to Matrix

Planet Drupal - Wed, 2024-12-04 10:00
Automatically send notifications to Matrix Anonymous (not verified) Wed, 12/04/2024 - 07:00 Tags Development Open Source ECA Matrix Automation Drupal Planet

When you work on a team, it's useful to have notifications go to a chat room where you can coordinate any necessary action. Reviewing a comment before publishing, seeing stories as they are written, getting notified of new orders are the kinds of things we like having in a shared room.

Categories: FLOSS Project Planets

Droptica: What to Do When You Forgot or Lost Your Drupal Admin Password?

Planet Drupal - Wed, 2024-12-04 09:17

Losing access to your Drupal admin account can be a stressful experience. Your admin password is the key to maintaining and managing your website and being locked out can halt your ability to make critical updates or manage content. Fortunately, there are several methods to regain access, whether you're using Drupal 7 or the latest version of the system. This guide will walk you through the possible options to reset your password and return to your web page.

Categories: FLOSS Project Planets

Real Python: Expression vs Statement in Python: What's the Difference?

Planet Python - Wed, 2024-12-04 09:00

After working with Python for a while, you’ll eventually come across two seemingly similar terms: expression and statement. When you browse the official documentation or dig through a Python-related thread on an online forum, you may get the impression that people use these terms interchangeably. That’s often true, but confusingly enough, there are cases when the expression vs statement distinction becomes important.

So, what’s the difference between expressions and statements in Python?

Get Your Code: Click here to download the free sample code you’ll use to learn about the difference between expressions and statements.

Take the Quiz: Test your knowledge with our interactive “Expression vs Statement in Python: What's the Difference?” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Expression vs Statement in Python: What's the Difference?

In this quiz, you'll test your understanding of Python expressions vs statements. Knowing the difference between these two is crucial for writing efficient and readable Python code.

In Short: Expressions Have Values and Statements Cause Side Effects

When you open the Python glossary, you’ll find the following two definitions:

Expression: A piece of syntax which can be evaluated to some value. (…) (Source)

Statement: A statement is part of a suite (a “block” of code). A statement is either an expression or one of several constructs with a keyword, (…) (Source)

Well, that isn’t particularly helpful, is it? Fortunately, you can summarize the most important facts about expressions and statements in as little as three points:

  1. All instructions in Python fall under the broad category of statements.
  2. By this definition, all expressions are also statements—sometimes called expression statements.
  3. Not every statement is an expression.

In a technical sense, every line or block of code is a statement in Python. That includes expressions, which represent a special kind of statement. What makes an expression special? You’ll find out now.

Expressions: Statements With Values

Essentially, you can substitute all expressions in your code with the computed values, which they’d produce at runtime, without changing the overall behavior of your program. Statements, on the other hand, can’t be replaced with equivalent values unless they’re expressions.

Consider the following code snippet:

Python >>> x = 42 >>> y = x + 8 >>> print(y) 50 Copied!

In this example, all three lines of code contain statements. The first two are assignment statements, while the third one is a call to the print() function.

When you look at each line more closely, you can start disassembling the corresponding statement into subcomponents. For example, the assignment operator (=) consists of the parts on the left and the right. The part to the left of the equal sign indicates the variable name, such as x or y, and the part on the right is the value assigned to that variable.

The word value is the key here. Notice that the variable x is assigned a literal value, 42, that’s baked right into your code. In contrast, the following line assigns an arithmetic expression, x + 8, to the variable y. Python must first calculate or evaluate such an expression to determine the final value for the variable when your program is running.

Arithmetic expressions are just one example of Python expressions. Others include logical expressions, conditional expressions, and more. What they all have in common is a value to which they evaluate, although each value will generally be different. As a result, you can safely substitute any expression with the corresponding value:

Python >>> x = 42 >>> y = 50 >>> print(y) 50 Copied!

This short program gives the same result as before and is functionally identical to the previous one. You’ve calculated the arithmetic expression by hand and inserted the resulting value in its place.

Note that you can evaluate x + 8, but you can’t do the same with the assignment y = x + 8, even though it incorporates an expression. The whole line of code represents a pure statement with no intrinsic value. So, what’s the point of having such statements? It’s time to dive into Python statements and find out.

Statements: Instructions With Side Effects

Statements that aren’t expressions cause side effects, which change the state of your program or affect an external resource, such as a file on disk. For example, when you assign a value to a variable, you define or redefine that variable somewhere in Python’s memory. Similarly, when you call print(), you effectively write to the standard output stream (stdout), which, by default, displays text on the screen.

Note: While statements encompass expressions, most people use the word statement informally when they refer to pure statements or instructions with no value.

Okay. You’ve covered statements that are expressions and statements that aren’t expressions. From now on, you can refer to them as pure expressions and pure statements, respectively. But it turns out there’s a middle ground here.

Read the full article at https://realpython.com/python-expression-vs-statement/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Drupal life hack's: How can AI help in understanding regular expressions using Drupal examples?

Planet Drupal - Wed, 2024-12-04 08:54
How can AI help in understanding regular expressions using Drupal examples? admin Wed, 12/04/2024 - 15:54
Categories: FLOSS Project Planets

The Drop Times: Axelerant’s IDMC Digital Overhaul Earns Splash Awards Asia Nomination

Planet Drupal - Wed, 2024-12-04 08:00
In the second episode of The DropTimes' "Splash Award Finalists" series, explore how Axelerant’s innovative work for the Internal Displacement Monitoring Centre (IDMC) has transformed its digital platform. Learn how this project tackles global displacement challenges and why it’s a contender for the 'Not-for-Profit' category at the Splash Awards Asia.
Categories: FLOSS Project Planets

Bits from Debian: "Ceratopsian" will be the default theme for Debian 13

Planet Debian - Wed, 2024-12-04 07:30

The theme "Ceratopsian" by Elise Couper has been selected as the default theme for Debian 13 "trixie". The theme is inspired by Trixie's (the fictional character from Toy Story) frill and is also influenced by a previously used theme called "futurePrototype" by Alex Makas.

After the Debian Desktop Team made the call for proposing themes, a total of six choices were submitted. The desktop artwork poll was open to the public, and we received 2817 responses ranking the different choices, of which Ceratopsian has been ranked as the winner among them.

We'd like to thank all the designers that have participated and have submitted their excellent work in the form of wallpapers and artwork for Debian 13.

Congratulations, Elise, and thank you very much for your contribution to Debian!

Categories: FLOSS Project Planets

Pages