FSF Events: Free Software Directory meeting on IRC: Friday, June 09, starting at 12:00 EDT (16:00 UTC)

GNU Planet! - Thu, 2023-06-01 11:45
Join the FSF and friends on Friday, June 09, from 12:00 to 15:00 EDT (16:00 to 19:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, June 02, starting at 12:00 EDT (16:00 UTC)

GNU Planet! - Thu, 2023-06-01 11:41
Join the FSF and friends on Friday, June 02, from 12:00 to 15:00 EDT (16:00 to 19:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

PyCharm: PyCharm 2023.3 EAP 2: Live Templates for Django Forms and Models, Support for Polars DataFrames

Planet Python - Thu, 2023-06-01 11:06

The second Early Access Program build brings a bunch of features for both web developers and data scientists. Try new, time-saving live templates for Django forms, models, and views, as well as support for a super-fast Polars DataFrame library and initial GitLab integration. 

You can get the latest build from our website, the free Toolbox App, or via snaps for Ubuntu.

If you want to catch up on the updates from the previous EAP build, you can refer to this blog post for more details.

Download PyCharm 2023.2 EAP

UX Text search in Search Everywhere

The Search Everywhere (Double ⇧ / Double Shift) functionality, primarily utilized for searching through files, classes, methods, actions, and settings, now includes text search capabilities similar to Find in Files. With this enhancement, text search results are displayed when there are few or no other search results available for a given query. The feature is enabled by default and can be managed in Settings/Preferences | Advanced Settings | Search Everywhere.

Dedicated syntax highlighting for Python local variables

PyCharm 2023.2 will provide a dedicated syntax highlighting option for local variables. To use it, go to Settings | Editor | Color Scheme | Python and choose Local variables from the list of available options. 

By default, the highlighting is set to inherit values from the Language Defaults identifiers. By unchecking this checkbox, you can choose the highlighting scheme that works best for you. 

Syntax highlighting in inspection descriptions 

In Settings / Preferences | Editor | Inspections, we’ve implemented syntax highlighting for code samples, which facilitates comprehension of any given inspection and its purpose.

Support for Polars DataFrames

PyCharm 2023.2 will allow you to work with a new, blazingly fast DataFrame library written in Rust – Polars

In PyCharm, you can work with interactive Polars tables in Jupyter notebooks. In the Python console, you can inspect Polars DataFrames via the View as DataFrame option in the Special Variables list. Both Python and Jupyter debuggers work with Polars as well.  

PyCharm will provide information about the type and dimensions of the tables, complete names and types of the columns, and allow you to use sorting for the tables. 

Note that Polars DataFrames are not supported in Scientific mode.

Please try Polars support and share your feedback with us in the comments section, on Twitter, or in our issue tracker.

Web development New live templates for Django forms and models

As part of Django support, PyCharm has traditionally provided a list of live templates for Django template files. PyCharm 2023.2 will extend this functionality to Django forms, models, generic views, and admin. Live templates will let you insert common fields for Django views, forms, and models by typing short abbreviations.

You can find the new templates and settings for them in Settings | Editor | Live Templates | Django. To edit the existing templates or create a new one, refer to the PyCharm help page.

The list of live templates that can be used to quickly create Django tags in the template files has also been enlarged. You can find the updated list via Settings | Editor | Live Templates | Django Templates.

Frontend development Volar support for Vue

We have some great news for those using Vue in PyCharm! We’ve implemented Volar support for Vue to support the changes in TypeScript 5.0. This should provide more accurate error detection, aligned with the Vue compiler. The new integration is still in early development and we would appreciate it if you could give it a try and provide us with any feedback you have.

To set the Vue service to use Volar integration on all TypeScript versions, go to Settings | Languages & Frameworks | TypeScript | Vue. By default, Volar will be used for TypeScript versions 5.0 and higher, and our own implementation will be used for TypeScript versions lower than 5.0.

In the future, we’ll consider enabling the Volar integration by default instead of our own implementation used for Vue and TypeScript.

CSS: Convert color to LCH and OKLCH

In PyCharm 2022.3, we added support for the new CSS color modification functions. This provided PyCharm users with a number of color conversion actions. For instance, you can change RGB to HSL, and vice versa. We are expanding this support in PyCharm 2023.2 to include conversion of LCH and OKLCH with other color functions.

Next.js custom documentation support

Next.js 13.1 now includes a plugin for the TypeScript Language Service specifically for the new app directory. This plugin offers suggestions for configuring pages and layouts, as well as helpful hints for using both Server and Client Components. It also comes with custom documentation, which adds extra information to the output of the TypeScript Language Service. It’s now possible to view this custom documentation in PyCharm.

VCS: GitLab integration

PyCharm 2023.2 EAP 2 introduces initial integration with GitLab, allowing you to work with the Merge Request functionality right from the IDE, streamlining your development workflow. To add your GitLab account go to Settings | Version Control | GitLab.

Notable bug fixes

We fixed the issue with debugging multiprocessing code on MacOS ARM that was caused by a missing dylib file. [PY-48163]

For PowerShell 7, venv is now activated correctly in the Terminal. [PY-58019]

These are the most notable updates for this week. To see the full list of changes in this EAP build, please refer to the release notes.

If you encounter any bugs while working with this build, please submit a report using our issue tracker. If you have any questions or feedback, let us know in the comments below or get in touch with our team on Twitter.

Categories: FLOSS Project Planets

The Drop Times: Panel to Explore Empowering the Drupal Community | DrupalCon NA

Planet Drupal - Thu, 2023-06-01 09:49
Join industry experts at DrupalCon Pittsburgh as they explore strategies to empower and accelerate the Drupal community's progress towards its strategic goals. Discover opportunities to broaden adoption, enable experts, drive contributions, and organize better. Be part of the conversation and help shape the future of Drupal
Categories: FLOSS Project Planets

Holger Levsen: 20230601-developers-reference-translations

Planet Debian - Thu, 2023-06-01 09:39
src:developers-reference translations wanted

I've just uploaded developers-reference 12.19, bringing the German translation status back to 100% complete, thanks to Carsten Schoenert. Some other translations however could use some updates:

$ make status for l in de fr it ja ru; do \ if [ -d source/locales/$l/LC_MESSAGES ] ; then \ echo -n "Stats for $l: " ; \ msgcat --use-first source/locales/$l/LC_MESSAGES/*.po | msgfmt --statistics - 2>&1 ; \ fi ; \ done Stats for de: 1374 translated messages. Stats for fr: 1286 translated messages, 39 fuzzy translations, 49 untranslated messages. Stats for it: 869 translated messages, 46 fuzzy translations, 459 untranslated messages. Stats for ja: 891 translated messages, 26 fuzzy translations, 457 untranslated messages. Stats for ru: 870 translated messages, 44 fuzzy translations, 460 untranslated messages.
Categories: FLOSS Project Planets

Russell Coker: Do Desktop Computers Make Sense?

Planet Debian - Thu, 2023-06-01 08:38
Laptop vs Desktop Price

Currently the smaller and cheaper USB-C docks start at about $25 and Dell has a new Vostro with 8G of RAM and 2*USB-C ports for $788. That gives a bit over $800 for a laptop and dock vs $795 for the cheapest Dell desktop which also has 8G of RAM. For every way of buying laptops and desktops (EG buying from Officeworks, buying on ebay, etc) the prices for laptops and desktops seem very similar. For all those comparisons the desktop will typically have a faster CPU and more options for PCIe cards, larger storage, etc. But if you don’t want to expand storage beyond the affordable 4TB NVMe/SSD devices, don’t need to add PCIe cards, and don’t need much CPU power then a laptop will do well. For the vast majority of the computer work I do my Thinkpad Carbon X1 Gen1 (from 2012) had plenty of CPU power.

If someone who’s not an expert in PC hardware was to buy a computer of a given age then laptops probably aren’t more expensive than desktops even disregarding the fact that a laptop works without the need to purchase a monitor, a keyboard, or a mouse. I can get regular desktop PCs for almost nothing and get parts to upgrade them very cheaply but most people can’t do that. I can also get a decent second-hand laptop and USB-C dock for well under $400.

Servers and Gaming Systems

For people doing serious programming or other compute or IO intensive tasks some variation on the server theme is the best option. That may be something more like the servers used by the r/homelab people than the corporate servers, or it might be something in the cloud, but a server is a server. If you are going to have a home server that’s a tower PC then it makes sense to put a monitor on it and use it as a workstation. If your server makes so much noise that you can’t spend much time in the same room or if it’s hosted elsewhere then using a laptop to access it makes sense.

Desktop computers for PC gaming makes sense as no-one seems to be making laptops with moderately powerful GPUs. The most powerful GPUs draw 150W which is more than most laptop PSUs can supply and even if a laptop PSU could supply that much there would be the issue of cooling. The Steam Deck [1] and the Nintendo Switch [2] can both work with USB-C docks. The PlayStation 5 [3] has a 350W PSU and doesn’t support video over USB-C. The Steam Deck can do 8K resolution at 60Hz or 4K at 120Hz but presumably the newer Steam games will need a desktop PC with a more powerful GPU to properly use such resolutions.

For people who want the best FPS rates on graphics intensive games it could make sense to have a tower PC. Also a laptop that’s run at high CPU/GPU use for a long time will tend to have it’s vents clogged by dust and possibly have the cooling fan wear out.

Monitor Resolution

Laptop support for a single 4K monitor became common in 2012 with the release of the Ivy Bridge mobile CPUs from Intel in 2012. My own experience of setting up 4K monitors for a Linux desktop in 2019 was that it was unreasonably painful and that the soon to be released Debian/Bookworm will make things work nicely for 4K monitors with KDE on X11. So laptop hardware has handled the case of a single high resolution monitor since before such monitors were cheap or common and before software supported it well. Of course at that time you had to use either a proprietary dock or a mini-DisplayPort to HDMI adaptor to get 4K working. But that was still easier than getting PCIe video cards supporting 4K resolution which is something that according to spec sheets wasn’t well supported by affordable cards in 2017.

Since USB-C became a standard feature in laptops in about 2017 support of more monitors than most people would want through a USB-C dock became standard. My Thinkpad X1 Carbon Gen5 which was released in 2017 will support 2*FullHD monitors plus a 4K monitor via a USB-C dock, I suspect it would do at least 2*4K monitors but haven’t had a chance to test. Cheap USB-C docks supporting this sort of thing have only become common in the last year or so.

How Many Computers per Home

Among middle class Australians it’s common to have multiple desktop PCs per household. One for each child who’s over the age of about 13 and one for the parents seems to be reasonably common. Students in the later years of high-school and university students are often compelled to have laptops so having the number of laptops plus the number of desktops be larger than the population of the house probably isn’t uncommon even among people who aren’t really into computers. As an aside it’s probably common among people who read my blog to have 2 desktops, a laptop, and a cloud server for their own personal use. But even among people who don’t do that sort of thing having computers outnumber people in a home is probably common.

A large portion of the computer users can do everything they need on a laptop. For gamers the graphics intensive games often run well on a console and that’s probably the most effective way of getting to playing the games. Of course the fact that there is “RGB RAM” (RAM with Red, Green, and Blue LEDs to light up) along with a lot of other wild products sold to gamers suggests that gaming PCs are not about what runs the game most effectively and that an art/craft project with the PC is more important than actually playing games.

Instead of having one desktop PC per bedroom and laptops for school/university as well it would make more sense to have a laptop per person and have a USB-C dock and monitor in each bedroom and a USB-C dock connected to a large screen TV in the lounge. This gives plenty of flexibility for moving around to do work and sharing what’s on your computer with other people. It also allows taking a work computer home and having work with your monitor, having a friend bring their laptop to your home to work on something together, etc.

For most people desktop computers don’t make sense. While I think that convergence of phones with laptops and desktops is the way of the future [4] for most people having laptops take over all functions of desktops is the best option today.

Related posts:

  1. Seatbelts and Transporting Computers I’ve just read an interesting post at Making Light about...
  2. Linux on the Desktop I started using Linux in 1993. I initially used it...
  3. Desktop Equivalent Augmented Reality Augmented reality is available on all relatively modern smart phones....
Categories: FLOSS Project Planets

Jamie McClelland: Enough about the AI Apocalypse Already

Planet Debian - Thu, 2023-06-01 08:27

After watching Democracy Now’s segment on artificial intelligence I started to wonder - am I out of step on this topic?

When people claim artificial intelligence will surpass human intelligence and thus threaten humanity with extinction, they seem to be referring specifically to advances made with large language models.

As I understand them, large language models are probability machines that have ingested massive amounts of text scraped from the Internet. They answer questions based on the probability of one series of words (their answer) following another series of words (the question).

It seems like a stretch to call this intelligence, but if we accept that definition then it follows that this kind of intelligence is nothing remotely like human intelligence, which makes the claim that it will surpass human intelligence confusing. Hasn’t this kind of machine learning surpassed us decades ago?

Or when we say “surpass” does that simply refer to fooling people into thinking an AI machine is a human via conversation? That is an important milestone, but I’m not ready to accept the turing test as proof of equal intelligence.

Furthermore, large language models “hallucinate” and also reflect the biases of their training data. The word “hallucinate” seems like a euphemism, as if it could be corrected with the right medication when in fact it seems hard to avoid when your strategy is to correlate words based on probability. But even if you could solve the “here is a completely wrong answer presented with sociopathic confidence” problem, reflecting the biases of your data sources seems fairly intractable. In what world would a system with built-in bias be considered on the brink of surpassing human intelligence?

The danger from LLMs seems to be their ability to convince people that their answers are correct, including their patently wrong and/or biased answers.

Why do people think they are giving correct answers? Oh right… terrifying right wing billionaires (with terrifying agendas have been claiming AI will exceed human intelligence and threaten humanity and every time they sign a hyperbolic statement they get front page mainstream coverage. And even progressive news outlets are spreading this narrative with minimal space for contrary opinions (thank you Tawana Petty from the Algorithmic Justice League for providing the only glimpse of reason in the segment).

The belief that artificial intelligence is or will soon become omnipotent has real world harms today: specifically it creates the misperception that current LLMs are accurate, which paves the way for greater adoption among police forces, social service agencies, medical facilities and other places where racial and economic biases have life and death consequences.

When the CEO of OpenAI calls the technology dangerous and in need of regulation, he gets both free advertising promoting the power and supposed accuracy of his product and the possibility of freezing further developments in the field that might challenge OpenAI’s current dominance.

The real threat to humanity is not AI, it’s massive inequality and the use of tactics ranging from mundane bureaucracy to deadly force and incarceration to segregate the affluent from the growing number of people unable to make ends meet. We have spent decades training bureaucrats, judges and cops to robotically follow biased laws to maintain this order without compassion or empathy. Replacing them with AI would be make things worse and should be stopped. But, let’s be clear, the narrative that AI is poised to surpass human intelligence and make humanity extinct is a dangerous distraction that runs counter to a much more important story about “the very real and very present exploitative practices of the [companies building AI], who are rapidly centralizing power and increasing social inequities.”.

Maybe we should talk about that instead?

Categories: FLOSS Project Planets

Stack Abuse: Simple NLP in Python with TextBlob: Lemmatization

Planet Python - Thu, 2023-06-01 08:24

TextBlob is a package built on top of two other packages, one of them is called Natural Language Toolkit, known mainly in its abbreviated form as NLTK, and the other is Pattern. NLTK is a traditional package used for text processing or Natural Language Processing (NLP), and Pattern is built mainly for web mining.

TextBlob is designed to be easier to learn and manipulate than NLTK, while maintaining the same important NLP tasks such as lemmatization, sentiment analysis, stemming, POS-tagging, noun phrase extraction, classification, translation, and more. You can see a complete list of tasks on the PyPI's TextBlob page.

If you are looking for a practical overview of many NLP tasks that can be executed with TextBlob, take a look at our "Python for NLP: Introduction to the TextBlob Library" guide.

There are no special technical prerequisites needed for employing TextBlob. For instance, the package is applicable for both Python 2 and 3 (Python >= 2.7 or >= 3.5).

Also, in case you don't have any textual information at hand, TextBlob provides the necessary collections of language data (usually texts), called corpora, from the NLTK database.

Installing TextBlob

Let's start by installing TextBlob. If you are using a terminal, command-line, or command prompt, you can enter:

$ pip install textblob

Otherwise, if you are using a Jupyter Notebook, you can execute the command directly from the notebook by adding an exclamation mark ! at the beginning of the instruction:

!pip install textblob

Note: This process can take some time due to the broad number of algorithms and corpora that this library contains.

After installing TextBlob, in order to have text examples, you can download the corpora by executing the python -m textblob.download_corpora command. Once again, you can execute it directly in the command line or in a notebook by preceding it with an exclamation mark.

When running the command, you should see the output below:

$ python -m textblob.download_corpora [nltk_data] Downloading package brown to /Users/csamp/nltk_data... [nltk_data] Package brown is already up-to-date! [nltk_data] Downloading package punkt to /Users/csamp/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package wordnet to /Users/csamp/nltk_data... [nltk_data] Package wordnet is already up-to-date! [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /Users/csamp/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date! [nltk_data] Downloading package conll2000 to /Users/csamp/nltk_data... [nltk_data] Unzipping corpora/conll2000.zip. [nltk_data] Downloading package movie_reviews to [nltk_data] /Users/csamp/nltk_data... [nltk_data] Unzipping corpora/movie_reviews.zip. Finished.

We have already installed the TextBlob package and its corpora. Now, let's understand more about lemmatization.

For more TextBlob content, check out our Simple NLP in Python with TextBlob: Tokenization, Simple NLP in Python with TextBlob: N-Grams Detection, and Sentiment Analysis in Python with TextBlob guides.

What is Lemmatization?

Before going deeper into the field of NLP, you should be able to recognize some key terms:

Corpus (or corpora in plural) - is a specific collection of language data (e.g., texts). Corpora are typically used for training various models of text classification or sentiment analysis, for instance.

Lemma - is the word you would look for in a dictionary. For instance, if you want to look at the definition for the verb "runs", you would search for "run".

Stem - is a part of a word that never changes.

What is lemmatization itself?

Lemmatization is the process of obtaining the lemmas of words from a corpus.

An illustration of this could be the following sentence:

  • Input (corpus): Alice thinks she is lost, but then starts to find herself
  • Output (lemmas): | Alice | think | she | is | lost | but | then | start | to | find | herself |

Notice that each word in the input sentence is lemmatized according to its context in the original sentence. For instance, "Alice" is a proper noun, so it stays the same, and the verbs "thinks" and "starts" are referenced in their base forms of "think" and "start".

Lemmatization is one of the basic stages of language processing. It brings words to their root forms or lemmas, which we would find if we were looking for them in a dictionary.

In the case of TextBlob, lemmatization is based on a database called WordNet, which is developed and maintained by Princeton University. Behind the scenes, TextBlob uses WordNet's morphy processor to obtain the lemma for a word.

Note: For further reference on how lemmatization works in TextBlob, you can take a peek at the documentation.

You probably won't notice significant changes with lemmatization unless you're working with large amounts of text. In that case, lemmatization helps reduce the size of words we might be searching for while trying to preserve their context in the sentence. It can be applied further in developing models of machine translation, search engine optimization, or various business inquiries.

Implementing Lemmatization in Code

First of all, it's necessary to establish a TextBlob object and define a sample corpus that will be lemmatized later. In this initial step, you can either write or define a string of text to use (as in this guide), or we can use an example from the NLTK corpus we have downloaded. Let's go with the latter.

Choosing a Review from the NLTK Corpus

For example, let's try to obtain the lemmas for a movie review that is in the corpus. To do this, we import both the TextBlob library and the movie_reviews from the nltk.corpus package:

# importing necessary libraries from textblob import TextBlob from nltk.corpus import movie_reviews

After importing, we can take a look at the movie reviews files with the fileids() method. Since this code is running in a Jupyter Notebook, we can directly execute:


This will return a list of 2,000 text file names containing negative and positive reviews:

['neg/cv000_29416.txt', 'neg/cv001_19502.txt', 'neg/cv002_17424.txt', 'neg/cv003_12683.txt', 'neg/cv004_12641.txt', 'neg/cv005_29357.txt', 'neg/cv006_17022.txt', 'neg/cv007_4992.txt', 'neg/cv008_29326.txt', 'neg/cv009_29417.txt', ...]

Note: If you are running the code in another way, for instance, in a terminal or IDE, you can print the response by executing print(movie_reviews.fileids()).

By looking at the neg in the name of the file, we can assume that the list starts with the negative reviews and ends with the positive ones. We can look at a positive review by indexing from the end of the list. Here, we are choosing the 1,989th review:


This results in:


To examine the review sentences, we can pass the name of the review to the .sents() method, which outputs a list of all review sentences:

movie_reviews.sents('pos/cv990_11591.txt') [['the', 'relaxed', 'dude', 'rides', 'a', 'roller', 'coaster', 'the', 'big', 'lebowski', 'a', 'film', 'review', 'by', 'michael', 'redman', 'copyright', '1998', 'by', 'michael', 'redman', 'the', 'most', 'surreal', 'situations', 'are', 'ordinary', 'everyday', 'life', 'as', 'viewed', 'by', 'an', 'outsider', '.'], ['when', 'those', 'observers', 'are', 'joel', 'and', 'ethan', 'coen', ',', 'the', 'surreal', 'becomes', 'bizarre', '.'], ...]

Let's store this list in a variable called pos_review:

pos_review = movie_reviews.sents("pos/cv990_11591.txt") len(pos_review) #returns 63

Here, we can see that there are 63 sentences. Now, we can select one sentence to lemmatize, for instance, the 15th sentence:

sentence = pos_review[16] type(sentence) # returns list Creating a TextBlob Object

After selecting the sentence, we need to create a TextBlob object to be able to access the .lemmatize() method. TextBlob objects need to be created from strings. Since we have a list, we can convert it to a string with the string.join() method, joining based on blank spaces:

sentence_string = ' '.join(sentence)

Now that we have our sentence string, we can pass it to the TextBlob constructor:

blob_object = TextBlob(sentence_string)

Once we have the TextBlob object, we can perform various operations, such as lemmatization.

Lemmatization of a Sentence

Finally, to get the lemmatized words, we simply retrieve the words attribute of the created blob_object. This gives us a list containing Word objects that behave very similarly to string objects:

# Word tokenization of the sentence corpus corpus_words = blob_object.words # To see all tokens print('sentence:', corpus_words) # To count the number of tokens number_of_tokens = len(corpus_words) print('\nnumber of tokens:', number_of_tokens)

The output commands should give you the following:

sentence: ['the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pulls', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there'] number of tokens: 22

To lemmatize the words, we can just use the .lemmatize() method:


This gives us a lemmatized WordList object:

WordList(['the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pull', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there'])

Since this might be a little difficult to read, we can do a loop and print each word before and after lemmatization:

for word in corpus_words: print(f'{word} | {word.lemmatize()}')

This results in:

the | the carpet | carpet is | is important | important to | to him | him because | because it | it pulls | pull the | the room | room together | together not | not surprisingly | surprisingly since | since it | it s | s virtually | virtually the | the only | only object | object there | there

Notice how "pulls" changed to "pull"; the other words, besides "it's," were also lemmatized as expected. We can also see that "it's" has been separated due to the apostrophe. This indicates we can further pre-process the sentence so that "it's" is considered a word instead of "it" and an "s".

Difference Between Lemmatization and Stemming

Lemmatization is often confused with another technique called stemming. This confusion occurs because both techniques are usually employed to reduce words. While lemmatization uses dictionaries and focuses on the context of words in a sentence, attempting to preserve it, stemming uses rules to remove word affixes, focusing on obtaining the stem of a word.

Let's quickly modify our for loop to look at these differences:

print('word | lemma | stem\n') for word in corpus_words: print(f'{word} | {word.lemmatize()} | {word.stem()}')

This outputs:

the | the | the carpet | carpet | carpet is | is | is important | important | import to | to | to him | him | him because | because | becaus it | it | it pulls | pull | pull the | the | the room | room | room together | together | togeth not | not | not surprisingly | surprisingly | surprisingli since | since | sinc it | it | it s | s | s virtually | virtually | virtual the | the | the only | only | onli object | object | object there | there | there

When looking at the above output, we can see how stemming can be problematic. It reduces "important" to "import", losing all the meaning of the word, which can even be considered a verb now; "because" to "becaus", which is a word that doesn't exist, same for "togeth", "surprisingli", "sinc", "onli".

There are clear differences between lemmatization and stemming. Understanding when to utilize each technique is the key. Suppose you are optimizing a word search and the focus is on being able to suggest the maximum amount of similar words, which technique would you use? When word context doesn't matter, and we could retrieve "important" with "import", the clear choice is stemming. On the other hand, if you are working on document text comparison, in which the position of the words in a sentence matters, and the context "importance" needs to be maintained and not confused with the verb "import", the best choice is lemmatization.

In the last scenario, suppose you are working on a word search followed by a retrieved document text comparison, what will you use? Both stemming and lemmatization.

We have understood the differences between stemming and lemmatization; now let's see how we can lemmatize the whole review instead of just a sentence.

Lemmatization of a Review

To lemmatize the entire review, we only need to modify the .join(). Instead of joining words in a sentence, we will join sentences in a review:

# joining each sentence with a new line between them, and a space between each word corpus_words = '\n'.join(' '.join(sentence) for sentence in pos_review)

After transforming the corpus into a string, we can proceed in the same way as it was for the sentence to lemmatize it:

blob_object = TextBlob(pos_rev) corpus_words = blob_object.words corpus_words.lemmatize()

This generates a WordList object with the full review text lemmatized. Here, we are omitting some parts with an ellipsis (...) since the review is large, but you will be able to see it in its integral form. We can spot our sentence in the middle of it:

WordList(['the', 'relaxed', 'dude', 'rides', 'a', 'roller', 'coaster', 'the', 'big', 'lebowski', 'a', 'film', 'review', 'by', 'michael', 'redman', 'copyright', '1998', 'by', 'michael', 'redman', 'the', 'most', 'surreal', 'situations', 'are', 'ordinary', 'everyday', 'life', 'as', 'viewed', 'by', 'an', 'outsider', 'when', 'those', 'observers', 'are', 'joel', (...) 'the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pulls', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there' (...) 'com', 'is', 'the', 'eaddress', 'for', 'estuff']) Conclusion

After lemmatizing the sentence and the review, we can see that both extract the corpus words first. This means lemmatization occurs at a word level, which also implies that it can be applied to a word, a sentence, or a full text. It works for a word or any collection of words.

This also suggests that it might be slower since it is necessary to break the text first into tokens to later apply it. And since lemmatization is context-specific, as we have seen, it is also crucial to have a good pre-processing of the text before using it, ensuring the correct breakdown into tokens and the appropriate part of speech tagging. Both will enhance results.

If you are not familiar with Part of Speech tagging (POS-tagging), check our Python for NLP: Parts of Speech Tagging and Named Entity Recognition guide.

We have also seen how lemmatization is different from stemming, another technique for reducing words that doesn't preserve their context. For this reason, it is usually faster.

There are many ways to perform lemmatization, and TextBlob is a great library for getting started with NLP. It offers a simple API that allows users to quickly begin working on NLP tasks. Leave a comment if you have used lemmatization in a project or plan to use it.

Happy coding!

Categories: FLOSS Project Planets

Tryton News: Newsletter June 2023

Planet Python - Thu, 2023-06-01 04:00

After the Tryton 6.8 release the developers are sprinting toward the next long term support (LTS) version 7.0 which is planned to be released in November 2023. Also please don’t miss our News from the Tryton Unconference 2023 in Berlin.

Changes for the User

In CSV exports with the option selected to use locale format, we now use the local time zone for date-time fields.

In the project_invoice module we now show the invoice line field on the time sheet line form. This is useful to get an understanding of the invoiced hours. The field is not shown if the user doesn’t have read access to invoice lines.

To have a clearer picture of the debts of a company, we’ve added receivables and payables to it.

Invoice lines now have pay, block and unblock payment buttons to be able to directly create payments from within the invoice form.

Changes for the System Administrator

Now, when saving a CSV export definition, the options to export listed records or selected records and to ignore search limit are also saved. Printing an export as a report will also make use of the new saved options.

We removed migrations prior to 5.0.

Changes for the Developer

We moved the representation of sum in XML from the bottom row of the Tryton client into the column header. And the sum attribute is now converted into a boolean type.

Authors: @udono @dave @pokoli

1 post - 1 participant

Read full topic

Categories: FLOSS Project Planets

KStars v3.6.5 is Released

Planet KDE - Thu, 2023-06-01 03:46


KStars v3.6.5 is released on 2023.06.01 for MacOS, Linux, and Windows. It's a bi-monthly bugfix release with a couple of exciting features.
Sky Map Rotation
Akarsh Simha added a new feature to allow the user to rotate the sky map. It also allows some standard settings like inverted view. Here are some the highlights:
  1. Rotate the sky-map freely: Shift + mouse drag on the sky map
  2. Pick pre-defined orientations: Zenith Up / Zenith Down / North Up / North Down depending on the coordinate system being used
  3. A magic mode for Dobsonians: The Erect Observer Correction feature, when selected along with Horizontal Coordinates / Zenith Down settings, will reproduce the orientation seen in the eyepiece of a Dobsonian. May need a one-time adjustment for your specific Dobsonian using the shift + drag feature.

Optimal Sub-Exposure Calculator
Joseph McGee made his first contributor to KStars with the Optimal Sub-Exposure Calculator. This is the first iteration of the calculator and only a handful of camera profiles is supported. There are different points of view within the astrophtography community on how optimal sub-exposure should be calculated and whether we should consider other factors such as processing time given the volume of data produced. Your feedback would be appreciated on this first iteration of the calculator.

Implementation of an optimal sub-exposure calculator based upon the work of, and presentation by, Dr Robin Glover. The calculator considers multiple inputs to determine a sub-exposure time which will provide minimal overall noise in the image:

  • A sky quality measurement (SQM) for light pollution
  • The optic focal length
  • A filter bandwidth
  • Camera read-noise (based upon gain/iso)
  • An optional adjustment to the allowable increase in noise from light pollution

As inputs are adjusted the calculator will refresh graphic presentation of potential exposure times of the range of gains, and update calculated outputs. The output values are separated into two sections: one for the sub-exposure, and another for image stacks of various integration times.

The sub-exposure outputs are:

  • the optimal sub-exposure time
  • the count of electrons produced from light-pollution
  • the shot noise, (noise from light pollution)
  • the total exposure noise, (the combined noise from light-pollution and camera read-noise)

The image stack information is presented in a table showing:

  • planned integration hours
  • the count of exposures to reach the planned integration hours
  • the actual stack (integration time) in seconds
  • the noise for the stack
  • a ration of stack time to noise, (as a indicator of quality)

An instance of the sub-exposure calculator can be started from a new 'clock' icon on the ekos capture screen. Multiple instances of the sub-exposure calculator can be started concurrently so that side-by-side comparisons can be made for variations in inputs.

Data for camera read-noise will be provided through individual xml files which will be user maintained and published in a repository. These camera data files persisted within a folder "exposure_calculator" under user/local/share/kstars. The calculator includes the capability to download camera files from a repository. Upon the initial start of the calculator at least one camera data file download will be mandatory before the calculator can be instantiated.

The intent is that camera data file names will be used to allow the calculator to select an appropriate camera data file based upon the device id of the active camera. (But some of the initial camera files were named using educated guesses, and will likely need to be re-named).

Rotator Dialog Improvements
Toni Schriber merged improvements and fixes for the Rotator Dialog. As shown in the illustrations the user interface is very simple and there is only one parameter to set: The Camera Position Angle. It is a very consistent term and easy to understand. The same Position Angle (PA) is also used in Alignment, Scheduler, and the Sky Map.

In the gauge this angle is presented in the same color as the FOV in the planetarium sky and in viewing direction. This way one can relate and understand this angle intuitively. The rotator angle is presented in gray and also in viewing direction. This angle is calculated from the Camera PA and the Cameras Offset Angle which is calibrated each time a [Capture & Solve] or a [Load & Slew] is brought into action. For further clarity the rotator angle and the camera offset is displayed again in a information window together with the current pier side.
The Rotator Settings can be accessed either in the Capture or Align modules.

Focus Linear 1 Pass Improvements
John Evans continued his phenomenal improvements to Ekos Focus module with L1P Phase 2 changes as detailed in the Linear Focus Phase 2 document. Here are the highlights:
  1. Optimized curve fitting . Should be faster and more accurate and includes outlier rejection.
  2. Currently HFR is the only fitting "measure" available. The following have been added: a) HFR Adj (adjusted HFR to compensate for star brightness vs background variation) b) FWHM c) Number stars (max at focus rather than a min) d) Fourier Power (alternative focus method not requiring star detection)
  3. Focus units can be displayed in pixels or arc-seconds.
  4. Critical Focus Zone - a calculator with 3 different algorithms has been added
  5. Focus Offset Utility to automatically build focus offsets.
  6. Take flats at same focus point as lights has been fixed.
  7. Focus Adviser. Still working on this but a tool to help with focus parameters (since there are now so many).
  8. SEP parameters suggestions for focus. Keen to get some feedback on this.
  9. Adaptive focus to adapt the focuser position between Autofocus runs, aiming to keep optimum focus for each sub-frame. Adaptations for Temperature and Altitude are supported.
  10. Adaptive focuser start . The starting position for an AF run can be filter and Adaptive Focus adjusted.
  11. Focus walks added to control how the inward sweep of the AF process performs.
  12. AF Overscan originally implemented in the Linear algorithm and then reused by Linear 1 Pass now extended to all focuser movements.

In addition to HFR, you can now use different measurements (FHWM, # of Stars, Fourier Power) that may work well with your setup and environment. Here are some focus runs with each of the new measurements types. You will notice that the solutions are very close to each other.

# Of Stars

Fourier Power

Focus Aberration Inspector

Wolfgang Reissenberger introduced the mosaic view well known from PixInsight's AberrationInspector script that builds a mosaic from all image corners and center tiles such that they can be compared directly.

Supernovae are back

The last few releases was missing supernovae data since the online source that was providing the data decided to go offline. Thankfully, Philipp Auersperg-Castell communicated with the fine folks over the Transient Name Server (IAU Supernovae Working Group) to obtain daily supernovae updates and imported them to KStars. All Recent supernovae should be available now in KStars.

Categories: FLOSS Project Planets

Electric Citizen: See you at DrupalCon 2023

Planet Drupal - Thu, 2023-06-01 03:06

Coming to DrupalCon Pittsburgh? Be sure to stop by our booth and say hello!

Once again, Electric Citizen is sponsoring the great Drupal get-together, and we're looking forward to catching up with old friends and making new ones while we're there.

Whether you are looking for an agency partner for your organization, or looking for your next job in Drupal, stop by our booth in the exhibit hall to chat.

Categories: FLOSS Project Planets

Junichi Uekawa: Already June.

Planet Debian - Wed, 2023-05-31 22:23
Already June.

Categories: FLOSS Project Planets

Promet Source: 6 Steps to Build a Solid Banner Component in Drupal 10

Planet Drupal - Wed, 2023-05-31 22:20
We’re well familiar with the saying that “You never get a second chance to make a first impression.” Nowhere is that more true than on a web page. The banner component is an essential element in effectively conveying the brand and the culture within a framework for design flexibility. There’s a lot riding on getting it right.   The question is: what makes a banner work, and how do we build it properly so that it is reusable, responsive, configurable, aesthetically appealing, and useful? 
Categories: FLOSS Project Planets

Paul Wise: FLOSS Activities May 2023

Planet Debian - Wed, 2023-05-31 20:09

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes Issues Review Administration
  • Debian IRC: set topic on new #debian-sa channel
  • Debian wiki: unblock IP addresses, approve accounts
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

The SIMDe, gensim, sptag work was sponsored. All other work was done on a volunteer basis.

Categories: FLOSS Project Planets

Matt Layman: New SaaS Signup - Building SaaS with Python and Django #161

Planet Python - Wed, 2023-05-31 20:00
In this episode, we dug into the first portion of the journey SaaS. I acquired the domain name of journeyinbox.com for this service. That service is not live yet. We started at the beginning to set up users and sign up features.
Categories: FLOSS Project Planets

Default parameters and virtual (C++20)

Planet KDE - Wed, 2023-05-31 18:00

I was messing around with some code that implements a widget hierarchy today, and bumped into some interesting edge cases of virtual functions. Parameters with a default value behave in an unexpected way.

Here is a vastly simplified code snippet:

#include <fmt/core.h> struct Base { virtual void Func(int v = 42) = 0; }; struct DerivedA : public Base { void Func(int v) override { fmt::print("A={}\n", v); } }; struct DerivedB : public Base { void Func(int v = 8) override { fmt::print("B={}\n", v); } };

There is a abstract base class (with a virtual function defined = 0) and two derived classes which override the virtual function in the base. Note that one of the derived classes forgets the default value of the parameter, and the other gives it a different value.

Note the use of the contextual keyword override. See item 12 in Scott Meyers Effective Modern C++. It makes the compiler complain in the function declaration that is decorated with it, is not actually an override. Examples of not-an-override come from typo’s, different return or parameter types, different constness .. there’s a bunch of ways to get it wrong, which is why the advice is to use override liberally.

Let’s call those virtual functions via a pointer-to-base and a pointer-to-derived in all four possible variations, shall we?

int main() { Base * ba = new DerivedA; Base * bb = new DerivedB; auto * da = new DerivedA; auto * db = new DerivedB; ba->Func(); bb->Func(); da->Func(3); db->Func(); }

You may ask: why does da->Func() need a parameter? Well, there is no default given in the declaration in the derived class. The default value provided in the base class is hidden.

If I leave the value 3 out, then clang suggests that I call Base::Func() instead. That compiles, and then fails to link because – and that’s the whole point – Base::Func() is pure virtual.

The output of the program is this:

A=42 B=42 A=3 B=8

When called through a pointer-to-base, the default value from the declaration in the base class is used. When called through a pointer-to-derived, the default value from the declaration in that derived class is used (or, if there is none, then you need to provide a value).


Now that I ran into this, I looked it up on cppreference, which says

The overriders of virtual functions do not acquire the default arguments from the base class declarations, and when the virtual function call is made, the default arguments are decided based on the static type of the object.

In the context of the codebase I’m working on today, this translates to

Do not provide default arguments in the declaration of abstract virtual functions.

Categories: FLOSS Project Planets

Test and Code: 202: Using Towncrier to Keep a Changelog

Planet Python - Wed, 2023-05-31 17:15

Hynek joins the show to discuss towncrier.

At the top of the towncrier documentation, it says "towncrier is a utility to produce useful, summarized news files (also known as changelogs) for your project."

Towncrier is used by "Twisted, pytest, pip, BuildBot, and attrs, among others."

This is the last of 3 episodes focused on keeping a CHANGELOG.

Episode 200 kicked off the series with keepachangelog.com and Olivier Lacan
In 201 we had Ned Batchelder discussing scriv.

Special Guest: Hynek Schlawack.


<p>Hynek joins the show to discuss towncrier. </p> <p>At the top of the towncrier documentation, it says &quot;towncrier is a utility to produce useful, summarized news files (also known as changelogs) for your project.&quot;</p> <p>Towncrier is used by &quot;Twisted, pytest, pip, BuildBot, and attrs, among others.&quot;</p> <p>This is the last of 3 episodes focused on keeping a CHANGELOG. </p> <p><a href="https://testandcode.com/200" rel="nofollow">Episode 200</a> kicked off the series with keepachangelog.com and Olivier Lacan<br> In <a href="https://testandcode.com/201" rel="nofollow">201</a> we had Ned Batchelder discussing scriv.</p><p>Special Guest: Hynek Schlawack.</p><p>Links:</p><ul><li><a href="https://towncrier.readthedocs.io/en/stable/" title="Towncrier docs" rel="nofollow">Towncrier docs</a></li><li><a href="https://towncrier.readthedocs.io/en/stable/markdown.html" title="How to Keep a Changelog in Markdown - Towncrier docs" rel="nofollow">How to Keep a Changelog in Markdown - Towncrier docs</a></li><li><a href="https://keepachangelog.com/en/1.0.0/" title="Keep a Changelog" rel="nofollow">Keep a Changelog</a></li><li><a href="https://github.com/hynek/structlog/blob/main/CHANGELOG.md" title="structlog/CHANGELOG.md" rel="nofollow">structlog/CHANGELOG.md</a> &mdash; Example of manually edited changelog.</li><li><a href="https://github.com/hynek/hatch-fancy-pypi-readme" title="hatch-fancy-pypi-readme" rel="nofollow">hatch-fancy-pypi-readme</a></li><li><a href="https://myst-tools.org/" title="MyST Markdown " rel="nofollow">MyST Markdown </a></li><li><a href="https://pypi.org/project/hatchling/" title="hatchling" rel="nofollow">hatchling</a></li></ul>
Categories: FLOSS Project Planets

GsoC 2023 First_post

Planet KDE - Wed, 2023-05-31 16:30

About Me -

I am Utkarsh Kumar, currently pursuing a Bachelor of Technology (B.Tech) degree in Industries Management from the esteemed IIT Kharagpur. I am currently in my third year of study. I am pleased to announce that I have been selected for the Properties Managment Digikam Google Summer of Code program this year. During the program, my focus will be on enhancing the properties management of Digikam.

Digikam Poperties Managment

Currently, users are endeavoring to modify images with a single click, such as altering image tags, author names, and the body text associated with the images. Consequently, I am actively engaged in devising a solution to address this issue. Primarily, my workflow revolves around the properties management of Digikam.</p>

What it’s all about

The majority of the images contain Exif data. In order to handle these images, it is necessary to extract all the metadata and store them in CSV files. To achieve this, a Python script is utilized to efficiently store a substantial amount of image metadata. Following this step, a command is executed to enable batch editing of the metadata, allowing for simultaneous modification of multiple images' metadata.

Digikam is an exceptional application that offers a multitude of advanced features, particularly in the realm of image editing. At present, my primary focus lies in exploring avenues to further enhance its capabilities, aiming to elevate the application to new levels of excellence.


Image editing preview Images have metadata:

#_exif_ifd_pointer #Tags #Artist #color_space #body_serial_num

Adding custom tagsThe editing and modification of image metadata are of paramount importance, with the objective of accommodating user requirements effectively. The primary aim is to ensure a user-friendly experience, enabling users to effortlessly navigate through the process without encountering any challenges or obstacles.

Categories: FLOSS Project Planets

Drupal Association blog: Drupal Association hires Alex Moreno

Planet Drupal - Wed, 2023-05-31 15:32

The Drupal Association is pleased to announce that Alex Moreno joined the team as of May 2023! We are thrilled to bring Alex’s talent and experience in the Drupal Community to the team.

Bringing Alex on continues the Drupal Association’s effort to become more global in our hiring.

Already very well known for his work in Drupal, Alex’s background includes a degree in Software Engineering with a major in Artificial Intelligence. For the past decade, he lived in London, working for big enterprise companies as a consultant and Technical Architect representing Acquia, Pantheon, The BBC, and Capgemini.

Additionally, Alex has a passion for journalism, communication, and marketing, which made him jump into a developer relations role during the past 12 months. At the Drupal Association, Alex works on the engineering team working on various projects. Considering himself a digital nomad, Alex lives between Madrid and his hometown near the Mediterranean Sea in Santa Pola. He enjoys anything related to the sea when in the Mediterranean, including windsurfing, kayaking, and sailing small dinghies.

When in Madrid, he swaps his passion for the sea with CrossFit and weightlifting! Recently, Alex got more into bodybuilding and healthy food habits, which made him lose around 15 kilograms in just a few months! He plans to promote healthy habits and exercise in the Drupal community as the biggest hack to improve productivity.

“Take care of your body. It’s the only place you have to live.” – Jim Rohn

Welcome to the Drupal Association team, Alex!

Categories: FLOSS Project Planets

Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!

Planet Python - Wed, 2023-05-31 11:22

This year’s Board Election Nomination period is opening tomorrow. Current board members want to share what being on the board is like and are making themselves available to answer all your questions about responsibilities, activities and time commitments via online chat. Please come join us on Slack anytime in June to talk with us about being on the PSF board.

Board Election Timeline:

  • Nominations are open, Thursday, June 1st, 2:00 pm UTC
  • Board Director Nomination cut-off: Thursday, June 15, 11:59 pm UTC
  • Voter application cut-off date: Thursday, June 15, 11:59 pm UTC
  • Announce candidates: Friday, June 16th
  • Voting start date: Tuesday, June 20, 12:01 am UTC
  • Voting end date: Friday, June 30, 11:59 pm UTC

Not sure what UTC is for you locally? Check here!

Nominations will be accepted here. (Note: you will need to sign into or create your python.org user account first). Learn more about membership here or if you have questions about membership or nominations please email psf-elections@python.orgIn addition to Slack you are welcome to join the discussion about the PSF Board election on our forum.

Also, you can see your membership record and status on psfmember.org. If you are a voting-eligible member and do not already have a login there, please sign up and then email psf-donations@python.org so we can link your membership to your account.
Categories: FLOSS Project Planets