Feeds

KDE Gear 24.08.2

Planet KDE - Wed, 2024-10-09 20:00

Over 180 individual programs plus dozens of programmer libraries and feature plugins are released simultaneously as part of KDE Gear.

Today they all get new bugfix source releases with updated translations, including:

  • dolphin: Ignore trailing slashes when comparing place URLs (Commit)
  • kate: Fix session restore of tabs/views of untitled documents (Commit, fixes bug #464703, bug #462112 and bug #462523)
  • konsole: Fix a crash when sending OSC 4 (RGB) color outside the 256 range (Commit, fixes bug #494205)

Distro and app store packagers should update their application packages.

Categories: FLOSS Project Planets

Ben Hutchings: FOSS activity in September 2024

Planet Debian - Wed, 2024-10-09 18:57
Categories: FLOSS Project Planets

GNUnet News: GNUnet 0.22.1

GNU Planet! - Wed, 2024-10-09 18:00
GNUnet 0.22.1

This is a bugfix release for gnunet 0.22.0. It addresses some issues in HELLO URI handling and formatting as well as regressions in the DHT subsystem along with other bug fixes.

Links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try https://ftp.gnu.org/gnu/gnunet/

Categories: FLOSS Project Planets

How we passed the AI conundrums

Open Source Initiative - Wed, 2024-10-09 13:01

Some people believe that full unfettered access to all training data is paramount. This group argues that anything less than all the data would compromise the Open Source principles, forever removing full reproducibility of AI systems, transparency, security and other outcomes. We’ve heard them and we’ve provided a solution rooted in decades of Open Source practice.

To have the chance for powerful Open Source AI systems to exist in any domain, the OSI community has incorporated in the Definition this principle: 

An Open Source AI needs to make available three kinds of components: the software used to create the dataset and run the training, the model parameters and the code to run inference, and finally all the data that can be made available legally.

Recognizing that there are four kinds of “data”, each with its own legal frameworks allowing different freedoms of distribution, we bypass what Stephen O’Grady called the “AI conundrums” and give Open Source AI builders a chance to build freedom-respecting alternatives to pretty much any proprietary AI.

Limiting Open source AI only to systems trainable on freely distributable data would relegate Open Source AI to a niche. One of which is that the amount of freely and legally shareable data is a tiny fraction of what is necessary to train powerful systems. Additionally, it’d be excluding Open Source AI from areas where data cannot be shared, like medical or anything dealing with personal or private data. What remains for “Open Source AI” would be tiny. There are abundant motives to reject this limitation.

The fact is, mixing openly distributable and non-distributable data is very similar to a reality we are very familiar with: Open Source software built with proprietary compilers and system libraries.

Is GNU Emacs Open Source software?

I’m sure you’d answer yes (and some of you will say “well, actually it’s free software”) and we’ll all agree. Below is a rough diagram of Emacs built for the GNOME desktop on a modern Linux distribution. Emacs depends on a few system libraries that GNOME provides with OSI-Approved Licenses. The whole stack is Open Source these days and one can distribute Emacs on a disk with all its dependencies without too much legal trouble. Imagine scientists who want to freeze the whole environment of an experiment they made; they could package all the pieces of a system like this without trouble and distribute it all with their paper. No problem here.

Now let’s go back to an age when Linux systems weren’t ready. When Stallman started writing Emacs, there was no GNOME and no Linux, no gcc and no glibc. He thought very early on that in order to have more freedom, he had to create a wedge to allow Emacs to run on proprietary software.

Emacs on the latest Solaris versions would look something like this: some pieces like X11 and Gstreamer are Open Source. Others, like libc and others aren’t. The hypothetical scientists from before couldn’t really freeze their full scientific environment. All they could say in their paper was: “We used Emacs from this CVS version, built with gcc version X with these makefile; tar.gz attached” and make a list of the operating system’s version and libraries versions they used. That’s because they have the right only to distribute Emacs, X11, some libraries and not the rest of Solaris.

Is Emacs on Solaris Open Source? Of course it is, even though the source code for the system libraries are not available.

One more question, Emacs on Mac OS: it can only be built with a proprietary compiler on proprietary GUI and other proprietary libraries.

Is Emacs on Mac Open Source? Of course it is. Can you fully study Emacs on Mac OS? For Emacs, yes. For the MacOS components, no. There are many programs that run only on MacOS or Windows: for OSI, those are Open Source. Would someone argue that they’re not “really Open Source” because you can’t see “everything?” Some people might but we’ve learned to live with that, adding governance rules in addition to those of the Open Source Definition. Debian for example requires that programs are Open Source and support multiple hardware platforms; the ASF graduates only projects that are Open Source and have a diverse community of contributors. If you only want to use Open Source applications running on Open Source stacks, you can decide that! Just as you can decide that your company will only acquire Open Source software whose copyright is owned by multiple entities. 

These are all additional requirements built on top of the base floor set by the Open Source Definition.

For AI, you can do the same: You can say “I will only use Open Source AI built with open data, because I don’t want to trust anything less than that.” A large organization could say “I will buy only Open Source AI that allows me to audit their full dataset, including unshareable data.” You can do all that. Open Source AI is the floor that you can build on, like the OSD.

Bypassing the conundrums

We’ve looked for a solution for almost three years and this is it: Require all the data that is legally shareable, and for the other data provide all the details. It’s exactly what we’ve been doing for Open Source software: 

You developed a text editor for Mac OS but you can’t share the system libraries? Fine, we’ll fork it: give us all the code you can legally share with an OSI-Approved License and we’ll rip the dependencies and “liberate” it to run on GNU. The editor will be slightly different, like code that runs on some ARM+Linux systems behaves differently on Intel+Windows for the different capabilities of the underlying hardware and OS, but it’s still Open Source.

For Open Source AI it’s a similar dance: You can’t legally give us all the data? Fine, we’ll fork it. For example, you made an AI that recognizes bone cancer in humans but the data can’t be shared. We’ll fork it! Tell us exactly how you built the system, how you trained it, share the code you used, and an anonymized sample of the data you used so we can train on our X-ray images. The system will be slightly different but it’s still Open Source AI.

If we want to have broad availability of powerful alternatives to proprietary AI systems that respect the freedoms of users and deployers, we must recognize conditions that make sense for the domain of AI. These examples of proprietary compilers and system libraries used to build Open Source software prove that there is room for similar conditions when talking about Code, Data and Parameters within the definition of Open Source AI.

Categories: FLOSS Research

FSF News: Free Software Foundation to serve on "artificial intelligence" safety consortium

GNU Planet! - Wed, 2024-10-09 10:05
BOSTON (October 8, 2024) -- The Free Software Foundation (FSF) has announced that it is taking part in the US National Institute of Standards and Technology (NIST)'s consortium on the safety of (so-called) artificial intelligence, particularly with reference to "generative" AI systems. The FSF will ensure the free software perspective is adequately represented in these discussions.
Categories: FLOSS Project Planets

Real Python: Build a Contact Book App With Python, Textual, and SQLite

Planet Python - Wed, 2024-10-09 10:00

Building projects is a great way to learn programming and have fun at the same time. When you work on a project, you apply different coding skills simultaneously, which is good practice for what you’ll do in a real-life project. In this tutorial, you’ll create a contact book application with a text-based interface (TUI) based on Python and Textual. To store the contact data, your app will use an SQLite database.

In this tutorial, you’ll learn how to:

  • Create the contact book app’s TUI using Textual
  • Handle the database operations using SQLite
  • Connect the app’s TUI with the database code and make it functional

At the end of this project, you’ll have a functional contact book application that will allow you to store and manage your contact information.

To get the complete source code for the application and the code for every step in this tutorial, click the link below:

Get Your Code: Click here to download the free sample code you’ll use to build a contact book app with Python, Textual, and SQLite.

Demo: A Contact Book Built With Python and Textual

Contact or address books are a widely used type of application. They can be found on phones and computers, allowing users to store and manage contact information for family, friends, coworkers, and so on.

In this tutorial, you’ll code a contact book TUI app with Python, Textual, and SQLite. Here’s a demo of how your contact book will look once you’ve followed all the steps:

Your contact book will provide a basic set of features for this type of application, and you’ll be able to display, add, and remove the information in your contacts list.

Project Overview

To build your contact book app, you’ll organize the code in a few modules under a package. In this tutorial, you’ll use the following directory structure:

rpcontacts_project/ │ ├── rpcontacts/ │ ├── __init__.py │ ├── __main__.py │ ├── database.py │ ├── rpcontacts.tcss │ └── tui.py │ ├── README.md └── requirements.txt

The root directory of your project is rpcontacts_project/. Inside, there’s an rpcontacts/ subdirectory that holds the application’s main package.

You’ll cover the content of each file in this tutorial. The name of each file will give you an idea of its role in the application.

For example, __main__.py will host the application, and database.py will provide database-related code. Similarly, rpcontacts.tcss is a CSS file that will allow you to tweak the visual style of your Textual app. Finally, tui.py will contain the code to generate the app’s TUI, including the main screen and a couple of auxiliary screens or dialogs.

Prerequisites

To get the most out of this project, you should have some previous knowledge of how to lay out a Python project and work with SQLite databases. You should also know the basics of working with Python classes. Some knowledge about writing CSS code would also be a plus.

To satisfy these knowledge requirements, you can take a look at the following resources:

Don’t worry if you don’t have all of the prerequisite knowledge before starting this tutorial—that’s completely okay! You’ll learn through the process of getting your hands dirty as you build the project. If you get stuck, then take some time to review the resources linked above. Then, get back to the code.

The contact book application you’ll build in this tutorial has a single external dependency, which is Textual. This library provides a rapid application development framework that allows you to create apps you can run in your terminal and browser.

To follow best practices in your development process, you can start by creating a virtual environment and then install Textual using pip:

Read the full article at https://realpython.com/contact-book-python-textual/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Stefanie Molin: Mind Your Image Metadata

Planet Python - Wed, 2024-10-09 09:00
Most devices record a variety of metadata when generating images. While some of that information *may* be innocuous, you could end up exposing the GPS coordinates to your home if you aren't careful. In this article, I provide a brief introduction to image metadata, and then show you how to remove it with `exif-stripper`.
Categories: FLOSS Project Planets

CTI Digital: Drupal CMS: A New Era for Non-Technical Users

Planet Drupal - Wed, 2024-10-09 08:55

Drupal CMS (formerly known as Drupal Starshot) is set to revolutionise how non-technical users engage with Drupal, enabling them to get started with just a few clicks. From installing Drupal directly in the browser to swiftly selecting Recipes that tailor the site to their specific needs, along with utilising AI-driven site-building tools, Drupal CMS is transforming the introduction of non-developers to the platform.

Unsurprisingly, Drupal CMS was a hot topic at DrupalCon, where an entire track of talks was dedicated to the various initiatives that constitute one of the most ambitious changes in Drupal's history. Dries Buytaert, the co-founder of Drupal and an influential leader in the open-source community, provided invaluable insights in his keynote presentation, showcasing the latest enhancements in Drupal CMS.

Categories: FLOSS Project Planets

1xINTERNET blog: 1xINTERNET at DrupalCon Barcelona 2024

Planet Drupal - Wed, 2024-10-09 08:00

This year, DrupalCon took place in Barcelona, and we were proud to be Platinum Sponsors. DrupalCon is a vibrant celebration of the Drupal community, bringing together developers, designers, marketers, and business leaders to share ideas, collaborate and innovate. This year we thoroughly enjoyed connecting with other Drupal users and makers.

Categories: FLOSS Project Planets

Cursor Size Problems In Wayland, Explained

Planet KDE - Wed, 2024-10-09 05:36

I've been fixing cursor problems on and off in the last few months. Here's a recap of what I've done, explanation of some cursor size problems you might encounter, and how new developments like Wayland cursor shape protocol and SVG cursors might improve the situation.

(I'm by no means an expert on cursors, X11 or Wayland, so please correct me if I'm wrong.)

Why don't we have cursors in the same size anymore?

My involvement with cursors started back in the end of 2023, when KDE Plasma 6.0 was about to be released. A major change in 6.0 was enabling Wayland by default. And if you enabled global scaling in Wayland, especially with a fractional scale like 2.5x, cursor sizes would be a mess across various apps (the upper row: Breeze cursors in Plasma 6.0 Beta 1, Wayland, 2.5x global scale, the lower row: Same cursors in Plasma 6.0):

So I dug into the code of my favorite terminal emulator, Kitty, which at the time drew the cursor in a slightly smaller size than it should be (similar to vscode in the above image). I gained some understanding of the problem, and eventually fixed it. Let me explain.

How to draw cursors in the same size in different apps?

In X11, there used to be a standard set of cursors, but nowadays most apps use the XCursor lib to load a (user-specified) cursor theme and draw the cursor themselves. So in order to have cursors in the same theme and size across apps, we need to make sure that:

  1. Apps get the same cursor theme and size from the system.
  2. Apps draw the cursor in the same way.

The transition to Wayland created difficulties in both points:

1. Get the same cursor theme and size from the system

It used to be simple in X11: we have Xcursor.size and Xcursor.theme in xrdb, also XCURSOR_SIZE and XCURSOR_THEME in environment variables. Setting them to the same value would make sure that all apps get the same cursor theme and size.

But Wayland apps don't use xrdb, and they interpret XCURSOR_SIZE differently: in X11, the size is in physical pixels, but in Wayland it's in logical pixels. E.g., if you have a cursor size 24 and global scale 2x, then in X11, XCURSOR_SIZE should be 48, but in Wayland it should be 24.

The Wayland way is necessary. Imagine you have two monitors with different DPI, e.g. they are both 24" but monitor A is 1920x1080, while monitor B is 3840x2160. You set scale=1 for A and scale=2 for B, so UI elements would be the same size on both monitors. Then you would also want the cursor to be of the same size on both monitors, which requires it to have 2x more physical pixels on B than on A, but it would be the same logical pixels.

So Plasma 6.0 no longer sets the two environment variables, because XCURSOR_SIZE can't be simultaneously correct for both X11 and Wayland apps. But without them and xrdb, Wayland apps no longer have a standard way to get the cursor theme and size. Instead, different frameworks / toolkits have their own ways. In Plasma, KDE / Qt apps get them from the Qt platform integration plugin provided by Plasma, GTK4 apps from ~/.config/gtk-4.0/settings.ini (also set by Plasma), Flatpak GTK apps from the GTK-specific configs in XDG Settings Portal.

The last one is particularly weird, as you need to install xdg-desktop-portal-gtk in order fix Flatpak apps in Plasma, which surprised many. It might seem like a hack, but it's not. Plasma officially recommends installing xdg-desktop-portal-gtk, and this was suggested by GNOME developers.

But what for 3rd-party Wayland apps besides GTK and Qt? The best hope is to read settings in either the GTK or the Qt way, piggy-backing the two major toolkits, assuming that the DE would at least take care of the two.

(IMHO either Wayland or the XDG Settings Portal should provide a standard way for apps to get the cursor theme and size.)

That was part of the problem in Kitty. It used to read settings from the GTK portal, but only under a GNOME session. I changed it to always try to read from the portal, even if under Plasma. But that's not the end of the story...

2. Draw the cursor in the same way

It's practically a non-issue in X11, as the user usually sets a size that the cursor theme provides, and the app just draws the cursor images as-is. But if you do set a cursor size not available in the theme (you can't do that in the cursor theme settings UI, but you can manually set XCURSOR_SIZE), you'll open a can of worms: various toolkits / apps deal with it differently:

  1. Some just use the closest size available (Electron and Kitty at the time), so it can be a bit smaller.
  2. Some use the XCursor default size 24, so it's a lot smaller.
  3. Some scale the cursor to the desired size, and the scaling algorithm might be different, resulting in pixelated or blurry cursors; Also they might scale from either the default size or the closest size available, resulting in very blurry (GTK) or slightly blurry (Qt) cursors.

The situation becomes worse with Wayland, as the user now specifies the size in logical pixels, then apps need to multiply it by the global scale to get the size in physical pixels, and try to load a cursor in that size. (If the app load the cursor in the logical size, then either the app or the compositor needs to scale it, resulting in a blurry / pixelated cursor.) With fractional scaling, it's even more likely that the required physical size is not available in the theme (which typically has only 2~5 sizes), and you see the result in the picture above.

One way to fix it (and why I didn't do)

It can be fixed by moving the "when we can't load cursors in the size we need, load a different size and scale it" logic from apps / toolkits to the XCursor lib. When the app requests cursors in a size, instead of returning the closest size available, the lib could scale the image to the requested size. So apps would always get the cursor in the size they ask for, and their own different scaling algorithms won't get a chance to run.

Either the default behavior can be changed, or it can be hidden behind a new option. But I didn't do that, because I felt at the time that it would be difficult to either convince XCursor lib maintainers to make a (potentially breaking) change to the default behavior, or to go around convincing all apps / toolkits to use a new option.

My fix (or shall we say workaround)

Then it came to me that although I can't fix all these toolkits / apps, they seem to all work the same way if the required physical size is available in the theme - then they just draw the cursor as-is. So I added a lot of sizes to the Breeze theme. It only has size 24, 36 and 48 at the time, but I added physical sizes corresponding to a logical size 24 and all global scales that Plasma allows, from 0.5x to 3x, So it's 12, 18, 24 ... all the way to 72.

It was easy. The source code of the Breeze theme is SVG (so are most other themes). Then a build script renders it into images using Inkscape, and packages them to XCursor format. The script has a list of the sizes it renders in, so I added a lot more.

And it worked! If you choose Breeze and size 24, then (as in the bottom row in the picture above) various apps draw the cursor in the same size at any global scale available in Plasma.

But this method has its limitations:

  1. We can't do that to 3rd-party themes, as we don't have their source SVG.
  2. It only works if you choose the default size 24. If you choose a different size, e.g. 36, and a global scale 3x, then the physical size 36x3=108 is not available in the theme, and you see the mess again. But we can't add sizes infinitely, as explained in Vlad's blog, the XCursor format stores cursor images uncompressed, so the binary size grows very fast when adding larger sizes.

Both limitations can be lifted with SVG cursors. But before getting to that, let's talk about the "right" way to fix the cursor size problem:

The "right" fix: Wayland cursor shape protocol

The simple and reliable way to get consistent cursors across apps is to not let apps draw the cursor at all. Instead, they only specify the name of the cursor shape, and the compositor draws the cursor for them. This is how Wayland cursor shape protocol works. Apps no longer need to care about the cursor theme and size (well, they might still need the size, if they want to draw custom cursors in the same size as standard shapes), and since the compositor is the only program drawing the cursor, it's guaranteed to be consistent for all apps using the protocol.

(It's quite interesting that we seem to went a full circle back to the original server-defined cursor font way in X11.)

Support for this protocol leaves a lot to improve, though. Not all compositors support it. On the client side, both Qt and Electron have the support, but GTK doesn't.

There are merge requests for GTK and Mutter, but GNOME devs request some modifications in the Wayland protocol before merging them, and the request seems to be stuck for some months. I hope the recent Wayland "things" could move it out of this seemingly deadlock.

Anyway, with this protocol, only the compositor has to be modified to support a new way to draw cursors. This makes it much easier to change how cursors work. So we come to:

SVG cursors

Immediately after the fix in Breeze, I proposed this idea of shipping the source SVG files of the Breeze cursor theme to the end user, and re-generate the XCursor files whenever the user changes the cursor size or global scale. This way, the theme will always be able to provide the exact size requested by apps. (Similar to the "modify XCursor lib" idea, but in a different way.) It would remove the limitation 2 above (and also limitation 1 if 3rd-party themes ship their source SVGs too).

With SVG cursors support in KWin and Breeze, I plan to implement this idea. It would also allow the user to set arbitrary cursor size, instead of limited to a predefined list.

Problems you might still encounter today Huge cursors in GTK4 apps

It's a new problem in GTK 4.16. If you use the Breeze cursor theme and a large global scale like 2x or 3x, you get huge cursors:

It has not limited to Plasma. Using Breeze in GNOME would result in the same problem. To explain it, let me first introduce the concept of "nominal size" and "image size" in XCursor.

Here is GNOME's default cursor theme, Adwaita:

"Nominal size" is the "cursor size" we are talking about above. It makes the list of sizes you choose from in the cursor theme settings UI. It's also the size you set in XCURSOR_SIZE. "Image size" is the actual size of the cursor image. "Hot spot" is the point in the image where the cursor is pointing at.

Things are a bit different in the Plasma default cursor theme, Breeze:

Unlike Adwaita, the image size is larger than the nominal size. That, combined with a global scale, triggers the bug in GTK4. Explanation of the bug.

XCursor allows the image size to be different from the nominal size. I don't know why it was designed this way, but my guess is so you can crop the empty part of the image. This both reduces file size, and reduces flicking when the cursor changes (with software cursors under X11). But the image size can also be larger than the nominal size, and Breeze (and a lot of other themes) uses this feature.

You can see in the above images that the "arrow" of nominal size 24 in Breeze is actually similar in size to the same nominal size in Adwaita. But the "badge" in Breeze is further apart, so it can't fit into a 24x24 image. That's why Breeze is built this way. In a sense, "nominal size" is similar to how "font size" works, where it resembles the "main part" of a character in the font, but some characters can have "extra parts" that go through the ceiling or floor.

This problem is already fixed in the main branch of GTK 4, but it's not backported to 4.16 yet, probably because the fix uses a Wayland feature that Mutter doesn't support yet. So at the moment, your only option is to use a different cursor theme whose "nominal size" and "image size" are equal.

Smaller cursors in GTK3 apps (most notably, Firefox)

The cursor code in GTK3 is different from GTK4, with its own limitations. You might find the cursor to be smaller than in other apps, and if you run the app in a terminal, you might see warnings like:

cursor image size (64x64) not an integer multiple of scale (3)

GTK3 doesn't support fractional scales in cursors. So if you have cursor size 24 and global scale 2.5x or 3x, it will use a scale 3x and try to load a cursor with a nominal size 24x3=72. And it requires the image size to be an integer multiple of the scale. So if your theme doesn't have a size 72, or it does but the image size is not multiple of 3, GTK3 fallbacks to a smaller unscaled cursor.

End words

OK, this is a long post. Hope I can bring you more cursor goodies in Plasma 6.3 and beyond.

Categories: FLOSS Project Planets

PyCharm: Where To Get Data for Your Data Science Projects

Planet Python - Wed, 2024-10-09 05:00

Whether you’re starting a new project or expanding an existing one, as a data scientist, you’re always on the lookout for new material to explore. Knowing where to get data for data science projects can be challenging, and finding “good data” can be even more difficult. In this article, we’ll look at what makes “good data”, what format that data might be in, where to find it, and what the next steps are. 

What is “good data” for data science projects?

Firstly, we should consider how relevant the dataset is to our work. You can stumble upon lots of datasets that overlap with your work in some way, but it can be difficult to decide which is the best one for you to put your effort into. In this scenario, we’ll briefly explore some of the attributes of the data. 

To start with, how consistent is the dataset? Specifically, are there any missing values? Data might be missing for a variety of acceptable reasons, but it can also be a sign of selection bias or other factors that might skew your results. Often, we can choose to either accept missing data or delete the records that contain it before we do our analysis, but knowing about missing data early in the process can help you make an informed decision to use that dataset or not. 

Along with missing data, it’s worth checking to see if any of the data is duplicated. Duplicated data might be fine, but it might also signify a lack of consistency that could skew your results. Duplicated data might also reduce your confidence in the dataset as a whole, so it’s important to consider when choosing your dataset. 

Another aspect to consider for good data is timeliness. The time over which the data was gathered is usually pertinent to the questions you want to answer when you start analyzing it. Checking if the data was collected in the timespan that you’re interested in and considering the continuity of that timespan is helpful. 

When you’re starting your journey into data science and picking your first few datasets to play with, you don’t need to worry about picking the perfect dataset – focus on the process and exploring instead. When you’re ready to learn more about datasets and how to avoid common pitfalls, I recommend you watch this talk from Dr. Jodie Burchell – Garbage data in, garbage models out.

Do you want structured or unstructured data?

Structured data is what you’ll find in a table where each row is an observation, and each column is a variable or field. By contrast, unstructured data usually needs to be pre-processed before you can work with it in a data science project, or it can be used by specialist models that can process it internally. Examples of unstructured data include text, images, and sound. 

As you might have guessed, unstructured data is used more in advanced and specialized subfields in data science, like natural language processing and computer vision. Most data scientists start with, and continue working with, structured data for many of their projects. I recommend that this is where you start, too.

I recommend you keep the notion of structured and unstructured data in mind as we explore standard data formats.

What are standard data formats?

In addition to the quality of the data, we also have to choose between available data formats. You’ll come across two broad types of data formats as a data scientist: downloadable data (often CSV) and databases. 

Downloadable data is nearly always structured data and often takes the form of comma-separated value (CSV) files. These downloads are available from various online repositories. They are among some of the most prolific and most accessible sources of data. If you’re new to data exploration, this is the best place to get started, as they’re easy to find, human-readable, and easy to work with without any extra steps. 

If you’re ready to enter the world of databases, it’s worth understanding that they are further subdivided into relational (SQL) and non-relational (non-SQL) databases. As a broad rule, relational databases contain structured data and non-relational databases contain non-structured data, but determining whether data is structured is not an exact science. Instead, think of non-relational databases as being adaptable to the shape of the data they are storing. 

Databases are commonly used in the following cases: when you have large datasets, when multiple people need to access and modify the data simultaneously, when datasets need to be able to scale, and when data is unstructured (non-SQL only). In addition, if you’re commissioned to do data analysis for your company, you may find that you’re given a database to work with as it’s already in-house. 

PyCharm Professional has excellent support for SQL and non-SQL databases. If your work involves using various databases and writing SQL queries, you can check out our webinar on Visual SQL Development with PyCharm to get more information about the functionality. Alternatively, you can learn how to explore tables without writing a single line of SQL with PyCharm and import your dataset into PyCharm and explore it

Try PyCharm Professional for free

Where can I find datasets for my data science projects?

Once you’re ready to find out how to get data, there are plenty of resources you can download to use for your data science project. This is not an endless list, but it’s a good place to start and a natural progression for your data science journey.

UCI Machine Learning Repository

The UCI Machine Learning Repository has over 600 datasets covering a host of exciting topics for you to explore, such as biology, health, physics, and climate. UCI datasets also have a diverse set of data types, including images, sequential, and time series. I recommend looking at a few different datasets and types of data if you’re new to data science, as it will help you expand your understanding of what data often looks like. 

Kaggle

Another well-known website for datasets is Kaggle. Not only can you sign up to Kaggle to download datasets for data science projects, but it also has a large community of like-minded people who run company-sponsored competitions designed to help you develop your data science skills. If you’re looking for a famous dataset that you’ve seen used in numerous examples, you’ll almost certainly find it hosted on Kaggle.

Hugging Face

Hugging Face is another resource that is rich in datasets. You can filter the results by modalities, including audio, geospatial, and video, and provide a range for the size of your dataset, which can be particularly helpful when you want to start small. Hugging Face has many natural language and computer vision datasets, so you might want to head over there once you’re past the basics and interested in more specialized fields.

Many more

There are many more places that you can go on your data science journey to find fun datasets to explore. You can check out GitHub for curated open source datasets, FiveThirtyEight for datasets relating to American politics and sports, and lastly, one of my favorites, the UK government, to get datasets relating to public services and the economy in the UK. 

What are the next steps?

Congratulations! You’ve gained a better understanding of what “good data” is, and you know where to look to find datasets for data science projects. Once you’ve chosen a dataset, you’re ready to start preparing and analyzing your data

Remember, you can use Jupyter notebooks inside PyCharm to explore both file format and database datasets

You can read or watch a video showing just some of the ways you can use Jupyter notebooks inside PyCharm to boost your productivity on your data science journey with your chosen dataset. 

Try PyCharm Professional for free

Categories: FLOSS Project Planets

Talk Python to Me: #480: Ahoy, Narwhals are bridging the data science APIs

Planet Python - Wed, 2024-10-09 04:00
If you work in data science, you definitely know about data frame libraries. Pandas is certainly the most popular, but there are others such as cuDF, Modin, Polars, Dask, and more. They are all similar but definitely not the same APIs and Polars is quite different. But here's the problem. If you want to write a library that is for users of more than one of these data frame frameworks, how do you do that? Or if you want to leave open the possibility of changing yours after the app is built, same problem. That's the problem that Narwhals solves. We have Marco Gorelli on the show to tell us all about it.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/workos'>WorkOS</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Marco Gorelli</b>: <a href="https://fosstodon.org/@marcogorelli" target="_blank" >@marcogorelli</a><br/> <b>Marco on LinkedIn</b>: <a href="https://www.linkedin.com/in/marcogorelli/?featured_on=talkpython" target="_blank" >linkedin.com</a><br/> <b>Narwhals</b>: <a href="https://narwhals-dev.github.io/narwhals/?featured_on=talkpython" target="_blank" >github.io</a><br/> <b>Narwhals on Github</b>: <a href="https://github.com/narwhals-dev/narwhals?featured_on=talkpython" target="_blank" >github.com</a><br/> <br/> <b>DuckDB</b>: <a href="https://duckdb.org?featured_on=talkpython" target="_blank" >duckdb.org</a><br/> <b>Ibis</b>: <a href="https://ibis-project.org?featured_on=talkpython" target="_blank" >ibis-project.org</a><br/> <b>modin</b>: <a href="https://modin.readthedocs.io/en/stable/?featured_on=talkpython" target="_blank" >readthedocs.io</a><br/> <b>Pandas and Beyond with Wes McKinney</b>: <a href="https://talkpython.fm/episodes/show/462/pandas-and-beyond-with-wes-mckinney" target="_blank" >talkpython.fm</a><br/> <b>Polars: A Lightning-fast DataFrame for Python</b>: <a href="https://talkpython.fm/episodes/show/402/polars-a-lightning-fast-dataframe-for-python-updated-audio" target="_blank" >talkpython.fm</a><br/> <b>Polars</b>: <a href="https://pola.rs?featured_on=talkpython" target="_blank" >pola.rs</a><br/> <b>Pandas</b>: <a href="https://pandas.pydata.org?featured_on=talkpython" target="_blank" >pandas.pydata.org</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=FSH7BZ0tuE0" target="_blank" >youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/480/ahoy-narwhals-are-bridging-the-data-science-apis" target="_blank" >talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Droptica: Drupal 7 to the latest version migration – Droptica’s top recommended enhancements

Planet Drupal - Wed, 2024-10-09 02:09

Upgrading from Drupal 7 to the latest version opens up a range of benefits, allowing you to leverage a modern CMS. By enhancing areas like content structure, SEO, and security during migration, you can maximize the impact of your investment. But, without Drupal expertise, deciding what to change and improve can be overwhelming during the migration process. Our free checklist, built on Droptica’s experience with clients, helps you explore common migration improvements and decide which ones fit your needs.

Categories: FLOSS Project Planets

Capellic: Migrating critical keys from Lockr to Pantheon Secrets

Planet Drupal - Wed, 2024-10-09 00:00
Lockr is shutting down. Don't panic!
Categories: FLOSS Project Planets

Django Weblog: Why Django supports the Open Source Pledge

Planet Python - Tue, 2024-10-08 23:26

We at the Django Software Foundation are pleased to share that Sentry, alongside other partners, has launched the Open Source Pledge — an initiative designed to address sustainability challenges in open source.

The Open Source Pledge is a commitment for member companies to pay OSS maintainers meaningfully for their work. When maintainers are adequately supported, they can better sustain their projects, ensuring the growth, stability, and security of the broader ecosystem.

The sustainability challenge in the Django community

In our community and OSS at large, the challenge is real and significant. Django packages are often maintained by small teams or even individuals, often unpaid. As the demands on these projects grow, so too does the pressure on the maintainers. And without financial support, maintainers often move on without a clear succession plan. The potential failure of these projects not only impacts the developers involved but also the thousands of companies and millions of users who rely on these critical pieces of infrastructure.

Here are a few assorted examples from Django packages in the top 10 by download counts:

The case for joining the pledge

The Open Source Pledge is simple but impactful: member companies commit a minimum of $2,000 per year, per developer on staff, to support open source maintainers. Additionally, companies are encouraged to publish an annual report detailing their payments, creating transparency and accountability within the community.

We encourage companies of all sizes to join the Pledge and contribute to the sustainability of the software we all depend on. By making a financial commitment, you are not just supporting maintainers—you are investing in the stability, security, and growth of the entire tech ecosystem.

If you're interested in joining the Open Source Pledge or learning more about the sustainability issues facing OSS, please visit the initiative’s page. Together, we can build a stronger, more sustainable open source future. And if you believe in this cause, we encourage you to share this post to help broaden awareness and inspire further commitments from peers and partners.

Categories: FLOSS Project Planets

KPhotoAlbum 5.13.0 released

Planet KDE - Tue, 2024-10-08 20:00

After almost a year, we’re very pleased to announce a new release of KPhotoAlbum, the Linux/KDE photo management software!

There are two new features/changes:

  • The “time ago”/birthday/age calculation has been reworked. Timespans should now be displayed in a nicer (more natural) way. Also, the age of people born on February 29 is now calculated correctly.
  • The ‘--db’ command line argument now rejects any file name that is not either an existing directory or an index.xml file within an existing directory (cf. Bug #418647).

Apart from that, quite a number of bugs have been fixed (cf. the ChangeLog for more info): #477529, #477530, #477531, #477532, #478944, #479483, #481181, #483266, #444744 and #493849. And on top some bugs that weren’t reported as a bug in the first place :-)

One additional change that should be mostly interesting for the distributors is: The key used for signing the release has been updated. All PGP keys used to sign KDE software releases can be found in the sysadmin/release-keyring repo. My currently used key that I used to sign the tarball can also be found there, cf. tleupold@key2.asc.

… and what about Qt 6?!

Fear not! Of course, there will be a Qt6/KF6 release of KPhotoAlbum. We currently have a working Qt6/KF6 branch, so most of the porting is already done. Last thing that’s missing is a Qt6/KF6 release of Marble, which we use to display maps for geographic coordinates in photos (preferrably stored there using KGeoTag ;-). It seems like there will be such a release towards the end of the year. We will get KPhotoAlbum ready for Qt6/KF6 shortly afterwards. Stay tuned!

According to git log, the following individuals contributed commits since the last release:

  • Boudhayan Bhattacharya
  • Oliver Kellogg
  • Tobias Leupold
  • Randall Rude
  • Johannes Zarl-Zierl

Have a lot of fun with KPhotoAlbum 5.13.0 :-)

— Tobias

Categories: FLOSS Project Planets

Drupal Core News: Coding standards proposals for final discussion on 23 October 2024

Planet Drupal - Tue, 2024-10-08 18:55

The Technical Working Group (TWG) is announcing one coding standards change for final discussion. Feedback will be reviewed at the meeting scheduled for 23 October 2024 UTC.

Issues for discussion

The Coding Standards project page outlines the process for changing Drupal coding standards.

Join the team working on Coding Standards

Join #coding-standards in Drupal Slack to meet and work with others on improving the Drupal coding standards. We work on improving our standards as well as implementing them in the core software.

Categories: FLOSS Project Planets

Thorsten Alteholz: My Debian Activities in September 2024

Planet Debian - Tue, 2024-10-08 17:49
FTP master

This month I accepted 441 and rejected 29 packages. The overall number of packages that got accepted was 448.

I couldn’t believe my eyes, but this month I really accepted the same number of packages as last month.

Debian LTS

This was my hundred-twenty-third month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:

  • [unstable] libcupsfilters security update to fix one CVE related to validation of IPP attributes obtained from remote printers
  • [unstable] cups-filters security update to fix two CVEs related to validation of IPP attributes obtained from remote printers
  • [unstable] cups security update to fix one CVE related to validation of IPP attributes obtained from remote printers
  • [DSA 5778-1] prepared package for cups-filters security update to fix two CVEs related to validation of IPP attributes obtained from remote printers
  • [DSA 5779-1] prepared package for cups security update to fix one CVE related to validation of IPP attributes obtained from remote printers
  • [DLA 3905-1] cups-filters security update to fix two CVEs related to validation of IPP attributes obtained from remote printers
  • [DLA 3904-1] cups security update to fix one CVE related to validation of IPP attributes obtained from remote printers
  • [DLA 3905-1] cups-filters security update to fix two CVEs related to validation of IPP attributes obtained from remote printers

Despite the announcement the package libppd in Debian is not affected by the CVEs related to CUPS. By pure chance there is an unrelated package with the same name in Debian. I also answered some question about the CUPS related uploads. Due to the CUPS issues, I postponed my work on other packages to October.

Last but not least I did a week of FD this month and attended the monthly LTS/ELTS meeting.

Debian ELTS

This month was the seventy-fourth ELTS month. During my allocated time I uploaded or worked on:

  • [ELA-1186-1]cups-filters security update for two CVEs in Stretch and Buster to fix the IPP attribute related CVEs.
  • [ELA-1187-1]cups-filters security update for one CVE in Jessie to fix the IPP attribute related CVEs (the version in Jessie was not affected by the other CVE).

I also started to work on updates for cups in Buster, Stretch and Jessie, but their uploads will happen only in October.

I also did a week of FD and attended the monthly LTS/ELTS meeting.

Debian Printing

This month I uploaded …

  • libcupsfilters to also fix a dependency and autopkgtest issue besides the security fix mentioned above.
  • splix for a new upstream version. This package is managed now by OpenPrinting.

Last but not least I tried to prepare an update for hplip. Unfortunately this is a nerve-stretching task and I need some more time.

This work is generously funded by Freexian!

Debian Matomo

This month I even found some time to upload packages that are dependencies of Matomo …

This work is generously funded by Freexian!

Debian Astro

This month I uploaded a new upstream or bugfix version of:

Most of the uploads were related to package migration to testing. As some of them are in non-free or contrib, one has to build all binary versions. From my point of view handling packages in non-free or contrib could be very much improved, but well, they are not part of Debian …

Anyway, starting in December there is an Outreachy project that takes care of automatic updates of these packages. So hopefully it will be much easier to keep those package up to date. I will keep you informed.

Debian IoT

This month I uploaded new upstream or bugfix versions of:

Debian Mobcom

This month I did source uploads of all the packages that were prepared last month by Nathan and started the transition. It went rather smooth except for a few packages where the new version did not propagate to the tracker and they got stuck in old failing autopkgtest. Anyway, in the end all packages migrated to testing.

I also uploaded new upstream releases or fixed bugs in:

misc

This month I uploaded new upstream or bugfix versions of:

Most of those uploads were needed to help packages to migrate to testing.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #650 (Oct. 8, 2024)

Planet Python - Tue, 2024-10-08 15:30

#650 – OCTOBER 8, 2024
View in Browser »

Differences Between Python’s Mutable and Immutable Types

In this video course, you’ll learn how Python mutable and immutable data types work internally and how you can take advantage of mutability or immutability to power your code.
REAL PYTHON course

DuckDB in the Browser With Pyodide

Learn how to run DuckDB in an in-browser Python environment to enable simple querying on remote files, interactive documentation, and easy to use training materials.
ALEX MONAHAN

Speech-to-Text With Django

Looking to add new functionality to your Django app? Learn how to integrate Speech-to-Text and build a working app that transcribes audio files—with 100+ free hours to get started →
ASSEMBLY AI sponsor

Free Threaded Python With Asyncio

This post talks about combining the new experimental free threading feature of Python 3.13 with Asyncio.
CHANGS.CO.UK • Shared by Jamie Chang

Python 3.13.0 Released

PYTHON.ORG

Python 3.12.7 Released

CPYTHON DEV BLOG

Quiz: Python 3.13: Cool New Features for You to Try

REAL PYTHON

Quiz: Syntactic Sugar: Why Python Is Sweet and Pythonic

REAL PYTHON

Articles & Tutorials Switching From virtualenvwrapper to direnv

Earlier this week Trey considered whether to switch from virtualenvwrapper to using local .venv managed by direnv. He then also started experimenting with uv and Starship. This post explains why and his new configuration.
TREY HUNNER

Ensuring a Block Is Overridden in a Django Template

Some template blocks are meant to be overloaded and forgetting to do so results in rendering bugs. This post talks about creating a new tag that throws an exception which alerts your tests if you forget to overload.
TOM CARRICK

Build Your Own AI Assistant with Edge AI

Simplify workloads and elevate customer service. Build customized AI assistants that respond to voice prompts with powerful language and comprehension capabilities - all based on your unique needs with Intel’s OpenVINO toolkit.
INTEL CORPORATION sponsor

Arrange, Act, and Assert Pattern in Testing

Learn what the Arrange, Act, and Assert (AAA) pattern is, how it works, the benefits it offers, and its role in unit test automation. Note: sample code is not in Python, but the concepts apply to all unit testing.
ANTONELLO ZANINI

How I Prepare a Technical Talk

This article outlines the system that Rodrigo uses to prepare his Python talks. Steal his ideas and suggestions so that you, too, can start giving talks at your local meetups and at PyCons all over the world.
MATHSPP.COM • Shared by Rodrigo

Implementing a Python Singleton With Decorators

A singleton pattern is one where only one instance of an object type is allowed at a time. One way to implement this concept is through the use of a decorator. This post teaches you how.
PIETER CLAERHOUT

PEP 759: External Wheel Hosting

This Python Enhancement Proposal specifies a mechanism by which projects hosted on pypi.org can safely host wheel artifacts on external sites other than PyPI.
PYTHON.ORG

Let’s Go Easy on PyPI, OK?

The use of containers can mean a lot of calls to PyPI. This post talks about caching properly to reduce the load on our shared community servers.
MICHAEL KENNEDY

Rotating Turn Order With deque

If you need to cycle through values, one way to do that is with deque. This post shows you through an example service for a game engine.
JUHA-MATTI SANTALA

Python 3.13: JIT and GIL Went Up the Hill

All you need to know about the latest Python release’s changes to the Global Interpreter Lock and Just-in-Time compilation.
DREW SILCOCK

A Dinosaur Learns poetry

“Not a real dinosaur and not real poetry.” This post is about Paul changing how what tools he uses for his Python setup.
PAUL COCHRANE

Projects & Code pytest-xflaky: A Flaky-Test Hunter Plugin

GITHUB.COM/TESORIO • Shared by Caio Ariede

foc: A Collection of Python Functions for Somebody’s Sanity

GITHUB.COM/THYEEM

spiderweb: A Small Web Framework

GITHUB.COM/ITSTHEJOKER

pipefunc: DAGs for Scientific Workflows

GITHUB.COM/PIPEFUNC • Shared by Bas Nijholt

django-unique-user-email: Emial Logins With User Model

GITHUB.COM/CARLTONGIBSON

Events Weekly Real Python Office Hours Q&A (Virtual)

October 9, 2024
REALPYTHON.COM

PyCon Uganda 2024

October 9 to October 14, 2024
PYCON.ORG

PyCon NL 2024

October 10 to October 11, 2024
PYCON.ORG

Python Atlanta

October 10 to October 11, 2024
MEETUP.COM

DFW Pythoneers 2nd Saturday Teaching Meeting

October 12, 2024
MEETUP.COM

PyCon MEA & Data Science 2024

October 14 to October 16, 2024
GLOBALDEVSLAM.COM

Python Brasil 2024

October 16 to October 21, 2024
PYTHONBRASIL.ORG.BR

PyCon Panamá 2024

October 16 to October 19, 2024
PYCON.PA

Swiss Python Summit 2024

October 17 to October 19, 2024
PYTHON-SUMMIT.CH

Happy Pythoning!
This was PyCoder’s Weekly Issue #650.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Smartbees: How to Add Google Maps in Drupal (Best Modules)

Planet Drupal - Tue, 2024-10-08 14:46

Maps on websites are a useful addition, allowing users to reach, for example, your company headquarters in a simple way. Depending on the requirements, you can implement them in many different ways. This article will show you how to embed Google Maps in a Drupal website.

Categories: FLOSS Project Planets

Pages