Feeds

Mario Hernandez: Migrating your Drupal theme from Patternlab to Storybook

Planet Drupal - Thu, 2024-09-12 00:48

Building a custom Drupal theme nowadays is a more complex process than it used to be. Most themes require some kind of build tool such as Gulp, Grunt, Webpack or others to automate many of the repeatitive tasks we perform when working on the front-end. Tasks like compiling and minifying code, compressing images, linting code, and many more. As Atomic Web Design became a thing, things got more complicated because now if you are building components you need a styleguide or Design System to showcase and maintain those components. One of those design systems for me has been Patternlab. I started using Patternlab in all my Drupal projects almost ten years ago with great success. In addition, Patternlab has been the design system of choice at my place of work but one of my immediate tasks was to work on migrating to a different design system. We have a small team but were very excited about the challenge of finding and using a more modern and robust design system for our large multi-site Drupal environment.

Enter Storybook

After looking a various options for a design system, Storybook seemed to be the right choice for us for a couple of reasons: one, it has been around for about 10 years and during this time it has matured significantly, and two, it has become a very popular option in the Drupal ecosystem. In some ways, Storybook follows the same model as Drupal, it has a pretty active community and a very healthy ecosystem of plugins to extend its core functionality.

Storybook looks very promising as a design system for Drupal projects and with the recent release of Single Directory Components or SDC, and the new Storybook module, we think things can only get better for Drupal front-end development. Unfortunately for us, technical limitations in combination with our specific requirements, prevented us from using SDC or the Storybook module. Instead, we built our environment from scratch with a stand-alone integration of Storybook 8.

INFO: At the time of our implementation, TwigJS did not have the capability to resolve SDC's namespace. It appears this has been addressed and using SDC should now be possible with this custom setup. I haven't personally tried it and therefore I can't confirm. Our process and requirements

In choosing Storybook, we went through a rigorous research and testing process to ensure it will not only solve our immediate problems with our current environment, but it will be around as a long term solution. As part of this process, we also tested several available options like Emulsify and Gesso which would be great options for anyone looking for a ready-to-go system out of the box. Some of our requirements included:

1. No components refactoring

The first and non-negotiable requirement was to be able to migrate components from Patternlab to a new design system with the least amount of refactoring as possible. We have a decent amount of components which have been built within the last year and the last thing we wanted was to have to rebuild them again because we are switching design system.

2. A new Front-end build workflow

I personally have been faithful to Gulp as a front-end build tool for as long as I can remember because it did everything I needed done in a very efficient manner. The Drupal project we maintain also used Gulp, but as part of this migration, we wanted to see what other options were out there that could improve our workflow. The obvious choice seemed to be Webpack, but as we looked closer into this we learned about ViteJS, "The Next Genration Frontend Tooling". Vite delivers on its promise of being "blazing fast", and its ecosystem is great and growing, so we went with it.

3. No more Sass in favor of PostCSS

CSS has drastically improved in recent years. It is now possible with plain CSS, to do many of the things you used to be able to only do with Sass or similar CSS Preprocessor. Eliminating Sass from our workflow meant we would also be able to get rid of many other node dependencies related to Sass. The goal for this project was to use plain CSS in combination with PostCSS and one bonus of using Vite is that Vite offers PostCSS processing out of the box without additional plugins or dependencies. Ofcourse if you want to do more advance PostCSS processing you will probably need some external dependencies.

Building a new Drupal theme with Storybook

Let's go over the steps to building the base of your new Drupal theme with ViteJS and Storybook. This will be at a high-level to callout only the most important and Drupal-related parts. This process will create a brand new theme. If you already have a theme you would like to use, make the appropriate changes to the instructions.

1. Setup Storybook with ViteJS ViteJS

In your Drupal project, navigate to the theme's directory (i.e. /web/themes/custom/)
Run the following command:

npm create vite@latest storybook

When prompted, select the framework of your choice, for us the framework is React.
When prompted, select the variant for your project, for us this is JavaScript

After the setup finishes you will have a basic Vite project running.

Storybook

Be sure your system is running NodeJS version 18 or higher
Inside the newly created theme, run this command:

npx storybook@latest init --type react

After installation completes, you will have a new Storybook instance running
If Storybook didn't start on its own, start it by running:

npm run storybook TwigJS

Twig templates are server-side templates which are normally rendered with TwigPHP to HTML by Drupal, but Storybook is a JS tool. TwigJS is the JS-equivalent of TwigPHP so that Storybook understands Twig. Let's install all dependencies needed for Storybook to work with Twig.

If Storybook is still running, press Ctrl + C to stop it
Then run the following command:

npm i -D vite-plugin-twig-drupal html-react-parser twig-drupal-filters @modyfi/vite-plugin-yaml

vite-plugin-twig-drupal: If you are using Vite like we are, this is a Vite plugin that handles transforming twig files into a Javascript function that can be used with Storybook. This plugin includes the following:
- Twig or TwigJS: This is the JavaScript implementation of the Twig PHP templating language. This allows Storybook to understand Twig.
  Note: TwigJS may not always be in sync with the version of Twig PHP in Drupal and you may run into issues when using certain Twig functions or filters, however, we are adding other extensions that may help with the incompatability issues.
- drupal attribute: Adds the ability to work with Drupal attributes.
twig-drupal-filters: TwigJS implementation of Twig functions and filters.
html-react-parser: This extension is key for Storybook to parse HTML code into react elements.
@modifi/vite-plugin-yaml: Transforms a YAML file into a JS object. This is useful for passing the component's data to React as args.

ViteJS configuration

Update your vite.config.js so it makes use of the new extensions we just installed as well as configuring the namesapces for our components.

import { defineConfig } from "vite" import yml from '@modyfi/vite-plugin-yaml'; import twig from 'vite-plugin-twig-drupal'; import { join } from "node:path" export default defineConfig({ plugins: [ twig({ namespaces: { components: join(__dirname, "./src/components"), // Other namespaces maybe be added. }, }), // Allows Storybook to read data from YAML files. yml(), ], }) Storybook configuration

Out of the box, Storybook comes with main.js and preview.js inside the .storybook directory. These two files is where a lot of Storybook's configuration is done. We are going to define the location of our components, same location as we did in vite.config.js above (we'll create this directory shortly). We are also going to do a quick config inside preview.js for handling drupal filters.

Inside .storybook/main.js file, update the stories array as follows:

stories: [ "../src/components/**/*.mdx", "../src/components/**/*.stories.@(js|jsx|mjs|ts|tsx)", ],

Inside .storybook/preview.js, update it as follows:

/** @type { import('@storybook/react').Preview } */ import Twig from 'twig'; import drupalFilters from 'twig-drupal-filters'; function setupFilters(twig) { twig.cache(); drupalFilters(twig); return twig; } setupFilters(Twig); const preview = { parameters: { controls: { matchers: { color: /(background|color)$/i, date: /Date$/i, }, }, }, }; export default preview; Creating the components directory

If Storybook is still running, press Ctrl + C to stop it
Inside the src directory, create the components directory. Alternatively, you could rename the existing stories directory to components.

Creating your first component

With the current system in place we can start building components. We'll start with a very simple component to try things out first.

Inside src/components, create a new directory called title
Inside the title directory, create the following files: title.yml and title.twig

Writing the code

Inside title.yml, add the following:

--- level: 2 modifier: 'title' text: 'Welcome to your new Drupal theme with Storybook!' url: 'https://mariohernandez.io'

Inside title.twig, add the following:

<h{{ level|default(2) }}{% if modifier %} class="{{ modifier }}"{% endif %}> {% if url %} <a href="{{ url }}">{{ text }}</a> {% else %} <span>{{ text }}</span> {% endif %} </h{{ level|default(2) }}>

We have a simple title component that will print a title of anything you want. The level key allows us to change the heading level of the title (i.e. h1, h2, h3, etc.), and the modifier key allows us to pass a modifier class to the component, and the url will be helpful when our title needs to be a link to another page or component.

Currently the title component is not available in storybook. Storybook uses a special file to display each component as a story, the file name is component-name.stories.jsx.

Inside title create a file called title.stories.jsx
Inside the stories file, add the following:

/** * First we import the `html-react-parser` extension to be able to * parse HTML into react. */ import parse from 'html-react-parser'; /** * Next we import the component's markup and logic (twig), data schema (yml), * as well as any styles or JS the component may use. */ import title from './title.twig'; import data from './title.yml'; /** * Next we define a default configuration for the component to use. * These settings will be inherited by all stories of the component, * shall the component have multiple variations. * `component` is an arbitrary name assigned to the default configuration. * `title` determines the location and name of the story in Storybook's sidebar. * `render` uses the parser extension to render the component's html to react. * `args` uses the variables defined in title.yml as react arguments. */ const component = { title: 'Components/Title', render: (args) => parse(title(args)), args: { ...data }, }; /** * Export the Title and render it in Storybook as a Story. * The `name` key allows you to assign a name to each story of the component. * For example: `Title`, `Title dark`, `Title light`, etc. */ export const TitleElement = { name: 'Title', }; /** * Finally export the default object, `component`. Storybook/React requires this step. */ export default component;

If Storybook is running you should see the title story. See example below:
Otherwise start Storybook by running:

npm run storybook

With Storybook running, the title component should look like the image below:

The controls highlighted at the bottom of the title allow you to change the values of each of the fields for the title.

I wanted to start with the simplest of components, the title, to show how Storybook, with help from the extensions we installed, understands Twig. The good news is that the same approach we took with the title component works on even more complex components. Even the React code we wrote does not change much on large components.

In the next blog post, we will build more components that nest smaller components, and we will also add Drupal related parts and configuration to our theme so we can begin using the theme in a Drupal site. Finally, we will integrate the components we built in Storybook with Drupal so our content can be rendered using the component we're building. Stay tuned. For now, if you want to grab a copy of all the code in this post, you can do so below.

Download the code

Resources

In closing

Getting to this point was a team effort and I'd like to thank Chaz Chumley, a Senior Software Engineer, who did a lot of the configuration discussed in this post. In addition, I am thankful to the Emulsify and Gesso teams for letting us pick their brains during our research. Their help was critical in this process.

I hope this was helpful and if there is anything I can help you with in your journey of a Storybook-friendly Drupal theme, feel free to reach out.

Categories: FLOSS Project Planets

Mario Hernandez: Responsive images in Drupal - a series

Planet Drupal - Thu, 2024-09-12 00:48

Images are an essential part of a website. They enhance the appeal of the site and make the user experience a more pleasant one. The challenge is finding the balance between enhancing the look of your website through the use of images and not jeopardizing performance. In this guide, we'll dig deep into how to find that balance by going over knowledge, techniques and practices that will provide you with a solid understanding of the best way to serve images to your visitors using the latest technologies and taking advantage of the advances of web browsers in recent years.

Hi, I hope you are ready to dig into responsive images. This is a seven-part guide that will cover everything you need to know about responsive images and how to manage them in a Drupal site. Although the excercises in this guide are Drupal-specific, the core principles of responsive images apply to any platform you use to build your sites.

Where do we start?

Choosing Drupal as your CMS is a great place to start. Drupal has always been ahead of the game when it comes to managing images by providing features such as image compression, image styles, responsive images styles and media library to mention a few. All these features, and more, come out of the box in Drupal. In fact, most of what we will cover in this guide will be solely out of the box Drupal features. We may touch on third party or contrib techniques or tools but only to let you know what's available not as a hard requirement for managing images in Drupal.

It is important to become well-versed with the tools available in Drupal for managing images. Only then you will be able to make the most of those tools. Don't worry though, this guide will provide you with a lot of knowledge about all the pieces that take part in building a solid system for managing and serving responsive images.

Let's start by breaking down the topics this guide will cover:

What are responsive images?

A responsive image is one whose dimensions adjust to changes in screen resolutions. The concept of responsive images is one that developers and designers have been strugling with ever since Ethan Marcotte published his famous blog post, Responsive Web Design, back in 2010 followed by his book of the same title. The concept itself is pretty straight forward, serve the right image to any device type based on various factors such as screen resolution, internet speed, device orientation, viewport size, and others. The technique for achieving this concept is not as easy. I can honestly say that over 10 years after reponsive images were introduced, we are still trying to figure out the best way to render images that are responsive. Read more about responsive images.

So if the concept of responsive images is so simple, why don't we have one standard for effectively implementing it? Well, images are complicated. They bring with them all sorts of issues that can negatively impact a website if not properly handled. Some of these issues include: Resolution, file size or weight, file type, bandwidth demands, browser support, and more.

Some of these issues have been resolved by fast internet speeds available nowadays, better browser support for file tyes such as webp, as well as excellent image compression technologies. However, there are still some issues that will probably never go away and that's what makes this topic so complicated. One issue in particular is using poorly compressed images that are extremely big in file size. Unfortunately often times this is at the hands of people who lack the knowledge of creating images that are light in weight and properly compressed. So it's up to us, developers, to anticipate the problems and proactively address them.

Ways to improve image files for your website

If you are responsible for creating or working with images in an image editor such as Photoshop, Illustrator, GIMP, and others, you have great tools at your disposal to ensure your images are optimized and sized properly. You can play around with the image quality scale as you export your images and ensure they are not bigger than they need to be. There are many other tools that can help you with compression. One little tool I've been using for years is this little app called ImageOptim, which allows you to drop in your images in it and it compresses them saving you some file size and improving compression.

Depending on your requirements and environment, you could also look at using different file types for your images. One highly recommended image type is webp. With the ability to do lossless and lossy compression, webp provides significant improvements in file sizes while still maintaining your images high quality. The browser support for webp is excellent as it is supported by all major browsers, but do some research prior to start using it as there are some hosting platforms that do not support webp.

To give you an example of how good webp is, the image in the header of this blog post was originally exported from Photoshop as a .JPG, which resulted in a 317KB file size. This is not bad at all, but then I ran the image through the ImageOptim app and the file size was reduced to 120KB. That's a 62% file size reduction. Then I exported the same image from Photoshop but this time in .webp format and the file size became 93KB. That's 71% in file size reduction compared to the original JPG version.

A must have CSS rule in your project

By now it should be clear that the goal for serving images on any website is doing it by using the responsive images approach. The way you implement responsive images on your site may vary depending on your platform, available tools, and skillset. Regardless, the following CSS rule should always be available within your project base CSS styles and should apply to all images on your site:

img { display: block; max-width: 100%; }

Easy right? That's it, we're done 😃

The CSS rule above will in fact make your images responsive (images will automatically adapt to the width of their containers/viewport). This rule should be added to your website's base styles so every image in your website becomes responsive by default. However, this should not be the extend of your responsive images solution. Although your images will be responsive with the CSS rule above, this does not address image compression nor optimization and this will result in performance issues if you are dealing with extremly large file sizes. Take a look at this example where the rule above is being used. Resize your browser to any width including super small to simulate a mobile device. Notice how the image automatically adapts to the width of the browser. Here's the problem though, the image in this example measures 5760x3840 pixels and it weights 6.7 MB. This means, even if your browser width is super narrow, and the image is resized to a very small visual size, you are still loading an image that is 6.7 MB in weight. No good 👎

In the next post of this series, we will begin the process of implementing a solution for handling responsive images the right way.

Navigate posts within this series

Art Direction using the <picture> HTML element

Categories: FLOSS Project Planets

Oliver Davies' daily list: When did you last deploy to production?

Planet Drupal - Wed, 2024-09-11 20:00

If you've experienced issues or are worried about deploying changes to production, on a Friday or another day, when did you last deploy something?

Can you make deployments smaller and more frequent?

Deploying regularly makes each deployment less risky and having a smaller changeset makes it easier to find and fix any issues that arise.

I'm much happier deploying to production if I've already done so that day, or at least that week.

Any time more than that, or if the changeset is large, the more likely there will be issues and the longer it will take to resolve them.

Categories: FLOSS Project Planets

Python⇒Speed: It's time to stop using Python 3.8

Planet Python - Wed, 2024-09-11 20:00

Upgrading to new software versions is work, and work that doesn’t benefit your software’s users. Users care about features and bug fixes, not how up-to-date you are.

So it’s perhaps not surprising how many people still use Python 3.8. As of September 2024, about 14% of packages downloaded from PyPI were for Python 3.8. This includes automated downloads as part of CI runs, so it doesn’t mean 3.8 is used in 14% of applications, but that’s still 250 million packages installed in a single day!

Still, there is only so much time you can delay upgrading, and for Python 3.8, the time to upgrade is as soon as possible. Python 3.8 is reaching its end of life at the end of October 2024.

No more bug fixes.

No more security fixes.

“He’s dead, Jim.”

Still not convinced? Let’s see why you want to upgrade.

Python⇒Speed: When should you upgrade to Python 3.13?

Planet Python - Wed, 2024-09-11 20:00

Python 3.13 will be out October 1, 2024—but should you switch to it immediately? And if you shouldn’t upgrade just yet, when should you?

Immediately after the release, you probably didn’t want to upgrade just yet. But from December 2024 and onwards, upgrading is definitely worth trying, though it may not succeed. To understand why, we need to consider Python packaging, the software development process, and take a look at the history of past releases.

Plasma Wayland Protocols 1.14.0

Planet KDE - Wed, 2024-09-11 20:00

Plasma Wayland Protocols 1.14.0 is now available for packaging.

This adds features needed for the Plasma 6.2 beta.

URL: https://download.kde.org/stable/plasma-wayland-protocols/
SHA256: 1a4385ecfc79f7589f07381cab11c3ff51f6e2fa4b73b78600d6ad096394bf81 Signed by: E0A3EB202F8E57528E13E72FD7574483BB57B18D Jonathan Riddell jr@jriddell.org

Full changelog:

add a protocol for externally controlled display brightness
output device: add support for brightness in SDR mode
plasma-window: add client geometry + bump to v18
Add warnings discouraging third party clients using internal desktop environment protocols

Categories: FLOSS Project Planets

Akademy 2024 - The Akademy of Many Changes

Planet KDE - Wed, 2024-09-11 20:00

Akademy 2024 group photo.

This year's Akademy in Würzburg, Germany was all about resetting priorities, refocusing goals, and combining individual projects into a final result greater than the sum of its parts.

A shift — or more accurately a broadening of interest — in the KDE community has been gradually emerging over the past few years, and reached a new peak at this year's event. The conference largely focused on delivering high quality, cutting-edge Free Software to end users. However the keynote "Only Hackers will Survive" by tech activist and environmentalist Joanna Murzyn, and Joseph De Veaugh-Geiss' "Opt In? Opt Out? Opt Green!" talk took attendees down a left turn by addressing growing concerns about the impact of IT on the environment and discussed how Free Software can help curb CO2 emissions.

Joanna Murzyn explains the impact irresponsible IT developments have on the environment.

KDE has always had a social conscience. The community's vision and mission of providing users with the tools to control their digital lives and protect their privacy is now combined with concern for the preservation of the planet we live on.

But KDE can do more than one thing at a time, and our software is also undergoing profound changes under the hood. Joshua Goins, for example, is working on ways to enable framework, plasma and application development in Rust. According to Joshua, Rust has many advantages, such as its memory safety capabilities, while adding a Qt front-end to Rust projects is much easier than many people believe.

This is in line with one of the new goals adopted by the community during this Akademy: Nicolas Fella, Plasma contributor and key developer of KDE's all-important frameworks, will champion "Streamlined Application Development Experience", a goal aimed at making it easier for new developers to access KDE technologies.

And onboarding new developers is what the "KDE Needs You! 🫵" goal is about. While the growing popularity of KDE software is great, growth puts more stress on the teams that create the applications, environments, and underlying engines and frameworks. Add to that the fact that the veteran developers are getting gray around the temples, and you need a constant influx of new contributors to keep the community and its projects running. The "KDE Needs You! 🫵" goal aims to address this challenge and is being spearheaded by members of the Promo and Mentoring teams. The champions aim to formalize and strengthen KDE's processes for recruiting active contributors, and to make recruiting active contributors to projects a priority and an ongoing task for the community.

These two goals aim to benefit the community and projects in general. KDE's third goal, "We care about your input", will build on their work: proposed by Jakob Petsovits and Gernot Schiller, "input" in this case refers to "input from devices". The goal addresses the fact that there are still a lot of languages and hardware that are not optimally supported by Plasma and KDE applications, and with the move to Wayland, some devices have even temporarily lost a measure of support they enjoyed on X11. Jakob, Gernot, and their team of supporters plan to solve this problem methodically, working to make KDE's software work smoothly and effortlessly on drawing tablets, accessibility devices, and game controllers, as well as software virtual keyboards and input methods for users of the Chinese, Japanese, and Korean languages.

One way to achieve the desired level of integration is to control the entire software stack, right down to the operating system. This is also in the works for KDE, as Harald Sitter is working on a new technologically advanced operating system tentatively named "KDE Linux". This operating system aims to break free of the constraints currently limiting KDE's existing Neon OS and offer a superior experience for KDE's developers, enthusiast users, and everyday users.

KDE Linux's base system will be immutable, meaning that nothing will be able to change critical parts of the system, such as the /etc, /usr, and /bin, directories. User applications will be installed via self-contained packages, such as Flatpaks and Snaps. Adventurous users and developers will be able to overlay anything they want on top of the base system in a non-destructive and reversible way, without ever having to touch the core and risk not-easily-fixable breakage.

This will help provide users with a solid, stable, and secure environment, without sacrificing the ability to run the latest and greatest KDE software.

As the proof of the pudding is in the eating, Harald surprised the audience when he revealed towards the end of his talk that his entire presentation had been delivered using KDE Linux!

The user-facing side of KDE is also changing with the work of Arjen Hiemstra and Andy Betts — and the Visual Design Group at large. Arjen is working on Union, a new theming engine that will eventually replace the different ways of styling in KDE. Up until now, developers and designers have had to deal with multiple ways of doing styling, some of which are quite difficult to use. Union, as the name implies, will end the fragmentation and provide something that is both easier for developer to maintain and more flexible for designers to interact with.

And then Andy Betts told us about Plasma Next, while introducing the audience to the concept of Design Systems -- management systems that allow all aspects of large collections of icons, components, and other graphical assets to be controlled. Using design systems, the VDG is developing a new look for Plasma and KDE applications that may very well replace KDE's current Breeze theme — and be consistently applied across all KDE apps and plasma, to boot!

A first look at Dolphin rocking Plasma Next icons.

This means that in the not too distant future, KDE users will be able to enjoy a rock-solid and elegant operating system with a brilliant look to match!

In other Akademy news...

Kevin Ottens added another KDE success story to the list, telling us how a wine producing company in Australia has been using hundreds of desktops running KDE Plasma for more than 10 years.
Natalie Clarius explained how she is working on adapting Plasma to work better on our vendor partners' hardware products.
In the "Openwashing" panel moderated by Markus Felner, Cornelius Schumacher of KDE, Holger Dyroff of OwnCloud, Richard Heigl of HalloWelt! and Leonhard Kugler of OpenCode took on companies that call themselves "open source" but are anything but.
David Schlanger shared the harsh truth about AI, his wishful positive vision for the technology, and then faced questions and comments from Lydia Pintscher and Eike Hein in the keynote session on day 2.
Ben Cooksley, Volker Kraus, Hannah von Reth, and Julius Künzel took a deep dive into the frameworks, services, and utilities that help KDE projects get their products to users quickly and on a wide variety of platforms.

Akademy Awards

The prestigious KDE Awards were given to:

Friedrich W.H. Kossebau for his work on the Okteta hex editor.
Albert Astals Cid for his work on the Qt patch collection, KDE Gear release maintenance, i18n, and many, many other things.
Nicolas Fella received his award for his work on KDE Frameworks and Plasma.
As is traditional, the Akademy organizing team was awarded for putting on a fun, interesting and safe event for the entire KDE community.

Your browser does not support the video tag.

If you would like to see the talks as they happened, the unedited videos are currently available on YouTube. Cut and edited versions with slides will be available soon on both YouTube and PeerTube.

Categories: FLOSS Project Planets

KDE Gear 24.08.1

Planet KDE - Wed, 2024-09-11 20:00

Over 180 individual programs plus dozens of programmer libraries and feature plugins are released simultaneously as part of KDE Gear.

Today they all get new bugfix source releases with updated translations, including:

filelight: Update the sidebar when deleting something from the context menu (Commit, fixes bug #492155)
kclock: Fix timer circle alignment (Commit, fixes bug #481170)
kwalletmanager: Start properly when invoked from the menu on wayland (Commit, fixes bug #492138)

Distro and app store packagers should update their application packages.

24.08 release notes for information on tarballs and known issues.
Package download wiki page
24.08.1 source info page
24.08.1 full changelog

Categories: FLOSS Project Planets

Copyright law makes a case for requiring data information rather than open datasets for Open Source AI

Open Source Initiative - Wed, 2024-09-11 16:04

The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.

This series features the voices of the volunteers who have helped shape and are shaping the Definition.

Meet Felix Reda Photo Credit: CC-by 4.0 International Volker Conradus volkerconradus.com.

Felix Reda (he/they) has been an active contributor to the Open Source AI Definition (OSAID) co-design process, bringing his personal interest and expertise in copyright reform to the online forums. Working in digital policy for over ten years, including serving as a member of the European Parliament from 2014 to 2019 and working with the strategic litigation NGO Gesellschaft für Freiheitsrechte (GFF), Felix is currently the director of developer policy at GitHub. He is also an affiliate of the Berkman Klein Center for Internet and Society at Harvard and serves on the board of the Open Knowledge Foundation Germany. He holds an M.A. in political science and communications science from the University of Mainz, Germany.

Data information as a viable alternative

Note: The original text was contributed by Felix Reda to the discussions happening on the Open Source AI forum as a response to Stefano Maffulli’s post on how the draft Open Source AI Definition arrived at its current state, the design principles behind the data information concept and the constraints (legal and technical) it operates under.

When we look at applying Open Source principles to the subject of AI, copyright law comes into play, especially for the topic of training data access. Open datasets have been a continuous discussion point in the collaborative process of writing the Open Source AI Definition. I would like to explain why the concept of data information is a viable alternative for the purposes of the OSAID.

The definition of Open Source software has an access element and a legal element – the access element being the availability of the source code and the legal element being a license rooted in the copyright-protection given to software. The underlying assumption is that the entity making software available as Open Source is the rights holder in the software and is therefore entitled to make the source code available without infringing the copyright of a third party, and to license it for re-use. To the extent that third-party copyright-protected material is incorporated into the Open Source software, it must itself be released under a compatible Open Source license that also allows the redistribution.

When it comes to AI, the situation is fundamentally different: The assumption that an Open Source AI model will only be trained on copyright-protected material that the developer is entitled to redistribute does not hold. Different copyright regimes around the world, including the EU, Japan and Singapore, have statutory exceptions that explicitly allow text and data mining for the purposes of AI training. The EU text and data mining exceptions, which I know best, were introduced with the objective of facilitating the development of AI and other automated analytical techniques. However, they only allow the reproduction of copyright-protected works (aka copying), but not the making available of those works (aka posting them on the internet).

That means that an Open Source AI definition that would require the republication of the complete dataset in order for an AI model to qualify as Open Source would categorically exclude Open Source AI models from the ability to rely on the text and data mining exceptions in copyright – that is despite the fact that the legislator explicitly decided that under certain circumstances (for example allowing rights holders to declare a machine-readable opt-out from training outside of the context of scientific research) the use of copyright-protected material for the purposes of training AI models should be legal. This result would be particularly counterproductive because it would even render Open Source AI models illegal in situations where the reproducibility of the dataset would be complete by the standards discussed on the OSAID forum.

Examples

Imagine an AI model that was trained on publicly accessible text on the internet that was version-controlled, for which the rights holder had not declared an opt-out, but which the rights holder had also not put under a permissive license (all rights reserved). Using this text as training data for an AI model would be legal under copyright law, but re-publishing the training dataset would be illegal. Publishing information about the training dataset that included the version of the data that was used, when and how it was retrieved from which website, and how it was tokenized would meet the requirements of the OSAID v 0.0.8 if (and only if) it put a skilled person in the position to build their own dataset to recreate an equivalent system.

Neither the developer of the original Open Source AI model nor the skilled person recreating it would violate copyright law in the process, unlike the scenario that required publication of the dataset. Including a requirement in the OSAID to publish the data, in which the AI developer typically does not hold the copyright, would have little added benefit but would drastically reduce the material that could be used for training, despite the existence of explicit legal permissions to use that content for AI training. I don’t think that would be wise.

The international concern of public domain

While I support the creation of public domain datasets that can be republished without restrictions, I would like to caution against pointing to these efforts as a solution to the problem of copyright in training datasets. Public domain status is not harmonized internationally – what is in the public domain in one jurisdiction is routinely protected by copyright in other parts of the world. For example, in US discourse it is often assumed that works generated by US government employees are in the public domain. They are not, they are only in the public domain in the US, while they are copyright-protected in other jurisdictions.

The same goes for works in which copyright has expired: Although the Berne Convention allows signatory countries to limit the copyright term on works until protection in the work’s country of origin has expired, exceptions to this rule are permitted. For example, although the first incarnation of Mickey Mouse has recently entered the public domain in the US, it is still protected by copyright in Germany due to an obscure bilateral copyright treaty between the US and Germany from 1892. Copyright protection is not conditional on registration of a work, and no even remotely comprehensive, reliable rights information on the copyright status of works exists. Good luck to an Open Source AI developer who tried to stay on top of all of these legal pitfalls.

Bottom line

There are solid legal permissions for using copyright-protected works for AI training (reproductions). There are no equivalent legal permissions for incorporating copyright-protected works into publishable datasets (making available). What an Open Source AI developer thinks is in the public domain and therefore publishable in an open dataset regularly turns out to be copyright-protected after all, at least in some jurisdictions.

Unlike reproductions, which only need to follow the copyright law of the country in which the reproduction takes place, making content available online needs to be legal in all jurisdictions from which the content can be accessed. If the OSAID required the publication of the dataset, this would routinely lead to situations where Open Source AI models could not be made accessible across national borders, thus impeding their collaborative improvement, one of the great strengths of Open Source. I doubt that with such a restrictive definition, Open Source AI would gain any practical significance. Tragically, the text and data mining exceptions that were designed to facilitate research collaboration and innovation across borders, would only support proprietary AI models, while excluding Open Source AI. The concept of data information will help us avoid that pitfall while staying true to Open Source principles.

How to get involved

The OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:

Join the forum: share your comment on the drafts.
Leave comment on the latest draft: provide precise feedback on the text of the latest draft.
Follow the weekly recaps: subscribe to our monthly newsletter and blog to be kept up-to-date.
Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions and share your thoughts.
Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.

Categories: FLOSS Research

Glyph Lefkowitz: Python macOS Framework Builds

Planet Python - Wed, 2024-09-11 15:43

When you build Python, you can pass various options to ./configure that change aspects of how it is built. There is documentation for all of these options, and they are things like --prefix to tell the build where to install itself, --without-pymalloc if you have some esoteric need for everything to go through a custom memory allocator, or --with-pydebug.

One of these options only matters on macOS, and its effects are generally poorly understood. The official documentation just says “Create a Python.framework rather than a traditional Unix install.” But… do you need a Python.framework? If you’re used to running Python on Linux, then a “traditional Unix install” might sound pretty good; more consistent with what you are used to.

If you use a non-Framework build, most stuff seems to work, so why should anyone care? I have mentioned it as a detail in my previous post about Python on macOS, but even I didn’t really explain why you’d want it, just that it was generally desirable.

The traditional answer to this question is that you need a Framework build “if you want to use a GUI”, but this is demonstrably not true. At first it might not seem so, since the go-to Python GUI test is “run IDLE”; many non-Framework builds also omit Tkinter because they don’t ship a Tk dependency, so IDLE won’t start. But other GUI libraries work fine. For example, uv tool install runsnakerun / runsnake will happily pop open a GUI window, Framework build or not. So it bears some explaining

Wait, what is a “Framework” anyway?

Let’s back up and review an important detail of the mac platform.

On macOS, GUI applications are not just an executable file, they are organized into a bundle, which is a directory with a particular layout, that includes metadata, that launches an executable. A thing that, on Linux, might live in a combination of /bin/foo for its executable and /share/foo/ for its associated data files, is instead on macOS bundled together into Foo.app, and those components live in specified locations within that directory.

A framework is also a bundle, but one that contains a library. Since they are directories, Applications can contain their own Frameworks and Frameworks can contain helper Applications. If /Applications is roughly equivalent to the Unix /bin, then /Library/Frameworks is roughly equivalent to the Unix /lib.

App bundles are contained in a directory with a .app suffix, and frameworks are a directory with a .framework suffix.

So what do you need a Framework for in Python?

The truth about Framework builds is that there is not really one specific thing that you can point to that works or doesn’t work, where you “need” or “don’t need” a Framework build. I was not able to quickly construct an example that trivially fails in a non-framework context for this post, but I didn’t try that many different things, and there are a lot of different things that might fail.

The biggest issue is not actually the Python.framework itself. The metadata on the framework is not used for much outside of a build or linker context. However, Python’s Framework builds also ship with a stub application bundle, which places your Python process into a normal application(-ish) execution context all the time, which allows for various platform APIs like [NSBundle mainBundle] to behave in the normal, predictable ways that all of the numerous, various frameworks included on Apple platforms expect.

Various Apple platform features might want to ask a process questions like “what is your unique bundle identifier?” or “what entitlements are you authorized to access” and even beginning to answer those questions requires information stored in the application’s bundle.

Python does not ship with a wrapper around the core macOS “cocoa” API itself, but we can use pyobjc to interrogate this. After installing pyobjc-framework-cocoa, I can do this

1 2>>> import AppKit >>> AppKit.NSBundle.mainBundle()

On a non-Framework build, it might look like this:

1NSBundle </Users/glyph/example/.venv/bin> (loaded)

But on a Framework build (even in a venv in a similar location), it might look like this:

1NSBundle </Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app> (loaded)

This is why, at various points in the past, GUI access required a framework build, since connections to the window server would just be rejected for Unix-style executables. But that was an annoying restriction, so it was removed at some point, or at least, the behavior was changed. As far as I can tell, this change was not documented. But other things like user notifications or geolocation might need to identity an application for preferences or permissions purposes, respectively. Even something as basic as “what is your app icon” for what to show in alert dialogs is information contained in the bundle. So if you use a library that wants to make use of any of these features, it might work, or it might behave oddly, or it might silently fail in an undocumented way.

This might seem like undocumented, unnecessary cruft, but it is that way because it’s just basic stuff the platform expects to be there for a lot of different features of the platform.

/etc/ builds

Still, this might seem like a strangely vague description of this feature, so it might be helpful to examine it by a metaphor to something you are more familiar with. If you’re familiar with more Unix style application development, consider a junior developer — let’s call him Jim — asking you if they should use an “/etc build” or not as a basis for their Docker containers.

What is an “/etc build”? Well, base images like ubuntu come with a bunch of files in /etc, and Jim just doesn’t see the point of any of them, so he likes to delete everything in /etc just to make things simpler. It seems to work so far. More experienced Unix engineers that he has asked react negatively and make a face when he tells them this, and seem to think that things will break. But their app seems to work fine, and none of these engineers can demonstrate some simple function breaking, so what’s the problem?

Off the top of your head, can you list all the features that all the files that /etc is needed for? Why not? Jim thinks it’s weird that all this stuff is undocumented, and it must just be unnecessary cruft.

If Jim were to come back to you later with a problem like “it seems like hostname resolution doesn’t work sometimes” or “ls says all my files are owned by 1001 rather than the user name I specified in my Dockerfile” you’d probably say “please, put /etc back, I don’t know exactly what file you need but lots of things just expect it to be there”.

This is what a framework vs. a non-Framework build is like. A Framework build just includes all the pieces of the build that the macOS platform expects to be there. What pieces do what features need? It depends. It changes over time. And the stub that Python’s Framework builds include may not be sufficient for some more esoteric stuff anyway. For example, if you want to use a feature that needs a bundle that has been signed with custom entitlements to access something specific, like the virtualization API, you might need to build your own app bundle. To extend our analogy with Jim, the fact that /etc exists and has the default files in it won’t always be sufficient; sometimes you have to add more files to /etc, with quite specific contents, for some features to work properly. But “don’t get rid of /etc (or your application bundle)” is pretty good advice.

Do you ever want a non-Framework build?

macOS does have a Unix subsystem, and many Unix-y things work, for Unix-y tasks. If you are developing a web application that mostly runs on Linux anyway and never care about using any features that touch the macOS-specific parts of your mac, then you probably don’t have to care all that much about Framework builds. You’re not going to be surprised one day by non-framework builds suddenly being unable to use some basic Unix facility like sockets or files. As long as you are aware of these limitations, it’s fine to install non-Framework builds. I have a dozen or so Pythons on my computer at any given time, and many of them are not Framework builds.

Framework builds do have some small drawbacks. They tend to be larger, they can be a bit more annoying to relocate, they typically want to live in a location like /Library or ~/Library. You can move Python.framework into an application bundle according to certain rules, as any bundling tool for macOS will have to do, but it might not work in random filesystem locations. This may make managing really large number of Python versions more annoying.

Most of all, the main reason to use a non-Framework build is if you are building a tool that manages a fleet of Python installations to perform some automation that needs to know about Python installs, and you want to write one simple tool that does stuff on Linux and on macOS. If you know you don’t need any platform-specific features, don’t want to spend the (not insignificant!) effort to cover those edge cases, and you get a lot of value from that level of consistency (for example, a teaching environment or interdisciplinary development team with a lot of platform diversity) then a non-framework build might be a better option.

Why do I care?

Personally, I think it’s important for Framework builds to be the default for most users, because I think that as much stuff should work out of the box as possible. Any user who sees a neat library that lets them get control of some chunk of data stored on their mac - map data, health data, game center high scores, whatever it is - should be empowered to call into those APIs and deal with that data for themselves.

Apple already makes it hard enough with their thicket of code-signing and notarization requirements for distributing software, aggressive privacy restrictions which prevents API access to some of this data in the first place, all these weird Unix-but-not-Unix filesystem layout idioms, sandboxing that restricts access to various features, and the use of esoteric abstractions like mach ports for communications behind the scenes. We don't need to make it even harder by making the way that you install your Python be a surprise gotcha variable that determines whether or not you can use an API like “show me a user notification when my data analysis is done” or “don’t do a power-hungry data analysis when I’m on battery power”, especially if it kinda-sorta works most of the time, but only fails on certain patch-releases of certain versions of the operating system, becuase an implementation detail of a proprietary framework changed in the meanwhile to require an application bundle where it didn’t before, or vice versa.

More generally, I think that we should care about empowering users with local computation and platform access on all platforms, Linux and Windows included. This just happens to be one particular quirk of how native platform integration works on macOS specifically.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. For this one, thanks especially to long-time patron Hynek who requested it specifically. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how can we set up our Mac developers’ laptops with Python”.

Categories: FLOSS Project Planets

ImageX: DrupalCon Barcelona 2024: Top Session Picks from Our Team

Planet Drupal - Wed, 2024-09-11 15:24

Authored by Nadiia Nykolaichuk.

DrupalCon 2024 is returning to one of the world’s most enchanting cities — Barcelona! As the event draws near, Drupal enthusiasts from around the globe are tapping out the rhythm of Spanish flamenco with their feet in anticipation. Now is the perfect time to explore the conference’s program and select the sessions that will inspire and invigorate you.

Categories: FLOSS Project Planets

FSF Events: Pick up some Sourceware infrastructure tips and tricks with Ian Kelling at GNU Cauldron in Prague on September 16

GNU Planet! - Wed, 2024-09-11 14:20

Categories: FLOSS Project Planets

A Tale of Wine Labels and Open Source Contributions

Planet KDE - Wed, 2024-09-11 11:08

At Akademy 2024, during my talk KDE to Make Wines I promised there would be a companion blog post focusing more on the technical details of what we did. This is the article in question.

This is a piece I also wrote for the enioka blog, so there is a French version available.

Where we set the stage

After years of working in the service industry, one thing which doesn’t cease to amaze me is the variety of needs our customers have and how we can still be surprised by them. They sometimes lead us in unexpected directions. This is obviously something I particularly like.

Today I’ll tell you the tale of a nice relationship enioka Haute Couture built with a customer down under.

It all started with a mail on a KDE mailing list a couple of years ago. People were looking for help. It was spotted by my colleague Benjamin Port who reached out. It turned out those people were from De Bortoli Wines an Australian winemaking company. They are known for using Free Software quite a bit and contributing when they can. They even got interviewed on the KDE’s dot ten years ago!

Not much really happened after our first contact but we kept the communication open… Until last year when they reached out to us for some help with Okular, the universal document viewer made by KDE.

They wanted to get rid of Acrobat Reader for Linux in favor of Okular. They had one issue though, the overprint preview support was visibly broken in Okular. This is essential when interacting with a reprographics company which will send back PDFs based on how they layer colors (or overprinting). You need the overprint preview to simulate what the final rendering will be. We were of course motivated to help them get rid of Acrobat Reader for Linux since it is an old and stale piece of proprietary software.

Getting serious about PDF rendering

This looked at first like an easy task. Poppler (used for rendering PDFs) was exposing some API for the overprinting preview but Okular didn’t make use of it. We quickly made a patch to use the overprint preview API.

Alas, doing so uncovered another issue. As soon as we turned on the overprinting preview support the application would crash. We tracked it down one level down. Somehow the crash was hiding in the Poppler-Qt bindings.

After further exploration, it was due to the binding wrongly determining the row size of the raster images to generate. There’s a color space conversion occurring between the initial memory representation and the target raster image. The code was getting this row size before the transform occurred… and then ended up stuffing the wrong amount of data in the target raster. This couldn’t go well. Another patch was thus produced to address this.

The good news is that we managed to fix the issue in less time than the budget the customer allocated to us. So we gave them the choice between stopping here or using the remaining budget to address something else.

We used the Ghent Workgroup PDF Output suite to validate our work beyond the samples provided by our contact. While doing so we noticed Poppler was failing at properly rendering some other cases. So we proposed to investigate those as well.

After spending some time on those… we made tentative fixes but unfortunately they led to some regression. So in agreement with the customer, we wrapped up and created a detailed bug report instead as to not waste their budget. This helped the Poppler community figure out the problem and produce a fix. That’s when we realized we came really close to the right fix at some point. Clearly an expert view on how the various PDF color spaces work was required so it was a good call to create the detailed report.

With still some budget left, our contact proposed us to also bring the overprint preview support to Okular printing. This was initially left out of the scope but necessary when you want to print your preview on a regular laser printer. This required adjusting Okular and adjusting Poppler-Qt once more. It was ultimately done within budget.

CIFS mount woes

Since the customer was satisfied with the work they came back for more. We setup a budget line for them to come up with issues to fix throughout the year.

Around that time, their focus moved more to CIFS mounts which they use extensively for their remote office branches. As active users of that kernel feature, they encounter issues in user facing software that you would otherwise not suspect.

File copy failures

They were affected by a bug preventing Okular to save on CIFS mounts. It is one of those which has been lingering on for a bit more than a year without a solution in sight. Some applications could modify and save a file opened on a CIFS mount but somehow not Okular.

It turned out to be due to some code in KIO itself (the KDE Frameworks API used for network transparent file operations) interacting in an unwanted way with CIFS mounts.

Indeed, the behavior of unlink() (file deletion) on CIFS mounts can be a bit “interesting”. If the file one tries to delete is opened by another process then the operation is claimed to have succeeded but the filename is still visible in the file hierarchy until the last handle is closed. This is unlike the usual UNIX behavior, outside of CIFS mounts the file wouldn’t be visible in the hierarchy anymore. We thus were seeing the issue because Okular does keep a file handle opened on the file.

Now, KIO rightfully attempts to write under a temporary name, delete the original file and rename to the final name during its file copy operation. This would then fail as the unlink() call would succeed, but the rename would unexpectedly fail due to the lingering file in the hierarchy.

So we proposed a patch for KIO which would do a direct copy for files on CIFS mounts. Files being directly overwritten succeed and so the bug experienced with Okular was solved.

Slow directory listing

This wasn’t the end of the issues with CIFS mounts (far from it). They were also experiencing performance problems when listing folders. Interestingly, they would experience it only in the details view of the KDE file dialog.

At its core the issue was due to requesting too much information. When listing a folder known to be remote by KIO (e.g. going through an smb:// URL) the view would limit the amount of information it’d request about the sub-folders. In particular it wouldn’t try to determine the number of files in the sub-folder. This operation is fast nowadays on modern disks, but incurs extra trafic and latency over the network.

This sounded like a simple fix… but in fact it was a bit more work than expected.

Unsurprisingly we quickly found that the code would decide to go for more or less details solely on the URL. Since CIFS mounts get file:/ URLs, they’d end up treated as local files… so we went to querying a bit more agressively. There is an isSlow() method on the items in the detail view tree which we extended to check for CIFS mounts.

This wasn’t enough though, we immediately realized that the new bottleneck was the calls to isSlow() itself. It would lead to several statfs calls which would be expensive as well. The way out was thus to cache the information in the items and for children to query the cache in their parent. Indeed, if the parent is considered slow, we decided to consider the children in the folder as slow as well. This heuristic allowed us to remove all the subsequent isSlow() calls after the one on the mountpoint folder itself.

This was a very old piece of code we touched there, so some time was also used to clean things up a bit, refactor and rename things to align them better with other KIO parts.

LibreOffice backup files during save

They are such active users of CIFS mounts that they found yet another issue! This time with an old version of LibreOffice. In their version it would show up only if the KDE integration was used. I can tell you we were a bit surprised by this. The investigation wasn’t that easy but we managed to track it down.

The issue was showing up after opening a file sitting on a CIFS mount with LibreOffice. If you did a change to the file, then clicked “Save As…” and selected the same file to overwrite, you would get a “Could not create a backup copy” error and the file wouldn’t be saved if the KDE integration was active. All would be fine without this integration though.

What would be different with and without the KDE integration? Well, in one case there is an extra process! When the file dialog opens, if it is the KDE file dialog, the listing is delegated to a KIO Worker. In this particular case this would matter. Indeed, we figured that LibreOffice keeps an open file descriptor on the opened file. Not only this, it also holds a read lock on the file. The KIO Worker is being forked from LibreOffice, so it too has the open file descriptor with a lock.

This made us realize that there was a leak of file descriptor which is in itself not a good thing. So we changed KIO to cleanup open file descriptors when spawning workers, it’s always a good idea to be tidy. Also, as soon as we removed the file descriptor leak the issue was gone. Nice, this was an easy fix.

Just to be thorough, we tried again but this time with the latest LibreOffice. And even with the patched KIO we would get the “Could not create a backup copy” error! This time it would show up also without the KDE integration. And so back to hunting… without getting too much into the internals of LibreOffice, it turned out the more recent version had extra code activated to produce said backup version, and it has two file descriptors open on the same file. So we were back to the problem of two file descriptors and read locks being involved. But this time it was not due to a leak towards another process, and the architecture of LibreOffice didn’t make it easy to reuse the original file descriptor created when opening the file.

The only thing we could do at this point was to simply not lock when the file is on a CIFS mount. It is not a very satisfying solution but it did the job while fitting the allocated time.

Somehow we couldn’t leave things like this though. If you know well the very much criticized POSIX file locks (which have extra challenges over CIFS), something is still not feeling quite right. We got two processes with a file descriptor on the file to save, and yet there is a single process holding a read lock. During the save, when the failure happens, LibreOffice is still making a “backup copy”, it is not writing yet to the file only reading it… for sure this should be allowed!

We thus started to suspect a problem with the kernel itself… and our contact has been willing to explore it further. After investigation, it’s been confirmed to be a kernel bug. For that the customer hooked us up with Andrew Bartlett from Catalyst as they knew he could help us flesh out ideas in this space. This proved valuable indeed. Thanks to tests we wrote previously and conversations with Andrew, we quickly figured that depending on the options you would pass at mount time the CIFS driver would handle the locks properly or not. We’ve discussed with the maintainer of the CIFS driver for a couple of fixes. They have been merged last week (end of August 2024).

A customer who sparks joy

It’s really a joy to have a customer like this. As can be seen from their willingness to dig deeper on what others would consider obscure issues, they demonstrate they are thinking long term. It also means they come with interesting and challenging issue… and they’re appreciative of what we achieved for them!

Indeed we got the pleasure to receive this by email during our conversations:

We very much appreciate the work you’ve done, and difficulty surrounding the challenges you’re working through. Your approach/results are spectacular, and we are very grateful to be able to be a part of it.

Of course, if you have projects involving Free Software communities or issues closer to the system, feel free to reach out, and we’ll see what we could do to help you. Maybe you too can be a forward thinking customer who sparks joy!

Categories: FLOSS Project Planets

Real Python: How to Use Conditional Expressions With NumPy where()

Planet Python - Wed, 2024-09-11 10:00

The NumPy where() function is a powerful tool for filtering array elements in lists, tuples, and NumPy arrays. It works by using a conditional predicate, similar to the logic used in the WHERE or HAVING clauses in SQL queries. It’s okay if you’re not familiar with SQL—you don’t need to know it to follow along with this tutorial.

You would typically use np.where() when you have an array and need to analyze its elements differently depending on their values. For example, you might need to replace negative numbers with zeros or replace missing values such as None or np.nan with something more meaningful. When you run where(), you’ll produce a new array containing the results of your analysis.

You generally supply three parameters when using where(). First, you provide a condition against which each element of your original array is matched. Then, you provide two additional parameters: the first defines what you want to do if an element matches your condition, while the second defines what you want to do if it doesn’t.

If you think this all sounds similar to Python’s ternary operator, you’re correct. The logic is the same.

Note: In this tutorial, you’ll work with two-dimensional arrays. However, the same principles can be applied to arrays of any dimension.

Before you start, you should familiarize yourself with NumPy arrays and how to use them. It will also be helpful if you understand the subject of broadcasting, particularly for the latter part of this tutorial.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment.

The NumPy library is not part of core Python, so you’ll need to install it. If you’re using a Jupyter Notebook, create a new code cell and type !python -m pip install numpy into it. When you run the cell, the library will install. If you’re working at the command line, use the same command, only without the exclamation point (!).

With these preliminaries out of the way, you’re now good to go.

Get Your Code: Click here to download the free sample code that shows you how to use conditional expressions with NumPy where().

Take the Quiz: Test your knowledge with our interactive “How to Use Conditional Expressions With NumPy where()” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Use Conditional Expressions With NumPy where()

This quiz aims to test your understanding of the np.where() function. You won't find all the answers in the tutorial, so you'll need to do additional research. It's recommended that you make sure you can do all the exercises in the tutorial before tackling this quiz. Enjoy!

How to Write Conditional Expressions With NumPy where()

One of the most common scenarios for using where() is when you need to replace certain elements in a NumPy array with other values depending on some condition.

Consider the following array:

Python >>> import numpy as np >>> test_array = np.array( ... [ ... [3.1688358, 3.9091694, 1.66405549, -3.61976783], ... [7.33400434, -3.25797286, -9.65148913, -0.76115911], ... [2.71053173, -6.02410179, 7.46355805, 1.30949485], ... ] ... ) Copied!

To begin with, you need to import the NumPy library into your program. It’s standard practice to do so using the alias np, which allows you to refer to the library using this abbreviated form.

The resulting array has a shape of three rows and four columns, each containing a floating-point number.

Now suppose you wanted to replace all the negative numbers with their positive equivalents:

Python >>> np.where( ... test_array < 0, ... test_array * -1, ... test_array, ... ) array([[3.1688358 , 3.9091694 , 1.66405549, 3.61976783], [7.33400434, 3.25797286, 9.65148913, 0.76115911], [2.71053173, 6.02410179, 7.46355805, 1.30949485]]) Copied!

The result is a new NumPy array with the negative numbers replaced by positives. Look carefully at the original test_array and then at the corresponding elements of the new all_positives array, and you’ll see that the result is exactly what you wanted.

Note: The above example gives you an idea of how the where() function works. If you were doing this in practice, you’d most likely use either the np.abs() or np.absolute() functions instead. Both do the same thing because the former is shorthand for the latter:

Python >>> np.abs(test_array) array([[3.1688358 , 3.9091694 , 1.66405549, 3.61976783], [7.33400434, 3.25797286, 9.65148913, 0.76115911], [2.71053173, 6.02410179, 7.46355805, 1.30949485]]) Copied!

Once more, all negative values have been removed.

Before moving on to other use cases of where(), you’ll take a closer look at how this all works. To achieve your aim in the previous example, you passed in test_array < 0 as the condition. In NumPy, this creates a Boolean array that where() uses:

Python >>> test_array < 0 array([[False, False, False, True], [False, True, True, True], [False, True, False, False]]) Copied!

The Boolean array, often called the mask, consists only of elements that are either True or False. If an element matches the condition, the corresponding element in the Boolean array will be True. Otherwise, it’ll be False.

Read the full article at https://realpython.com/numpy-where-conditional-expressions/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Jamie McClelland: MariaDB mystery

Planet Debian - Wed, 2024-09-11 08:27

I keep getting an error in our backup logs:

Sep 11 05:08:03 Warning: mysqldump: Error 2013: Lost connection to server during query when dumping table `1C4Uonkwhe_options` at row: 1402 Sep 11 05:08:03 Warning: Failed to dump mysql databases ic_wp

It’s a WordPress database having trouble dumping the options table.

The error log has a corresponding message:

Sep 11 13:50:11 mysql007 mariadbd[580]: 2024-09-11 13:50:11 69577 [Warning] Aborted connection 69577 to db: 'ic_wp' user: 'root' host: 'localhost' (Got an error writing communication packets)

The Internet is full of suggestions, almost all of which either focus on the network connection between the client and the server or the FEDERATED plugin. We aren’t using the federated plugin and this error happens when conneting via the socket.

Check it out - what is better than a consistently reproducible problem!

It happens if I try to select all the values in the table:

root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

It happens when I specifiy one specific offset:

root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

It happens if I specify the field name explicitly:

root@mysql007:~# mysql --protocol=socket -e 'select option_id,option_name,option_value,autoload from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

It doesn’t happen if I specify the key field:

root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp +-----------+ | option_id | +-----------+ | 16296351 | +-----------+ root@mysql007:~#

It does happen if I specify the value field:

root@mysql007:~# mysql --protocol=socket -e 'select option_value from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

It doesn’t happen if I query the specific row by key field:

Hm. Surely there is some funky non-printing character in that option_value right?

root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp +---------------------------+ | CHAR_LENGTH(option_value) | +---------------------------+ | 0 | +---------------------------+ root@mysql007:~# mysql --protocol=socket -e 'select HEX(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp +-------------------+ | HEX(option_value) | +-------------------+ | | +-------------------+ root@mysql007:~#

Resetting the value to an empty value doesn’t make a difference:

root@mysql007:~# mysql --protocol=socket -e 'update 1C4Uonkwhe_options set option_value = "" where option_id = 16296351' ic_wp root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

Deleting the row in question causes the error to specify a new offset:

root@mysql007:~# mysql --protocol=socket -e 'delete from 1C4Uonkwhe_options where option_id = 16296351' ic_wp root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~# mysqldump ic_wp > /dev/null mysqldump: Error 2013: Lost connection to server during query when dumping table `1C4Uonkwhe_options` at row: 1401 root@mysql007:~#

If I put the record I deleted back in, we return to the old offset:

root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_options VALUES(16296351,"z_taxonomy_image8905","","yes");' ic_wp root@mysql007:~# mysqldump ic_wp > /dev/null mysqldump: Error 2013: Lost connection to server during query when dumping table `1C4Uonkwhe_options` at row: 1402 root@mysql007:~#

I’m losing my little mind. Let’s get drastic and create a whole new table, copy over the data delicately working around the deadly offset:

oot@mysql007:~# mysql --protocol=socket -e 'create table 1C4Uonkwhe_new_options like 1C4Uonkwhe_options;' ic_wp root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 1402 offset 0;' ic_wp --- There is only 33 more records, not sure how to specify unlimited limit but 100 does the trick. root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 100 offset 1403;' ic_wp

Now let’s make sure all is working properly:

root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_new_options' ic_wp >/dev/null;

Now let’s examine which row we are missing:

root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options where option_id not in (select option_id from 1C4Uonkwhe_new_options) ;' ic_wp +-----------+ | option_id | +-----------+ | 18405297 | +-----------+ root@mysql007:~#

Wait, what? I was expecting option_id 16296351.

Oh, now we are getting somewhere. And I see my mistake: when using offsets, you need to use ORDER BY or you won’t get consistent results.

root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options order by option_id limit 1 offset 1402' ic_wp ; +-----------+ | option_id | +-----------+ | 18405297 | +-----------+ root@mysql007:~#

Now that I have the correct row… what is in it:

root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ; ERROR 2013 (HY000) at line 1: Lost connection to server during query root@mysql007:~#

Well, that makes a lot more sense. Let’s start over with examining the value:

root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ; +---------------------------+ | CHAR_LENGTH(option_value) | +---------------------------+ | 50814767 | +---------------------------+ root@mysql007:~#

Wow, that’s a lot of characters. If it were a book, it would be 35,000 pages long (I just discovered this site). It’s a LONGTEXT field so it should be able to handle it. But now I have a better idea of what could be going wrong. The name of the option is “rewrite_rules” so it seems like something is going wrong with the generation of that option.

I imagine there is some tweak I can make to allow MariaDB to cough up the value (read_buffer_size? tmp_table_size?). But I’ll start with checking in with the database owner because I don’t think 35,000 pages of rewrite rules is appropriate for any site.

Categories: FLOSS Project Planets

Morpht: Nightly CI hygiene pays off

Planet Drupal - Wed, 2024-09-11 08:00

The Morpht CI pipeline caught a recent vulnerability in Drupal core which led to the problem promptly being fixed.

Categories: FLOSS Project Planets

Real Python: Quiz: Python Virtual Environments: A Primer

Planet Python - Wed, 2024-09-11 08:00

So you’ve been primed on Python virtual environments! Test your understanding of the tutorial here.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Vector Graphics in Qt 6.8

Planet KDE - Wed, 2024-09-11 05:41

Two-dimensional vector graphics has been quite prevalent in recent Qt release notes, and it is something we have plans to continue exploring in the releases to come. This blog takes a look at some of the options you have, as a Qt developer.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: RcppSpdlog 0.0.18 on CRAN: Updates

Planet Debian - Tue, 2024-09-10 20:21

Version 0.0.18 of RcppSpdlog arrived on CRAN today and has been uploaded to Debian. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site.

This releases updates the code to the version 1.14.1 of spdlog which was released as an incremental fix to 1.14.0, and adds the ability to set log levels via the environment variable SPDLOG_LEVEL.

The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.18 (2024-09-10)

Upgraded to upstream release spdlog 1.14.1
Minor packaging upgrades
Allow logging levels to be set via environment variable SPDLOG_LEVEL

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Oliver Davies' daily list: Do you deploy on Fridays?

Planet Drupal - Tue, 2024-09-10 20:00

Do you deploy changes to production on Fridays?

Some people don't as they're worried about potential issues occuring over the weekend.

When's the last time you did a deployment which caused a issue 24 or 48 hours later?

In my experience, most issues are visible immediately or shortly after a deployment and not days later.

Deploying on a Friday may not be as risky as you think.

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Mario Hernandez: Migrating your Drupal theme from Patternlab to Storybook

Mario Hernandez: Responsive images in Drupal - a series

Oliver Davies' daily list: When did you last deploy to production?

Python⇒Speed: It's time to stop using Python 3.8

Python⇒Speed: When should you upgrade to Python 3.13?

Plasma Wayland Protocols 1.14.0

Akademy 2024 - The Akademy of Many Changes

KDE Gear 24.08.1

Copyright law makes a case for requiring data information rather than open datasets for Open Source AI

Glyph Lefkowitz: Python macOS Framework Builds

ImageX: DrupalCon Barcelona 2024: Top Session Picks from Our Team

FSF Events: Pick up some Sourceware infrastructure tips and tricks with Ian Kelling at GNU Cauldron in Prague on September 16

A Tale of Wine Labels and Open Source Contributions

Real Python: How to Use Conditional Expressions With NumPy where()

Jamie McClelland: MariaDB mystery

Morpht: Nightly CI hygiene pays off

Real Python: Quiz: Python Virtual Environments: A Primer

Vector Graphics in Qt 6.8

Dirk Eddelbuettel: RcppSpdlog 0.0.18 on CRAN: Updates

Oliver Davies' daily list: Do you deploy on Fridays?

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research