Feeds
Bits from Debian: Bits from the DPL
Dear Debian community,
this is Bits from DPL for October. In addition to a summary of my recent activities, I aim to include newsworthy developments within Debian that might be of interest to the broader community. I believe this provides valuable insights and foster a sense of connection across our diverse projects. Also, I welcome your feedback on the format and focus of these Bits, as community input helps shape their value.
Ada Lovelace Day 2024As outlined in my platform, I'm committed to increasing the diversity of Debian developers. I hope the recent article celebrating Ada Lovelace Day 2024–featuring interviews with women in Debian–will serve as an inspiring motivation for more women to join our community.
MiniDebConf CambridgeThis was my first time attending the MiniDebConf in Cambridge, hosted at the ARM building. I thoroughly enjoyed the welcoming atmosphere of both MiniDebCamp and MiniDebConf. It was wonderful to reconnect with people who hadn't made it to the last two DebConfs, and, as always, there was plenty of hacking, insightful discussions, and valuable learning.
If you missed the recent MiniDebConf, there's a great opportunity to attend the next one in Toulouse. It was recently decided to include a MiniDebCamp beforehand as well.
FTPmaster accepts MRs for DAKAt the recent MiniDebConf in Cambridge, I discussed potential enhancements for DAK to make life easier for both FTP Team members and developers. For those interested, the document "Hacking on DAK" provides guidance on setting up a local DAK instance and developing patches, which can be submitted as MRs.
As a perfectly random example of such improvements some older MR, "Add commands to accept/reject updates from a policy queue" might give you some inspiration.
At MiniDebConf, we compiled an initial list of features that could benefit both the FTP Team and the developer community. While I had preliminary discussions with the FTP Team about these items, not all ideas had consensus. I aim to open a detailed, public discussion to gather broader feedback and reach a consensus on which features to prioritize.
- Accept+Bug report
Sometimes, packages are rejected not because of DFSG-incompatible licenses but due to other issues that could be resolved within an existing package (as discussed in my DebConf23 BoF, "Chatting with ftpmasters"[1]). During the "Meet the ftpteam" BoF (Log/transcription of the BoF can be found here), for the moment until the MR gets accepted, a new option was proposed for FTP Team members reviewing packages in NEW:
Accept + Bug Report
This option would allow a package to enter Debian (in unstable or experimental) with an automatically filed RC bug report. The RC bug would prevent the package from migrating to testing until the issues are addressed. To ensure compatibility with the BTS, which only accepts bug reports for existing packages, a delayed job (24 hours post-acceptance) would file the bug.- Binary name changes - for instance if done to experimental not via new
When binary package names change, currently the package must go through the NEW queue, which can delay the availability of updated libraries. Allowing such packages to bypass the queue could expedite this process. A configuration option to enable this bypass specifically for uploads to experimental may be useful, as it avoids requiring additional technical review for experimental uploads.
Previously, I believed the requirement for binary name changes to pass through NEW was due to a missing feature in DAK, possibly addressable via an MR. However, in discussions with the FTP Team, I learned this is a matter of team policy rather than technical limitation. I haven't found this policy documented, so it may be worth having a community discussion to clarify and reach consensus on how we want to handle binary name changes to get the MR sensibly designed.
- Remove dependency tree
When a developer requests the removal of a package – whether entirely or for specific architectures – RM bugs must be filed for the package itself as well as for each package depending on it. It would be beneficial if the dependency tree could be automatically resolved, allowing either:
a) the DAK removal tooling to remove the entire dependency tree after prompting the bug report author for confirmation, or b) the system to auto-generate corresponding bug reports for all packages in the dependency tree.The latter option might be better suited for implementation in an MR for reportbug. However, given the possibility of large-scale removals (for example, targeting specific architectures), having appropriate tooling for this would be very beneficial.
In my opinion the proposed DAK enhancements aim to support both FTP Team members and uploading developers. I'd be very pleased if these ideas spark constructive discussion and inspire volunteers to start working on them--possibly even preparing to join the FTP Team.
On the topic of ftpmasters: an ongoing discussion with SPI lawyers is currently reviewing the non-US agreement established 22 years ago. Ideally, this review will lead to a streamlined workflow for ftpmasters, removing certain hurdles that were originally put in place due to legal requirements, which were updated in 2021.
Contacting teamsMy outreach efforts to Debian teams have slowed somewhat recently. However, I want to emphasize that anyone from a packaging team is more than welcome to reach out to me directly. My outreach emails aren't following any specific orders--just my own somewhat naïve view of Debian, which I'm eager to make more informed.
Recently, I received two very informative responses: one from the Qt/KDE Team, which thoughtfully compiled input from several team members into a shared document. The other was from the Rust Team, where I received three quick, helpful replies–one of which included an invitation to their upcoming team meeting.
Interesting readings on our mailing listsI consider the following threads on our mailing list some interesting reading and would like to add some comments.
Sensible languages for younger contributorsThough the discussion on debian-devel about programming languages took place in September, I recently caught up with it. I strongly believe Debian must continue evolving to stay relevant for the future.
"Everything must change, so that everything can stay the same." -- Giuseppe Tomasi di Lampedusa, The Leopard
I encourage constructive discussions on integrating programming languages in our toolchain that support this evolution.
Concerns regarding the "Open Source AI Definition"A recent thread on the debian-project list discussed the "Open Source AI Definition". This topic will impact Debian in the future, and we need to reach an informed decision. I'd be glad to see more perspectives in the discussions−particularly on finding a sensible consensus, understanding how FTP Team members view their delegated role, and considering whether their delegation might need adjustments for clarity on this issue.
Kind regards Andreas.
ImageX: AI in Drupal: Latest Demos of the Incredible Capabilities
Authored by Nadiia Nykolaichuk.
AI is shifting our perception of the impossible. It does things that past generations would have never imagined. Indeed, it has long since become a routine to ask AI assistants to play music, turn on the lights, or even order groceries. With the advance of generative AI, boosting content management through various AI-driven tasks has also become increasingly common.
ImageX: AI in Drupal: Latest Demos of the Incredible Capabilities
Authored by Nadiia Nykolaichuk.
AI is shifting our perception of the impossible. It does things that past generations would have never imagined. Indeed, it has long since become a routine to ask AI assistants to play music, turn on the lights, or even order groceries. With the advance of generative AI, boosting content management through various AI-driven tasks has also become increasingly common.
ClearlyDefined at SOSS Fusion 2024: a collaborative solution to Open Source license compliance
This past month, the Open Source Security Foundation (OpenSSF) hosted SOSS Fusion in Atlanta, an event that brought together a diverse community of leaders and innovators from across the digital security spectrum. The conference, held on October 22-23, explored themes central to today’s technological landscape: AI security, diversity in technology, and public policy for Open Source software. Industry thought leaders like Bruce Schneier, Marten Mickos, and Cory Doctorow delivered keynotes, setting the tone for a conference that emphasized collaboration and community in creating a secure digital future.
Amidst these pressing topics, the Open Source Initiative in collaboration with GitHub and SAP presented ClearlyDefined—an innovative project aimed at simplifying software license compliance and metadata management. Presented by Nick Vidal of the Open Source Initiative, along with E. Lynette Rayle from GitHub and Qing Tomlinson from SAP, the session highlighted how ClearlyDefined is transforming the way organizations handle licensing compliance for Open Source components.
What is ClearlyDefined?ClearlyDefined is a project with a powerful vision: to create a global crowdsourced database of license metadata for every software component ever published. This ambitious mission seeks to help organizations of all sizes easily manage compliance by providing accurate, up-to-date metadata for Open Source components. By offering a single, reliable source for license information, ClearlyDefined enables organizations to work together rather than in isolation, collectively contributing to the metadata that keeps Open Source software compliant and accessible.
The problem: redundant and inconsistent license managementIn today’s Open Source ecosystem, managing software licenses has become a significant challenge. Many organizations face the repetitive task of identifying, correcting, and maintaining accurate licensing data. When one component has missing or incorrect metadata, dozens—or even hundreds—of organizations using that component may duplicate efforts to resolve the same issue. ClearlyDefined aims to eliminate redundancy by enabling a collaborative approach.
The solution: crowdsourcing compliance with ClearlyDefinedClearlyDefined provides an API and user-friendly interface that make it easy to access and contribute license metadata. By aggregating and standardizing licensing data, ClearlyDefined offers a powerful solution for organizations to enhance SBOMs (Software Bill of Materials) and license information without the need for extensive re-scanning and data correction. At the conference, Nick demonstrated how developers can quickly retrieve license data for popular libraries using a simple API call, making license compliance seamless and scalable.
In addition, organizations that encounter incomplete or incorrect metadata can easily update it through ClearlyDefined’s platform, creating a feedback loop that benefits the entire Open Source community. This crowdsourcing approach means that once an organization fixes a licensing issue, that data becomes available to all, fostering efficiency and accuracy.
Key components of ClearlyDefined’s platform1. API and User Interface: Users can access ClearlyDefined data through an API or the website, making it simple for developers to integrate license checks directly into their workflows.
2. Human curation and community collaboration: To ensure high data quality, ClearlyDefined employs a curation workflow. When metadata requires updates, community members can submit corrections that go through a human review process, ensuring accuracy and reliability.
3. Integration with popular package managers: ClearlyDefined supports various package managers, including npm and pypi, and has recently expanded to support Conda, a popular choice among data science and AI developers.
Real-world use cases: GitHub and SAP’s adoption of ClearlyDefinedDuring the presentation, representatives from GitHub and SAP shared how ClearlyDefined has impacted their organizations.
– GitHub: ClearlyDefined’s licensing data powers GitHub’s compliance solutions, allowing GitHub to manage millions of licenses with ease. Lynette shared how they initially onboarded over 17 million licenses through ClearlyDefined, a number that has since grown to over 40 million. This database enables GitHub to provide accurate compliance information to users, significantly reducing the resources required to maintain licensing accuracy. Lynette showcased the harvesting process and the curation process. More details about how GitHub is using ClearlyDefined is available here.
– SAP: Qing discussed how ClearlyDefined’s approach has streamlined SAP’s Open Source compliance efforts. By using ClearlyDefined’s data, SAP reduced the time spent on license reviews and improved the quality of metadata available for compliance checks. SAP’s internal harvesting service integrates with ClearlyDefined, ensuring that critical license metadata is consistently available and accurate. SAP has contributed to the ClearlyDefined project and most notably, together with Microsoft, has optimized the database schema and reduced the database operational cost by more than 90%. More details about how SAP is using ClearlyDefined is available here.
Why ClearlyDefined mattersClearlyDefined is a community-driven initiative with a vision to address one of Open Source’s biggest challenges: ensuring accurate and accessible licensing metadata. By centralizing and standardizing this data, ClearlyDefined not only reduces redundant work but also fosters a collaborative approach to license compliance.
The platform’s Open Source nature and integration with existing package managers and APIs make it accessible and scalable for organizations of all sizes. As more contributors join the effort, ClearlyDefined continues to grow, strengthening the Open Source community’s commitment to compliance, security, and transparency.
Join the ClearlyDefined communityClearlyDefined is always open to new contributors. With weekly developer meetings, an open governance model, and continuous collaboration with OpenSSF and other Open Source organizations, ClearlyDefined provides numerous ways to get involved. For anyone interested in shaping the future of license compliance and data quality in Open Source, ClearlyDefined offers an exciting opportunity to make a tangible impact.
At SOSS Fusion, ClearlyDefined’s presentation showcased how an open, collaborative approach to license compliance can benefit the entire digital ecosystem, embodying the very spirit of the conference: working together toward a secure, inclusive, and sustainable digital future.
Download slides and see summarized presentation transcript below.
ClearlyDefined presentation transcriptHello, folks, good morning! Let’s start by introducing ClearlyDefined, an exciting project. My name is Nick Vidal, and I work with the Open Source Initiative. With me today are Lynette Rayle from GitHub and Qing Tomlinson from SAP, and we’re all very excited to be here.
Introduction to ClearlyDefined’s mission
So, what’s the mission of ClearlyDefined? Our mission is ambitious—we aim to crowdsource a global database of license metadata for every software component ever published. This would benefit everyone in the Open Source ecosystem.
The problem ClearlyDefined addresses
There’s a critical problem in the Open Source space: compliance and managing SBOMs (Software Bill of Materials) at scale. Many organizations struggle with missing or incorrect licensing metadata for software components. When multiple organizations use a component with incomplete or wrong license metadata, they each have to solve it individually. ClearlyDefined offers a solution where, instead of every organization doing redundant work, we can collectively work on fixing these issues once and make the corrected data available to all.
ClearlyDefined’s solution
ClearlyDefined enables organizations to access license metadata through a simple API. This reduces the need for repeated license scanning and helps with SBOM generation at scale. When issues arise with a component’s license metadata, organizations can contribute fixes that benefit the entire community.
Getting started with ClearlyDefined
To use ClearlyDefined, you can access its API directly from your terminal. For example, let’s say you’re working with a JavaScript library like Lodash. By calling the API, you can get all license metadata for a specific version of Lodash at your fingertips.
Once you incorporate this licensing metadata into your workflow, you may notice some metadata that needs updating. You can curate that data and contribute it back, so everyone benefits. ClearlyDefined also provides a user-friendly interface for this, making it easier to contribute.
Open Source and community contributions
ClearlyDefined is an Open Source initiative, hosted on GitHub, supporting various package managers (e.g., npm, pypi). We work to promote best practices and integrate with other tools. Recently, we’ve expanded our scope to support non-SPDX licenses and Conda, a package manager often used in data science projects.
Integration with other tools
ClearlyDefined integrates with GUAC, an OpenSSF project that consumes ClearlyDefined data. This integration broadens the reach and utility of ClearlyDefined’s licensing information.
Case studies and community impact
I’d like to hand it over to Lynette from GitHub, who will talk about how GitHub uses ClearlyDefined and why it’s critical for license compliance.
GitHub’s use of ClearlyDefined
Hello, I’m Lynette, a developer at GitHub working on license compliance solutions. ClearlyDefined has become a key part of our workflows. Knowing the licenses of our dependencies is crucial, as legal compliance requires correct attributions. By using ClearlyDefined, we’ve streamlined our process and now manage over 40 million licenses. We also run our own harvester to contribute back to ClearlyDefined and scale our operations.
SAP’s adoption of ClearlyDefined
Hi, my name is Qing. At SAP, we co-innovate and collaborate with Open Source, ensuring a clean, well-maintained software pool. ClearlyDefined has streamlined our license review process, reducing time spent on scanning and enhancing data quality. SAP’s journey with ClearlyDefined began in 2018, and since then, we’ve implemented large-scale automation for our Open Source compliance and continuously contribute curated data back to the community.
Community and governance
ClearlyDefined thrives on community involvement. We recently elected members to our Steering and Outreach Committees to support the platform and encourage new contributors. Our weekly developer meetings and active Discord channel provide opportunities to engage, share knowledge, and collaborate.
Q&A highlights
- PURLs as Package Identifiers: We’re exploring support for PURLs as an internal coordinate system.
- Data Quality Issues: Data quality is our top priority. We plan to implement routines to scan for common issues, ensuring accurate metadata across the platform.
Thank you all for joining us today. If you’re interested in contributing, please reach out and become part of this collaborative community.
Members Newsletter – November 2024
After more than two years of collaboration, information gathering, global workshopping, testing, and an in-depth co-design process, we have an Open Source AI Definition.
The purpose of version 1.0 is to establish a workable standard for developers, researchers, and educators to consider how they may design evaluations for AI systems’ openness. The meaningful ability to fork and control their AI will foster permissionless, global innovation. It was important to drive a stake in the ground so everyone has something to work with. It’s version 1.0, so going forward, the process allows for improvement, and that’s exactly what will happen.
Over 150 individuals were part of the OSAID forum, nearly 15K subscribers to the OSI newsletter were kept up-to-date with the latest news about the OSAID, 2M unique visitors to the OSI website were exposed to the OSAID process. There were 50+ co-design working group volunteers representing 29 countries, including participants from Africa, Asia, Europe, and the Americas.
Future versions of OSAID will continue to be informed by the feedback we receive from various stakeholder communities. The fundamental principles and aim will not change, but, as our (collective) understanding of the technology improves and technology itself evolves, we might need to update to clarify or even change certain requirements. To enable this, the OSI Board voted to establish an AI sub-committee who will develop appropriate mechanisms for updating the OSAID in consultation with stakeholders. It will be fully formed in the months ahead.
Please continue to stay involved, as diverse voices and experiences are required to ensure Open Source AI works for the good of us all.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI The Open Source Initiative Announces the Release of the Industry’s First Open Source AI DefinitionOpen and public co-design process culminates in a stable version of Open Source AI Definition, ensures freedoms to use, study, share and modify AI systems.
Other highlights:
- How we passed the AI conundrums
- ClearlyDefined at SOSS Fusion 2024
- ClearlyDefined’s Steering and Outreach Committees Defined
- The Open Source Initiative Supports the Open Source Pledge
Article from ZDNet
For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them.
Other highlights:
- The Gap Between Open and Closed AI Models Might Be Shrinking. Here’s Why That Matters (Time)
- Meta’s military push is as much about the battle for open-source AI as it is about actual battles (Fortune)
- OSI unveils Open Source AI Definition 1.0 (InfoWorld)
- We finally have an ‘official’ definition for open source AI (TechCrunch)
- Read all press mentions from this past month
News from OSI affiliates:
- OpenSSF: SOSS Fusion 2024: Uniting Security Minds for the Future of Open Source (Security Boulevard)
- Mozilla Foundation: How Mozilla’s President Defines Open-Source AI (Forbes)
News from OpenSource.net:
- OpenSource.Net turns one with a redesign
- How to make reviewing pull requests a better experience
- Closing the Gap: Accelerating environmental Open Source
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
Bloomberg is seeking a Technical Architect to join their OSPO team.
EventsUpcoming events:
- Nerdearla Mexico (November 7-9, 2024 – Mexico City)
- SeaGL (November 8-9, 2024 – Seattle)
- SFSCON (November 8-9, 2024 – Bolzano)
- KubeCon + CloudNativeCon North America (November 12-15, 2024 – Salt Lake City)
- OpenForum Academy Symposium (November, 13-14, 2024 – Boston)
- The Linux Foundation Legal Summit (November 18-19, 2024 – Napa)
- The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- FOSDEM 2025 EU-Policy Devroom – event being organized by the OSI, OpenForum Europe, Eclipse Foundation, The European Open Source Software Business Association, the European Commission Open Source Programme Office, and the European Commission.
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
mark.ie: LocalGov Drupal (LGD): A Digital Public Good Transforming Government Services
LocalGov Drupal is the epitome of the principles of a Digital Public Good.
Drupal In the News: Drupal CMS: Groundbreaking New Version of Drupal Detailed at DrupalCon Singapore 2024
MARINA BAY, Singapore, 6 November, 2024—Drupal CMS, the groundbreaking package built on Drupal core with the marketer in mind, will launch on 15 January 2025. Conference attendees at DrupalCon Singapore 2024 will have the exclusive opportunity to be the first to learn more about Drupal CMS directly from Drupal’s founder, Dries Buytaert.
Learn how Drupal CMS will enable site builders without any Drupal experience to easily create a new site using their browser, marking one of the most significant launches in Drupal history.
Alongside the Drupal Association leadership team, Dries will unveil key features of Drupal CMS, making DrupalCon Singapore 2024 a can’t-miss event for anyone in the Open Source community. Occurring one month before the release of Drupal CMS, DrupalCon Singapore 2024 is an exclusive opportunity for attendees to join in the conversation surrounding Drupal CMS directly with its creators.
“The product strategy is for Drupal CMS to be the gold standard for no-code website building,” said Dries. “Our goal is to empower non-technical users like digital marketers, content creators, and site-builders to create exceptional digital experiences without requiring developers.”
DrupalCon Singapore 2024, 9-11 December 2024, is a premier gathering of Drupal and Open Source professionals. Over three days, the conference will showcase the latest Drupal trends, facilitate networking opportunities, and offer a platform for thought leadership in the Open Source landscape.
Key features of DrupalCon Singapore 2024 include:
- Keynotes, sessions, and panels: The Driesnote and Drupal CMS Panel are two highlights amongst a packed schedule of insightful sessions.
- Contribution Day: Contribution Day is where attendees grow and learn by helping to make Drupal even better. Giving back to the project is crucial in an Open Source community, as the Drupal project is developed by a community of people who work together to innovate the software.
- Birds of a Feather (BoFs): BoFs provide the perfect setting for connecting with like-minded attendees who share your interests.
- Splash Awards: Celebrate the work and creativity of the global Drupal community with this awards ceremony, which recognises outstanding projects built with Drupal.
- Networking Opportunities: Network with experts from around the globe who create ambitious digital experiences.
Register for DrupalCon Singapore 2024 at https://events.drupal.org/singapore2024 and join the next chapter in Drupal’s evolution!
Real Python: How to Reset a pandas DataFrame Index
In this tutorial, you’ll learn how to reset a pandas DataFrame index, the reasons why you might want to do this, and the problems that could occur if you don’t.
Before you start your learning journey, you should familiarize yourself with how to create a pandas DataFrame. Knowing the difference between a DataFrame and a pandas Series will also prove useful to you.
In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.
As a starting point, you’ll need some data. To begin with, you’ll use the band_members.csv file included in the downloadable materials that you can access by clicking the link below:
Get Your Code: Click here to download the free sample code you’ll use to learn how to reset a pandas DataFrame index.
The table below describes the data from band_members.csv that you’ll begin with:
Column Name PyArrow Data Type Description first_name string First name of member last_name string Last name of member instrument string Main instrument played date_of_birth string Member’s date of birthAs you’ll see, the data has details of the members of the rock band The Beach Boys. Each row contains information about its various members both past and present.
Note: In case you’ve never heard of The Beach Boys, they’re an American rock band formed in the early 1960s.
Throughout this tutorial, you’ll be using the pandas library to allow you to work with DataFrames, as well as the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types that pandas uses by default.
If you’re working at the command line, you can install both pandas and pyarrow using the single command python -m pip install pandas pyarrow. If you’re working in a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. Regardless, you should do this within a virtual environment to avoid clashes with the libraries you use in your global environment.
Once you have the libraries in place, it’s time to read your data into a DataFrame:
Python >>> import pandas as pd >>> beach_boys = pd.read_csv( ... "band_members.csv" ... ).convert_dtypes(dtype_backend="pyarrow") Copied!First, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the beach_boys variable, you used pandas’ read_csv() function, passing band_members.csv as the file to read. Finally, by passing dtype_backend="pyarrow" to .convert_dtypes() you convert all columns to pyarrow types.
If you want to verify that pyarrow data types are indeed being used, then beach_boys.dtypes will satisfy your curiosity:
Python >>> beach_boys.dtypes first_name string[pyarrow] last_name string[pyarrow] instrument string[pyarrow] date_of_birth string[pyarrow] dtype: object Copied!As you can see, each data type contains [pyarrow] in its name.
If you wanted to analyze the date information thoroughly, then you would parse the date_of_birth column to make sure dates are read as a suitable pyarrow date type. This would allow you to analyze by specific days, months or years, and so on, as commonly found in pivot tables.
The date_of_birth column is not analyzed in this tutorial, so the string data type it’s being read as will do. Later on, you’ll get the chance to hone your skills with some exercises. The solutions include the date parsing code if you want to see how it’s done.
Now that the file has been loaded into a DataFrame, you’ll probably want to take a look at it:
Python >>> beach_boys first_name last_name instrument date_of_birth 0 Brian Wilson Bass 20-Jun-1942 1 Mike Love Saxophone 15-Mar-1941 2 Al Jardine Guitar 03-Sep-1942 3 Bruce Johnston Bass 27-Jun-1942 4 Carl Wilson Guitar 21-Dec-1946 5 Dennis Wilson Drums 04-Dec-1944 6 David Marks Guitar 22-Aug-1948 7 Ricky Fataar Drums 05-Sep-1952 8 Blondie Chaplin Guitar 07-Jul-1951 Copied!DataFrames are two-dimensional data structures similar to spreadsheets or database tables. A pandas DataFrame can be considered a set of columns, with each column being a pandas Series. Each column also has a heading, which is the name property of the Series, and each row has a label, which is referred to as an element of its associated index object.
The DataFrame’s index is shown to the left of the DataFrame. It’s not part of the original band_members.csv source file, but is added as part of the DataFrame creation process. It’s this index object you’re learning to reset.
The index of a DataFrame is an additional column of labels that helps you identify rows. When used in combination with column headings, it allows you to access specific data within your DataFrame. The default index labels are a sequence of integers, but you can use strings to make them more meaningful. You can actually use any hashable type for your index, but integers, strings, and timestamps are the most common.
Note: Although indexes are certainly useful in pandas, an alternative to pandas is the new high-performance Polars library, which eliminates them in favor of row numbers. This may come as a surprise, but aside from being used for selecting rows or columns, indexes aren’t often used when analyzing DataFrames. Also, row numbers always remain sequential when rows are added or removed in a Polars DataFrame. This isn’t the case with indexes in pandas.
Read the full article at https://realpython.com/pandas-reset-index/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Julien Tayon: The crudest CRUD of them all : the smallest CRUD possible in 150 lines of python
For this to begin, I am not really motivated in beginning with a full fledged MVC (Model View Controller) à la django because there is a lot of boilerplates and actions to do before a result. But, it has a lot of feature I want, including authentication, authorization and handling security.
For prototypes we normally flavours lightweight framework (à la flask), and CRUD.
CRUD approach is a factorisation of all framework in a single dynamic form that adapts itself to the model to generate HTML forms to input data, tabulate, REST endpoints and search them from the python class declaration and generate the database model. One language to rule them all : PYTHON. You can easily generate even the javascript to handle autocompletion on the generated view from python with enough talent.
But before using a CRUD framework, we need a cruder one, ugly, disgusting but useful for a human before building the REST APIs, writing the class in python, the HTML form, and the controlers.
I call this the crudest CRUD of them all.
Think hard at what you want when prototyping ...
- to write no CONTROLLERS ; flask documentation has a very verbose approach to exposing routes and writing them, writing controller for embasing and searching databases is boring
- to write the fewer HTML views possible, one and only onle would be great ;
- to avoid having to fiddle the many files reflecting separation of concerns : the lesser python files and class you touch the better;
- to avoid having to write SQL nor use an ORM (at least a verbose declarative one) ;
- show me your code and you can mesmerize and even fool me, however show me your data structure and I'll know everthing I have to know about your application : data structure should be under your nose in a readable fashion in the code;/
- to have AT LEAST one end point for inserting and searching so that curl can be used to begin automation and testing, preferably in a factorisable fashion;
- only one point of failure is accepted
Once we set these few condition we see whatever we do WE NEED a dynamic http server at the core. Python being the topic here, we are gonna do it in python.
What is the simplest dynamic web server in python ?
The reference implementation of wsgi that is the crudest wsgi server of them all : wsgiref. And you don't need to download it since it's provided in python stdlib.
First thing first, we are gonna had a default view so that we can serve an HTML static page with the list of the minimal HTML we need to interact with data : sets of input and forms.
Here, we stop. And we see that these forms are describing the data model.
Wouldn't it be nice if we could parse the HTML form easily with a tool from the standard library : html.parser and maybe deduce the database model and even more than fields coud add relationship, and well since we are dreaming : what about creating the tables on the fly from the form if they don't exists ?
The encoding of the relationship do require an hijack of convention where when the parser cross a name of the field in the form whatever_id it deduces it is a foreign key to table « whatever », column « id ».
Once this is done, we can parse the html, do some magick to match HTML input types to database types (adapter) and it's almost over. We can even dream of creating the database if it does not exists in a oneliner for sqlite.
We just need to throw away all the frugality of dependencies by the window and spoil our karma of « digital soberty » by adding the almighty sqlalchemy the crudest (but still heavy) ORM when it comes of the field of the introspective features of an ORM to map a database object to a python object in a clear consistent way. With this, just one function is needed in the controller to switch from embasing (POST method) and searching (GET).
Well, if the DOM is passed in the request. So of course I see the critics here :
- we can't pass the DOM in the request because the HTML form ignores the DOM
- You are not scared of error 415 (request too large) in the get method if you pass the DOM ?
Since we are human we would also like the form to be readable when served, because, well, human don't read the source and can't see the name attributes of the input. A tad of improving the raw html would be nice. It would also give consistency. It will also diminishes the required size of the formular to send. Here, javascript again is the right anwser. Fine, we serve the static page in the top of the controller. Let's use jquery to make it terse enough. Oh, if we have Javascript, wouldn't il be able to clone the part of the invented model tag inside every form so now we can pass the relevant part of the DOM to the controller ?
I think we have everything to write the crudest CRUD server of them all :D
Happy code reading : import multipart from wsgiref.simple_server import make_server from json import dumps from sqlalchemy import create_engine, MetaData, Table, Column from sqlalchemy import Integer, String, Float, Date, DateTime,UnicodeText, ForeignKey from html.parser import HTMLParser from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session from sqlalchemy import select from sqlalchemy import create_engine from sqlalchemy_utils import database_exists, create_database from urllib.parse import parse_qsl, urlparse engine = create_engine("postgresql://jul@192.168.1.32/pdca") if not database_exists(engine.url): create_database(engine.url) tables = dict() class HTMLtoData(HTMLParser): def __init__(self): global engine, tables self.cols = [] self.table = "" self.tables= [] self.engine= engine self.meta = MetaData() super().__init__() def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "input": if attrs.get("name") == "id": self.cols += [ Column('id', Integer, primary_key = True), ] return try: if attrs.get("name").endswith("_id"): table,_=attrs.get("name").split("_") self.cols += [ Column(attrs["name"], Integer, ForeignKey(table + ".id")) ] return except Exception as e: print(e) if attrs["type"] in ("email", "url", "phone", "text"): self.cols += [ Column(attrs["name"], UnicodeText ), ] if attrs["type"] == "number": if attrs["step"] == "any": self.cols+= [ Columns(attrs["name"], Float), ] else: self.cols+= [ Column(attrs["name"], Integer), ] if attrs["type"] == "date": self.cols += [ Column(attrs["name"], Date) ] if attrs["type"] == "datetime": self.cols += [ Column(attrs["name"], DateTime) ] if attrs["type"] == "time": self.cols += [ Column(attrs["name"], Time) ] if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): if tag=="form": self.tables += [ Table(self.table, self.meta, *self.cols), ] tables[self.table] = self.tables[-1] self.table = "" self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() html = """ <!doctype html> <html> <head> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.7.1/jquery.min.js"></script> <script> $(document).ready(function() { $("form").each((i,el) => { $(el).wrap("<fieldset>"+ el.action + "</fieldset>" ); $(el).append("<input type=submit value=insert formmethod=post ><input type=submit value=search formmethod=get />"); }); $("input:not([type=hidden],[type=submit])").each((i,el) => { $(el).before("<label>" + el.name+ "</label><br/>"); $(el).after("<br>"); }); }); </script> </head> <body> <form action=/user > <input type=number name=id /> <input type=text name=name /> <input type=email name=email > </form> <form action=/event > <input type=number name=id /> <input type=date name=date /> <input type=text name=text /> <input type=number name=user_id /> </form> </body> </html> """ router = dict({"" : lambda fo: html,}) def simple_app(environ, start_response): fo,fi=multipart.parse_form_data(environ) fo.update(**{ k: dict( name=fi.filename, content=fi.file.read().decode('utf-8', 'backslashreplace'), content_type=fi.content_type, ) for k,v in fi.items()}) table = route = environ["PATH_INFO"][1:] fo.update(**dict(parse_qsl(environ["QUERY_STRING"]))) start_response('200 OK', [('Content-type', 'text/html; charset=utf-8')]) try: HTMLtoData().feed(html) except KeyError: pass metadata = MetaData() metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() if route in tables.keys(): with Session(engine) as session: Item = getattr(Base.classes, table) if environ.get("REQUEST_METHOD", "GET") == "POST": new_item = Item(**{ k:v for k,v in fo.items() if v and not k.startswith("_")}) session.add(new_item) ret=session.commit() fo["insert_result"] = new_item.id if environ.get("REQUEST_METHOD") == "GET": result = [] for elt in session.execute( select(Item).filter_by(**{ k : v for k,v in fo.items() if v and not k.startswith("_")})).all(): result += [{ k.name:getattr(elt[0],k.name) for k in tables[table].columns}] fo["search_result"] = result return [ router.get(route,lambda fo:dumps(fo.dict, indent=4, default=str))(fo).encode() ] print("Crudest CRDU of them all on port 5000...") make_server('', 5000, simple_app).serve_forever()
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
Daniel Lange: Weird times ... or how the New York DEC decided the US presidential elections
November 2024 will be known as the time when killing peanut, a pet squirrel, by the New York State DEC swung the US presidential elections and shaped history forever.
The hundreds of millions of dollars spent on each side, the tireless campaigning by the candidates, the celebrity endorsements ... all made for an open race for months. Investments evened each other out.
But an OnlyFans producer showing people an overreaching, bureaucracy driven State raiding his home to confiscate a pet squirrel and kill it ... swung enough voters to decide the elections.
That is what we need to understand in times of instant worldwide publication and a mostly attention driven economy: Human fates, elections, economic cycles and wars can be decided by people killing squirrels.
RIP, peanut.
P.S.: Trump Media & Technology Group Corp. (DJT) stock is up 30% pre-market.
Little Wayland Things
While I do have a Qt git build on my machine that I use for development, I usually only test individual applications and functionality but hardly ever run my full Plasma session on it. This means that for day-to-day use I typically only get to enjoy new Qt features once they have actually been released.
Proper modal dialogs under Wayland (note the darkened editor window) thanks to XDG Dialog and the new Qt 6.8One feature I talked about in the very last issue of “On the road to Plasma 6” is a nice API for XDG Foreign. To recap: it’s a Wayland protocol that lets an application export a window to another one so it can can attach a window to it. For example, the XDG Desktop Portal wants to attach the “Open File” dialog as if it were coming from the application that requested it.
Of course we don’t want to write low-level Wayland code and instead have an easy to use API for it. The KWindowSystem::setMainWindow function does just that: hand in a window and the token you received from the other application (created through KWaylandExtras::exportWindow) and it takes care of everything else. Presumably, you want to set the parent window before showing your dialog to make absolutely sure it’s set up properly.
However, Qt did not have an API to tell us when the underlying XDG Toplevel (think: a regular desktop-y window with a title bar and what not) had been created. We were only told when the basic wl_surface was created, which was too early, or the window was exposed/shown, at which point it was already flashing up in the user’s task bar. Hence, I added a new QWaylandWindow::surfaceRoleCreated (and corresponding surfaceRoleDestroyed) signal. Utilizing that, the aforementioned KWindowSystem API now works perfectly.
Another major addition to Qt Wayland that I have been looking forward to very much is support for the XDG Dialog protocol. While a window could have always had a parent (e.g. a popup menu or settings dialog parented to the application’s main window), there was no concept of a “modal” dialog. Therefore, we did not support the “dim parent” effect under Wayland that darkens a window to indicate it cannot be interacted with. More importantly, KWin couldn’t take it into account for its focus handling either. It happily let you focus a blocked window but the application would then just ignore your input.
There’s only one Dolphin running here!This was most noticeable for me when Alt+Tab’ing back and forth, for example using the “Open File” dialog in one application and then trying to switch to the other to verify where the file was actually located. Instead of cycling between the file dialog and the other application, it would alternate between the file dialog and the blocked main window.
Sadly, even when I upgraded to Qt 6.8 the situation didn’t improve. I noticed that Alt+Tab actually showed the dialog twice. This looked like a bug and sure enough comparing it to the Plasma 5.27 LTS session on my other computer proved that it used to work at some point. At first I didn’t spot anything obvious until I noticed a small typo that must have slipped in during some major refactoring. Instead of not including the main window when it had a modal child, it included the modal child once again! Sure enough, adding an exclamation mark (the logical NOT operator in C++) did the trick.
If you want to support more good people such as myself, consider donating to the KDE End of Year Fundraiser!
drunomics: Low-code + Decoupled Drupal: The Power of Custom Elements 3.0
Matt Layman: Deploy Your Own Web App With Kamal 2
HDR and color management in KWin, part 5: HDR on SDR laptops
This one required a few other features to be implemented first, so let’s jump right in.
Matching reference luminancesA big part of what a desktop compositor needs to get right with HDR content is to show SDR and HDR content properly side by side. KWin 6.0 added an SDR brightness slider for that purpose, but that’s only half the equation - what about the brightness of HDR content?
When we say “HDR”, usually that refers to a colorspace with the rec.2020 primaries and the perceptual quantizer (PQ) transfer function. A transfer function describes how to calculate a real brightness value from the “electrical” signal encoded in the content - PQ specifically has encoded values from 0 to 1 and brightness values from 0 to 10000 nits. For reference, your typical office monitor does around 300 or 400 nits at maximum brightness setting, and many newer phones can go a bit above 1000 nits.
Now if we want to show HDR content on an HDR screen, the most straight forward thing to do would be to just calculate the brightness values, write them to the screen and be done with it, right? That’s what KWin did up to Plasma 6.1, but it’s far from ideal. Even if your display can show the full range of requested brightness values, you might want to adjust the brightness to match your environment - be it brighter or darker than the room the content was optimized for - and when there’s SDR things in HDR content, like subtitles in a video, that should ideally match other SDR content on the screen as well.
Luckily, there is a preexisting relationship between HDR and SDR that we can use: The reference luminance. It defines how bright SDR white is - which is why another name for it is simply “SDR white”.
As we want to keep the brightness slider working, we won’t map SDR content to the reference luminance of any HDR transfer function though, but instead we map both SDR and HDR content to the SDR brightness setting. If we have an HDR video that uses the PQ transfer function, that reference luminance is 203 nits. If your SDR brightness setting is at 406 nits, KWin will just multiply the brightness of the HDR video with a factor of 2.
This doesn’t only mean that we can make SDR and HDR content fit together nicely on HDR screens, but it also means we now know what to do when we have HDR content on an SDR screen: We map the reference luminance from the video to SDR white on the screen. That’s of course not enough to make it look nice though…
Tone mappingEspecially with HDR presented on an SDR screen, but also on many HDR screens, it will happen that the content brightness exceeds the display capabilities. To handle this, starting with Plasma 6.2, whenever the HDR metadata of the content says it’s brighter than the display can go, KWin will apply tone mapping.
Doing this tone mapping in RGB can result in changing the content quite badly though. Let’s take a look by using the most simple “tone mapping” function there is, clipping. It just limits the red, green and blue values separately to the brightness that the screen can show.
If we have a pixel with the value [2.0, 0.0, 2.0] and a maximum brightness of 1.0, that gets mapped to [1.0, 0.0, 1.0] - which is the same purple, just in darker. But if the pixel has the values [2.0, 0.0, 1.0], then that gets mapped to [1.0, 0.0, 1.0], even though the source color was significantly more red!
To fix that, KWin’s tone mapping uses ICtCp. This is a color space developed by Dolby, in which the perceived brightness (aka Intensity) is separated from the chroma components (Ct = blue-yellow, Cp = red-green), which is perfect for tone mapping. KWin’s shaders thus transform the RGB content to ICtCp, apply a brightness mapping function to only the intensity component, and then convert back to RGB.
The result of that algorithm looks like this:
RGB clipping KWin 6.2’s tone mapping MPV’s tone mappingAs you can see, there’s still some color changes going on in comparison to MPV’s algorithm; this is partially because the tone mapping curve still needs some more adjustments, and partially because we also still need to do similar mapping for colors that the screen can’t actually show. It’s already a large improvement though, and does better than the built-in tone mapping functionality in many HDR screens.
When tone mapping HDR content on SDR screens, we always end up reducing the brightness of the overall image, so that we have some brightness values to map the really bright highlights in the video to - otherwise everything just slightly over the reference luminance would look like an overexposed blob of color, as you can see in the “RGB clipping” image. There are ways around that though…
HDR on SDR laptop displaysTo explain the reasoning behind this, it helps to first have a look at what even makes a display “HDR”. In many cases it’s just marketing nonsense, a label that’s put on displays to make them seem more fancy and desirable, but in others there’s an actual tangible benefit to it.
Let’s take OLED displays as an example, as it’s considered one of the display technologies where HDR really shines. When you drive an OLED at high brightness levels, it becomes quite inefficient, it draws a lot of power and generates a lot of heat. Both of these things can only be dealt with to a limited degree, so OLED displays can generally only be used with relatively low average brightness levels. They can go a lot brighter than the average in a small part of the screen though, and that’s why they benefit so much from HDR - you can show a scene that’s on average only 200 nits bright, with the sky in the image going up to 300 nits, the sun going up to 1000 nits and the ground only doing 150 nits.
Now let’s compare that to SDR laptop displays. In the case of most LCDs, you have a single backlight LED for the whole screen, and when you move the brightness slider, the power the backlight is driven at is changed. So there’s no way to make parts of the screen brighter than the rest on a hardware level… but that doesn’t mean there isn’t a way to do it in software!
When we want to show HDR content and the brightness slider is below 100%, KWin increases the backlight level to get a peak brightness that matches the relative peak brightness of that content (as far as that’s possible). At the same time it changes the colorspace description on the output to match that change: While the reference luminance stays the same, the maximum luminance of the transfer function gets increased in proportion to the increase in backlight brightness.
The results is that SDR white gets mapped to a reduced RGB value, which is at least supposed to exactly counteract the increase of brightness that we’re applying with the backlight, while HDR content that goes beyond the reference luminance gets to use the full brightness range.
Increasing the backlight power of course doesn’t come without downsides; black levels and power usage both get increased, so this is only ever active if there’s HDR content on the screen with valid HDR metadata that signals brightness levels going beyond the reference luminance.
As always, capturing HDR content with a phone camera is quite difficult, but I think you can at least sort of see the effect:
without backlight adjustment with backlight adjustmentThis feature has been merged into KWin’s git master branch and will be available on all laptop displays starting with Plasma 6.3. I really recommend trying it for yourself once it reaches your distribution!