Feeds
ClearlyDefined at SOSS Fusion 2024: a collaborative solution to Open Source license compliance
This past month, the Open Source Security Foundation (OpenSSF) hosted SOSS Fusion in Atlanta, an event that brought together a diverse community of leaders and innovators from across the digital security spectrum. The conference, held on October 22-23, explored themes central to today’s technological landscape: AI security, diversity in technology, and public policy for Open Source software. Industry thought leaders like Bruce Schneier, Marten Mickos, and Cory Doctorow delivered keynotes, setting the tone for a conference that emphasized collaboration and community in creating a secure digital future.
Amidst these pressing topics, the Open Source Initiative in collaboration with GitHub and SAP presented ClearlyDefined—an innovative project aimed at simplifying software license compliance and metadata management. Presented by Nick Vidal of the Open Source Initiative, along with E. Lynette Rayle from GitHub and Qing Tomlinson from SAP, the session highlighted how ClearlyDefined is transforming the way organizations handle licensing compliance for Open Source components.
What is ClearlyDefined?ClearlyDefined is a project with a powerful vision: to create a global crowdsourced database of license metadata for every software component ever published. This ambitious mission seeks to help organizations of all sizes easily manage compliance by providing accurate, up-to-date metadata for Open Source components. By offering a single, reliable source for license information, ClearlyDefined enables organizations to work together rather than in isolation, collectively contributing to the metadata that keeps Open Source software compliant and accessible.
The problem: redundant and inconsistent license managementIn today’s Open Source ecosystem, managing software licenses has become a significant challenge. Many organizations face the repetitive task of identifying, correcting, and maintaining accurate licensing data. When one component has missing or incorrect metadata, dozens—or even hundreds—of organizations using that component may duplicate efforts to resolve the same issue. ClearlyDefined aims to eliminate redundancy by enabling a collaborative approach.
The solution: crowdsourcing compliance with ClearlyDefinedClearlyDefined provides an API and user-friendly interface that make it easy to access and contribute license metadata. By aggregating and standardizing licensing data, ClearlyDefined offers a powerful solution for organizations to enhance SBOMs (Software Bill of Materials) and license information without the need for extensive re-scanning and data correction. At the conference, Nick demonstrated how developers can quickly retrieve license data for popular libraries using a simple API call, making license compliance seamless and scalable.
In addition, organizations that encounter incomplete or incorrect metadata can easily update it through ClearlyDefined’s platform, creating a feedback loop that benefits the entire Open Source community. This crowdsourcing approach means that once an organization fixes a licensing issue, that data becomes available to all, fostering efficiency and accuracy.
Key components of ClearlyDefined’s platform1. API and User Interface: Users can access ClearlyDefined data through an API or the website, making it simple for developers to integrate license checks directly into their workflows.
2. Human curation and community collaboration: To ensure high data quality, ClearlyDefined employs a curation workflow. When metadata requires updates, community members can submit corrections that go through a human review process, ensuring accuracy and reliability.
3. Integration with popular package managers: ClearlyDefined supports various package managers, including npm and pypi, and has recently expanded to support Conda, a popular choice among data science and AI developers.
Real-world use cases: GitHub and SAP’s adoption of ClearlyDefinedDuring the presentation, representatives from GitHub and SAP shared how ClearlyDefined has impacted their organizations.
– GitHub: ClearlyDefined’s licensing data powers GitHub’s compliance solutions, allowing GitHub to manage millions of licenses with ease. Lynette shared how they initially onboarded over 17 million licenses through ClearlyDefined, a number that has since grown to over 40 million. This database enables GitHub to provide accurate compliance information to users, significantly reducing the resources required to maintain licensing accuracy. Lynette showcased the harvesting process and the curation process. More details about how GitHub is using ClearlyDefined is available here.
– SAP: Qing discussed how ClearlyDefined’s approach has streamlined SAP’s Open Source compliance efforts. By using ClearlyDefined’s data, SAP reduced the time spent on license reviews and improved the quality of metadata available for compliance checks. SAP’s internal harvesting service integrates with ClearlyDefined, ensuring that critical license metadata is consistently available and accurate. SAP has contributed to the ClearlyDefined project and most notably, together with Microsoft, has optimized the database schema and reduced the database operational cost by more than 90%. More details about how SAP is using ClearlyDefined is available here.
Why ClearlyDefined mattersClearlyDefined is a community-driven initiative with a vision to address one of Open Source’s biggest challenges: ensuring accurate and accessible licensing metadata. By centralizing and standardizing this data, ClearlyDefined not only reduces redundant work but also fosters a collaborative approach to license compliance.
The platform’s Open Source nature and integration with existing package managers and APIs make it accessible and scalable for organizations of all sizes. As more contributors join the effort, ClearlyDefined continues to grow, strengthening the Open Source community’s commitment to compliance, security, and transparency.
Join the ClearlyDefined communityClearlyDefined is always open to new contributors. With weekly developer meetings, an open governance model, and continuous collaboration with OpenSSF and other Open Source organizations, ClearlyDefined provides numerous ways to get involved. For anyone interested in shaping the future of license compliance and data quality in Open Source, ClearlyDefined offers an exciting opportunity to make a tangible impact.
At SOSS Fusion, ClearlyDefined’s presentation showcased how an open, collaborative approach to license compliance can benefit the entire digital ecosystem, embodying the very spirit of the conference: working together toward a secure, inclusive, and sustainable digital future.
Download slides and see summarized presentation transcript below.
ClearlyDefined presentation transcriptHello, folks, good morning! Let’s start by introducing ClearlyDefined, an exciting project. My name is Nick Vidal, and I work with the Open Source Initiative. With me today are Lynette Rayle from GitHub and Qing Tomlinson from SAP, and we’re all very excited to be here.
Introduction to ClearlyDefined’s mission
So, what’s the mission of ClearlyDefined? Our mission is ambitious—we aim to crowdsource a global database of license metadata for every software component ever published. This would benefit everyone in the Open Source ecosystem.
The problem ClearlyDefined addresses
There’s a critical problem in the Open Source space: compliance and managing SBOMs (Software Bill of Materials) at scale. Many organizations struggle with missing or incorrect licensing metadata for software components. When multiple organizations use a component with incomplete or wrong license metadata, they each have to solve it individually. ClearlyDefined offers a solution where, instead of every organization doing redundant work, we can collectively work on fixing these issues once and make the corrected data available to all.
ClearlyDefined’s solution
ClearlyDefined enables organizations to access license metadata through a simple API. This reduces the need for repeated license scanning and helps with SBOM generation at scale. When issues arise with a component’s license metadata, organizations can contribute fixes that benefit the entire community.
Getting started with ClearlyDefined
To use ClearlyDefined, you can access its API directly from your terminal. For example, let’s say you’re working with a JavaScript library like Lodash. By calling the API, you can get all license metadata for a specific version of Lodash at your fingertips.
Once you incorporate this licensing metadata into your workflow, you may notice some metadata that needs updating. You can curate that data and contribute it back, so everyone benefits. ClearlyDefined also provides a user-friendly interface for this, making it easier to contribute.
Open Source and community contributions
ClearlyDefined is an Open Source initiative, hosted on GitHub, supporting various package managers (e.g., npm, pypi). We work to promote best practices and integrate with other tools. Recently, we’ve expanded our scope to support non-SPDX licenses and Conda, a package manager often used in data science projects.
Integration with other tools
ClearlyDefined integrates with GUAC, an OpenSSF project that consumes ClearlyDefined data. This integration broadens the reach and utility of ClearlyDefined’s licensing information.
Case studies and community impact
I’d like to hand it over to Lynette from GitHub, who will talk about how GitHub uses ClearlyDefined and why it’s critical for license compliance.
GitHub’s use of ClearlyDefined
Hello, I’m Lynette, a developer at GitHub working on license compliance solutions. ClearlyDefined has become a key part of our workflows. Knowing the licenses of our dependencies is crucial, as legal compliance requires correct attributions. By using ClearlyDefined, we’ve streamlined our process and now manage over 40 million licenses. We also run our own harvester to contribute back to ClearlyDefined and scale our operations.
SAP’s adoption of ClearlyDefined
Hi, my name is Qing. At SAP, we co-innovate and collaborate with Open Source, ensuring a clean, well-maintained software pool. ClearlyDefined has streamlined our license review process, reducing time spent on scanning and enhancing data quality. SAP’s journey with ClearlyDefined began in 2018, and since then, we’ve implemented large-scale automation for our Open Source compliance and continuously contribute curated data back to the community.
Community and governance
ClearlyDefined thrives on community involvement. We recently elected members to our Steering and Outreach Committees to support the platform and encourage new contributors. Our weekly developer meetings and active Discord channel provide opportunities to engage, share knowledge, and collaborate.
Q&A highlights
- PURLs as Package Identifiers: We’re exploring support for PURLs as an internal coordinate system.
- Data Quality Issues: Data quality is our top priority. We plan to implement routines to scan for common issues, ensuring accurate metadata across the platform.
Thank you all for joining us today. If you’re interested in contributing, please reach out and become part of this collaborative community.
Members Newsletter – November 2024
After more than two years of collaboration, information gathering, global workshopping, testing, and an in-depth co-design process, we have an Open Source AI Definition.
The purpose of version 1.0 is to establish a workable standard for developers, researchers, and educators to consider how they may design evaluations for AI systems’ openness. The meaningful ability to fork and control their AI will foster permissionless, global innovation. It was important to drive a stake in the ground so everyone has something to work with. It’s version 1.0, so going forward, the process allows for improvement, and that’s exactly what will happen.
Over 150 individuals were part of the OSAID forum, nearly 15K subscribers to the OSI newsletter were kept up-to-date with the latest news about the OSAID, 2M unique visitors to the OSI website were exposed to the OSAID process. There were 50+ co-design working group volunteers representing 29 countries, including participants from Africa, Asia, Europe, and the Americas.
Future versions of OSAID will continue to be informed by the feedback we receive from various stakeholder communities. The fundamental principles and aim will not change, but, as our (collective) understanding of the technology improves and technology itself evolves, we might need to update to clarify or even change certain requirements. To enable this, the OSI Board voted to establish an AI sub-committee who will develop appropriate mechanisms for updating the OSAID in consultation with stakeholders. It will be fully formed in the months ahead.
Please continue to stay involved, as diverse voices and experiences are required to ensure Open Source AI works for the good of us all.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI The Open Source Initiative Announces the Release of the Industry’s First Open Source AI DefinitionOpen and public co-design process culminates in a stable version of Open Source AI Definition, ensures freedoms to use, study, share and modify AI systems.
Other highlights:
- How we passed the AI conundrums
- ClearlyDefined at SOSS Fusion 2024
- ClearlyDefined’s Steering and Outreach Committees Defined
- The Open Source Initiative Supports the Open Source Pledge
Article from ZDNet
For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them.
Other highlights:
- The Gap Between Open and Closed AI Models Might Be Shrinking. Here’s Why That Matters (Time)
- Meta’s military push is as much about the battle for open-source AI as it is about actual battles (Fortune)
- OSI unveils Open Source AI Definition 1.0 (InfoWorld)
- We finally have an ‘official’ definition for open source AI (TechCrunch)
- Read all press mentions from this past month
News from OSI affiliates:
- OpenSSF: SOSS Fusion 2024: Uniting Security Minds for the Future of Open Source (Security Boulevard)
- Mozilla Foundation: How Mozilla’s President Defines Open-Source AI (Forbes)
News from OpenSource.net:
- OpenSource.Net turns one with a redesign
- How to make reviewing pull requests a better experience
- Closing the Gap: Accelerating environmental Open Source
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
Bloomberg is seeking a Technical Architect to join their OSPO team.
EventsUpcoming events:
- Nerdearla Mexico (November 7-9, 2024 – Mexico City)
- SeaGL (November 8-9, 2024 – Seattle)
- SFSCON (November 8-9, 2024 – Bolzano)
- KubeCon + CloudNativeCon North America (November 12-15, 2024 – Salt Lake City)
- OpenForum Academy Symposium (November, 13-14, 2024 – Boston)
- The Linux Foundation Legal Summit (November 18-19, 2024 – Napa)
- The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- FOSDEM 2025 EU-Policy Devroom – event being organized by the OSI, OpenForum Europe, Eclipse Foundation, The European Open Source Software Business Association, the European Commission Open Source Programme Office, and the European Commission.
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
mark.ie: LocalGov Drupal (LGD): A Digital Public Good Transforming Government Services
LocalGov Drupal is the epitome of the principles of a Digital Public Good.
Drupal In the News: Drupal CMS: Groundbreaking New Version of Drupal Detailed at DrupalCon Singapore 2024
MARINA BAY, Singapore, 6 November, 2024—Drupal CMS, the groundbreaking package built on Drupal core with the marketer in mind, will launch on 15 January 2025. Conference attendees at DrupalCon Singapore 2024 will have the exclusive opportunity to be the first to learn more about Drupal CMS directly from Drupal’s founder, Dries Buytaert.
Learn how Drupal CMS will enable site builders without any Drupal experience to easily create a new site using their browser, marking one of the most significant launches in Drupal history.
Alongside the Drupal Association leadership team, Dries will unveil key features of Drupal CMS, making DrupalCon Singapore 2024 a can’t-miss event for anyone in the Open Source community. Occurring one month before the release of Drupal CMS, DrupalCon Singapore 2024 is an exclusive opportunity for attendees to join in the conversation surrounding Drupal CMS directly with its creators.
“The product strategy is for Drupal CMS to be the gold standard for no-code website building,” said Dries. “Our goal is to empower non-technical users like digital marketers, content creators, and site-builders to create exceptional digital experiences without requiring developers.”
DrupalCon Singapore 2024, 9-11 December 2024, is a premier gathering of Drupal and Open Source professionals. Over three days, the conference will showcase the latest Drupal trends, facilitate networking opportunities, and offer a platform for thought leadership in the Open Source landscape.
Key features of DrupalCon Singapore 2024 include:
- Keynotes, sessions, and panels: The Driesnote and Drupal CMS Panel are two highlights amongst a packed schedule of insightful sessions.
- Contribution Day: Contribution Day is where attendees grow and learn by helping to make Drupal even better. Giving back to the project is crucial in an Open Source community, as the Drupal project is developed by a community of people who work together to innovate the software.
- Birds of a Feather (BoFs): BoFs provide the perfect setting for connecting with like-minded attendees who share your interests.
- Splash Awards: Celebrate the work and creativity of the global Drupal community with this awards ceremony, which recognises outstanding projects built with Drupal.
- Networking Opportunities: Network with experts from around the globe who create ambitious digital experiences.
Register for DrupalCon Singapore 2024 at https://events.drupal.org/singapore2024 and join the next chapter in Drupal’s evolution!
Real Python: How to Reset a pandas DataFrame Index
In this tutorial, you’ll learn how to reset a pandas DataFrame index, the reasons why you might want to do this, and the problems that could occur if you don’t.
Before you start your learning journey, you should familiarize yourself with how to create a pandas DataFrame. Knowing the difference between a DataFrame and a pandas Series will also prove useful to you.
In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.
As a starting point, you’ll need some data. To begin with, you’ll use the band_members.csv file included in the downloadable materials that you can access by clicking the link below:
Get Your Code: Click here to download the free sample code you’ll use to learn how to reset a pandas DataFrame index.
The table below describes the data from band_members.csv that you’ll begin with:
Column Name PyArrow Data Type Description first_name string First name of member last_name string Last name of member instrument string Main instrument played date_of_birth string Member’s date of birthAs you’ll see, the data has details of the members of the rock band The Beach Boys. Each row contains information about its various members both past and present.
Note: In case you’ve never heard of The Beach Boys, they’re an American rock band formed in the early 1960s.
Throughout this tutorial, you’ll be using the pandas library to allow you to work with DataFrames, as well as the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types that pandas uses by default.
If you’re working at the command line, you can install both pandas and pyarrow using the single command python -m pip install pandas pyarrow. If you’re working in a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. Regardless, you should do this within a virtual environment to avoid clashes with the libraries you use in your global environment.
Once you have the libraries in place, it’s time to read your data into a DataFrame:
Python >>> import pandas as pd >>> beach_boys = pd.read_csv( ... "band_members.csv" ... ).convert_dtypes(dtype_backend="pyarrow") Copied!First, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the beach_boys variable, you used pandas’ read_csv() function, passing band_members.csv as the file to read. Finally, by passing dtype_backend="pyarrow" to .convert_dtypes() you convert all columns to pyarrow types.
If you want to verify that pyarrow data types are indeed being used, then beach_boys.dtypes will satisfy your curiosity:
Python >>> beach_boys.dtypes first_name string[pyarrow] last_name string[pyarrow] instrument string[pyarrow] date_of_birth string[pyarrow] dtype: object Copied!As you can see, each data type contains [pyarrow] in its name.
If you wanted to analyze the date information thoroughly, then you would parse the date_of_birth column to make sure dates are read as a suitable pyarrow date type. This would allow you to analyze by specific days, months or years, and so on, as commonly found in pivot tables.
The date_of_birth column is not analyzed in this tutorial, so the string data type it’s being read as will do. Later on, you’ll get the chance to hone your skills with some exercises. The solutions include the date parsing code if you want to see how it’s done.
Now that the file has been loaded into a DataFrame, you’ll probably want to take a look at it:
Python >>> beach_boys first_name last_name instrument date_of_birth 0 Brian Wilson Bass 20-Jun-1942 1 Mike Love Saxophone 15-Mar-1941 2 Al Jardine Guitar 03-Sep-1942 3 Bruce Johnston Bass 27-Jun-1942 4 Carl Wilson Guitar 21-Dec-1946 5 Dennis Wilson Drums 04-Dec-1944 6 David Marks Guitar 22-Aug-1948 7 Ricky Fataar Drums 05-Sep-1952 8 Blondie Chaplin Guitar 07-Jul-1951 Copied!DataFrames are two-dimensional data structures similar to spreadsheets or database tables. A pandas DataFrame can be considered a set of columns, with each column being a pandas Series. Each column also has a heading, which is the name property of the Series, and each row has a label, which is referred to as an element of its associated index object.
The DataFrame’s index is shown to the left of the DataFrame. It’s not part of the original band_members.csv source file, but is added as part of the DataFrame creation process. It’s this index object you’re learning to reset.
The index of a DataFrame is an additional column of labels that helps you identify rows. When used in combination with column headings, it allows you to access specific data within your DataFrame. The default index labels are a sequence of integers, but you can use strings to make them more meaningful. You can actually use any hashable type for your index, but integers, strings, and timestamps are the most common.
Note: Although indexes are certainly useful in pandas, an alternative to pandas is the new high-performance Polars library, which eliminates them in favor of row numbers. This may come as a surprise, but aside from being used for selecting rows or columns, indexes aren’t often used when analyzing DataFrames. Also, row numbers always remain sequential when rows are added or removed in a Polars DataFrame. This isn’t the case with indexes in pandas.
Read the full article at https://realpython.com/pandas-reset-index/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Julien Tayon: The crudest CRUD of them all : the smallest CRUD possible in 150 lines of python
For this to begin, I am not really motivated in beginning with a full fledged MVC (Model View Controller) à la django because there is a lot of boilerplates and actions to do before a result. But, it has a lot of feature I want, including authentication, authorization and handling security.
For prototypes we normally flavours lightweight framework (à la flask), and CRUD.
CRUD approach is a factorisation of all framework in a single dynamic form that adapts itself to the model to generate HTML forms to input data, tabulate, REST endpoints and search them from the python class declaration and generate the database model. One language to rule them all : PYTHON. You can easily generate even the javascript to handle autocompletion on the generated view from python with enough talent.
But before using a CRUD framework, we need a cruder one, ugly, disgusting but useful for a human before building the REST APIs, writing the class in python, the HTML form, and the controlers.
I call this the crudest CRUD of them all.
Think hard at what you want when prototyping ...
- to write no CONTROLLERS ; flask documentation has a very verbose approach to exposing routes and writing them, writing controller for embasing and searching databases is boring
- to write the fewer HTML views possible, one and only onle would be great ;
- to avoid having to fiddle the many files reflecting separation of concerns : the lesser python files and class you touch the better;
- to avoid having to write SQL nor use an ORM (at least a verbose declarative one) ;
- show me your code and you can mesmerize and even fool me, however show me your data structure and I'll know everthing I have to know about your application : data structure should be under your nose in a readable fashion in the code;/
- to have AT LEAST one end point for inserting and searching so that curl can be used to begin automation and testing, preferably in a factorisable fashion;
- only one point of failure is accepted
Once we set these few condition we see whatever we do WE NEED a dynamic http server at the core. Python being the topic here, we are gonna do it in python.
What is the simplest dynamic web server in python ?
The reference implementation of wsgi that is the crudest wsgi server of them all : wsgiref. And you don't need to download it since it's provided in python stdlib.
First thing first, we are gonna had a default view so that we can serve an HTML static page with the list of the minimal HTML we need to interact with data : sets of input and forms.
Here, we stop. And we see that these forms are describing the data model.
Wouldn't it be nice if we could parse the HTML form easily with a tool from the standard library : html.parser and maybe deduce the database model and even more than fields coud add relationship, and well since we are dreaming : what about creating the tables on the fly from the form if they don't exists ?
The encoding of the relationship do require an hijack of convention where when the parser cross a name of the field in the form whatever_id it deduces it is a foreign key to table « whatever », column « id ».
Once this is done, we can parse the html, do some magick to match HTML input types to database types (adapter) and it's almost over. We can even dream of creating the database if it does not exists in a oneliner for sqlite.
We just need to throw away all the frugality of dependencies by the window and spoil our karma of « digital soberty » by adding the almighty sqlalchemy the crudest (but still heavy) ORM when it comes of the field of the introspective features of an ORM to map a database object to a python object in a clear consistent way. With this, just one function is needed in the controller to switch from embasing (POST method) and searching (GET).
Well, if the DOM is passed in the request. So of course I see the critics here :
- we can't pass the DOM in the request because the HTML form ignores the DOM
- You are not scared of error 415 (request too large) in the get method if you pass the DOM ?
Since we are human we would also like the form to be readable when served, because, well, human don't read the source and can't see the name attributes of the input. A tad of improving the raw html would be nice. It would also give consistency. It will also diminishes the required size of the formular to send. Here, javascript again is the right anwser. Fine, we serve the static page in the top of the controller. Let's use jquery to make it terse enough. Oh, if we have Javascript, wouldn't il be able to clone the part of the invented model tag inside every form so now we can pass the relevant part of the DOM to the controller ?
I think we have everything to write the crudest CRUD server of them all :D
Happy code reading : import multipart from wsgiref.simple_server import make_server from json import dumps from sqlalchemy import create_engine, MetaData, Table, Column from sqlalchemy import Integer, String, Float, Date, DateTime,UnicodeText, ForeignKey from html.parser import HTMLParser from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session from sqlalchemy import select from sqlalchemy import create_engine from sqlalchemy_utils import database_exists, create_database from urllib.parse import parse_qsl, urlparse engine = create_engine("postgresql://jul@192.168.1.32/pdca") if not database_exists(engine.url): create_database(engine.url) tables = dict() class HTMLtoData(HTMLParser): def __init__(self): global engine, tables self.cols = [] self.table = "" self.tables= [] self.engine= engine self.meta = MetaData() super().__init__() def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "input": if attrs.get("name") == "id": self.cols += [ Column('id', Integer, primary_key = True), ] return try: if attrs.get("name").endswith("_id"): table,_=attrs.get("name").split("_") self.cols += [ Column(attrs["name"], Integer, ForeignKey(table + ".id")) ] return except Exception as e: print(e) if attrs["type"] in ("email", "url", "phone", "text"): self.cols += [ Column(attrs["name"], UnicodeText ), ] if attrs["type"] == "number": if attrs["step"] == "any": self.cols+= [ Columns(attrs["name"], Float), ] else: self.cols+= [ Column(attrs["name"], Integer), ] if attrs["type"] == "date": self.cols += [ Column(attrs["name"], Date) ] if attrs["type"] == "datetime": self.cols += [ Column(attrs["name"], DateTime) ] if attrs["type"] == "time": self.cols += [ Column(attrs["name"], Time) ] if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): if tag=="form": self.tables += [ Table(self.table, self.meta, *self.cols), ] tables[self.table] = self.tables[-1] self.table = "" self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() html = """ <!doctype html> <html> <head> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.7.1/jquery.min.js"></script> <script> $(document).ready(function() { $("form").each((i,el) => { $(el).wrap("<fieldset>"+ el.action + "</fieldset>" ); $(el).append("<input type=submit value=insert formmethod=post ><input type=submit value=search formmethod=get />"); }); $("input:not([type=hidden],[type=submit])").each((i,el) => { $(el).before("<label>" + el.name+ "</label><br/>"); $(el).after("<br>"); }); }); </script> </head> <body> <form action=/user > <input type=number name=id /> <input type=text name=name /> <input type=email name=email > </form> <form action=/event > <input type=number name=id /> <input type=date name=date /> <input type=text name=text /> <input type=number name=user_id /> </form> </body> </html> """ router = dict({"" : lambda fo: html,}) def simple_app(environ, start_response): fo,fi=multipart.parse_form_data(environ) fo.update(**{ k: dict( name=fi.filename, content=fi.file.read().decode('utf-8', 'backslashreplace'), content_type=fi.content_type, ) for k,v in fi.items()}) table = route = environ["PATH_INFO"][1:] fo.update(**dict(parse_qsl(environ["QUERY_STRING"]))) start_response('200 OK', [('Content-type', 'text/html; charset=utf-8')]) try: HTMLtoData().feed(html) except KeyError: pass metadata = MetaData() metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() if route in tables.keys(): with Session(engine) as session: Item = getattr(Base.classes, table) if environ.get("REQUEST_METHOD", "GET") == "POST": new_item = Item(**{ k:v for k,v in fo.items() if v and not k.startswith("_")}) session.add(new_item) ret=session.commit() fo["insert_result"] = new_item.id if environ.get("REQUEST_METHOD") == "GET": result = [] for elt in session.execute( select(Item).filter_by(**{ k : v for k,v in fo.items() if v and not k.startswith("_")})).all(): result += [{ k.name:getattr(elt[0],k.name) for k in tables[table].columns}] fo["search_result"] = result return [ router.get(route,lambda fo:dumps(fo.dict, indent=4, default=str))(fo).encode() ] print("Crudest CRDU of them all on port 5000...") make_server('', 5000, simple_app).serve_forever()
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
Daniel Lange: Weird times ... or how the New York DEC decided the US presidential elections
November 2024 will be known as the time when killing peanut, a pet squirrel, by the New York State DEC swung the US presidential elections and shaped history forever.
The hundreds of millions of dollars spent on each side, the tireless campaigning by the candidates, the celebrity endorsements ... all made for an open race for months. Investments evened each other out.
But an OnlyFans producer showing people an overreaching, bureaucracy driven State raiding his home to confiscate a pet squirrel and kill it ... swung enough voters to decide the elections.
That is what we need to understand in times of instant worldwide publication and a mostly attention driven economy: Human fates, elections, economic cycles and wars can be decided by people killing squirrels.
RIP, peanut.
P.S.: Trump Media & Technology Group Corp. (DJT) stock is up 30% pre-market.
Little Wayland Things
While I do have a Qt git build on my machine that I use for development, I usually only test individual applications and functionality but hardly ever run my full Plasma session on it. This means that for day-to-day use I typically only get to enjoy new Qt features once they have actually been released.
Proper modal dialogs under Wayland (note the darkened editor window) thanks to XDG Dialog and the new Qt 6.8One feature I talked about in the very last issue of “On the road to Plasma 6” is a nice API for XDG Foreign. To recap: it’s a Wayland protocol that lets an application export a window to another one so it can can attach a window to it. For example, the XDG Desktop Portal wants to attach the “Open File” dialog as if it were coming from the application that requested it.
Of course we don’t want to write low-level Wayland code and instead have an easy to use API for it. The KWindowSystem::setMainWindow function does just that: hand in a window and the token you received from the other application (created through KWaylandExtras::exportWindow) and it takes care of everything else. Presumably, you want to set the parent window before showing your dialog to make absolutely sure it’s set up properly.
However, Qt did not have an API to tell us when the underlying XDG Toplevel (think: a regular desktop-y window with a title bar and what not) had been created. We were only told when the basic wl_surface was created, which was too early, or the window was exposed/shown, at which point it was already flashing up in the user’s task bar. Hence, I added a new QWaylandWindow::surfaceRoleCreated (and corresponding surfaceRoleDestroyed) signal. Utilizing that, the aforementioned KWindowSystem API now works perfectly.
Another major addition to Qt Wayland that I have been looking forward to very much is support for the XDG Dialog protocol. While a window could have always had a parent (e.g. a popup menu or settings dialog parented to the application’s main window), there was no concept of a “modal” dialog. Therefore, we did not support the “dim parent” effect under Wayland that darkens a window to indicate it cannot be interacted with. More importantly, KWin couldn’t take it into account for its focus handling either. It happily let you focus a blocked window but the application would then just ignore your input.
There’s only one Dolphin running here!This was most noticeable for me when Alt+Tab’ing back and forth, for example using the “Open File” dialog in one application and then trying to switch to the other to verify where the file was actually located. Instead of cycling between the file dialog and the other application, it would alternate between the file dialog and the blocked main window.
Sadly, even when I upgraded to Qt 6.8 the situation didn’t improve. I noticed that Alt+Tab actually showed the dialog twice. This looked like a bug and sure enough comparing it to the Plasma 5.27 LTS session on my other computer proved that it used to work at some point. At first I didn’t spot anything obvious until I noticed a small typo that must have slipped in during some major refactoring. Instead of not including the main window when it had a modal child, it included the modal child once again! Sure enough, adding an exclamation mark (the logical NOT operator in C++) did the trick.
If you want to support more good people such as myself, consider donating to the KDE End of Year Fundraiser!
drunomics: Low-code + Decoupled Drupal: The Power of Custom Elements 3.0
Matt Layman: Deploy Your Own Web App With Kamal 2
HDR and color management in KWin, part 5: HDR on SDR laptops
This one required a few other features to be implemented first, so let’s jump right in.
Matching reference luminancesA big part of what a desktop compositor needs to get right with HDR content is to show SDR and HDR content properly side by side. KWin 6.0 added an SDR brightness slider for that purpose, but that’s only half the equation - what about the brightness of HDR content?
When we say “HDR”, usually that refers to a colorspace with the rec.2020 primaries and the perceptual quantizer (PQ) transfer function. A transfer function describes how to calculate a real brightness value from the “electrical” signal encoded in the content - PQ specifically has encoded values from 0 to 1 and brightness values from 0 to 10000 nits. For reference, your typical office monitor does around 300 or 400 nits at maximum brightness setting, and many newer phones can go a bit above 1000 nits.
Now if we want to show HDR content on an HDR screen, the most straight forward thing to do would be to just calculate the brightness values, write them to the screen and be done with it, right? That’s what KWin did up to Plasma 6.1, but it’s far from ideal. Even if your display can show the full range of requested brightness values, you might want to adjust the brightness to match your environment - be it brighter or darker than the room the content was optimized for - and when there’s SDR things in HDR content, like subtitles in a video, that should ideally match other SDR content on the screen as well.
Luckily, there is a preexisting relationship between HDR and SDR that we can use: The reference luminance. It defines how bright SDR white is - which is why another name for it is simply “SDR white”.
As we want to keep the brightness slider working, we won’t map SDR content to the reference luminance of any HDR transfer function though, but instead we map both SDR and HDR content to the SDR brightness setting. If we have an HDR video that uses the PQ transfer function, that reference luminance is 203 nits. If your SDR brightness setting is at 406 nits, KWin will just multiply the brightness of the HDR video with a factor of 2.
This doesn’t only mean that we can make SDR and HDR content fit together nicely on HDR screens, but it also means we now know what to do when we have HDR content on an SDR screen: We map the reference luminance from the video to SDR white on the screen. That’s of course not enough to make it look nice though…
Tone mappingEspecially with HDR presented on an SDR screen, but also on many HDR screens, it will happen that the content brightness exceeds the display capabilities. To handle this, starting with Plasma 6.2, whenever the HDR metadata of the content says it’s brighter than the display can go, KWin will apply tone mapping.
Doing this tone mapping in RGB can result in changing the content quite badly though. Let’s take a look by using the most simple “tone mapping” function there is, clipping. It just limits the red, green and blue values separately to the brightness that the screen can show.
If we have a pixel with the value [2.0, 0.0, 2.0] and a maximum brightness of 1.0, that gets mapped to [1.0, 0.0, 1.0] - which is the same purple, just in darker. But if the pixel has the values [2.0, 0.0, 1.0], then that gets mapped to [1.0, 0.0, 1.0], even though the source color was significantly more red!
To fix that, KWin’s tone mapping uses ICtCp. This is a color space developed by Dolby, in which the perceived brightness (aka Intensity) is separated from the chroma components (Ct = blue-yellow, Cp = red-green), which is perfect for tone mapping. KWin’s shaders thus transform the RGB content to ICtCp, apply a brightness mapping function to only the intensity component, and then convert back to RGB.
The result of that algorithm looks like this:
RGB clipping KWin 6.2’s tone mapping MPV’s tone mappingAs you can see, there’s still some color changes going on in comparison to MPV’s algorithm; this is partially because the tone mapping curve still needs some more adjustments, and partially because we also still need to do similar mapping for colors that the screen can’t actually show. It’s already a large improvement though, and does better than the built-in tone mapping functionality in many HDR screens.
When tone mapping HDR content on SDR screens, we always end up reducing the brightness of the overall image, so that we have some brightness values to map the really bright highlights in the video to - otherwise everything just slightly over the reference luminance would look like an overexposed blob of color, as you can see in the “RGB clipping” image. There are ways around that though…
HDR on SDR laptop displaysTo explain the reasoning behind this, it helps to first have a look at what even makes a display “HDR”. In many cases it’s just marketing nonsense, a label that’s put on displays to make them seem more fancy and desirable, but in others there’s an actual tangible benefit to it.
Let’s take OLED displays as an example, as it’s considered one of the display technologies where HDR really shines. When you drive an OLED at high brightness levels, it becomes quite inefficient, it draws a lot of power and generates a lot of heat. Both of these things can only be dealt with to a limited degree, so OLED displays can generally only be used with relatively low average brightness levels. They can go a lot brighter than the average in a small part of the screen though, and that’s why they benefit so much from HDR - you can show a scene that’s on average only 200 nits bright, with the sky in the image going up to 300 nits, the sun going up to 1000 nits and the ground only doing 150 nits.
Now let’s compare that to SDR laptop displays. In the case of most LCDs, you have a single backlight LED for the whole screen, and when you move the brightness slider, the power the backlight is driven at is changed. So there’s no way to make parts of the screen brighter than the rest on a hardware level… but that doesn’t mean there isn’t a way to do it in software!
When we want to show HDR content and the brightness slider is below 100%, KWin increases the backlight level to get a peak brightness that matches the relative peak brightness of that content (as far as that’s possible). At the same time it changes the colorspace description on the output to match that change: While the reference luminance stays the same, the maximum luminance of the transfer function gets increased in proportion to the increase in backlight brightness.
The results is that SDR white gets mapped to a reduced RGB value, which is at least supposed to exactly counteract the increase of brightness that we’re applying with the backlight, while HDR content that goes beyond the reference luminance gets to use the full brightness range.
Increasing the backlight power of course doesn’t come without downsides; black levels and power usage both get increased, so this is only ever active if there’s HDR content on the screen with valid HDR metadata that signals brightness levels going beyond the reference luminance.
As always, capturing HDR content with a phone camera is quite difficult, but I think you can at least sort of see the effect:
without backlight adjustment with backlight adjustmentThis feature has been merged into KWin’s git master branch and will be available on all laptop displays starting with Plasma 6.3. I really recommend trying it for yourself once it reaches your distribution!
TestDriven.io: Avoid Counting in Django Pagination
FSF Anniversary Logo Contest
FSF Blogs: Forty years of commitment to software freedom
Forty years of commitment to software freedom
PreviousNext: PowerBI Dashboard: Addressing content currency
Post co-authored with NSW Resources. A critical issue with the management of content currency on our Drupal website, nsw.gov.au/nswresources required an innovative solution to provide us with an automated content audit process.
by luhur.rizal / 6 November 2024Due to the complexity and size of our Drupal web presence, ensuring each page was up-to-date and reviewed by the appropriate business unit became increasingly challenging. We needed a tool to track how long it had been since a page was reviewed, set specific periods for future reviews and easily identify the page owners for each section of the website. Furthermore, with the required frequency of daily updates to the site, the solution had to be ‘live’ to accurately reflect these changes.
Choosing the right solutionTo tackle these challenges, we collaborated with our Drupal web development partner, PreviousNext, to create a live .csv file of all relevant web pages. This file included custom metadata detailing each page’s review frequency, page owner, date of last page update, date of last content review, publishing status and the next scheduled review date. By using this .csv file as a data source, we built a user-friendly content audit report dashboard in Microsoft Power BI.
The PowerBI dashboard provides executives with a high-level overview of which sections of the website are most in need of review. A complementary dashboard for ‘content champions’ offers a more granular view of the status of each individual page, enabling targeted content management.
Power BI implementationImplementing this solution involved several steps:
Internal stakeholder consultationWe engaged with the various business units in NSW Resources division to identify page owners and establish appropriate review periods for each section of the website.
Metadata assignmentMetadata bulk-uploaded to the pages included the custom metadata fields created for the project, such as review periods and page owners.
Data manipulation in PowerBIData from the .csv file was manipulated within PowerBI to ensure that columns were in the correct format. We created a 'Review status' column based on the next date of review to provide clear visibility. We also filtered out any unpublished or archived content to make it more streamlined.
PowerBI buildUsing the dataset from the website's metadata and Google Analytics, we built a comprehensive dashboard in PowerBI Desktop and then uploaded it to PowerBI Service for broader access. Live links to the web pages were integrated into the dashboards for easy navigation.
Executive overview reportWe developed a high-level summary report that shows how many pages each business unit is responsible for and includes Google Analytics page views from the past 30 days.
Content audit reportThis report provides filtering by review status and sub-areas within each business unit, offering direct links to the listed web pages and conditional formatting utilising a traffic light system for review status indicators.
Overall trackerDesigned to be used exclusively by the NSW Resources website team, this site-wide and document overview provides an overall tracker, including documents, events, and articles that are not part of other reports.
Internal integrationsPowerBI reports were integrated into Microsoft Teams and the SharePoint intranet, facilitating use across the business.
Internal work request formThe form was updated to distinguish whether a web update is part of a comprehensive content audit review, therefore requiring a review period reset, or just a minor adjustment that means the review status remains unchanged.
A content audit that works for everyoneThe project required extensive consultation to define the scope and needs of each business unit. Following this, identifying the correct page owners, along with setting appropriate review periods, posed significant challenges.
As business units sometimes want entire website sections to be marked as reviewed with a change in the review period, the Drupal-side dashboard allowed for bulk changes to both owners and review periods by uploading a revised version of the .csv file, saving substantial time.
Understanding the correct licensing requirements for PowerBI was another challenge. After consulting with our internal IT team, a group workspace was set up under an enterprise agreement, and an individual licence was obtained for the team member managing the dashboards.
Testing the PowerBI dashboardTo ensure effectiveness, the solution was initially tested in a development environment that mirrored the production site. This approach allowed us to test the limitations and user experience prior to going live. During this phase, we tested the bulk upload of the .csv file to update page metadata.
A soft launch of the content audit dashboard provided valuable insights, such as the realisation that a three-month review period was too short, given the number of pages each business unit manages.
As a result of this testing period, we made minor adjustments, such as requiring a defined review period for each page and allowing users to opt-out of updates considered a ‘review’ for reporting purposes. This might include, for example, correcting a typo, which doesn’t constitute a page review in the context of the content audit dashboard.
Transforming content auditingThis solution significantly enhanced reporting capabilities across NSW Resources, reducing the need for manual intervention. Now, page updates are easily reflected in the report dashboard automatically.
The PowerBI dashboards offer real-time updates and clear visibility of page ownership and review status, making it easier for business units to manage their content.
Business units can independently track the currency of their pages without needing data from the digital team, streamlining the process and increasing efficiency.
Future plans for the dashboardThe solution will continue to evolve. We plan to use the work done on the PowerBI integration to inform future website improvements with the potential for further Google Analytics data integration.
A document audit is currently a separate and opt-in process for business units, but future plans may involve greater integration.
ConclusionOverall, this innovative solution addresses a critical need for NSW Resources by providing a robust, automated and user-friendly content audit process that adapts to the dynamic nature of our Drupal website.
PyCoder’s Weekly: Issue #654 (Nov. 5, 2024)
#654 – NOVEMBER 5, 2024
View in Browser »
What goes into building a spreadsheet application in Python that runs in the browser? How do you make it launch quickly, and where do you store the cells of data? This week on the show, we speak with Chris Laffra about his project, PySheets, and his book “Communication for Engineers.”
REAL PYTHON podcast
Python 3.13 included a new version of the REPL which has the ability to define keyboard shortcuts. This article shows you how to create one and warns you about potential hangups.
TREY HUNNER
Say goodbye to managing failures, network outages, flaky endpoints, and long-running processes. Temporal ensures your code never fails. Period. PLUS, you can get started today on Temporal Cloud with $1,000 free credits on us →
TEMPORAL TECHNOLOGIES sponsor
To better understand just where the performance cost of running tests comes from, Anders ran a million empty tests. This post talks about what he did and the final results.
ANDERS HOVMOLLER
Currently, CPython signs its artifacts with both PGP and Sigstore. Removing the PGP signature has been proposed, but that has implications: Sigstore is still new enough that many Linux distributions don’t support it yet.
JOE BROCKMEIER
In this video course, you’ll learn what magic methods are in Python, how they work, and how to use them in your custom classes to support powerful features in your object-oriented code.
REAL PYTHON course
As GenAI and LLMs rapidly evolve, the impact of data leaks and unsafe AI outputs makes it critical to secure your AI infrastructure. Learn how MLOps and ML Platform teams can use the newly launched Guardrails Pro to secure AI operations — enabling faster, safer adoption of LLMs at scale →
GUARDRAILS sponsor
In the real world, things decay over time. In the digital world things get kept forever, and sometimes that shouldn’t be so. Designing for deletion is hard.
ARMIN RONACHER
Bite code! does their monthly Python news wrap-up. Check out stories on 3.13, proposed template strings, dependency groups in pyproject.toml, and more.
BITE CODE!
This project uses computer vision solution to automate doing inventory of products in retail, using YOLOv8 and image embeddings for precise detection.
ALBERT FERRÉ • Shared by Albert Ferré
Context managers enable you to create “template” code with initialization and clean up to make the code that uses them easier to read and understand.
JUHA-MATTI SANTALA
This post celebrating ten years of Django Girls talks about how it got started, what they’re hoping to do, and how you can get involved.
DJANGO GIRLS
This quick TIL post talks about five useful pytest options that let you control what tests to run with respect to failing tests.
RODRIGO GIRÃO SERRÃO
This post shows you how to return values from coroutines that have been concurrently executed using asyncio.gather().
JASON BROWNLEE
This list contains the recorded talks from the PyBay 2024 conference.
YOUTUBE video
CRISTIANOPIZZAMIGLIO.COM • Shared by Cristiano Pizzamiglio
jamesql: In-Memory NoSQL Database in Python Events Weekly Real Python Office Hours Q&A (Virtual) November 6, 2024
REALPYTHON.COM
November 7, 2024
MEETUP.COM
November 7, 2024
SYPY.ORG
November 9, 2024
MEETUP.COM
November 12, 2024
PITERPY.COM
November 14 to November 16, 2024
PYCON.SE
November 16 to November 17, 2024
PYCON.HK
November 16 to November 17, 2024
PYCON.JP
November 16 to November 18, 2024
PYTHON.IE
Happy Pythoning!
This was PyCoder’s Weekly Issue #654.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]