Open Source Initiative
2024 end-of-year review: policy and standards
As 2024 draws to a close, the Open Source Initiative (OSI) reflects on an eventful year marked by significant achievements in advocating for Open Source principles in policy and standards development worldwide. Below, we highlight key milestones, initiatives and ongoing work from the year.
Global engagementsColumbia Convening on AI: Led by the Mozilla Foundation and the Columbia Institute of Global Politics, OSI participated in this collaboration to build a framework for openness in AI. The convening produced technical and policy memorandums advancing the discourse on open and equitable AI development.
United Nations “OSPOs for Good”: OSI participated in a panel on this NYC event, emphasizing Open Source’s role in defining Open Source AI. Collaborative discussions underscored the growing recognition of Open Source in global governance.
Open Source Congress: OSI participated in the second Open Source Congress, hosted in Beijing by the Open Atom Foundation. This event fostered collaboration among nonprofit leaders in the Open Source ecosystem.
Digital Public Goods Alliance (DPGA): OSI continued its active role as a member of the DPGA, contributing to the Annual Members Meeting held in Singapore. The event highlighted OSI’s involvement in promoting open standards, software and AI models as part of the global ecosystem of Digital Public Goods.
European policy workCyber Resilience Act (CRA): After having successfully helped the community to understand the challenges of the first draft of the Cyber Resilience act, OSI’s focus shifted to implementation in the second half of the year, joining the Eclipse Open Regulatory Compliance Group, a group designed to help Open Source developers implement the CRA. In addition to this, we are preparing to work with European standardization organizations to ensure the standards developed for the CRA are implementable for Open Source developers.
Standardization Advocacy: OSI provided feedback on the Standard Essential Patents Regulation and standardization frameworks, ensuring they align with Open Source principles. Efforts included ensuring software standards can be freely implemented by the community. In parallel to this, OSI is working to find ways to involve Open Source developers in the standardization process. OSI remains a member at ETSI and regularly engages with representatives of other standards bodies.
Collaboration with Eclipse Foundation: A Memorandum of Understanding with the Eclipse Foundation marked a strategic partnership to influence Open Source AI policy and the implementation of the Cyber Resilience Act.
AI Act Advocacy: After the introduction of an exemption for Open Source AI in the AI act, lawmakers have faced an unprecedented wave of openwashing in the AI space. OSI has expanded its educational efforts, meeting with lawmakers and staffers, presenting the Open Source AI Definition, and highlighting cases of openwashing. In addition to this, OSI was selected as one of the stakeholders for the drafting of the Code of Practice on General-Purpose AI, a set of rules designed to help companies comply with the AI act. OSI has used this opportunity to help address some of the challenges Open Source AI developers face, and to clearly differentiate between Open Source and open weight AI.
New Appointments: OSI welcomed Jordan Maris as EU Policy Analyst, strengthening its advocacy presence in Brussels.
Community Outreach: Blogs, workshops and participation in conferences like FOSDEM and CPDP-ai in Brussels ensured robust engagement with diverse stakeholders.
U.S. policy workOpen Policy Alliance: Designed to bring nonprofit organizations together, the OPA focuses on educating and informing U.S. public policy decisions related to Open Source software, content, research and education. OSI’s leadership in the OPA strengthened policy efforts across the U.S., including invited participation roundtables with federal agencies, responding to public calls for comments on security, sustainability and open source AI models, educating policymakers on Open Source’s societal benefits, and securing philanthropic support for Open Source policy initiatives.
AI Policy Development: Partnering with Carnegie Mellon University, OSI co-founded the Open Forum for AI, focusing on human-centered AI development. The initiative reflects OSI’s commitment to ensuring AI policies protect openness and transparency.
Federal Advocacy: OSI contributed responses to Requests for Comments (RFCs) on AI and cybersecurity, emphasizing the value of openness in safeguarding public interest, working together with other organizations like the Mozilla Foundation and the Center for Democracy & Technology.
Community Outreach: Blogs, workshops and participation in events like All Things Open in Raleigh, NC and Open Forum for AI in Washington D.C. fostered strong engagement with a diverse range of stakeholders.
Looking ahead: 2025The year ahead promises continued focus on:
- Implementing AI Act and CRA provisions
- Expanding outreach and educational programs for policymakers
- Strengthening global partnerships
OSI remains committed to championing Open Source principles in the policy landscape, ensuring developers and communities worldwide thrive in an environment of transparency, collaboration and innovation.
We extend our gratitude to our partners, allies and community members for their support in advancing OSI’s mission. Together, we are shaping a more open and equitable future. Please consider donating to or sponsoring the OSI.
2024 End-of-Year Review: Open Source AI Definition v1.0
The release of version 1.0 of the Open Source AI Definition (OSAID) marks an important milestone on a journey to ensure that AI systems are innovative and aligned with the foundational principles of Open Source: the freedoms to use, study, modify and share.
Drafting a definition through collaborationThe OSAID is a testament to the power of global collaboration. Over the past two years, OSI convened a coalition of stakeholders—developers, data scientists, legal experts, policymakers and end users from all over the world. This diverse group coalesced through in-person workshops, online town halls and intensive co-design sessions to craft version 1.0 of the definition.
Key milestones of 2024System Analysis: As part of the co-design process, working groups were formed to discuss which AI system components should be required to satisfy the four freedoms for AI. This included assessing how data, models, training methods and legal agreements adhere to Open Source principles. The analysis provided invaluable insights into the gaps that exist in current AI practices and outlined actionable steps to bridge these gaps. It also helped refine the OSAID’s criteria, ensuring they remain both practical and comprehensive.
System Evaluation: Several AI systems were assessed against the OSAID’s criteria. While models like OLMo (AI2), Pythia (Eleuther AI), CrystalCoder (LLM360) and T5 (Google) met the requirements, others like LLaMA2 (Meta), Phi-2 (Microsoft), Mixtral (Mistral) and Grok (X/Twitter) fell short, spotlighting the critical need for transparent frameworks in AI development. Other models such as BLOOM (BigScience), Starcoder2 (BigCode) and Falcon (TII) would pass if they changed their license. This evaluation process also revealed areas where certain models could improve to better align with Open Source standards, demonstrating the OSAID’s role as a constructive guide for future developments.
Stable Release of OSAID 1.0: After extensive global consultation, OSI released the first stable version of the definition at the All Things Open conference in Raleigh, NC. This marked the culmination of two years of dialogue, research and iteration. The stable release provides a comprehensive framework to evaluate AI systems against the core principles of openness.
Global Endorsements: The OSAID has garnered endorsements from over 20 organizations, including Mozilla Foundation, Eleuther AI, CommonCrawl Foundation and the Eclipse Foundation, alongside support from more than 100 individuals. These endorsements validate the OSAID’s importance and its potential to shape the future of AI development.
Events and conferencesThroughout 2024, OSI actively participated in 23 conferences from around the world to engage with diverse communities. Highlights include FOSDEM (February – Brussels), Columbia Convening on openness and AI (February – New York), Open Source Summit NA (April, Seattle), PyCon (May – Pittsburgh), AI_Dev Europe (June, 2024 – Paris), OSPOs for Good (July, 2024 – New York), KubeCon + AI_dev Hong Kong (August – Hong Kong), Open Source Congress (August – Beijing), Deep Learning Indaba (September – Dakar), India FOSS (September – Bengaluru), Open Source Summit Europe (September – Vienna), Nerdearla (September – Buenos Aires), Training Data in OSAI (October – Paris) and All Things Open (October – Raleigh). A full timeline of in-person and online events is available here.
Publications and voices of the OSAIDThe OSI published over 60 blog posts about Open Source AI in 2024. One of the highlights is the Voices of the OSAID series that we ran with stories about a few of the people involved in the Open Source AI Definition co-design process, featuring 10 volunteers who have helped shape and are shaping the definition. These stories highlight the diversity and passion of the community, bringing a human element to the often technical discussions around Open Source and AI.
Press coverageThe work around the Open Source AI Definition was cited over 180 times in the press worldwide, educating and countering misinformation. Our work was featured at The New York Times, The Verge, TechCrunch, ZDNET, InfoWorld, Ars Technica, IEEE Spectrum, MIT Technology Review, among other top media outlets.
Looking aheadThe release of OSAID 1.0 is not the end but the beginning of a new chapter. As we transition into 2025, OSI remains committed to continuing regular updates to the definition and evaluating AI systems and licenses to ensure alignment with Open Source principles.
As the Open Source community moves forward, your involvement is more critical than ever. We encourage more organizations and individuals to endorse and implement the OSAID. By broadening its reach, OSI aims to establish the OSAID as the global benchmark for open AI systems. Together, we can ensure that AI remains a tool for permissionless innovation.
OSI extends its deepest gratitude to the sponsors, volunteers and participants who made 2024 a banner year for Open Source AI. Let’s continue to build a future where technology serves everyone, everywhere. As we celebrate this year’s accomplishments, we look forward to what we can achieve together in 2025 and beyond. Please consider donating or sponsoring the OSI.
Top articles at OpenSource.net in 2024
OpenSource.net, a platform designed to foster knowledge sharing, was launched in September 2023. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions. Here are some of the top articles published at OpenSource.net in 2024:
Business with Open Source- Open Source projects vs products: A strategic approach (Thomas Di Giacomo)
- Open Source visibility hacks — No icky marketing needed (Olga Rusakova)
- So, You Have Your 20-Page Open Source Strategy Doc. Now What? (Amanda Katona)
- Pajamas to profit: Launch your Open Source empire (Gaël Duval)
- Demystifying Open Source as a Business (Julia Machado)
- Why single vendor is the new proprietary (Thierry Carrez)
- Open code for closed services: The Open Source paradox of the cloud (Vittorio Bertola)
- Beyond the binary: The nuances of Open Source innovation (Roberto Galoppini)
- From data to action: Using metrics to improve Open Source communities (Dawn Foster)
- Diversity, Equity and Inclusion (DEI) metrics: Breaking barriers in Open Source (Anita Ihuman)
- How to make reviewing pull requests a better experience (Alya Abbott)
- Steady in a shifting Open Source world: FreeBSD’s enduring stability (Jason Perlow)
- Celebrating 30 years of Open Source with FreeDOS (Jim Hall)
- Sustain Open Source, sustain the planet: A new conversation (Tobias Augspurger)
- Closing the Gap: Accelerating environmental Open Source (Tobias Augspurger)
- Preserving Open Values in artificial intelligence (Mia Lykou Lund)
A special thank you to the authors who have contributed with articles and Cisco for sponsoring OpenSource.net. If you are interested in contributing with articles on Open Source software, hardware, open culture, and open knowledge, please submit a proposal.
ClearlyDefined: 2024 in review – milestones, growth and community impact
As 2024 draws to a close, it’s time to reflect on a transformative year for the ClearlyDefined project. From technical advancements to community growth, this year has been nothing short of extraordinary. Here’s a recap of our key milestones and how we’ve continued to bring clarity to the Open Source ecosystem.
ClearlyDefined 2.0: expanding license coverageThis year, we launched ClearlyDefined v2.0, a major milestone in improving license data quality. By integrating support for LicenseRefs, we expanded beyond the SPDX License List, enabling organizations to navigate complex licensing scenarios with ease. Thanks to contributions from the community and leadership from GitHub and SAP, this release brought over 2,000 new licenses into scope. Dive into the details here.
New harvester for Conda packagesIn response to the growing needs of the data science and machine learning communities, we introduced a new harvester for Conda packages. This implementation ensures comprehensive metadata coverage for one of the most popular package managers. Kudos to Basit Ayantunde and our collaborators for making this a reality. Learn more about this update here.
Integration with GUAC for supply chain transparencyOur partnership with GUAC (Graph for Understanding Artifact Composition) from OpenSSF took supply chain observability to new heights. By integrating ClearlyDefined’s license metadata, GUAC users now have access to enriched data for compliance and security. This collaboration underscores the importance of a unified Open Source supply chain. Read about the integration here.
Community growth and governanceIn 2024, we took significant steps toward a more open governance model by electing leaders to the Steering and Outreach Committees. These committees are pivotal in driving technical direction and expanding community engagement. Meet our new leaders here.
Showcasing ClearlyDefined globallyWe showcased ClearlyDefined’s mission and impact across three continents:
- At FOSS Backstage and ORT Community Days in Berlin, we connected with industry leaders to discuss best practices for software compliance.
- At SOSS Fusion 2024 in Atlanta, we presented our collaborative approach to license compliance alongside GitHub and SAP.
- At Open Compliance Summit in Tokyo, we showcased how Bloomberg leverages ClearlyDefined in order to detect and manage Open Source licenses.
Each event reinforced the global importance of a transparent Open Source ecosystem. Explore our conference highlights here and here.
A Revamped Online PresenceTo welcome new contributors and support existing ones, we unveiled a new website featuring comprehensive documentation and resources. Whether you’re exploring our guides, engaging in forums, or diving into the project roadmap, the platform is designed to foster collaboration. Take a tour here.
Looking ahead to 2025As we celebrate these achievements, we’re already planning for an even more impactful 2025. From enhancing our tools to expanding our community, the future of ClearlyDefined looks brighter than ever.
Thank you to everyone who contributed to our success this year. A special thank you to Microsoft for hosting and curating ClearlyDefined, GitHub and SAP for their technical leadership, and Bloomberg and Cisco for their sponsorship. Your dedication ensures that Open Source continues to thrive with clarity and confidence.
Standards and the presumption of conformity
If you have been following the progress of the Cyber Resilience Act (CRA), you may have been intrigued to hear that the next step following publication of the Act as law in the Official Journal is the issue of a European Standards Request (ESR) to the three official European Standards Bodies (ESBs). What is that about? Well, a law like the CRA is extremely long and complex and conforming to it will involve a detailed analysis and a lot of legal advice.
Rather than forcing everyone individually to do that, the ESBs are instead sent a list of subjects that need proving and are asked to recommend a set of standards that, if observed, will demonstrate conformity with the law. This greatly simplifies things for everyone and leads to what the lawmakers call a “presumption of conformity.” You could go comply with the law based on your own research, but realistically that’s impossible for almost everyone so you will instead choose to observe the harmonized standards supplied by the ESBs.
This change of purpose for standards is very significant. They have evolved from merely being a vehicle to promote interoperability in a uniform market – an optional tool for private companies that improves their product for their consumers – to being a vehicle to prove legal compliance – a mandatory responsibility for all citizens and thus a public responsibility. This new role creates new challenges as the standards system was not originally designed with legal conformance in mind. Indeed, we are frequently reminded that standardization is a matter for the private sector.
So for example, the three ESBs (ETSI, CENELEC and CEN) all have “IPR rules” that permit the private parties who work within them to embed in the standards steps that are patented by those private companies. This arrangement is permitted by the European law that created the mechanism, Regulation 1025/2012 (in Annex II §4c). All three ESB’s expressly tolerate this behaviour as long as the patents are then licensed to implementers of the standards on “Fair, Reasonable and Non Discriminatory” (FRAND) terms. None of those words is particularly well defined, and the consequence is that to implement the standards that emerge from the ESBs you may well need to retain counsel to understand your patent obligations and enable you to enter into a relationship with Europe’s largest commercial entities to negotiate a license to those patents.
Setting aside the obvious problems this creates for Open Source software (where the need for such relationships broadly inhibits implementation), it is also a highly questionable challenge to our democracy. At the foundation of our fundamental rights is the absolute requirement that first, every citizen may know the law that governs them and secondly every citizen is freely able to comply if they choose. The Public.Resource.Org case shows us this principle also extends to standards that are expressly or effectively necessary for compliance with a given law.
But when these standards are allowed to have patents intentionally embodied within them by private actors for their own profit, citizens find themselves unable to practically conform to the law without specialist support and a necessary private relationship with the patent holders. While some may have considered this to be a tolerable compromise when the goal of standards was merely interoperability, it is clearly an abridgment of fundamental rights to condition compliance with the law on identifying and negotiating a private licensing arrangement for patents, especially those embedded intentionally in standards.
Just as Regulation 1025/2012 will need updating to reflect the court ruling on availability of standards, so too should it be updated to require that harmonized standards will only be accepted from the ESBs if they are supplied on FRAND terms where all restrictions on use are waived by the contributors. Without this change, standards will serve only the benefit of dominant actors and not the public.
Driving Open Source forward: make your impact in 2025
Our work thrives because of a passionate community dedicated to Open Source values. From advancing initiatives like the Open Source AI Definition to addressing challenges in licensing and policy, collaboration has always been our driving force.
While 2024 brought significant progress, the challenges ahead demand even greater collective action. As we look to 2025, your support will be pivotal in scaling our efforts.
We’ll continue serving as the steward of the Open Source Definition, protecting the Open Source principles and maintaining a list of OSI Approved Licenses.
We’ll continue monitoring policy and standards setting organizations, supporting legislators and policy makers, educating them about the Open Source ecosystem, its role in innovation and its value for an open future.
We’ll continue leading the conversation on Open Source AI, as we’ll need to keep on driving the discussion around data and write position papers to educate legislators in the US, in the EU, and around the world.
And finally, we’ll continue supporting Open Source business practice, helping developers and entrepreneurs to unlock the permissionless innovation enabled by Open Source software.
Join us as a supporting member or, if your organization benefits from Open Source, consider becoming a sponsor. Together, we can protect and expand the freedoms that make Open Source possible—for everyone.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI Celebrating 5 years at the Open Source Initiative: a journey of growth, challenges, and community engagementNick Vidal, community manager at the OSI, shares his story about working at the organization. This isn’t just a story about his career—it’s about the evolution of Open Source, the incredible people he worked with, and the values they’ve championed.
Other highlights:
- Improving Open Source security with the new GitHub Secure Open Source Fund
- Highlights from the Digital Public Goods Alliance Annual Members Meeting 2024
- Open Data and Open Source AI: Charting a course to get more of both
- The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
- ClearlyDefined v2.0 adds support for LicenseRefs
Article from Time Magazine
The distinction between ‘open’ and ‘closed’ AI models is not as simple as it might appear. While Meta describes its Llama models as open-source, it doesn’t meet the new definition published last month by the Open Source Initiative, which has historically set the industry standard for what constitutes open source.
Other highlights:
- Ai2 releases new language models competitive with Meta’s Llama (TechCrunch)
- AI2 closes the gap between closed-source and open-source post-training (VentureBeat)
- Women leading the charge in open source AI (Capacity Media)
- A battle is raging over the definition of open-source AI (The Economist)
- The best open-source AI models: All your free-to-use options explained (ZDNet)
- Read all press mentions from this past month
News from OSI affiliates:
- Eclipse Foundation: The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
- DPGA, ITU: Advancing Open Source AI: Definitions, Standards, and Global Implementation for a Sustainable Future
- Apache Software Foundation: The Apache Software Foundation Welcomes a New President
- Linux Foundation: Jim Zemlin, ‘head janitor of open source,’ marks 20 years at Linux Foundation
- Mozilla Foundation: Firefox 1.0 Released 20 Years Ago
- DPGA: Open source key to exchanging best practices in digital government
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
EventsUpcoming events:
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- Deep Dive into the Open Source AI Definition v1.0 (December 12, 2024 – online)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- Automattic
- Sentry
- Cisco
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
Improving Open Source security with the new GitHub Secure Open Source Fund
The Open Source community underpins much of today’s software innovation, but with this power comes responsibility. Security vulnerabilities, unclear licensing, and a lack of transparency in software components pose significant risks to software supply chains. Recognizing this challenge, GitHub recently announced the GitHub Secure Open Source Fund—a transformative initiative aimed at bolstering the security and sustainability of Open Source projects.
What is the Secure Open Source Fund?Launched with a $1.25 million commitment from partners, the GitHub Secure Open Source Fund is designed to address a critical issue: the often-overlooked necessity of security for widely-used Open Source projects. The fund not only provides financial support to project maintainers but also delivers a comprehensive suite of resources, including but-not-limited-to:
- Hands-on security training: A three-week program offering mentorship, workshops, and expert guidance.
- Community engagement: Opportunities to connect with GitHub’s Security Lab, sponsors, and other maintainers.
- Funding milestones: $10,000 per project, tied to achieving key security objectives.
The program’s cohort-based approach fosters collaboration and equips maintainers with the skills, networking, and funding to enhance the security of their projects sustainably.
Why this mattersThe success of Open Source hinges on its trustworthiness. For developers and organizations, the ability to confidently adopt and integrate Open Source projects is paramount. However, without sufficient security measures and transparency, these projects risk introducing vulnerabilities into the software supply chain. GitHub’s Secure Open Source Fund directly tackles this issue by empowering maintainers with the knowledge, community, and funding to make their projects secure and reliable.
Building trust through transparencyThe GitHub Secure Open Source Fund aligns with the global push for greater transparency and resilience in software supply chains between creators and consumers of Open Source software. Its focus on security addresses growing concerns highlighted by regulations such as the EU’s Cyber Resilience Act and US Cyber and Infrastructure Security Agency (CISA). By providing maintainers vital funding to prioritize focused-time and with resources to identify and address vulnerabilities, the program strengthens the foundation of Open Source ecosystems.
GitHub has taken an ecosystem-wide approach, where resources and security go hand in hand. The Open Source Initiative (OSI) was invited to become a launch ecosystem partner, and we hope to contribute with valuable input, feedback, and ideas along with other community members. One of our projects, ClearlyDefined, helps organizations to manage SBOMs at scale for each stage on the supply chain by providing easy access to accurate licensing metadata for Open Source components. Together, we hope to foster greater transparency and security for the entire supply chain.
GitHub Secure Open Source Fund Ecosystem Partners A call to action for the Open Source communityAs GitHub leads the charge with its Secure Open Source Fund, it’s crucial for the broader community to step up. Here’s how you can get involved:
- Learn more about security: Gain access to workshops, group sessions, and mentorship.
- Maximize transparency: Adopt tools like ClearlyDefined to ensure clear metadata for your components.
- Advocate for funding: Support initiatives that prioritize security, whether through sponsorship or advocacy.
Together, we can create a safer, more transparent, and more sustainable Open Source ecosystem.
To learn more about GitHub’s Secure Open Source Fund and apply, visit their official program page and announcement.
Let’s work collectively to secure the software supply chains that power innovation worldwide.
GitHub Secure Open Source Fund SponsorsCelebrating 5 years at the Open Source Initiative: a journey of growth, challenges, and community engagement
Reaching the five-year mark at the Open Source Initiative (OSI) has been a huge privilege. It’s been a whirlwind of progress, personal growth, and community engagement—filled with highs, great challenges, and plenty of Open Source celebrations. As I reflect on this milestone, it’s impossible not to feel both gratitude and excitement for what lies ahead. This isn’t just a story about my career—it’s about the evolution of Open Source, the incredible people I’ve worked with, and the values we’ve championed.
Joining OSI: first steps into the journeyBack in 2017, under the leadership of OSI’s General Manager Patrick Masson, I stepped into the role of Director of Community and Development at the OSI. That year, we began planning a celebration of the 20th anniversary of Open Source for 2018, a massive undertaking. This wasn’t just about throwing a party. It was a celebration of the immense impact Open Source has had on the global tech community.
And yet, the timing was crucial. Open Source, which had steadily grown over the past two decades, faced challenges on multiple fronts: from sustainability and diversity to startups attempting to redefine the term “Open Source” and obscure its principles. The emergence of faux Open Source licenses, like the Commons Clause and the Server Side Public License, only added to the urgency of defending the very core of our mission.
The 20th Anniversary world tourWe embarked on the OSI’s 20th Anniversary World Tour in 2018, organizing over 100 activities across 40 events globally, rallying the support of affiliates, sponsors, and the Open Source community at-large. The goal wasn’t just to celebrate but to affirm the values we held dear—transparency, collaboration, and freedom in software development. This culminated in the signing of the Affirmation of the Open Source Definition by over 40 foundations and corporations, a powerful statement that we would not back down in the face of challenges.
For the 20th Anniversary, the OSI contributed to 40 events worldwide Stepping back amidst the pandemicWhen the pandemic hit in 2020, everything came to a sudden halt. With schools closing and my young children—a one-year-old and a four-year-old—at home, the demands of balancing work and family life became overwhelming. I made the difficult decision to step down from my role at the OSI. It wasn’t easy to step away from something I loved, but at the time, it felt necessary.
A year later, I joined a security startup as Head of Community, where I led a cutting-edge Open Source project and held a leadership role at the Confidential Computing Consortium at the Linux Foundation. This new role allowed me to expand my expertise in community management and Open Source security—critical topics that would soon come to the forefront.
Return to the OSI: a new chapterIn early 2023, the startup I was working for had to close its doors unfortunately. But just as one chapter ended, another began. Stefano Maffulli, OSI’s new Executive Director, reached out to me with an opportunity to return to the OSI. This time, the focus was on addressing a growing concern: the clarity around licenses and security vulnerabilities in the Open Source supply chain.
I jumped at the chance to come back. In the beginning of this new chapter, I had the opportunity to work on two challenges: to manage the ClearlyDefined community, a project aimed at improving transparency in Open Source licensing and security, and organizing activities around the 25th Anniversary of Open Source.
The 25th Anniversary world tourThe 25th anniversary of Open Source marked an incredible milestone, not just for the OSI, but for the whole tech community. We organized activities across 36 conferences from around the world with a combined attendance of over 125,000 people. We contributed with 12 keynotes, 24 talks, 6 workshops, and 18 webinars.
For the 25th Anniversary, the OSI contributed to 36 events worldwideIt was a wonderful experience connecting with organizers of these conferences and engaging with speakers, volunteers and attendees. I personally contributed to 4 keynotes, 6 talks, 15 events, and co-organized the OSI track at All Things Open and the Deep Dive: AI webinars.
Throughout the year, our focus shifted from reviewing the past of Free and Open Source software to exploring the future of Open Source in this new era of AI.
Open Source AI Definition: a new challengeWith the popularization of Generative AI, many players in industry started using “Open Source AI” to describe their projects. Legislators around the world also started drafting laws that would have a huge impact on Open Source and AI developments. Stefano was already exploring this new challenge with the Deep Dive: AI series, but we realized that establishing an Open Source AI Definition was going to be critical. Along with the Board of Directors, we started planning a new direction for the OSI.
We organized several activities around Open Source and AI in partnership with major Open Source conferences, from talks and panels to workshops. We also launched forums for online discussions, and hosted webinars and town halls to make our activities more inclusive. The work felt more important than ever.
One of the biggest undertakings was organizing a multistakeholder co-design process, led by Mer Joyce, where we brought together global experts to establish a shared set of principles that can recreate permissionless, pragmatic and simplified collaboration for AI builders.
One of the projects I’m particularly proud of is the “Voices of the Open Source AI Definition” series. Through this initiative, we’ve been able to share the stories of the volunteers from the co-design process involved in shaping the Open Source AI Definition (OSAID). These stories highlight the diversity and passion of the community, bringing a human element to the often technical discussions around Open Source and AI.
Voices of the Open Source AI Definition
The work around the Open Source AI Definition developed by the OSI was cited over 100 times in the press worldwide, educating and countering misinformation. Our work was featured at The New York Times, The Verge, TechCrunch, ZDNET, InfoWorld, Ars Technica, IEEE Spectrum, MIT Technology Review, among other top media outlets.
ClearlyDefined: bringing clarity to licensingAs part of my community management role on the ClearlyDefined project, I had the opportunity to contribute under the exceptional leadership of E. Lynette Rayle (GitHub) and Qing Tomlinson (SAP), whose guidance was instrumental in driving the project forward. One of my major accomplishments was the creation of a brand-new website and comprehensive documentation.
We engaged with the ORT (Open Source Review Toolkit) and Scancode communities, building stronger connections. We made important technical updates, including upgrading to the latest version of Scancode and expanding our license support beyond just SPDX, making it easier for developers and organizations to navigate the increasingly complex landscape of licensing. This important work led to the release of version 2.0 of ClearlyDefined.
Another highlight was a new harvester implementation for conda, a popular package manager with a large collection of pre-built packages for various domains, including data science, machine learning, scientific computing and more.
Additionally, the adoption of ClearlyDefined by the GUAC community from OpenSSF was a testament to the growing importance of our work in bringing clarity to the Open Source supply chain not just in terms of licensing but also security.
ClearlyDefined’s new features: LicenseRef, conda support, and GUAC integration
We presented our work across three continents: in Europe at ORT Community Days, in North America at SOSS Fusion, and in Asia at Open Compliance Summit.
Finally, we made progress toward a more open governance model by electing leaders for the ClearlyDefined Steering and Outreach Committees.
Open Policy: from cybersecurity to AIOn the policy side, I had the privilege to work along with Deb Bryant and Simon Phipps, who kept track of policies affecting Open Source software, in particular the Securing Open Source Software Act in the US and the Cyber Resilience Act in Europe. As for policies involving Open Source and AI, I followed the European AI Act and US AI Bill of Rights and contributed with a compilation of a list of compelling responses from nonprofit organizations and companies to NTIA’s AI Open Model Weights RFC.
OpenSource.net: fostering knowledge sharingI also helped to launch OpenSource.net, a platform designed to foster knowledge sharing. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions, furthering the reach and impact of our work.
As part of the Practical Open Source (POSI) program, which facilitates discussions on doing business with and for Open Source, I was able to bring contributions from outstanding entrepreneurs and intrapreneurs.
Looking ahead: excitement for the futureIn the last 2 years at the OSI, I’ve had the privilege of publishing over 50 blog posts, as well as organizing, speaking at, and attending multiple events worldwide. The work we’ve done has been both challenging and rewarding, but it’s the community—the people who believe in the power of Open Source—that makes it all worthwhile.
As I celebrate five years at the OSI, I’m more energized than ever to continue this journey. The world of Open Source is evolving, and I’m excited to be part of shaping its future. There’s so much more to come, and I can’t wait to see where the next five years will take us.
This is more than just a personal milestone—it’s a celebration of the impact Open Source has had on the world and the endless possibilities it holds for the future.
You too can join the OSI: support our workThe work we do is only possible because of the passionate, engaged community that supports us. From advocating for Open Source principles to driving initiatives like the Open Source AI Definition and ClearlyDefined, every step forward has been powered by collaboration and shared commitment.
But there’s so much more to be done. The challenges facing Open Source today—from licensing to policy—are growing in scale and complexity. To meet them, we need your help. By joining or sponsoring the Open Source Initiative, you enable us to continue this vital work: educating, advocating, and building a stronger, more inclusive Open Source ecosystem.
Your support isn’t just a contribution; it’s an investment in the future of Open Source. Together, we can ensure that Open Source remains open for all. Join us in shaping the next chapter of this incredible journey.
Highlights from the Digital Public Goods Alliance Annual Members Meeting 2024
This month, I had the privilege of representing the Open Source Initiative at the Digital Public Goods Alliance (DPGA) Annual Members Meeting. Held in Singapore, this event marked our second year participating as members, following our first participation in Ethiopia. It was an inspiring gathering of innovators, developers and advocates working to create a thriving ecosystem for Digital Public Goods (DPGs) as part of UNICEF’s initiative to advance the United Nations’ sustainable development goals (SDGs).
Keynote insightsThe conference began with an inspiring keynote by Liv Marte Nordhaug, CEO of the DPGA, who made a call for governments and organizations to incorporate:
- Open Source first principles
- Open data at scale
- Interoperable Digital Public Infrastructure (DPI)
- DPGs as catalysts for climate change action
These priorities underscored the critical role of DPGs in fostering transparency, accountability and innovation across sectors.
DPGs and Open Source AIA standout feature of the first day of the event was the Open Source AI track led by Amreen Taneja, DPGA Standards Lead, which encompassed three dynamic sessions:
- Toward AI Democratization with Digital Public Goods: This session explored the role of DPGs in contributing to the democratization of AI technologies, including AI use, development and governance, to ensure that everyone benefits from AI technology equally.
- Fully Open Public Interest AI with Open Data: This session highlighted the need for open AI infrastructure supported by accessible, high-quality datasets, especially in the global majority countries. Discussions evolved over how open training data sets ought to be licensed to ensure open, public interest AI.
- Creating Public Value with Public AI: This session examined real-world applications of generative AI in public services. Governments and NGOs showcased how AI-enabled tools can effectively tackle social challenges, leveraging Open Source solutions within the AI stack.
The second day of the event was marked by the DPG Product Fair, which provided a science-fair-style platform for showcasing DPGs. Notable examples included:
- India’s eGov DIGIT Open Source platform, serving over 1 billion citizens with robust digital infrastructure.
- Singapore’s Open Government Products, which leverages Open Source and enables open collaboration to allow this small nation to expand their impact together with other southeast Asian nations.
One particularly engaging session was Sarah Espaldon’s “Improving Government Services with Legos.” This presentation from the Singapore government highlighted the benefits of modular DPGs in enhancing service delivery and building flexible DPI capabilities.
Privacy best practices for DPGsA highlight from the third and final day of the event was the privacy-focused workshop co-hosted by the DPGA and the Open Knowledge Foundation. As privacy becomes a central concern for DPGs, the DPGA introduced its Standard Expert Group to refine privacy requirements and develop best practice guidelines. The interactive session provided invaluable feedback, driving forward the development of robust privacy standards for DPGs.
Looking aheadThe event reaffirmed the potential of Open Source technologies to transform global public goods. As we move forward, the Open Source Initiative is committed to advancing these conversations, including around the Open Source AI Definition and fostering a more inclusive digital ecosystem. A special thanks to the dedicated DPGA team—Liv Marte Nordhaug, Lucy Harris, Max Kintisch, Ricardo Miron, Luciana Amighini, Bolaji Ayodeji, Lea Gimpel, Pelin Hizal Smines, Jon Lloyd, Carol Matos, Amreen Taneja, and Jameson Voisin—whose efforts made this conference a success. We look forward to another year of impactful collaboration and to the Annual Members Meeting next year in Brazil!
Give Your Input on the State of Open Source Survey
As we announced back in September, the OSI has partnered again with OpenLogic by Perforce to produce a comprehensive report on global, industry-wide Open Source software adoption trends. The 2025 State of Open Source Report will be based on responses to a survey of those working with Open Source software in their organizations, from developers to CTOs and everyone in between.
“This is our fourth year being involved in the State of Open Source Report, and there is never any shortage of surprises in the data,” says Stefano Maffulli, Executive Director, Open Source Initiative. “Now, however, the aim of the survey is not to determine whether or not organizations are using Open Source — we know they are — but to find out how they are handling complexities related to AI, licensing, and of course, security.”
This year, the survey includes new sections on Big Data, the impact of CentOS EOL, and security/compliance. As always, there are questions about technology usage in various categories such as infrastructure, cloud-native, frameworks, CI/CD, automation, and programming languages. Finally, a few questions toward the end look at Open Source maturity and stewardship, including sponsoring or being involved with open source foundations and organizations like OSI.
Of course, any report like this is only as valuable as its data and the more robust and high-quality the dataset, the stronger the report will be. As stewards of the Open Source community, OSI members are encouraged to take the survey so that the 2025 State of Open Source Report accurately reflects the interests, concerns, and preferences of Open Source software users around the world.
You can access the State of Open Source Survey here: https://www.research.net/r/SLQWZGF
Open Data and Open Source AI: Charting a course to get more of both
While working to define Open Source AI, we realized that data governance is an unresolved issue. The Open Source Initiative organized a workshop to discuss data sharing and governance for AI training. The critical question posed to attendees was “How can we best govern and share data to power Open Source AI?” The main objective of this workshop was to establish specific approaches and strategies for both Open Source AI developers and other stakeholders.
The Workshop: Building bridges across “Open” streamsHeld on October 10-11, 2024, and hosted by Linagora’s Villa Good Tech, the OSI workshop brought together 20 experts from diverse fields and regions. Funded by the Alfred P. Sloan Foundation, the event focused on actionable steps to align open data practices with the goals of Open Source AI.
Participants, listed below, comprised academics, civil society leaders, technologists, and representatives from organizations like Mozilla Foundation, Creative Commons, EleutherAI Institute and others.
- Ignatius Ezeani University of Lancaster / Nigeria
- Masayuki Hatta Debian, Open Source Group Japan / Japan
- Aviya Skowron EleutherAI Institute / Poland
- Stefano Zacchiroli Software Heritage / Italy
- Ricardo Torres Digital Public Goods Alliance / Mexico
- Kristina Podnar Data and Trust Alliance / Croatia + USA
- Joana Varon Coding Rights / Brazil
- Renata Avila Open Knowledge Foundation / Guatemala
- Alek Tarkowski Open Future / Poland
- Maximilian Gantz Mozilla Foundation / Germany
- Stefaan Verhulst GovLab / USA/Belgium
- Paul Keller Open Future / Germany
- Thom Vaughan Common Crawl / UK
- Julie Hunter Linagora / USA
- Deshni Govender GIZ FAIR Forward – AI for All / South Africa
- Ramya Chandrasekhar CNRS / India
- Anna Tumadóttir Creative Commons / Iceland
- Stefano Maffulli Open Source Initiative / Italy
Over two days, the group worked to frame a cohesive approach to data governance. Alek Tarkowski and Paul Keller of the Open Future Foundation are working with OSI to complete the white paper summarizing the group’s work. In the meantime, here is a quick “tease”—just a few of the many topics that the group discussed:
The streams of “open” merge, creating wavesAI is where Open Source software, open data, open knowledge, and open science meet in a new way. Since OpenAI released ChatGPT, what once were largely parallel tracks with occasional junctures are now a turbulent merger of streams, creating ripples in all of these disciplines and forcing us to reassess our principles: How do we merge these streams without eroding the principles of transparency and access that define openness?
We discovered in the process of defining Open Source AI that the basic freedoms we’ve put in the Open Source Definition and its foundation, the Free Software Definition, are still good and relevant. Open Source software has had decades to mature into a structured ecosystem with clear rules, tools, and legal frameworks. Same with Open Knowledge and Open Science: While rooted in age-old traditions, open knowledge and science have seen modern rejuvenation through platforms like Wikipedia and the Open Knowledge Foundation. Open data, however, feels less solid: often serving as a one-way pipeline from public institutions to private profiteers, is now dragged into a whole new territory.
How are these principles of “open” interacting with each other, how are we going to merge Open Data with Open Source with Open Science and Open Knowledge in Open Source AI?
The broken social contract of dataData fuels AI. The sheer scale of data required to train models like ChatGPT reveals not just a technological challenge but also a societal dilemma. Much of this data comes from us—the blogs we write, the code we share, the information we give freely to platforms.
OpenAI, for example, “slurps” all the data it can find, and much of it is what we willingly give: the blogs we write; the code we share; the pictures, emails and address books we keep in “the cloud”; and all the other information we give freely to platforms.
We, the people, make the “data,” but what are we getting in exchange? OpenAI owns and controls the machine built with our data, and it grants us access via API, until it changes its mind. We are essentially being stripmined for a proprietary system that grants access at a price—until the owner decides otherwise.
We need a different future, one where data empowers communities, not just corporations. That starts with revisiting the principles of openness that underpin the open source, open science, and open knowledge movements. The question is: How do we take back control?
Charting a path forwardWe want the machine for ourselves. We want machines that the people can own and control. We need to find a way to swing the pendulum back to our meaning of Open. And it’s all about the “data.”
The OSI’s work on the Open Source AI Definition provides a starting point. An Open Source AI machine is one that the people can meaningfully fork without having to ask for permission. For AI to truly be open, developers need access to the same tools and data as the original creators. That means transparent training processes, open filtering code, and, critically, open datasets.
Group photo of the participants to the workshop on data governance, Paris, Oct 2024. Next stepsThe white paper, expected in December, will synthesize the workshop’s discussions and propose concrete strategies for data governance in Open Source AI. Its goal is to lay the groundwork for an ecosystem where innovation thrives without sacrificing openness or equity.
As the lines between “open” streams continue to blur, the choices we make now will define the future of AI. Will it be a tool controlled by a few, or a shared resource for all?
The answer lies in how we navigate the waves of data and openness. Let’s get it right.
The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
BRUSSELS and WEST HOLLYWOOD, Calif. – 14 November 2024 – The Eclipse Foundation, one of the world’s largest open source foundations, and the Open Source Initiative (OSI), the global non-profit educating about and advocating for the benefits of open source and steward of the Open Source Definition, have signed a Memorandum of Understanding (MOU) to collaborate on promoting the interest of the open source community in the implementation of regulatory initiatives on Open Source Artificial Intelligence (OSAI). This agreement underscores the two organisations’ shared commitment to ensuring that emerging AI regulations align with widely recognised OSI open source definitions and open source values and principles.
“AI is arguably the most transformative technology of our generation,” said Stefano Maffulli, executive director, Open Source Initiative. “The challenge now is to craft policies that not only foster growth of AI but ensure that Open Source AI thrives within this evolving landscape. Partnering with the Eclipse Foundation and its expertise, with its experience in European open source development and regulatory compliance, is important to shape the future of Open Source AI.”
“For decades, OSI has been the ‘gold standard’ the open source community has turned to for building consensus around important issues,” said Mike Milinkovich, executive director of the Eclipse Foundation. “As AI reshapes industries and societies, there is no more pressing issue for the open source community than the regulatory recognition of open source AI systems. Our combined expertise – OSI’s global leadership in open standards and open source licences and our extensive work with open source regulatory compliance – makes this partnership a powerful advocate for the design and implementation of sound AI policies worldwide.”
Addressing the Global Challenges of AI Regulation
With AI regulation on the horizon in multiple regions, including the EU, both organisations recognise the urgency of helping policymakers understand the unique challenges and opportunities of OSAI technologies. The rapid evolution of AI technologies, together with new, upcoming complex regulatory landscapes, demand clear, consistent, and aligned guidance rooted in open source principles.
Through this partnership, the Eclipse Foundation and OSI will endeavour to bring clarity in language and terms that industry, community, civil society, and policymakers can rely upon as public policy is drafted and enforced. The organisations will collaborate by leveraging their respective public platforms and events to raise awareness and advocate on the topic. Additionally, they will work together on joint publications, presentations, and other promotional activities, while also assisting one another in educating government officials on policy considerations for OSAI and General Purpose AI (GPAI). Through this partnership, they aim to provide clear, consistent guidance that aligns with open source principles.
Key Areas of Collaboration
The MoU outlines several areas of cooperation, including:
- Information Exchange: OSI and the Eclipse Foundation will share relevant insights and information related to public policy-making and regulatory activities on artificial intelligence.
- Representation to Policymakers: OSI and the Eclipse Foundation will cooperate in representing the principles and values of open source licences to policymakers and civil society organisations.
- Promotion of Open Source Principles: Joint efforts will be made to raise awareness of the role of open source in AI, emphasising how it can foster innovation while mitigating risks.
A Partnership for the Future
As AI continues to revolutionise industries worldwide, the need for thoughtful, balanced regulation is critical. The OSI and Eclipse Foundation are committed to providing the open source community, industry leaders, and policymakers with the tools and knowledge they need to navigate this rapidly evolving field.
This MoU marks the very beginning of a long-term collaboration, with joint initiatives and activities to be announced throughout the remainder of 2024 and into 2025.
About the Eclipse Foundation
The Eclipse Foundation provides our global community of individuals and organisations with a business-friendly environment for open source software collaboration and innovation. We host the Eclipse IDE, Adoptium, Software Defined Vehicle, Jakarta EE, and over 420 open source projects, including runtimes, tools, specifications, and frameworks for cloud and edge applications, IoT, AI, automotive, systems engineering, open processor designs, and many others. Headquartered in Brussels, Belgium, the Eclipse Foundation is an international non-profit association supported by over 385 members. To learn more, follow us on social media @EclipseFdn, LinkedIn, or visit eclipse.org.
About the Open Source Initiative
Founded in 1998, the Open Source Initiative (OSI) is a non-profit corporation with global scope formed to educate about and advocate for the benefits of Open Source and to build bridges among different constituencies in the Open Source community. It is the steward of the Open Source Definition, setting the foundation for the global Open Source ecosystem. Join and support the OSI mission today at https://opensource.org/join.
Third-party trademarks mentioned are the property of their respective owners.
###
Media contacts:
Schwartz Public Relations (Germany)
Gloria Huppert/Marita Bäumer
Sendlinger Straße 42A
80331 Munich
EclipseFoundation@schwartzpr.de
+49 (89) 211 871 -70/ -62
514 Media Ltd (France, Italy, Spain)
Benoit Simoneau
benoit@514-media.com
M: +44 (0) 7891 920 370
Nichols Communications (Global Press Contact)
Jay Nichols
jay@nicholscomm.com
+1 408-772-1551
ClearlyDefined v2.0 adds support for LicenseRefs
One of the major focuses of the ClearlyDefined Technical Roadmap is the improvement in the quality of license data. As such, we are excited to announce the release of ClearlyDefined v2.0 which adds over 2,000 new well-known licenses it can identify. You can see the complete list of new non-SPDX licenses in ScanCode LicenseDB.
A little historical background, when Clearly Defined was first created, it was initially decided to limit the reported licenses to only those on the SPDX License List. As teams worked with the Clearly Defined data, it became clear that additional license discovery is important to give users a fuller picture of the projects they depend on. In previous releases of ClearlyDefined, licenses not on the SPDX License List were represented in the definition as NOASSERTION or OTHER. (See the breakdown of licenses in The most popular licenses for each language in 2023.)The v2.0 release of ClearlyDefined includes an update of ScanCode to v32 and the support of LicenseRefs to identify non-SPDX licenses. The license in the definition will now be a LicenseRef with prefix LicenseRef-scancode- if ScanCode identifies a non-SPDX license. This improves the license coverage in the ClearlyDefined definitions and consumers ability to accurately construct license compliance policies.
ClearlyDefined identifies licenses in definitions using SPDX expressions. The SPDX specification has a way to include non-SPDX licenses in license expressions.
A license expression could be a single license identifier found on the SPDX License List; a user defined license reference denoted by the LicenseRef-[idString]; a license identifier combined with an SPDX exception; or some combination of license identifiers, license references and exceptions constructed using a small set of defined operators (e.g., AND, OR, WITH and +)
— excerpt from SPDX Annexes: SPDX license expressions
Example change of a definition:
CoordinatesLicense BEFORELicense AFTERnpm/npmjs/@alexa-games/sfb-story-debugger/2.1.0NOASSERTIONLicenseRef-.amazon.com.-AmznSL-1.0Note: ClearlyDefined v2.0 also includes an update to ScanCode v32.
What does this mean for definitions?This section includes a simplified description of what happens when you request a definition from ClearlyDefined. These examples only refer to the ScanCode tool. Other tools are run as well and are handled in similar ways.
When the definition already existsAny request for a definition through the /definitions API makes a couple of checks before returning the definition:
If the definition exists, it checks whether the definition was created using the latest version of the ClearlyDefined service.
- If yes, it returns the definition as is.
- If not, it recomputes the definition using the existing raw results from the tools run during the previous harvest for the existing definition. In this case, the tool version will be earlier than ScanCode v32.
NOTE: ClearlyDefined does not support LicenseRefs from ScanCode prior to v32. For earlier versions of ScanCode, ClearlyDefined stores any LicenseRefs as NOASSERTION. In some cases, you may see OTHER when the definition was curated.
When the definition does not existIf the definition does not exist:
- It will send a harvest request which will run the latest version of all the tools and produce raw results.
- From these raw results, it will compute a definition which might include a LicenseRef.
If you see NOASSERTION in the license expression, you can check the definition to determine the version of ScanCode in the “described”: “tools” section.
If ScanCode is a version earlier than v32, you can submit a harvest API request. This will run any tools for which ClearlyDefined now supports a later version. Once the tools complete, the definition will be recomputed based on the new results.
In some cases, even when the results are from ScanCode v32, you may still see NOASSERTION. Reharvesting when the ScanCode version is already v32 will not change the definition.
What does this mean for tools?When adding ScanCode licenses to allow/deny lists, note the ScanCode LicenseDB lists licenses without the LicenseRef prefix. All LicenseRefs coming from ScanCode will start with LicenseRef-scancode-.
Tools using an Allow ListA recomputed definition may change the license to include a LicenseRef that you want to allow. All new LicenseRefs that are acceptable will need to be added to your allow list. We are taking the approach of adding them as they appear in flagged package-version licenses. An alternative is to review the ScanCode LicenseDB to proactively add LicenseRefs to your allow list.
Tools using a Deny ListDeny lists need to be exhaustive to prevent a new license from being allowed by default. It is recommended that you review the ScanCode LicenseDB to determine if there are LicenseRefs you want to add to the deny list.
Note: The SPDX License List also changes over time. A periodic review to maintain the Deny list is always a good idea.
Providing FeedbackAs with any major version change, there can be unexpected behavior. You can reach out with questions, feedback, or requests. Find how to get in touch with us in the Get Involved doc.
If you have comments or questions on the actual LicenseRefs, you should reach out to ScanCode License DB maintainers.
AcknowledgementsA huge thank you to the contributing developers and their organizations for supporting the work of ClearlyDefined.
In alphabetical order, contributors were…
- ajhenry (GitHub)
- brifl (Microsoft)
- elrayle (GitHub)
- jeff-luszcz (GitHub)
- ljones140 (GitHub)
- lumaxis (GitHub)
- mpcen (Microsoft)
- nickvidal (Open Source Initiative)
- qtomlinson (SAP)
- RomanIakovlev (GitHub)
- yashkohli88 (SAP)
See something you’d like ClearlyDefined to do or could do better? If you have resources to help out, we have work to be done to further improve data quality, performance, and sustainability. We’d love to hear from you.
ReferencesClearlyDefined at SOSS Fusion 2024: a collaborative solution to Open Source license compliance
This past month, the Open Source Security Foundation (OpenSSF) hosted SOSS Fusion in Atlanta, an event that brought together a diverse community of leaders and innovators from across the digital security spectrum. The conference, held on October 22-23, explored themes central to today’s technological landscape: AI security, diversity in technology, and public policy for Open Source software. Industry thought leaders like Bruce Schneier, Marten Mickos, and Cory Doctorow delivered keynotes, setting the tone for a conference that emphasized collaboration and community in creating a secure digital future.
Amidst these pressing topics, the Open Source Initiative in collaboration with GitHub and SAP presented ClearlyDefined—an innovative project aimed at simplifying software license compliance and metadata management. Presented by Nick Vidal of the Open Source Initiative, along with E. Lynette Rayle from GitHub and Qing Tomlinson from SAP, the session highlighted how ClearlyDefined is transforming the way organizations handle licensing compliance for Open Source components.
What is ClearlyDefined?ClearlyDefined is a project with a powerful vision: to create a global crowdsourced database of license metadata for every software component ever published. This ambitious mission seeks to help organizations of all sizes easily manage compliance by providing accurate, up-to-date metadata for Open Source components. By offering a single, reliable source for license information, ClearlyDefined enables organizations to work together rather than in isolation, collectively contributing to the metadata that keeps Open Source software compliant and accessible.
The problem: redundant and inconsistent license managementIn today’s Open Source ecosystem, managing software licenses has become a significant challenge. Many organizations face the repetitive task of identifying, correcting, and maintaining accurate licensing data. When one component has missing or incorrect metadata, dozens—or even hundreds—of organizations using that component may duplicate efforts to resolve the same issue. ClearlyDefined aims to eliminate redundancy by enabling a collaborative approach.
The solution: crowdsourcing compliance with ClearlyDefinedClearlyDefined provides an API and user-friendly interface that make it easy to access and contribute license metadata. By aggregating and standardizing licensing data, ClearlyDefined offers a powerful solution for organizations to enhance SBOMs (Software Bill of Materials) and license information without the need for extensive re-scanning and data correction. At the conference, Nick demonstrated how developers can quickly retrieve license data for popular libraries using a simple API call, making license compliance seamless and scalable.
In addition, organizations that encounter incomplete or incorrect metadata can easily update it through ClearlyDefined’s platform, creating a feedback loop that benefits the entire Open Source community. This crowdsourcing approach means that once an organization fixes a licensing issue, that data becomes available to all, fostering efficiency and accuracy.
Key components of ClearlyDefined’s platform1. API and User Interface: Users can access ClearlyDefined data through an API or the website, making it simple for developers to integrate license checks directly into their workflows.
2. Human curation and community collaboration: To ensure high data quality, ClearlyDefined employs a curation workflow. When metadata requires updates, community members can submit corrections that go through a human review process, ensuring accuracy and reliability.
3. Integration with popular package managers: ClearlyDefined supports various package managers, including npm and pypi, and has recently expanded to support Conda, a popular choice among data science and AI developers.
Real-world use cases: GitHub and SAP’s adoption of ClearlyDefinedDuring the presentation, representatives from GitHub and SAP shared how ClearlyDefined has impacted their organizations.
– GitHub: ClearlyDefined’s licensing data powers GitHub’s compliance solutions, allowing GitHub to manage millions of licenses with ease. Lynette shared how they initially onboarded over 17 million licenses through ClearlyDefined, a number that has since grown to over 40 million. This database enables GitHub to provide accurate compliance information to users, significantly reducing the resources required to maintain licensing accuracy. Lynette showcased the harvesting process and the curation process. More details about how GitHub is using ClearlyDefined is available here.
– SAP: Qing discussed how ClearlyDefined’s approach has streamlined SAP’s Open Source compliance efforts. By using ClearlyDefined’s data, SAP reduced the time spent on license reviews and improved the quality of metadata available for compliance checks. SAP’s internal harvesting service integrates with ClearlyDefined, ensuring that critical license metadata is consistently available and accurate. SAP has contributed to the ClearlyDefined project and most notably, together with Microsoft, has optimized the database schema and reduced the database operational cost by more than 90%. More details about how SAP is using ClearlyDefined is available here.
Why ClearlyDefined mattersClearlyDefined is a community-driven initiative with a vision to address one of Open Source’s biggest challenges: ensuring accurate and accessible licensing metadata. By centralizing and standardizing this data, ClearlyDefined not only reduces redundant work but also fosters a collaborative approach to license compliance.
The platform’s Open Source nature and integration with existing package managers and APIs make it accessible and scalable for organizations of all sizes. As more contributors join the effort, ClearlyDefined continues to grow, strengthening the Open Source community’s commitment to compliance, security, and transparency.
Join the ClearlyDefined communityClearlyDefined is always open to new contributors. With weekly developer meetings, an open governance model, and continuous collaboration with OpenSSF and other Open Source organizations, ClearlyDefined provides numerous ways to get involved. For anyone interested in shaping the future of license compliance and data quality in Open Source, ClearlyDefined offers an exciting opportunity to make a tangible impact.
At SOSS Fusion, ClearlyDefined’s presentation showcased how an open, collaborative approach to license compliance can benefit the entire digital ecosystem, embodying the very spirit of the conference: working together toward a secure, inclusive, and sustainable digital future.
Download slides and see summarized presentation transcript below.
ClearlyDefined presentation transcriptHello, folks, good morning! Let’s start by introducing ClearlyDefined, an exciting project. My name is Nick Vidal, and I work with the Open Source Initiative. With me today are Lynette Rayle from GitHub and Qing Tomlinson from SAP, and we’re all very excited to be here.
Introduction to ClearlyDefined’s mission
So, what’s the mission of ClearlyDefined? Our mission is ambitious—we aim to crowdsource a global database of license metadata for every software component ever published. This would benefit everyone in the Open Source ecosystem.
The problem ClearlyDefined addresses
There’s a critical problem in the Open Source space: compliance and managing SBOMs (Software Bill of Materials) at scale. Many organizations struggle with missing or incorrect licensing metadata for software components. When multiple organizations use a component with incomplete or wrong license metadata, they each have to solve it individually. ClearlyDefined offers a solution where, instead of every organization doing redundant work, we can collectively work on fixing these issues once and make the corrected data available to all.
ClearlyDefined’s solution
ClearlyDefined enables organizations to access license metadata through a simple API. This reduces the need for repeated license scanning and helps with SBOM generation at scale. When issues arise with a component’s license metadata, organizations can contribute fixes that benefit the entire community.
Getting started with ClearlyDefined
To use ClearlyDefined, you can access its API directly from your terminal. For example, let’s say you’re working with a JavaScript library like Lodash. By calling the API, you can get all license metadata for a specific version of Lodash at your fingertips.
Once you incorporate this licensing metadata into your workflow, you may notice some metadata that needs updating. You can curate that data and contribute it back, so everyone benefits. ClearlyDefined also provides a user-friendly interface for this, making it easier to contribute.
Open Source and community contributions
ClearlyDefined is an Open Source initiative, hosted on GitHub, supporting various package managers (e.g., npm, pypi). We work to promote best practices and integrate with other tools. Recently, we’ve expanded our scope to support non-SPDX licenses and Conda, a package manager often used in data science projects.
Integration with other tools
ClearlyDefined integrates with GUAC, an OpenSSF project that consumes ClearlyDefined data. This integration broadens the reach and utility of ClearlyDefined’s licensing information.
Case studies and community impact
I’d like to hand it over to Lynette from GitHub, who will talk about how GitHub uses ClearlyDefined and why it’s critical for license compliance.
GitHub’s use of ClearlyDefined
Hello, I’m Lynette, a developer at GitHub working on license compliance solutions. ClearlyDefined has become a key part of our workflows. Knowing the licenses of our dependencies is crucial, as legal compliance requires correct attributions. By using ClearlyDefined, we’ve streamlined our process and now manage over 40 million licenses. We also run our own harvester to contribute back to ClearlyDefined and scale our operations.
SAP’s adoption of ClearlyDefined
Hi, my name is Qing. At SAP, we co-innovate and collaborate with Open Source, ensuring a clean, well-maintained software pool. ClearlyDefined has streamlined our license review process, reducing time spent on scanning and enhancing data quality. SAP’s journey with ClearlyDefined began in 2018, and since then, we’ve implemented large-scale automation for our Open Source compliance and continuously contribute curated data back to the community.
Community and governance
ClearlyDefined thrives on community involvement. We recently elected members to our Steering and Outreach Committees to support the platform and encourage new contributors. Our weekly developer meetings and active Discord channel provide opportunities to engage, share knowledge, and collaborate.
Q&A highlights
- PURLs as Package Identifiers: We’re exploring support for PURLs as an internal coordinate system.
- Data Quality Issues: Data quality is our top priority. We plan to implement routines to scan for common issues, ensuring accurate metadata across the platform.
Thank you all for joining us today. If you’re interested in contributing, please reach out and become part of this collaborative community.
Members Newsletter – November 2024
After more than two years of collaboration, information gathering, global workshopping, testing, and an in-depth co-design process, we have an Open Source AI Definition.
The purpose of version 1.0 is to establish a workable standard for developers, researchers, and educators to consider how they may design evaluations for AI systems’ openness. The meaningful ability to fork and control their AI will foster permissionless, global innovation. It was important to drive a stake in the ground so everyone has something to work with. It’s version 1.0, so going forward, the process allows for improvement, and that’s exactly what will happen.
Over 150 individuals were part of the OSAID forum, nearly 15K subscribers to the OSI newsletter were kept up-to-date with the latest news about the OSAID, 2M unique visitors to the OSI website were exposed to the OSAID process. There were 50+ co-design working group volunteers representing 29 countries, including participants from Africa, Asia, Europe, and the Americas.
Future versions of OSAID will continue to be informed by the feedback we receive from various stakeholder communities. The fundamental principles and aim will not change, but, as our (collective) understanding of the technology improves and technology itself evolves, we might need to update to clarify or even change certain requirements. To enable this, the OSI Board voted to establish an AI sub-committee who will develop appropriate mechanisms for updating the OSAID in consultation with stakeholders. It will be fully formed in the months ahead.
Please continue to stay involved, as diverse voices and experiences are required to ensure Open Source AI works for the good of us all.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI The Open Source Initiative Announces the Release of the Industry’s First Open Source AI DefinitionOpen and public co-design process culminates in a stable version of Open Source AI Definition, ensures freedoms to use, study, share and modify AI systems.
Other highlights:
- How we passed the AI conundrums
- ClearlyDefined at SOSS Fusion 2024
- ClearlyDefined’s Steering and Outreach Committees Defined
- The Open Source Initiative Supports the Open Source Pledge
Article from ZDNet
For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them.
Other highlights:
- The Gap Between Open and Closed AI Models Might Be Shrinking. Here’s Why That Matters (Time)
- Meta’s military push is as much about the battle for open-source AI as it is about actual battles (Fortune)
- OSI unveils Open Source AI Definition 1.0 (InfoWorld)
- We finally have an ‘official’ definition for open source AI (TechCrunch)
- Read all press mentions from this past month
News from OSI affiliates:
- OpenSSF: SOSS Fusion 2024: Uniting Security Minds for the Future of Open Source (Security Boulevard)
- Mozilla Foundation: How Mozilla’s President Defines Open-Source AI (Forbes)
News from OpenSource.net:
- OpenSource.Net turns one with a redesign
- How to make reviewing pull requests a better experience
- Closing the Gap: Accelerating environmental Open Source
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
Bloomberg is seeking a Technical Architect to join their OSPO team.
EventsUpcoming events:
- Nerdearla Mexico (November 7-9, 2024 – Mexico City)
- SeaGL (November 8-9, 2024 – Seattle)
- SFSCON (November 8-9, 2024 – Bolzano)
- KubeCon + CloudNativeCon North America (November 12-15, 2024 – Salt Lake City)
- OpenForum Academy Symposium (November, 13-14, 2024 – Boston)
- The Linux Foundation Legal Summit (November 18-19, 2024 – Napa)
- The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- FOSDEM 2025 EU-Policy Devroom – event being organized by the OSI, OpenForum Europe, Eclipse Foundation, The European Open Source Software Business Association, the European Commission Open Source Programme Office, and the European Commission.
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
The Open Source Initiative Announces the Release of the Industry’s First Open Source AI Definition
RALEIGH, N.C., Oct. 28, 2024 — ALL THINGS OPEN 2024 — After a year-long, global, community design process, the Open Source Definition (OSAID) v.1.0 is available for public use.
The release of version 1.0 was announced today at All Things Open 2024, an industry conference focused on common issues of interest to the worldwide Open Source community. The OSAID offers a standard by which community-led, open and public evaluations will be conducted to validate whether or not an AI system can be deemed Open Source AI. This first stable version of the OSAID is the result of multiple years of research and collaboration, an international roadshow of workshops, and a year-long co-design process led by the Open Source Initiative (OSI), globally recognized by individuals, companies and public institutions as the authority that defines Open Source.
“The co-design process that led to version 1.0 of the Open Source AI Definition was well-developed, thorough, inclusive and fair,” said Carlo Piana, OSI board chair. “It adhered to the principles laid out by the board, and the OSI leadership and staff followed our directives faithfully. The board is confident that the process has resulted in a definition that meets the standards of Open Source as defined in the Open Source Definition and the Four Essential Freedoms, and we’re energized about how this definition positions OSI to facilitate meaningful and practical Open Source guidance for the entire industry.”
“The new definition requires Open Source models to provide enough information about their training data so that a ‘skilled person can recreate a substantially equivalent system using the same or similar data,’ which goes further than what many proprietary or ostensibly Open Source models do today,” said Ayah Bdeir, who leads AI strategy at Mozilla. “This is the starting point to addressing the complexities of how AI training data should be treated, acknowledging the challenges of sharing full datasets while working to make open datasets a more commonplace part of the AI ecosystem. This view of AI training data in Open Source AI may not be a perfect place to be, but insisting on an ideologically pristine kind of gold standard that will not actually be met by any model builder could end up backfiring.”
“We welcome OSI’s stewardship of the complex process of defining Open Source AI,” said Liv Marte Nordhaug, CEO of the Digital Public Goods Alliance (DPGA) secretariat. “The Digital Public Goods Alliance secretariat will build on this foundational work as we update the DPG Standard as it relates to AI as a category of DPGs.”
“Transparency is at the core of EleutherAI’s non-profit mission. The Open Source AI Definition is a necessary step towards promoting the benefits of Open Source principles in the field of AI,” said Stella Biderman, executive director at the EleutherAI Institute. “We believe that this definition supports the needs of independent machine learning researchers and promotes greater transparency among the largest AI developers.”
“Arriving at today’s OSAID version 1.0 was a difficult journey, filled with new challenges for the OSI community,” said OSI Executive Director, Stefano Maffulli. “Despite this delicate process, filled with differing opinions and uncharted technical frontiers—and the occasional heated exchange—the results are aligned with the expectations set out at the start of this two-year process. This is a starting point for a continued effort to engage with the communities to improve the definition over time as we develop with the broader Open Source community the knowledge to read and apply OSAID v.1.0.”
The text of the OSAID v.1.0 as well as a partial list of the many global stakeholders who endorse the definition can be found here: https://opensource.org/ai
About the Open Source Initiative
Founded in 1998, the Open Source Initiative (OSI) is a non-profit corporation with global scope formed to educate about and advocate for the benefits of Open Source and to build bridges among different constituencies in the Open Source community. It is the steward of the Open Source Definition and the Open Source AI Definition, setting the foundation for the global Open Source ecosystem. Join and support the OSI mission today at: https://opensource.org/join.
ClearlyDefined’s Steering and Outreach Committees Defined
We are excited to announce the newly elected leaders for the ClearlyDefined Steering and Outreach Committees!
What is ClearlyDefined?ClearlyDefined is an Open Source project dedicated to improving the clarity and transparency of Open Source licensing and security data. By harvesting, curating, and sharing essential metadata, ClearlyDefined helps developers and organizations better understand their software components, ensuring responsible and compliant use of Open Source code.
Steering Committee Election Results:Congratulations to E. Lynette Rayle (GitHub), Qing Tomlinson (SAP), and Jeff Mendoza (Kusari/GUAC) for being elected to the ClearlyDefined Steering Committee. These three community leaders will serve a one-year term starting on September 25, 2024. Following election recommendations, the Steering Committee is structured to have an odd number of members (three in this case) and a maximum of one member per company. Lynette Rayle was elected chair of the committee.
The Steering Committee is primarily responsible for setting the project’s technical direction. They oversee processes such as data harvesting, curation, and contribution, ensuring that the underlying architecture functions smoothly. Their focus is on empowering the community, supporting the contributors and maintainers, and fostering collaboration with related projects.
E. Lynette Rayle is a Senior Engineer at GitHub and has been working on ClearlyDefined as a maintainer for just over a year. GitHub is using ClearlyDefined data in several capacities and has a strong stake in ensuring successful outcomes in data quality, performance, and sustainability.
Qing Tomlinson is a Senior Developer at SAP and has been contributing to the ClearlyDefined project since November 2021. SAP has been actively engaged in the ClearlyDefined project since its inception, utilizing the data and actively contributing to its curation. The quality, performance, and long-term viability of the ClearlyDefined project are of utmost importance to SAP.
Jeff Mendoza is a Software Engineer at Kusari, a software supply chain security startup. He is a maintainer of the OpenSSF GUAC project, which consumes ClearlyDefined data. Formerly, Jeff was a full time developer on ClearlyDefined. Jeff brings experience from both the sides of the project, developer and consumer.
Outreach Committee Election Results:We are also thrilled to announce the election of Jeff Luszcz (GitHub), Alyssa Wright (Bloomberg), Brian Duran (SAP), and Nick Vidal (Open Source Initiative) to lead the ClearlyDefined Outreach Committee. They began their one-year term on October 7, 2024. Unlike the Steering Committee, the Outreach Committee has four members, following a consensus reached at the Community meeting that an even number of members is acceptable since tie-breaking votes are less likely. The elected members will select their Chair soon and may also invite other community members to participate.
The Outreach Committee focuses on promoting the project and growing its community. Their responsibilities include organizing events, creating educational materials, and managing communications across various channels, including blogs, social media, and webinars. They help ensure that more users and contributors engage with ClearlyDefined and understand its mission.
Jeff Luszcz is Staff Product Manager at GitHub. Since 2004, he has helped hundreds of software companies understand how to best use open source while complying with their license obligations and keeping on top of security issues.
Alyssa Wright helps lead Bloomberg’s Open Source Program Office in the Office of the CTO, which is the center of excellence for Bloomberg’s engagements with and consumption of open source software.
Brian Duran leads the implementation strategy for adoption of ClearlyDefined within SAP’s open source compliance teams. He has a combined 12 years of experience in open-source software compliance and data quality management.
Nick Vidal is Community Manager at the Open Source Initiative and former Outreach Chair at the Confidential Computing Consortium from the Linux Foundation. Previously, he was the Director of Community and Business Development at the Open Source Initiative and Director of Americas at the Open Invention Network.
Get Involved!We encourage everyone in the ClearlyDefined community to get involved! Whether you’re a developer, data curator, or simply passionate about Open Source software, your contributions are invaluable. Join the conversation, attend meetings, and share your ideas on how to improve and grow the project. Reach out to the newly elected committee members or participate in our upcoming community events.
Let’s work together to drive the ClearlyDefined mission forward! Stay tuned for more updates and opportunities to participate as the committees continue their important work.
Rahmat Akintola: Voices of the Open Source AI Definition
The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.
This series features the voices of the volunteers who have helped shape and are shaping the Definition.
Meet Rahmat AkintolaWhat’s your background related to Open Source and AI?
Sure. I’ll start with Open Source. My journey began at PyCon Africa in 2019, where I participated in a hackathon on Cookiecutter. At the time, I had just transitioned into web development, and I was looking for ways to improve my skills beyond personal projects. So, I joined the Cookiecutter Academy at Python Africa in 2019. That’s how I got introduced to Open Source.
Since then, I’ve been contributing regularly, starting with one-off contributions to different projects. These days, I primarily focus on code and documentation contributions, mainly in web development.
As for AI, my journey started with data science. I had been working as a program manager and was part of the Women in Machine Learning and Data Science community in Accra, which was looking for volunteers. Coincidentally, I had lost my job at the time, so I applied for the program manager role and got it. That experience sparked my interest in AI. I started learning more about machine learning and AI, and I needed to build my domain knowledge to help with my role in the community.
I’ve worked on traditional models like linear and logistic regression through various courses. Recently, as part of our community, we organized a “Mathematics for Machine Learning” boot camp, where we worked on projects related to reinforcement learning and logistic regression. One dataset I worked with involved predicting BP (blood pressure) levels in the US. The task was to assess the risk of developing hypertension based on various factors.
What motivated you to join this co-design process to define Open Source AI?
The Open Source AI journey started when I was informed about a virtual co-design process that was reaching out to different communities, including mine. As the program lead, I saw it as an opportunity to merge my two passions—Open Source and AI.
I volunteered and worked on testing the OpenCV workbook, as I was using OpenCV at the time. I participated in the first phase, which focused on determining whether certain datasets needed to be open. Unfortunately, I couldn’t participate in the validation phase because I was involved in the mathematics boot camp, but I followed the discussions closely.
When the opportunity came up to participate in the co-design process, I saw it as a chance to bridge my work in Open Source web development and my growing interest in AI. It felt like the perfect moment. I was already using OpenCV, which happened to be part of the AI systems under review, so I jumped right in.
Through the process, I realized that defining Open Source AI goes beyond just using tools or making code contributions—it involves a deep understanding of data, legality, and the broader system.
How did you get invited to speak at the Deep Learning Indaba conference in Dakar? How was the conference experience? Did you make any meaningful connections?
As for speaking at Deep Learning Indaba, the opportunity came unexpectedly. One day, Mer Joyce (the OSAID co-design organizer) sent an email offering a chance to speak on Open Source AI at the conference. I had previously applied to attend but didn’t get in, so I jumped on this opportunity. We used a presentation similar to one May had given at Open Source Community Africa.
I made excellent connections. The conference itself was amazing—though the food and the Senegal experience also played a part! There were many AI and machine learning researchers, and I learned new concepts, like using JAX, which was introduced as an alternative to some common frameworks. The tutorials were well-targeted at beginners, which was perfect for me.
On a personal level, it was great to connect with academics. I’m considering applying for a master’s or Ph.D., and the conference provided an opportunity to ask questions and receive guidance.
Why do you think AI should be Open Source?
AI is becoming a significant part of our lives. I work with the Meltwater Entrepreneurial School of Technology (MEST) as a technical lead, and we use AI for various training purposes. Opening up parts of AI systems allows others to adapt and refine them to suit their needs, especially in localized contexts. For example, I saw someone on Twitter excited about building a GPT for dating, customizing it to ask specific questions.
This ability for people to tweak and refine AI models, even without building them from scratch, is important. Open-sourcing AI enables more innovation and helps tailor models for specific needs, which is why I believe it should be open to an extent.
Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?
One new perspective I gained was on the legal and data availability aspects of AI. Before this, I had never really considered the legal side of things, but during the co-design process, it became clear that these elements are crucial in defining Open Source AI systems. It’s more than just contributing code—it’s about ensuring compliance with legal frameworks and making sure data is available and usable.
What do you think the primary benefit will be once there is a clear definition of Open Source AI?
A clear definition would help people understand that Open Source AI involves more than just attaching an MIT or Apache license to a project on GitHub. There’s more complexity around sharing models, data and parameters.
For instance, I was once asked whether using an “Open Source” large language model like LLaMA meant the data had to be open too. A well-defined standard would provide guidance for questions like these, ensuring people understand the legal and technical aspects of making their AI systems Open Source.
What do you think are the next steps for the community involved in Open Source AI?
In Africa, I think the next step is spreading awareness about the Open Source AI Definition. Many people are still unaware of the complexities, and there’s still a tendency to assume that adding an Open Source license to a project automatically makes it open. Building collaborations with local communities to share this information is important.
For women, especially in Africa, visibility is key. When women see others doing similar work, they feel encouraged to join. Representation and community engagement play significant roles in driving diversity in Open Source AI.
How to get involvedThe OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:
- Join the forum: share your comment on the drafts.
- Leave comment on the latest draft: provide precise feedback on the text of the latest draft.
- Follow the weekly recaps: subscribe to our monthly newsletter and blog to be kept up-to-date.
- Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions and share your thoughts.
- Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.
How we passed the AI conundrums
Some people believe that full unfettered access to all training data is paramount. This group argues that anything less than all the data would compromise the Open Source principles, forever removing full reproducibility of AI systems, transparency, security and other outcomes. We’ve heard them and we’ve provided a solution rooted in decades of Open Source practice.
To have the chance for powerful Open Source AI systems to exist in any domain, the OSI community has incorporated in the Definition this principle:
An Open Source AI needs to make available three kinds of components: the software used to create the dataset and run the training, the model parameters and the code to run inference, and finally all the data that can be made available legally.
Recognizing that there are four kinds of “data”, each with its own legal frameworks allowing different freedoms of distribution, we bypass what Stephen O’Grady called the “AI conundrums” and give Open Source AI builders a chance to build freedom-respecting alternatives to pretty much any proprietary AI.
Limiting Open source AI only to systems trainable on freely distributable data would relegate Open Source AI to a niche. One of which is that the amount of freely and legally shareable data is a tiny fraction of what is necessary to train powerful systems. Additionally, it’d be excluding Open Source AI from areas where data cannot be shared, like medical or anything dealing with personal or private data. What remains for “Open Source AI” would be tiny. There are abundant motives to reject this limitation.
The fact is, mixing openly distributable and non-distributable data is very similar to a reality we are very familiar with: Open Source software built with proprietary compilers and system libraries.
Is GNU Emacs Open Source software?I’m sure you’d answer yes (and some of you will say “well, actually it’s free software”) and we’ll all agree. Below is a rough diagram of Emacs built for the GNOME desktop on a modern Linux distribution. Emacs depends on a few system libraries that GNOME provides with OSI-Approved Licenses. The whole stack is Open Source these days and one can distribute Emacs on a disk with all its dependencies without too much legal trouble. Imagine scientists who want to freeze the whole environment of an experiment they made; they could package all the pieces of a system like this without trouble and distribute it all with their paper. No problem here.
Now let’s go back to an age when Linux systems weren’t ready. When Stallman started writing Emacs, there was no GNOME and no Linux, no gcc and no glibc. He thought very early on that in order to have more freedom, he had to create a wedge to allow Emacs to run on proprietary software.
Emacs on the latest Solaris versions would look something like this: some pieces like X11 and Gstreamer are Open Source. Others, like libc and others aren’t. The hypothetical scientists from before couldn’t really freeze their full scientific environment. All they could say in their paper was: “We used Emacs from this CVS version, built with gcc version X with these makefile; tar.gz attached” and make a list of the operating system’s version and libraries versions they used. That’s because they have the right only to distribute Emacs, X11, some libraries and not the rest of Solaris.
Is Emacs on Solaris Open Source? Of course it is, even though the source code for the system libraries are not available.
One more question, Emacs on Mac OS: it can only be built with a proprietary compiler on proprietary GUI and other proprietary libraries.
Is Emacs on Mac Open Source? Of course it is. Can you fully study Emacs on Mac OS? For Emacs, yes. For the MacOS components, no. There are many programs that run only on MacOS or Windows: for OSI, those are Open Source. Would someone argue that they’re not “really Open Source” because you can’t see “everything?” Some people might but we’ve learned to live with that, adding governance rules in addition to those of the Open Source Definition. Debian for example requires that programs are Open Source and support multiple hardware platforms; the ASF graduates only projects that are Open Source and have a diverse community of contributors. If you only want to use Open Source applications running on Open Source stacks, you can decide that! Just as you can decide that your company will only acquire Open Source software whose copyright is owned by multiple entities.
These are all additional requirements built on top of the base floor set by the Open Source Definition.
For AI, you can do the same: You can say “I will only use Open Source AI built with open data, because I don’t want to trust anything less than that.” A large organization could say “I will buy only Open Source AI that allows me to audit their full dataset, including unshareable data.” You can do all that. Open Source AI is the floor that you can build on, like the OSD.
Bypassing the conundrumsWe’ve looked for a solution for almost three years and this is it: Require all the data that is legally shareable, and for the other data provide all the details. It’s exactly what we’ve been doing for Open Source software:
You developed a text editor for Mac OS but you can’t share the system libraries? Fine, we’ll fork it: give us all the code you can legally share with an OSI-Approved License and we’ll rip the dependencies and “liberate” it to run on GNU. The editor will be slightly different, like code that runs on some ARM+Linux systems behaves differently on Intel+Windows for the different capabilities of the underlying hardware and OS, but it’s still Open Source.
For Open Source AI it’s a similar dance: You can’t legally give us all the data? Fine, we’ll fork it. For example, you made an AI that recognizes bone cancer in humans but the data can’t be shared. We’ll fork it! Tell us exactly how you built the system, how you trained it, share the code you used, and an anonymized sample of the data you used so we can train on our X-ray images. The system will be slightly different but it’s still Open Source AI.
If we want to have broad availability of powerful alternatives to proprietary AI systems that respect the freedoms of users and deployers, we must recognize conditions that make sense for the domain of AI. These examples of proprietary compilers and system libraries used to build Open Source software prove that there is room for similar conditions when talking about Code, Data and Parameters within the definition of Open Source AI.
The Open Source Initiative Supports the Open Source Pledge
As businesses rely more heavily on Open Source software (OSS), the strain on maintainers to provide timely updates and security patches continues to grow – often without fair compensation for their crucial work. Recent high-profile security incidents like XZ and Log4Shell have put a spotlight on the security challenges developers face against a backdrop of burnout that has reached an all-time high.
To help address this imbalance, the Open Source Initiative (OSI) supports the Open Source Pledge, launched today by Sentry and partners to support maintainers and inspire a shift toward a healthier work-life balance, and more robust software security practices. The Pledge is a commitment from member companies to pay Open Source maintainers and organizations meaningfully in support of a more sustainable maintainer ecosystem and a reduction of flare-ups of high-profile security incidents.
This Pledge is an attempt to address a problem that has long existed within the Open Source ecosystem. Many companies have built their businesses on top of Open Source software, benefiting from the contributions of maintainers taking them for granted. While they’ve reaped the rewards, the burden has been placed on unpaid or underpaid developers.
It is essential that companies recognize their role in sustaining the ecosystem that powers their innovations. By taking the Pledge, companies have one more instrument to commit to supporting an ecosystem of maintainers and organizations, ensuring the long-term health of the Open Source projects they rely on.
In order to qualify, the projects that companies pledge to should meet the Open Source Definition. You can join the Open Source Pledge by donating to the Open Source Initiative or contacting us to become a sponsor.