FLOSS Research
Driving Open Source forward: make your impact in 2025
Our work thrives because of a passionate community dedicated to Open Source values. From advancing initiatives like the Open Source AI Definition to addressing challenges in licensing and policy, collaboration has always been our driving force.
While 2024 brought significant progress, the challenges ahead demand even greater collective action. As we look to 2025, your support will be pivotal in scaling our efforts.
We’ll continue serving as the steward of the Open Source Definition, protecting the Open Source principles and maintaining a list of OSI Approved Licenses.
We’ll continue monitoring policy and standards setting organizations, supporting legislators and policy makers, educating them about the Open Source ecosystem, its role in innovation and its value for an open future.
We’ll continue leading the conversation on Open Source AI, as we’ll need to keep on driving the discussion around data and write position papers to educate legislators in the US, in the EU, and around the world.
And finally, we’ll continue supporting Open Source business practice, helping developers and entrepreneurs to unlock the permissionless innovation enabled by Open Source software.
Join us as a supporting member or, if your organization benefits from Open Source, consider becoming a sponsor. Together, we can protect and expand the freedoms that make Open Source possible—for everyone.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI Celebrating 5 years at the Open Source Initiative: a journey of growth, challenges, and community engagementNick Vidal, community manager at the OSI, shares his story about working at the organization. This isn’t just a story about his career—it’s about the evolution of Open Source, the incredible people he worked with, and the values they’ve championed.
Other highlights:
- Improving Open Source security with the new GitHub Secure Open Source Fund
- Highlights from the Digital Public Goods Alliance Annual Members Meeting 2024
- Open Data and Open Source AI: Charting a course to get more of both
- The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
- ClearlyDefined v2.0 adds support for LicenseRefs
Article from Time Magazine
The distinction between ‘open’ and ‘closed’ AI models is not as simple as it might appear. While Meta describes its Llama models as open-source, it doesn’t meet the new definition published last month by the Open Source Initiative, which has historically set the industry standard for what constitutes open source.
Other highlights:
- Ai2 releases new language models competitive with Meta’s Llama (TechCrunch)
- AI2 closes the gap between closed-source and open-source post-training (VentureBeat)
- Women leading the charge in open source AI (Capacity Media)
- A battle is raging over the definition of open-source AI (The Economist)
- The best open-source AI models: All your free-to-use options explained (ZDNet)
- Read all press mentions from this past month
News from OSI affiliates:
- Eclipse Foundation: The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
- DPGA, ITU: Advancing Open Source AI: Definitions, Standards, and Global Implementation for a Sustainable Future
- Apache Software Foundation: The Apache Software Foundation Welcomes a New President
- Linux Foundation: Jim Zemlin, ‘head janitor of open source,’ marks 20 years at Linux Foundation
- Mozilla Foundation: Firefox 1.0 Released 20 Years Ago
- DPGA: Open source key to exchanging best practices in digital government
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
EventsUpcoming events:
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- Deep Dive into the Open Source AI Definition v1.0 (December 12, 2024 – online)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- Automattic
- Sentry
- Cisco
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
Improving Open Source security with the new GitHub Secure Open Source Fund
The Open Source community underpins much of today’s software innovation, but with this power comes responsibility. Security vulnerabilities, unclear licensing, and a lack of transparency in software components pose significant risks to software supply chains. Recognizing this challenge, GitHub recently announced the GitHub Secure Open Source Fund—a transformative initiative aimed at bolstering the security and sustainability of Open Source projects.
What is the Secure Open Source Fund?Launched with a $1.25 million commitment from partners, the GitHub Secure Open Source Fund is designed to address a critical issue: the often-overlooked necessity of security for widely-used Open Source projects. The fund not only provides financial support to project maintainers but also delivers a comprehensive suite of resources, including but-not-limited-to:
- Hands-on security training: A three-week program offering mentorship, workshops, and expert guidance.
- Community engagement: Opportunities to connect with GitHub’s Security Lab, sponsors, and other maintainers.
- Funding milestones: $10,000 per project, tied to achieving key security objectives.
The program’s cohort-based approach fosters collaboration and equips maintainers with the skills, networking, and funding to enhance the security of their projects sustainably.
Why this mattersThe success of Open Source hinges on its trustworthiness. For developers and organizations, the ability to confidently adopt and integrate Open Source projects is paramount. However, without sufficient security measures and transparency, these projects risk introducing vulnerabilities into the software supply chain. GitHub’s Secure Open Source Fund directly tackles this issue by empowering maintainers with the knowledge, community, and funding to make their projects secure and reliable.
Building trust through transparencyThe GitHub Secure Open Source Fund aligns with the global push for greater transparency and resilience in software supply chains between creators and consumers of Open Source software. Its focus on security addresses growing concerns highlighted by regulations such as the EU’s Cyber Resilience Act and US Cyber and Infrastructure Security Agency (CISA). By providing maintainers vital funding to prioritize focused-time and with resources to identify and address vulnerabilities, the program strengthens the foundation of Open Source ecosystems.
GitHub has taken an ecosystem-wide approach, where resources and security go hand in hand. The Open Source Initiative (OSI) was invited to become a launch ecosystem partner, and we hope to contribute with valuable input, feedback, and ideas along with other community members. One of our projects, ClearlyDefined, helps organizations to manage SBOMs at scale for each stage on the supply chain by providing easy access to accurate licensing metadata for Open Source components. Together, we hope to foster greater transparency and security for the entire supply chain.
GitHub Secure Open Source Fund Ecosystem Partners A call to action for the Open Source communityAs GitHub leads the charge with its Secure Open Source Fund, it’s crucial for the broader community to step up. Here’s how you can get involved:
- Learn more about security: Gain access to workshops, group sessions, and mentorship.
- Maximize transparency: Adopt tools like ClearlyDefined to ensure clear metadata for your components.
- Advocate for funding: Support initiatives that prioritize security, whether through sponsorship or advocacy.
Together, we can create a safer, more transparent, and more sustainable Open Source ecosystem.
To learn more about GitHub’s Secure Open Source Fund and apply, visit their official program page and announcement.
Let’s work collectively to secure the software supply chains that power innovation worldwide.
GitHub Secure Open Source Fund SponsorsCelebrating 5 years at the Open Source Initiative: a journey of growth, challenges, and community engagement
Reaching the five-year mark at the Open Source Initiative (OSI) has been a huge privilege. It’s been a whirlwind of progress, personal growth, and community engagement—filled with highs, great challenges, and plenty of Open Source celebrations. As I reflect on this milestone, it’s impossible not to feel both gratitude and excitement for what lies ahead. This isn’t just a story about my career—it’s about the evolution of Open Source, the incredible people I’ve worked with, and the values we’ve championed.
Joining OSI: first steps into the journeyBack in 2017, under the leadership of OSI’s General Manager Patrick Masson, I stepped into the role of Director of Community and Development at the OSI. That year, we began planning a celebration of the 20th anniversary of Open Source for 2018, a massive undertaking. This wasn’t just about throwing a party. It was a celebration of the immense impact Open Source has had on the global tech community.
And yet, the timing was crucial. Open Source, which had steadily grown over the past two decades, faced challenges on multiple fronts: from sustainability and diversity to startups attempting to redefine the term “Open Source” and obscure its principles. The emergence of faux Open Source licenses, like the Commons Clause and the Server Side Public License, only added to the urgency of defending the very core of our mission.
The 20th Anniversary world tourWe embarked on the OSI’s 20th Anniversary World Tour in 2018, organizing over 100 activities across 40 events globally, rallying the support of affiliates, sponsors, and the Open Source community at-large. The goal wasn’t just to celebrate but to affirm the values we held dear—transparency, collaboration, and freedom in software development. This culminated in the signing of the Affirmation of the Open Source Definition by over 40 foundations and corporations, a powerful statement that we would not back down in the face of challenges.
For the 20th Anniversary, the OSI contributed to 40 events worldwide Stepping back amidst the pandemicWhen the pandemic hit in 2020, everything came to a sudden halt. With schools closing and my young children—a one-year-old and a four-year-old—at home, the demands of balancing work and family life became overwhelming. I made the difficult decision to step down from my role at the OSI. It wasn’t easy to step away from something I loved, but at the time, it felt necessary.
A year later, I joined a security startup as Head of Community, where I led a cutting-edge Open Source project and held a leadership role at the Confidential Computing Consortium at the Linux Foundation. This new role allowed me to expand my expertise in community management and Open Source security—critical topics that would soon come to the forefront.
Return to the OSI: a new chapterIn early 2023, the startup I was working for had to close its doors unfortunately. But just as one chapter ended, another began. Stefano Maffulli, OSI’s new Executive Director, reached out to me with an opportunity to return to the OSI. This time, the focus was on addressing a growing concern: the clarity around licenses and security vulnerabilities in the Open Source supply chain.
I jumped at the chance to come back. In the beginning of this new chapter, I had the opportunity to work on two challenges: to manage the ClearlyDefined community, a project aimed at improving transparency in Open Source licensing and security, and organizing activities around the 25th Anniversary of Open Source.
The 25th Anniversary world tourThe 25th anniversary of Open Source marked an incredible milestone, not just for the OSI, but for the whole tech community. We organized activities across 36 conferences from around the world with a combined attendance of over 125,000 people. We contributed with 12 keynotes, 24 talks, 6 workshops, and 18 webinars.
For the 25th Anniversary, the OSI contributed to 36 events worldwideIt was a wonderful experience connecting with organizers of these conferences and engaging with speakers, volunteers and attendees. I personally contributed to 4 keynotes, 6 talks, 15 events, and co-organized the OSI track at All Things Open and the Deep Dive: AI webinars.
Throughout the year, our focus shifted from reviewing the past of Free and Open Source software to exploring the future of Open Source in this new era of AI.
Open Source AI Definition: a new challengeWith the popularization of Generative AI, many players in industry started using “Open Source AI” to describe their projects. Legislators around the world also started drafting laws that would have a huge impact on Open Source and AI developments. Stefano was already exploring this new challenge with the Deep Dive: AI series, but we realized that establishing an Open Source AI Definition was going to be critical. Along with the Board of Directors, we started planning a new direction for the OSI.
We organized several activities around Open Source and AI in partnership with major Open Source conferences, from talks and panels to workshops. We also launched forums for online discussions, and hosted webinars and town halls to make our activities more inclusive. The work felt more important than ever.
One of the biggest undertakings was organizing a multistakeholder co-design process, led by Mer Joyce, where we brought together global experts to establish a shared set of principles that can recreate permissionless, pragmatic and simplified collaboration for AI builders.
One of the projects I’m particularly proud of is the “Voices of the Open Source AI Definition” series. Through this initiative, we’ve been able to share the stories of the volunteers from the co-design process involved in shaping the Open Source AI Definition (OSAID). These stories highlight the diversity and passion of the community, bringing a human element to the often technical discussions around Open Source and AI.
Voices of the Open Source AI Definition
The work around the Open Source AI Definition developed by the OSI was cited over 100 times in the press worldwide, educating and countering misinformation. Our work was featured at The New York Times, The Verge, TechCrunch, ZDNET, InfoWorld, Ars Technica, IEEE Spectrum, MIT Technology Review, among other top media outlets.
ClearlyDefined: bringing clarity to licensingAs part of my community management role on the ClearlyDefined project, I had the opportunity to contribute under the exceptional leadership of E. Lynette Rayle (GitHub) and Qing Tomlinson (SAP), whose guidance was instrumental in driving the project forward. One of my major accomplishments was the creation of a brand-new website and comprehensive documentation.
We engaged with the ORT (Open Source Review Toolkit) and Scancode communities, building stronger connections. We made important technical updates, including upgrading to the latest version of Scancode and expanding our license support beyond just SPDX, making it easier for developers and organizations to navigate the increasingly complex landscape of licensing. This important work led to the release of version 2.0 of ClearlyDefined.
Another highlight was a new harvester implementation for conda, a popular package manager with a large collection of pre-built packages for various domains, including data science, machine learning, scientific computing and more.
Additionally, the adoption of ClearlyDefined by the GUAC community from OpenSSF was a testament to the growing importance of our work in bringing clarity to the Open Source supply chain not just in terms of licensing but also security.
ClearlyDefined’s new features: LicenseRef, conda support, and GUAC integration
We presented our work across three continents: in Europe at ORT Community Days, in North America at SOSS Fusion, and in Asia at Open Compliance Summit.
Finally, we made progress toward a more open governance model by electing leaders for the ClearlyDefined Steering and Outreach Committees.
Open Policy: from cybersecurity to AIOn the policy side, I had the privilege to work along with Deb Bryant and Simon Phipps, who kept track of policies affecting Open Source software, in particular the Securing Open Source Software Act in the US and the Cyber Resilience Act in Europe. As for policies involving Open Source and AI, I followed the European AI Act and US AI Bill of Rights and contributed with a compilation of a list of compelling responses from nonprofit organizations and companies to NTIA’s AI Open Model Weights RFC.
OpenSource.net: fostering knowledge sharingI also helped to launch OpenSource.net, a platform designed to foster knowledge sharing. Led by Editor-in-Chief Nicole Martinelli, this platform has become a space for diverse perspectives and contributions, furthering the reach and impact of our work.
As part of the Practical Open Source (POSI) program, which facilitates discussions on doing business with and for Open Source, I was able to bring contributions from outstanding entrepreneurs and intrapreneurs.
Looking ahead: excitement for the futureIn the last 2 years at the OSI, I’ve had the privilege of publishing over 50 blog posts, as well as organizing, speaking at, and attending multiple events worldwide. The work we’ve done has been both challenging and rewarding, but it’s the community—the people who believe in the power of Open Source—that makes it all worthwhile.
As I celebrate five years at the OSI, I’m more energized than ever to continue this journey. The world of Open Source is evolving, and I’m excited to be part of shaping its future. There’s so much more to come, and I can’t wait to see where the next five years will take us.
This is more than just a personal milestone—it’s a celebration of the impact Open Source has had on the world and the endless possibilities it holds for the future.
You too can join the OSI: support our workThe work we do is only possible because of the passionate, engaged community that supports us. From advocating for Open Source principles to driving initiatives like the Open Source AI Definition and ClearlyDefined, every step forward has been powered by collaboration and shared commitment.
But there’s so much more to be done. The challenges facing Open Source today—from licensing to policy—are growing in scale and complexity. To meet them, we need your help. By joining or sponsoring the Open Source Initiative, you enable us to continue this vital work: educating, advocating, and building a stronger, more inclusive Open Source ecosystem.
Your support isn’t just a contribution; it’s an investment in the future of Open Source. Together, we can ensure that Open Source remains open for all. Join us in shaping the next chapter of this incredible journey.
Highlights from the Digital Public Goods Alliance Annual Members Meeting 2024
This month, I had the privilege of representing the Open Source Initiative at the Digital Public Goods Alliance (DPGA) Annual Members Meeting. Held in Singapore, this event marked our second year participating as members, following our first participation in Ethiopia. It was an inspiring gathering of innovators, developers and advocates working to create a thriving ecosystem for Digital Public Goods (DPGs) as part of UNICEF’s initiative to advance the United Nations’ sustainable development goals (SDGs).
Keynote insightsThe conference began with an inspiring keynote by Liv Marte Nordhaug, CEO of the DPGA, who made a call for governments and organizations to incorporate:
- Open Source first principles
- Open data at scale
- Interoperable Digital Public Infrastructure (DPI)
- DPGs as catalysts for climate change action
These priorities underscored the critical role of DPGs in fostering transparency, accountability and innovation across sectors.
DPGs and Open Source AIA standout feature of the first day of the event was the Open Source AI track led by Amreen Taneja, DPGA Standards Lead, which encompassed three dynamic sessions:
- Toward AI Democratization with Digital Public Goods: This session explored the role of DPGs in contributing to the democratization of AI technologies, including AI use, development and governance, to ensure that everyone benefits from AI technology equally.
- Fully Open Public Interest AI with Open Data: This session highlighted the need for open AI infrastructure supported by accessible, high-quality datasets, especially in the global majority countries. Discussions evolved over how open training data sets ought to be licensed to ensure open, public interest AI.
- Creating Public Value with Public AI: This session examined real-world applications of generative AI in public services. Governments and NGOs showcased how AI-enabled tools can effectively tackle social challenges, leveraging Open Source solutions within the AI stack.
The second day of the event was marked by the DPG Product Fair, which provided a science-fair-style platform for showcasing DPGs. Notable examples included:
- India’s eGov DIGIT Open Source platform, serving over 1 billion citizens with robust digital infrastructure.
- Singapore’s Open Government Products, which leverages Open Source and enables open collaboration to allow this small nation to expand their impact together with other southeast Asian nations.
One particularly engaging session was Sarah Espaldon’s “Improving Government Services with Legos.” This presentation from the Singapore government highlighted the benefits of modular DPGs in enhancing service delivery and building flexible DPI capabilities.
Privacy best practices for DPGsA highlight from the third and final day of the event was the privacy-focused workshop co-hosted by the DPGA and the Open Knowledge Foundation. As privacy becomes a central concern for DPGs, the DPGA introduced its Standard Expert Group to refine privacy requirements and develop best practice guidelines. The interactive session provided invaluable feedback, driving forward the development of robust privacy standards for DPGs.
Looking aheadThe event reaffirmed the potential of Open Source technologies to transform global public goods. As we move forward, the Open Source Initiative is committed to advancing these conversations, including around the Open Source AI Definition and fostering a more inclusive digital ecosystem. A special thanks to the dedicated DPGA team—Liv Marte Nordhaug, Lucy Harris, Max Kintisch, Ricardo Miron, Luciana Amighini, Bolaji Ayodeji, Lea Gimpel, Pelin Hizal Smines, Jon Lloyd, Carol Matos, Amreen Taneja, and Jameson Voisin—whose efforts made this conference a success. We look forward to another year of impactful collaboration and to the Annual Members Meeting next year in Brazil!
Give Your Input on the State of Open Source Survey
As we announced back in September, the OSI has partnered again with OpenLogic by Perforce to produce a comprehensive report on global, industry-wide Open Source software adoption trends. The 2025 State of Open Source Report will be based on responses to a survey of those working with Open Source software in their organizations, from developers to CTOs and everyone in between.
“This is our fourth year being involved in the State of Open Source Report, and there is never any shortage of surprises in the data,” says Stefano Maffulli, Executive Director, Open Source Initiative. “Now, however, the aim of the survey is not to determine whether or not organizations are using Open Source — we know they are — but to find out how they are handling complexities related to AI, licensing, and of course, security.”
This year, the survey includes new sections on Big Data, the impact of CentOS EOL, and security/compliance. As always, there are questions about technology usage in various categories such as infrastructure, cloud-native, frameworks, CI/CD, automation, and programming languages. Finally, a few questions toward the end look at Open Source maturity and stewardship, including sponsoring or being involved with open source foundations and organizations like OSI.
Of course, any report like this is only as valuable as its data and the more robust and high-quality the dataset, the stronger the report will be. As stewards of the Open Source community, OSI members are encouraged to take the survey so that the 2025 State of Open Source Report accurately reflects the interests, concerns, and preferences of Open Source software users around the world.
You can access the State of Open Source Survey here: https://www.research.net/r/SLQWZGF
Open Data and Open Source AI: Charting a course to get more of both
While working to define Open Source AI, we realized that data governance is an unresolved issue. The Open Source Initiative organized a workshop to discuss data sharing and governance for AI training. The critical question posed to attendees was “How can we best govern and share data to power Open Source AI?” The main objective of this workshop was to establish specific approaches and strategies for both Open Source AI developers and other stakeholders.
The Workshop: Building bridges across “Open” streamsHeld on October 10-11, 2024, and hosted by Linagora’s Villa Good Tech, the OSI workshop brought together 20 experts from diverse fields and regions. Funded by the Alfred P. Sloan Foundation, the event focused on actionable steps to align open data practices with the goals of Open Source AI.
Participants, listed below, comprised academics, civil society leaders, technologists, and representatives from organizations like Mozilla Foundation, Creative Commons, EleutherAI Institute and others.
- Ignatius Ezeani University of Lancaster / Nigeria
- Masayuki Hatta Debian, Open Source Group Japan / Japan
- Aviya Skowron EleutherAI Institute / Poland
- Stefano Zacchiroli Software Heritage / Italy
- Ricardo Torres Digital Public Goods Alliance / Mexico
- Kristina Podnar Data and Trust Alliance / Croatia + USA
- Joana Varon Coding Rights / Brazil
- Renata Avila Open Knowledge Foundation / Guatemala
- Alek Tarkowski Open Future / Poland
- Maximilian Gantz Mozilla Foundation / Germany
- Stefaan Verhulst GovLab / USA/Belgium
- Paul Keller Open Future / Germany
- Thom Vaughan Common Crawl / UK
- Julie Hunter Linagora / USA
- Deshni Govender GIZ FAIR Forward – AI for All / South Africa
- Ramya Chandrasekhar CNRS / India
- Anna Tumadóttir Creative Commons / Iceland
- Stefano Maffulli Open Source Initiative / Italy
Over two days, the group worked to frame a cohesive approach to data governance. Alek Tarkowski and Paul Keller of the Open Future Foundation are working with OSI to complete the white paper summarizing the group’s work. In the meantime, here is a quick “tease”—just a few of the many topics that the group discussed:
The streams of “open” merge, creating wavesAI is where Open Source software, open data, open knowledge, and open science meet in a new way. Since OpenAI released ChatGPT, what once were largely parallel tracks with occasional junctures are now a turbulent merger of streams, creating ripples in all of these disciplines and forcing us to reassess our principles: How do we merge these streams without eroding the principles of transparency and access that define openness?
We discovered in the process of defining Open Source AI that the basic freedoms we’ve put in the Open Source Definition and its foundation, the Free Software Definition, are still good and relevant. Open Source software has had decades to mature into a structured ecosystem with clear rules, tools, and legal frameworks. Same with Open Knowledge and Open Science: While rooted in age-old traditions, open knowledge and science have seen modern rejuvenation through platforms like Wikipedia and the Open Knowledge Foundation. Open data, however, feels less solid: often serving as a one-way pipeline from public institutions to private profiteers, is now dragged into a whole new territory.
How are these principles of “open” interacting with each other, how are we going to merge Open Data with Open Source with Open Science and Open Knowledge in Open Source AI?
The broken social contract of dataData fuels AI. The sheer scale of data required to train models like ChatGPT reveals not just a technological challenge but also a societal dilemma. Much of this data comes from us—the blogs we write, the code we share, the information we give freely to platforms.
OpenAI, for example, “slurps” all the data it can find, and much of it is what we willingly give: the blogs we write; the code we share; the pictures, emails and address books we keep in “the cloud”; and all the other information we give freely to platforms.
We, the people, make the “data,” but what are we getting in exchange? OpenAI owns and controls the machine built with our data, and it grants us access via API, until it changes its mind. We are essentially being stripmined for a proprietary system that grants access at a price—until the owner decides otherwise.
We need a different future, one where data empowers communities, not just corporations. That starts with revisiting the principles of openness that underpin the open source, open science, and open knowledge movements. The question is: How do we take back control?
Charting a path forwardWe want the machine for ourselves. We want machines that the people can own and control. We need to find a way to swing the pendulum back to our meaning of Open. And it’s all about the “data.”
The OSI’s work on the Open Source AI Definition provides a starting point. An Open Source AI machine is one that the people can meaningfully fork without having to ask for permission. For AI to truly be open, developers need access to the same tools and data as the original creators. That means transparent training processes, open filtering code, and, critically, open datasets.
Group photo of the participants to the workshop on data governance, Paris, Oct 2024. Next stepsThe white paper, expected in December, will synthesize the workshop’s discussions and propose concrete strategies for data governance in Open Source AI. Its goal is to lay the groundwork for an ecosystem where innovation thrives without sacrificing openness or equity.
As the lines between “open” streams continue to blur, the choices we make now will define the future of AI. Will it be a tool controlled by a few, or a shared resource for all?
The answer lies in how we navigate the waves of data and openness. Let’s get it right.
The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy
BRUSSELS and WEST HOLLYWOOD, Calif. – 14 November 2024 – The Eclipse Foundation, one of the world’s largest open source foundations, and the Open Source Initiative (OSI), the global non-profit educating about and advocating for the benefits of open source and steward of the Open Source Definition, have signed a Memorandum of Understanding (MOU) to collaborate on promoting the interest of the open source community in the implementation of regulatory initiatives on Open Source Artificial Intelligence (OSAI). This agreement underscores the two organisations’ shared commitment to ensuring that emerging AI regulations align with widely recognised OSI open source definitions and open source values and principles.
“AI is arguably the most transformative technology of our generation,” said Stefano Maffulli, executive director, Open Source Initiative. “The challenge now is to craft policies that not only foster growth of AI but ensure that Open Source AI thrives within this evolving landscape. Partnering with the Eclipse Foundation and its expertise, with its experience in European open source development and regulatory compliance, is important to shape the future of Open Source AI.”
“For decades, OSI has been the ‘gold standard’ the open source community has turned to for building consensus around important issues,” said Mike Milinkovich, executive director of the Eclipse Foundation. “As AI reshapes industries and societies, there is no more pressing issue for the open source community than the regulatory recognition of open source AI systems. Our combined expertise – OSI’s global leadership in open standards and open source licences and our extensive work with open source regulatory compliance – makes this partnership a powerful advocate for the design and implementation of sound AI policies worldwide.”
Addressing the Global Challenges of AI Regulation
With AI regulation on the horizon in multiple regions, including the EU, both organisations recognise the urgency of helping policymakers understand the unique challenges and opportunities of OSAI technologies. The rapid evolution of AI technologies, together with new, upcoming complex regulatory landscapes, demand clear, consistent, and aligned guidance rooted in open source principles.
Through this partnership, the Eclipse Foundation and OSI will endeavour to bring clarity in language and terms that industry, community, civil society, and policymakers can rely upon as public policy is drafted and enforced. The organisations will collaborate by leveraging their respective public platforms and events to raise awareness and advocate on the topic. Additionally, they will work together on joint publications, presentations, and other promotional activities, while also assisting one another in educating government officials on policy considerations for OSAI and General Purpose AI (GPAI). Through this partnership, they aim to provide clear, consistent guidance that aligns with open source principles.
Key Areas of Collaboration
The MoU outlines several areas of cooperation, including:
- Information Exchange: OSI and the Eclipse Foundation will share relevant insights and information related to public policy-making and regulatory activities on artificial intelligence.
- Representation to Policymakers: OSI and the Eclipse Foundation will cooperate in representing the principles and values of open source licences to policymakers and civil society organisations.
- Promotion of Open Source Principles: Joint efforts will be made to raise awareness of the role of open source in AI, emphasising how it can foster innovation while mitigating risks.
A Partnership for the Future
As AI continues to revolutionise industries worldwide, the need for thoughtful, balanced regulation is critical. The OSI and Eclipse Foundation are committed to providing the open source community, industry leaders, and policymakers with the tools and knowledge they need to navigate this rapidly evolving field.
This MoU marks the very beginning of a long-term collaboration, with joint initiatives and activities to be announced throughout the remainder of 2024 and into 2025.
About the Eclipse Foundation
The Eclipse Foundation provides our global community of individuals and organisations with a business-friendly environment for open source software collaboration and innovation. We host the Eclipse IDE, Adoptium, Software Defined Vehicle, Jakarta EE, and over 420 open source projects, including runtimes, tools, specifications, and frameworks for cloud and edge applications, IoT, AI, automotive, systems engineering, open processor designs, and many others. Headquartered in Brussels, Belgium, the Eclipse Foundation is an international non-profit association supported by over 385 members. To learn more, follow us on social media @EclipseFdn, LinkedIn, or visit eclipse.org.
About the Open Source Initiative
Founded in 1998, the Open Source Initiative (OSI) is a non-profit corporation with global scope formed to educate about and advocate for the benefits of Open Source and to build bridges among different constituencies in the Open Source community. It is the steward of the Open Source Definition, setting the foundation for the global Open Source ecosystem. Join and support the OSI mission today at https://opensource.org/join.
Third-party trademarks mentioned are the property of their respective owners.
###
Media contacts:
Schwartz Public Relations (Germany)
Gloria Huppert/Marita Bäumer
Sendlinger Straße 42A
80331 Munich
EclipseFoundation@schwartzpr.de
+49 (89) 211 871 -70/ -62
514 Media Ltd (France, Italy, Spain)
Benoit Simoneau
benoit@514-media.com
M: +44 (0) 7891 920 370
Nichols Communications (Global Press Contact)
Jay Nichols
jay@nicholscomm.com
+1 408-772-1551
ClearlyDefined v2.0 adds support for LicenseRefs
One of the major focuses of the ClearlyDefined Technical Roadmap is the improvement in the quality of license data. As such, we are excited to announce the release of ClearlyDefined v2.0 which adds over 2,000 new well-known licenses it can identify. You can see the complete list of new non-SPDX licenses in ScanCode LicenseDB.
A little historical background, when Clearly Defined was first created, it was initially decided to limit the reported licenses to only those on the SPDX License List. As teams worked with the Clearly Defined data, it became clear that additional license discovery is important to give users a fuller picture of the projects they depend on. In previous releases of ClearlyDefined, licenses not on the SPDX License List were represented in the definition as NOASSERTION or OTHER. (See the breakdown of licenses in The most popular licenses for each language in 2023.)The v2.0 release of ClearlyDefined includes an update of ScanCode to v32 and the support of LicenseRefs to identify non-SPDX licenses. The license in the definition will now be a LicenseRef with prefix LicenseRef-scancode- if ScanCode identifies a non-SPDX license. This improves the license coverage in the ClearlyDefined definitions and consumers ability to accurately construct license compliance policies.
ClearlyDefined identifies licenses in definitions using SPDX expressions. The SPDX specification has a way to include non-SPDX licenses in license expressions.
A license expression could be a single license identifier found on the SPDX License List; a user defined license reference denoted by the LicenseRef-[idString]; a license identifier combined with an SPDX exception; or some combination of license identifiers, license references and exceptions constructed using a small set of defined operators (e.g., AND, OR, WITH and +)
— excerpt from SPDX Annexes: SPDX license expressions
Example change of a definition:
CoordinatesLicense BEFORELicense AFTERnpm/npmjs/@alexa-games/sfb-story-debugger/2.1.0NOASSERTIONLicenseRef-.amazon.com.-AmznSL-1.0Note: ClearlyDefined v2.0 also includes an update to ScanCode v32.
What does this mean for definitions?This section includes a simplified description of what happens when you request a definition from ClearlyDefined. These examples only refer to the ScanCode tool. Other tools are run as well and are handled in similar ways.
When the definition already existsAny request for a definition through the /definitions API makes a couple of checks before returning the definition:
If the definition exists, it checks whether the definition was created using the latest version of the ClearlyDefined service.
- If yes, it returns the definition as is.
- If not, it recomputes the definition using the existing raw results from the tools run during the previous harvest for the existing definition. In this case, the tool version will be earlier than ScanCode v32.
NOTE: ClearlyDefined does not support LicenseRefs from ScanCode prior to v32. For earlier versions of ScanCode, ClearlyDefined stores any LicenseRefs as NOASSERTION. In some cases, you may see OTHER when the definition was curated.
When the definition does not existIf the definition does not exist:
- It will send a harvest request which will run the latest version of all the tools and produce raw results.
- From these raw results, it will compute a definition which might include a LicenseRef.
If you see NOASSERTION in the license expression, you can check the definition to determine the version of ScanCode in the “described”: “tools” section.
If ScanCode is a version earlier than v32, you can submit a harvest API request. This will run any tools for which ClearlyDefined now supports a later version. Once the tools complete, the definition will be recomputed based on the new results.
In some cases, even when the results are from ScanCode v32, you may still see NOASSERTION. Reharvesting when the ScanCode version is already v32 will not change the definition.
What does this mean for tools?When adding ScanCode licenses to allow/deny lists, note the ScanCode LicenseDB lists licenses without the LicenseRef prefix. All LicenseRefs coming from ScanCode will start with LicenseRef-scancode-.
Tools using an Allow ListA recomputed definition may change the license to include a LicenseRef that you want to allow. All new LicenseRefs that are acceptable will need to be added to your allow list. We are taking the approach of adding them as they appear in flagged package-version licenses. An alternative is to review the ScanCode LicenseDB to proactively add LicenseRefs to your allow list.
Tools using a Deny ListDeny lists need to be exhaustive to prevent a new license from being allowed by default. It is recommended that you review the ScanCode LicenseDB to determine if there are LicenseRefs you want to add to the deny list.
Note: The SPDX License List also changes over time. A periodic review to maintain the Deny list is always a good idea.
Providing FeedbackAs with any major version change, there can be unexpected behavior. You can reach out with questions, feedback, or requests. Find how to get in touch with us in the Get Involved doc.
If you have comments or questions on the actual LicenseRefs, you should reach out to ScanCode License DB maintainers.
AcknowledgementsA huge thank you to the contributing developers and their organizations for supporting the work of ClearlyDefined.
In alphabetical order, contributors were…
- ajhenry (GitHub)
- brifl (Microsoft)
- elrayle (GitHub)
- jeff-luszcz (GitHub)
- ljones140 (GitHub)
- lumaxis (GitHub)
- mpcen (Microsoft)
- nickvidal (Open Source Initiative)
- qtomlinson (SAP)
- RomanIakovlev (GitHub)
- yashkohli88 (SAP)
See something you’d like ClearlyDefined to do or could do better? If you have resources to help out, we have work to be done to further improve data quality, performance, and sustainability. We’d love to hear from you.
ReferencesClearlyDefined at SOSS Fusion 2024: a collaborative solution to Open Source license compliance
This past month, the Open Source Security Foundation (OpenSSF) hosted SOSS Fusion in Atlanta, an event that brought together a diverse community of leaders and innovators from across the digital security spectrum. The conference, held on October 22-23, explored themes central to today’s technological landscape: AI security, diversity in technology, and public policy for Open Source software. Industry thought leaders like Bruce Schneier, Marten Mickos, and Cory Doctorow delivered keynotes, setting the tone for a conference that emphasized collaboration and community in creating a secure digital future.
Amidst these pressing topics, the Open Source Initiative in collaboration with GitHub and SAP presented ClearlyDefined—an innovative project aimed at simplifying software license compliance and metadata management. Presented by Nick Vidal of the Open Source Initiative, along with E. Lynette Rayle from GitHub and Qing Tomlinson from SAP, the session highlighted how ClearlyDefined is transforming the way organizations handle licensing compliance for Open Source components.
What is ClearlyDefined?ClearlyDefined is a project with a powerful vision: to create a global crowdsourced database of license metadata for every software component ever published. This ambitious mission seeks to help organizations of all sizes easily manage compliance by providing accurate, up-to-date metadata for Open Source components. By offering a single, reliable source for license information, ClearlyDefined enables organizations to work together rather than in isolation, collectively contributing to the metadata that keeps Open Source software compliant and accessible.
The problem: redundant and inconsistent license managementIn today’s Open Source ecosystem, managing software licenses has become a significant challenge. Many organizations face the repetitive task of identifying, correcting, and maintaining accurate licensing data. When one component has missing or incorrect metadata, dozens—or even hundreds—of organizations using that component may duplicate efforts to resolve the same issue. ClearlyDefined aims to eliminate redundancy by enabling a collaborative approach.
The solution: crowdsourcing compliance with ClearlyDefinedClearlyDefined provides an API and user-friendly interface that make it easy to access and contribute license metadata. By aggregating and standardizing licensing data, ClearlyDefined offers a powerful solution for organizations to enhance SBOMs (Software Bill of Materials) and license information without the need for extensive re-scanning and data correction. At the conference, Nick demonstrated how developers can quickly retrieve license data for popular libraries using a simple API call, making license compliance seamless and scalable.
In addition, organizations that encounter incomplete or incorrect metadata can easily update it through ClearlyDefined’s platform, creating a feedback loop that benefits the entire Open Source community. This crowdsourcing approach means that once an organization fixes a licensing issue, that data becomes available to all, fostering efficiency and accuracy.
Key components of ClearlyDefined’s platform1. API and User Interface: Users can access ClearlyDefined data through an API or the website, making it simple for developers to integrate license checks directly into their workflows.
2. Human curation and community collaboration: To ensure high data quality, ClearlyDefined employs a curation workflow. When metadata requires updates, community members can submit corrections that go through a human review process, ensuring accuracy and reliability.
3. Integration with popular package managers: ClearlyDefined supports various package managers, including npm and pypi, and has recently expanded to support Conda, a popular choice among data science and AI developers.
Real-world use cases: GitHub and SAP’s adoption of ClearlyDefinedDuring the presentation, representatives from GitHub and SAP shared how ClearlyDefined has impacted their organizations.
– GitHub: ClearlyDefined’s licensing data powers GitHub’s compliance solutions, allowing GitHub to manage millions of licenses with ease. Lynette shared how they initially onboarded over 17 million licenses through ClearlyDefined, a number that has since grown to over 40 million. This database enables GitHub to provide accurate compliance information to users, significantly reducing the resources required to maintain licensing accuracy. Lynette showcased the harvesting process and the curation process. More details about how GitHub is using ClearlyDefined is available here.
– SAP: Qing discussed how ClearlyDefined’s approach has streamlined SAP’s Open Source compliance efforts. By using ClearlyDefined’s data, SAP reduced the time spent on license reviews and improved the quality of metadata available for compliance checks. SAP’s internal harvesting service integrates with ClearlyDefined, ensuring that critical license metadata is consistently available and accurate. SAP has contributed to the ClearlyDefined project and most notably, together with Microsoft, has optimized the database schema and reduced the database operational cost by more than 90%. More details about how SAP is using ClearlyDefined is available here.
Why ClearlyDefined mattersClearlyDefined is a community-driven initiative with a vision to address one of Open Source’s biggest challenges: ensuring accurate and accessible licensing metadata. By centralizing and standardizing this data, ClearlyDefined not only reduces redundant work but also fosters a collaborative approach to license compliance.
The platform’s Open Source nature and integration with existing package managers and APIs make it accessible and scalable for organizations of all sizes. As more contributors join the effort, ClearlyDefined continues to grow, strengthening the Open Source community’s commitment to compliance, security, and transparency.
Join the ClearlyDefined communityClearlyDefined is always open to new contributors. With weekly developer meetings, an open governance model, and continuous collaboration with OpenSSF and other Open Source organizations, ClearlyDefined provides numerous ways to get involved. For anyone interested in shaping the future of license compliance and data quality in Open Source, ClearlyDefined offers an exciting opportunity to make a tangible impact.
At SOSS Fusion, ClearlyDefined’s presentation showcased how an open, collaborative approach to license compliance can benefit the entire digital ecosystem, embodying the very spirit of the conference: working together toward a secure, inclusive, and sustainable digital future.
Download slides and see summarized presentation transcript below.
ClearlyDefined presentation transcriptHello, folks, good morning! Let’s start by introducing ClearlyDefined, an exciting project. My name is Nick Vidal, and I work with the Open Source Initiative. With me today are Lynette Rayle from GitHub and Qing Tomlinson from SAP, and we’re all very excited to be here.
Introduction to ClearlyDefined’s mission
So, what’s the mission of ClearlyDefined? Our mission is ambitious—we aim to crowdsource a global database of license metadata for every software component ever published. This would benefit everyone in the Open Source ecosystem.
The problem ClearlyDefined addresses
There’s a critical problem in the Open Source space: compliance and managing SBOMs (Software Bill of Materials) at scale. Many organizations struggle with missing or incorrect licensing metadata for software components. When multiple organizations use a component with incomplete or wrong license metadata, they each have to solve it individually. ClearlyDefined offers a solution where, instead of every organization doing redundant work, we can collectively work on fixing these issues once and make the corrected data available to all.
ClearlyDefined’s solution
ClearlyDefined enables organizations to access license metadata through a simple API. This reduces the need for repeated license scanning and helps with SBOM generation at scale. When issues arise with a component’s license metadata, organizations can contribute fixes that benefit the entire community.
Getting started with ClearlyDefined
To use ClearlyDefined, you can access its API directly from your terminal. For example, let’s say you’re working with a JavaScript library like Lodash. By calling the API, you can get all license metadata for a specific version of Lodash at your fingertips.
Once you incorporate this licensing metadata into your workflow, you may notice some metadata that needs updating. You can curate that data and contribute it back, so everyone benefits. ClearlyDefined also provides a user-friendly interface for this, making it easier to contribute.
Open Source and community contributions
ClearlyDefined is an Open Source initiative, hosted on GitHub, supporting various package managers (e.g., npm, pypi). We work to promote best practices and integrate with other tools. Recently, we’ve expanded our scope to support non-SPDX licenses and Conda, a package manager often used in data science projects.
Integration with other tools
ClearlyDefined integrates with GUAC, an OpenSSF project that consumes ClearlyDefined data. This integration broadens the reach and utility of ClearlyDefined’s licensing information.
Case studies and community impact
I’d like to hand it over to Lynette from GitHub, who will talk about how GitHub uses ClearlyDefined and why it’s critical for license compliance.
GitHub’s use of ClearlyDefined
Hello, I’m Lynette, a developer at GitHub working on license compliance solutions. ClearlyDefined has become a key part of our workflows. Knowing the licenses of our dependencies is crucial, as legal compliance requires correct attributions. By using ClearlyDefined, we’ve streamlined our process and now manage over 40 million licenses. We also run our own harvester to contribute back to ClearlyDefined and scale our operations.
SAP’s adoption of ClearlyDefined
Hi, my name is Qing. At SAP, we co-innovate and collaborate with Open Source, ensuring a clean, well-maintained software pool. ClearlyDefined has streamlined our license review process, reducing time spent on scanning and enhancing data quality. SAP’s journey with ClearlyDefined began in 2018, and since then, we’ve implemented large-scale automation for our Open Source compliance and continuously contribute curated data back to the community.
Community and governance
ClearlyDefined thrives on community involvement. We recently elected members to our Steering and Outreach Committees to support the platform and encourage new contributors. Our weekly developer meetings and active Discord channel provide opportunities to engage, share knowledge, and collaborate.
Q&A highlights
- PURLs as Package Identifiers: We’re exploring support for PURLs as an internal coordinate system.
- Data Quality Issues: Data quality is our top priority. We plan to implement routines to scan for common issues, ensuring accurate metadata across the platform.
Thank you all for joining us today. If you’re interested in contributing, please reach out and become part of this collaborative community.
Members Newsletter – November 2024
After more than two years of collaboration, information gathering, global workshopping, testing, and an in-depth co-design process, we have an Open Source AI Definition.
The purpose of version 1.0 is to establish a workable standard for developers, researchers, and educators to consider how they may design evaluations for AI systems’ openness. The meaningful ability to fork and control their AI will foster permissionless, global innovation. It was important to drive a stake in the ground so everyone has something to work with. It’s version 1.0, so going forward, the process allows for improvement, and that’s exactly what will happen.
Over 150 individuals were part of the OSAID forum, nearly 15K subscribers to the OSI newsletter were kept up-to-date with the latest news about the OSAID, 2M unique visitors to the OSI website were exposed to the OSAID process. There were 50+ co-design working group volunteers representing 29 countries, including participants from Africa, Asia, Europe, and the Americas.
Future versions of OSAID will continue to be informed by the feedback we receive from various stakeholder communities. The fundamental principles and aim will not change, but, as our (collective) understanding of the technology improves and technology itself evolves, we might need to update to clarify or even change certain requirements. To enable this, the OSI Board voted to establish an AI sub-committee who will develop appropriate mechanisms for updating the OSAID in consultation with stakeholders. It will be fully formed in the months ahead.
Please continue to stay involved, as diverse voices and experiences are required to ensure Open Source AI works for the good of us all.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI The Open Source Initiative Announces the Release of the Industry’s First Open Source AI DefinitionOpen and public co-design process culminates in a stable version of Open Source AI Definition, ensures freedoms to use, study, share and modify AI systems.
Other highlights:
- How we passed the AI conundrums
- ClearlyDefined at SOSS Fusion 2024
- ClearlyDefined’s Steering and Outreach Committees Defined
- The Open Source Initiative Supports the Open Source Pledge
Article from ZDNet
For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them.
Other highlights:
- The Gap Between Open and Closed AI Models Might Be Shrinking. Here’s Why That Matters (Time)
- Meta’s military push is as much about the battle for open-source AI as it is about actual battles (Fortune)
- OSI unveils Open Source AI Definition 1.0 (InfoWorld)
- We finally have an ‘official’ definition for open source AI (TechCrunch)
- Read all press mentions from this past month
News from OSI affiliates:
- OpenSSF: SOSS Fusion 2024: Uniting Security Minds for the Future of Open Source (Security Boulevard)
- Mozilla Foundation: How Mozilla’s President Defines Open-Source AI (Forbes)
News from OpenSource.net:
- OpenSource.Net turns one with a redesign
- How to make reviewing pull requests a better experience
- Closing the Gap: Accelerating environmental Open Source
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
Bloomberg is seeking a Technical Architect to join their OSPO team.
EventsUpcoming events:
- Nerdearla Mexico (November 7-9, 2024 – Mexico City)
- SeaGL (November 8-9, 2024 – Seattle)
- SFSCON (November 8-9, 2024 – Bolzano)
- KubeCon + CloudNativeCon North America (November 12-15, 2024 – Salt Lake City)
- OpenForum Academy Symposium (November, 13-14, 2024 – Boston)
- The Linux Foundation Legal Summit (November 18-19, 2024 – Napa)
- The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- Open Source Experience (December 4-5 – Paris)
- KubeCon + CloudNativeCon India (December 11-12, 2024 – Delhi)
- EU Open Source Policy Summit (January 31, 2025 – Brussels)
- FOSDEM (February 1-2, 2025 – Brussels)
CFPs:
- FOSDEM 2025 EU-Policy Devroom – event being organized by the OSI, OpenForum Europe, Eclipse Foundation, The European Open Source Software Business Association, the European Commission Open Source Programme Office, and the European Commission.
- PyCon US 2025: the Python Software Foundation kicks off Website, CfP, and Sponsorship!
- GitHub
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Get to vote for the OSI Board by becoming a memberLet’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
The Open Source Initiative Announces the Release of the Industry’s First Open Source AI Definition
RALEIGH, N.C., Oct. 28, 2024 — ALL THINGS OPEN 2024 — After a year-long, global, community design process, the Open Source Definition (OSAID) v.1.0 is available for public use.
The release of version 1.0 was announced today at All Things Open 2024, an industry conference focused on common issues of interest to the worldwide Open Source community. The OSAID offers a standard by which community-led, open and public evaluations will be conducted to validate whether or not an AI system can be deemed Open Source AI. This first stable version of the OSAID is the result of multiple years of research and collaboration, an international roadshow of workshops, and a year-long co-design process led by the Open Source Initiative (OSI), globally recognized by individuals, companies and public institutions as the authority that defines Open Source.
“The co-design process that led to version 1.0 of the Open Source AI Definition was well-developed, thorough, inclusive and fair,” said Carlo Piana, OSI board chair. “It adhered to the principles laid out by the board, and the OSI leadership and staff followed our directives faithfully. The board is confident that the process has resulted in a definition that meets the standards of Open Source as defined in the Open Source Definition and the Four Essential Freedoms, and we’re energized about how this definition positions OSI to facilitate meaningful and practical Open Source guidance for the entire industry.”
“The new definition requires Open Source models to provide enough information about their training data so that a ‘skilled person can recreate a substantially equivalent system using the same or similar data,’ which goes further than what many proprietary or ostensibly Open Source models do today,” said Ayah Bdeir, who leads AI strategy at Mozilla. “This is the starting point to addressing the complexities of how AI training data should be treated, acknowledging the challenges of sharing full datasets while working to make open datasets a more commonplace part of the AI ecosystem. This view of AI training data in Open Source AI may not be a perfect place to be, but insisting on an ideologically pristine kind of gold standard that will not actually be met by any model builder could end up backfiring.”
“We welcome OSI’s stewardship of the complex process of defining Open Source AI,” said Liv Marte Nordhaug, CEO of the Digital Public Goods Alliance (DPGA) secretariat. “The Digital Public Goods Alliance secretariat will build on this foundational work as we update the DPG Standard as it relates to AI as a category of DPGs.”
“Transparency is at the core of EleutherAI’s non-profit mission. The Open Source AI Definition is a necessary step towards promoting the benefits of Open Source principles in the field of AI,” said Stella Biderman, executive director at the EleutherAI Institute. “We believe that this definition supports the needs of independent machine learning researchers and promotes greater transparency among the largest AI developers.”
“Arriving at today’s OSAID version 1.0 was a difficult journey, filled with new challenges for the OSI community,” said OSI Executive Director, Stefano Maffulli. “Despite this delicate process, filled with differing opinions and uncharted technical frontiers—and the occasional heated exchange—the results are aligned with the expectations set out at the start of this two-year process. This is a starting point for a continued effort to engage with the communities to improve the definition over time as we develop with the broader Open Source community the knowledge to read and apply OSAID v.1.0.”
The text of the OSAID v.1.0 as well as a partial list of the many global stakeholders who endorse the definition can be found here: https://opensource.org/ai
About the Open Source Initiative
Founded in 1998, the Open Source Initiative (OSI) is a non-profit corporation with global scope formed to educate about and advocate for the benefits of Open Source and to build bridges among different constituencies in the Open Source community. It is the steward of the Open Source Definition and the Open Source AI Definition, setting the foundation for the global Open Source ecosystem. Join and support the OSI mission today at: https://opensource.org/join.
ClearlyDefined’s Steering and Outreach Committees Defined
We are excited to announce the newly elected leaders for the ClearlyDefined Steering and Outreach Committees!
What is ClearlyDefined?ClearlyDefined is an Open Source project dedicated to improving the clarity and transparency of Open Source licensing and security data. By harvesting, curating, and sharing essential metadata, ClearlyDefined helps developers and organizations better understand their software components, ensuring responsible and compliant use of Open Source code.
Steering Committee Election Results:Congratulations to E. Lynette Rayle (GitHub), Qing Tomlinson (SAP), and Jeff Mendoza (Kusari/GUAC) for being elected to the ClearlyDefined Steering Committee. These three community leaders will serve a one-year term starting on September 25, 2024. Following election recommendations, the Steering Committee is structured to have an odd number of members (three in this case) and a maximum of one member per company. Lynette Rayle was elected chair of the committee.
The Steering Committee is primarily responsible for setting the project’s technical direction. They oversee processes such as data harvesting, curation, and contribution, ensuring that the underlying architecture functions smoothly. Their focus is on empowering the community, supporting the contributors and maintainers, and fostering collaboration with related projects.
E. Lynette Rayle is a Senior Engineer at GitHub and has been working on ClearlyDefined as a maintainer for just over a year. GitHub is using ClearlyDefined data in several capacities and has a strong stake in ensuring successful outcomes in data quality, performance, and sustainability.
Qing Tomlinson is a Senior Developer at SAP and has been contributing to the ClearlyDefined project since November 2021. SAP has been actively engaged in the ClearlyDefined project since its inception, utilizing the data and actively contributing to its curation. The quality, performance, and long-term viability of the ClearlyDefined project are of utmost importance to SAP.
Jeff Mendoza is a Software Engineer at Kusari, a software supply chain security startup. He is a maintainer of the OpenSSF GUAC project, which consumes ClearlyDefined data. Formerly, Jeff was a full time developer on ClearlyDefined. Jeff brings experience from both the sides of the project, developer and consumer.
Outreach Committee Election Results:We are also thrilled to announce the election of Jeff Luszcz (GitHub), Alyssa Wright (Bloomberg), Brian Duran (SAP), and Nick Vidal (Open Source Initiative) to lead the ClearlyDefined Outreach Committee. They began their one-year term on October 7, 2024. Unlike the Steering Committee, the Outreach Committee has four members, following a consensus reached at the Community meeting that an even number of members is acceptable since tie-breaking votes are less likely. The elected members will select their Chair soon and may also invite other community members to participate.
The Outreach Committee focuses on promoting the project and growing its community. Their responsibilities include organizing events, creating educational materials, and managing communications across various channels, including blogs, social media, and webinars. They help ensure that more users and contributors engage with ClearlyDefined and understand its mission.
Jeff Luszcz is Staff Product Manager at GitHub. Since 2004, he has helped hundreds of software companies understand how to best use open source while complying with their license obligations and keeping on top of security issues.
Alyssa Wright helps lead Bloomberg’s Open Source Program Office in the Office of the CTO, which is the center of excellence for Bloomberg’s engagements with and consumption of open source software.
Brian Duran leads the implementation strategy for adoption of ClearlyDefined within SAP’s open source compliance teams. He has a combined 12 years of experience in open-source software compliance and data quality management.
Nick Vidal is Community Manager at the Open Source Initiative and former Outreach Chair at the Confidential Computing Consortium from the Linux Foundation. Previously, he was the Director of Community and Business Development at the Open Source Initiative and Director of Americas at the Open Invention Network.
Get Involved!We encourage everyone in the ClearlyDefined community to get involved! Whether you’re a developer, data curator, or simply passionate about Open Source software, your contributions are invaluable. Join the conversation, attend meetings, and share your ideas on how to improve and grow the project. Reach out to the newly elected committee members or participate in our upcoming community events.
Let’s work together to drive the ClearlyDefined mission forward! Stay tuned for more updates and opportunities to participate as the committees continue their important work.
Rahmat Akintola: Voices of the Open Source AI Definition
The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.
This series features the voices of the volunteers who have helped shape and are shaping the Definition.
Meet Rahmat AkintolaWhat’s your background related to Open Source and AI?
Sure. I’ll start with Open Source. My journey began at PyCon Africa in 2019, where I participated in a hackathon on Cookiecutter. At the time, I had just transitioned into web development, and I was looking for ways to improve my skills beyond personal projects. So, I joined the Cookiecutter Academy at Python Africa in 2019. That’s how I got introduced to Open Source.
Since then, I’ve been contributing regularly, starting with one-off contributions to different projects. These days, I primarily focus on code and documentation contributions, mainly in web development.
As for AI, my journey started with data science. I had been working as a program manager and was part of the Women in Machine Learning and Data Science community in Accra, which was looking for volunteers. Coincidentally, I had lost my job at the time, so I applied for the program manager role and got it. That experience sparked my interest in AI. I started learning more about machine learning and AI, and I needed to build my domain knowledge to help with my role in the community.
I’ve worked on traditional models like linear and logistic regression through various courses. Recently, as part of our community, we organized a “Mathematics for Machine Learning” boot camp, where we worked on projects related to reinforcement learning and logistic regression. One dataset I worked with involved predicting BP (blood pressure) levels in the US. The task was to assess the risk of developing hypertension based on various factors.
What motivated you to join this co-design process to define Open Source AI?
The Open Source AI journey started when I was informed about a virtual co-design process that was reaching out to different communities, including mine. As the program lead, I saw it as an opportunity to merge my two passions—Open Source and AI.
I volunteered and worked on testing the OpenCV workbook, as I was using OpenCV at the time. I participated in the first phase, which focused on determining whether certain datasets needed to be open. Unfortunately, I couldn’t participate in the validation phase because I was involved in the mathematics boot camp, but I followed the discussions closely.
When the opportunity came up to participate in the co-design process, I saw it as a chance to bridge my work in Open Source web development and my growing interest in AI. It felt like the perfect moment. I was already using OpenCV, which happened to be part of the AI systems under review, so I jumped right in.
Through the process, I realized that defining Open Source AI goes beyond just using tools or making code contributions—it involves a deep understanding of data, legality, and the broader system.
How did you get invited to speak at the Deep Learning Indaba conference in Dakar? How was the conference experience? Did you make any meaningful connections?
As for speaking at Deep Learning Indaba, the opportunity came unexpectedly. One day, Mer Joyce (the OSAID co-design organizer) sent an email offering a chance to speak on Open Source AI at the conference. I had previously applied to attend but didn’t get in, so I jumped on this opportunity. We used a presentation similar to one May had given at Open Source Community Africa.
I made excellent connections. The conference itself was amazing—though the food and the Senegal experience also played a part! There were many AI and machine learning researchers, and I learned new concepts, like using JAX, which was introduced as an alternative to some common frameworks. The tutorials were well-targeted at beginners, which was perfect for me.
On a personal level, it was great to connect with academics. I’m considering applying for a master’s or Ph.D., and the conference provided an opportunity to ask questions and receive guidance.
Why do you think AI should be Open Source?
AI is becoming a significant part of our lives. I work with the Meltwater Entrepreneurial School of Technology (MEST) as a technical lead, and we use AI for various training purposes. Opening up parts of AI systems allows others to adapt and refine them to suit their needs, especially in localized contexts. For example, I saw someone on Twitter excited about building a GPT for dating, customizing it to ask specific questions.
This ability for people to tweak and refine AI models, even without building them from scratch, is important. Open-sourcing AI enables more innovation and helps tailor models for specific needs, which is why I believe it should be open to an extent.
Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?
One new perspective I gained was on the legal and data availability aspects of AI. Before this, I had never really considered the legal side of things, but during the co-design process, it became clear that these elements are crucial in defining Open Source AI systems. It’s more than just contributing code—it’s about ensuring compliance with legal frameworks and making sure data is available and usable.
What do you think the primary benefit will be once there is a clear definition of Open Source AI?
A clear definition would help people understand that Open Source AI involves more than just attaching an MIT or Apache license to a project on GitHub. There’s more complexity around sharing models, data and parameters.
For instance, I was once asked whether using an “Open Source” large language model like LLaMA meant the data had to be open too. A well-defined standard would provide guidance for questions like these, ensuring people understand the legal and technical aspects of making their AI systems Open Source.
What do you think are the next steps for the community involved in Open Source AI?
In Africa, I think the next step is spreading awareness about the Open Source AI Definition. Many people are still unaware of the complexities, and there’s still a tendency to assume that adding an Open Source license to a project automatically makes it open. Building collaborations with local communities to share this information is important.
For women, especially in Africa, visibility is key. When women see others doing similar work, they feel encouraged to join. Representation and community engagement play significant roles in driving diversity in Open Source AI.
How to get involvedThe OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:
- Join the forum: share your comment on the drafts.
- Leave comment on the latest draft: provide precise feedback on the text of the latest draft.
- Follow the weekly recaps: subscribe to our monthly newsletter and blog to be kept up-to-date.
- Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions and share your thoughts.
- Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.
How we passed the AI conundrums
Some people believe that full unfettered access to all training data is paramount. This group argues that anything less than all the data would compromise the Open Source principles, forever removing full reproducibility of AI systems, transparency, security and other outcomes. We’ve heard them and we’ve provided a solution rooted in decades of Open Source practice.
To have the chance for powerful Open Source AI systems to exist in any domain, the OSI community has incorporated in the Definition this principle:
An Open Source AI needs to make available three kinds of components: the software used to create the dataset and run the training, the model parameters and the code to run inference, and finally all the data that can be made available legally.
Recognizing that there are four kinds of “data”, each with its own legal frameworks allowing different freedoms of distribution, we bypass what Stephen O’Grady called the “AI conundrums” and give Open Source AI builders a chance to build freedom-respecting alternatives to pretty much any proprietary AI.
Limiting Open source AI only to systems trainable on freely distributable data would relegate Open Source AI to a niche. One of which is that the amount of freely and legally shareable data is a tiny fraction of what is necessary to train powerful systems. Additionally, it’d be excluding Open Source AI from areas where data cannot be shared, like medical or anything dealing with personal or private data. What remains for “Open Source AI” would be tiny. There are abundant motives to reject this limitation.
The fact is, mixing openly distributable and non-distributable data is very similar to a reality we are very familiar with: Open Source software built with proprietary compilers and system libraries.
Is GNU Emacs Open Source software?I’m sure you’d answer yes (and some of you will say “well, actually it’s free software”) and we’ll all agree. Below is a rough diagram of Emacs built for the GNOME desktop on a modern Linux distribution. Emacs depends on a few system libraries that GNOME provides with OSI-Approved Licenses. The whole stack is Open Source these days and one can distribute Emacs on a disk with all its dependencies without too much legal trouble. Imagine scientists who want to freeze the whole environment of an experiment they made; they could package all the pieces of a system like this without trouble and distribute it all with their paper. No problem here.
Now let’s go back to an age when Linux systems weren’t ready. When Stallman started writing Emacs, there was no GNOME and no Linux, no gcc and no glibc. He thought very early on that in order to have more freedom, he had to create a wedge to allow Emacs to run on proprietary software.
Emacs on the latest Solaris versions would look something like this: some pieces like X11 and Gstreamer are Open Source. Others, like libc and others aren’t. The hypothetical scientists from before couldn’t really freeze their full scientific environment. All they could say in their paper was: “We used Emacs from this CVS version, built with gcc version X with these makefile; tar.gz attached” and make a list of the operating system’s version and libraries versions they used. That’s because they have the right only to distribute Emacs, X11, some libraries and not the rest of Solaris.
Is Emacs on Solaris Open Source? Of course it is, even though the source code for the system libraries are not available.
One more question, Emacs on Mac OS: it can only be built with a proprietary compiler on proprietary GUI and other proprietary libraries.
Is Emacs on Mac Open Source? Of course it is. Can you fully study Emacs on Mac OS? For Emacs, yes. For the MacOS components, no. There are many programs that run only on MacOS or Windows: for OSI, those are Open Source. Would someone argue that they’re not “really Open Source” because you can’t see “everything?” Some people might but we’ve learned to live with that, adding governance rules in addition to those of the Open Source Definition. Debian for example requires that programs are Open Source and support multiple hardware platforms; the ASF graduates only projects that are Open Source and have a diverse community of contributors. If you only want to use Open Source applications running on Open Source stacks, you can decide that! Just as you can decide that your company will only acquire Open Source software whose copyright is owned by multiple entities.
These are all additional requirements built on top of the base floor set by the Open Source Definition.
For AI, you can do the same: You can say “I will only use Open Source AI built with open data, because I don’t want to trust anything less than that.” A large organization could say “I will buy only Open Source AI that allows me to audit their full dataset, including unshareable data.” You can do all that. Open Source AI is the floor that you can build on, like the OSD.
Bypassing the conundrumsWe’ve looked for a solution for almost three years and this is it: Require all the data that is legally shareable, and for the other data provide all the details. It’s exactly what we’ve been doing for Open Source software:
You developed a text editor for Mac OS but you can’t share the system libraries? Fine, we’ll fork it: give us all the code you can legally share with an OSI-Approved License and we’ll rip the dependencies and “liberate” it to run on GNU. The editor will be slightly different, like code that runs on some ARM+Linux systems behaves differently on Intel+Windows for the different capabilities of the underlying hardware and OS, but it’s still Open Source.
For Open Source AI it’s a similar dance: You can’t legally give us all the data? Fine, we’ll fork it. For example, you made an AI that recognizes bone cancer in humans but the data can’t be shared. We’ll fork it! Tell us exactly how you built the system, how you trained it, share the code you used, and an anonymized sample of the data you used so we can train on our X-ray images. The system will be slightly different but it’s still Open Source AI.
If we want to have broad availability of powerful alternatives to proprietary AI systems that respect the freedoms of users and deployers, we must recognize conditions that make sense for the domain of AI. These examples of proprietary compilers and system libraries used to build Open Source software prove that there is room for similar conditions when talking about Code, Data and Parameters within the definition of Open Source AI.
The Open Source Initiative Supports the Open Source Pledge
As businesses rely more heavily on Open Source software (OSS), the strain on maintainers to provide timely updates and security patches continues to grow – often without fair compensation for their crucial work. Recent high-profile security incidents like XZ and Log4Shell have put a spotlight on the security challenges developers face against a backdrop of burnout that has reached an all-time high.
To help address this imbalance, the Open Source Initiative (OSI) supports the Open Source Pledge, launched today by Sentry and partners to support maintainers and inspire a shift toward a healthier work-life balance, and more robust software security practices. The Pledge is a commitment from member companies to pay Open Source maintainers and organizations meaningfully in support of a more sustainable maintainer ecosystem and a reduction of flare-ups of high-profile security incidents.
This Pledge is an attempt to address a problem that has long existed within the Open Source ecosystem. Many companies have built their businesses on top of Open Source software, benefiting from the contributions of maintainers taking them for granted. While they’ve reaped the rewards, the burden has been placed on unpaid or underpaid developers.
It is essential that companies recognize their role in sustaining the ecosystem that powers their innovations. By taking the Pledge, companies have one more instrument to commit to supporting an ecosystem of maintainers and organizations, ensuring the long-term health of the Open Source projects they rely on.
In order to qualify, the projects that companies pledge to should meet the Open Source Definition. You can join the Open Source Pledge by donating to the Open Source Initiative or contacting us to become a sponsor.
Members Newsletter – October 2024
We’re pleased to announce that Release Candidate 1 of the Open Source AI Definition has been confirmed and published! If you’d like to add your name to the list of endorsers published online, please let us know.
We traveled four continents presenting to diverse audiences and soliciting feedback on the draft definition: Deep Learning Indaba in Dakar, Senegal; IndiaFOSS in Bangalore, India; Open Source Summit EU in Vienna, Austria; and Nerdearla in Buenos Aires, Argentina.
The work continues this month as we continue to seek input at the Data in OSAI in Paris, France; at OCX in Mainz, Germany; and during our weekly town hall meetings. And, finally, we’ll be in Raleigh, North Carolina, at the end of the month for All Things Open, where we plan to present the Open Source AI Definition version 1.0!
My thanks to everyone who is contributing to this community-led process. Please continue to let your voice be heard.
Stefano Maffulli
Executive Director, OSI
I hold weekly office hours on Fridays with OSI members: book time if you want to chat about OSI’s activities, if you want to volunteer or have suggestions.
News from the OSI The Open Source AI Definition RC1 is available for commentsThe Open Source AI Definition first Release Candidate has been published and collaboration continues.
Other highlights:
- Co-designing the OSAID: a highlight from Nerdearla
- A Journey toward defining Open Source AI: presentation at Open Source Summit Europe
- Copyright law makes a case for requiring data information rather than open datasets for Open Source AI
- Data Transparency in Open Source AI: Protecting Sensitive Datasets
- Is “Open Source” ever hyphenated?
- Jordan Maris joins OSI
Article from Mark Surman at The New Stack
Other highlights:
- Elastic founder on returning to open source four years after going proprietary (TechCrunch)
- Europe’s Tech Future Hinges on Open Source AI (The New Stack)
- How big new AI regulatory pushes could affect open source (Tech Brew)
- Does Open Source Software Still Matter? (Datanami)
- AI2’s new model aims to be open and powerful yet cost effective (VentureBeat)
- Is that LLM Actually “Open Source”? We need to talk Open-Washing in AI Governance (HackerNoon)
- AI Models From Google, Meta, Others May Not Be Truly ‘Open Source’ (PCMag)
- What’s Behind Elastic’s Unexpected Return to Open Source? (The New Stack)
The Open Policy Alliance reaches 100 members on LinkedIn.
Other newsNews from OSI affiliates:
- The Eclipse Foundation Launches the Open Regulatory Compliance Working Group to Help Open Source Participants Navigate Global Regulations
- LPI and OSI Unite to Professionalize the Global Linux and Open Source Ecosystem
- Apache Software Foundation Initiatives to Fuel the Next 25 Years of Open Source Innovation
- Open source orgs strengthen alliance against patent trolls
- Open Source Foundations Considered Helpful
- Solving the Maker-Taker problem
News from OpenSource.net:
- OpenSource.Net turns one with a redesign
- Beyond the binary: The nuances of Open Source innovation
- Steady in a shifting Open Source world: FreeBSD’s enduring stability
Tidelift’s 2024 State of the Open Source Maintainer Report
More than 400 maintainers responded and shared details about their work.
The State of Open Source Survey
In collaboration with the Eclipse Foundation and Open Source Initiative (OSI).
JobsLead OSI’s public policy agenda and education.
Bloomberg is seeking a Technical Architect to join their OSPO team.
EventsUpcoming events:
- Hacktoberfest (October – Online)
- SOSS Fusion (October 22-23, 2024 – Atlanta)
- Open Community Experience (October 22-24, 2024 – Mainz)
- All Things Open (October 27-29 – Raleigh)
- Nerdearla Mexico (November 7-9, 2024 – Mexico City)
- SeaGL (November 8-9, 2024 – Seattle)
- OpenForum Academy Symposium (November, 13-14, 2024 – Boston)
CFPs:
- FOSDEM 2025 call for devrooms (February 1-2, 2024 – Brussels)
- Consul Conference (February, 4-6, 2025 – Las Palmas de Gran Canaria)
- SCALE 22x (March 6-9, 2025 – Pasadena)
- Mercado Libre
- FerretDB
- Word Unscrambler
Interested in sponsoring, or partnering with, the OSI? Please see our Sponsorship Prospectus and our Annual Report. We also have a dedicated prospectus for the Deep Dive: Defining Open Source AI. Please contact the OSI to find out more about how your company can promote open source development, communities and software.
Support OSI by becoming a member!Let’s build a world where knowledge is freely shared, ideas are nurtured, and innovation knows no bounds!
Co-designing the OSAID: a highlight from Nerdearla
At the 10th anniversary of Nerdearla, one of the largest Open Source conferences in Latin America, Mer Joyce, Co-Design Facilitator of the Open Source AI Definition (OSAID), delivered a key presentation titled “Defining Open Source AI”. Held in Buenos Aires from September 24-28, 2024, this major event brought together 12,000 in-person participants and over 30,000 virtual attendees, with more than 200 speakers from 20 countries. Organized as a free-to-attend event, Nerdearla 2024 exemplified the spirit of Open Source collaboration by providing a platform for developers, enthusiasts, and thought leaders to share knowledge and foster community engagement.
Why is a definition so important?Mer Joyce took the stage at Nerdearla to present “Defining Open Source AI”. Mer’s presentation focused on the organization’s ongoing work to establish a global Open Source AI Definition (OSAID). She emphasized the importance of co-designing this definition through a collaborative, inclusive process that ensures input from stakeholders across industries and continents.
Her talk underscored the significance of defining Open Source AI in the context of increasing AI regulations from governments in the EU, the U.S., and beyond. In her view, defining OSAI is essential for combating “open-washing”—where companies falsely market their AI systems as Open Source while imposing restrictive licenses—and for promoting true openness, transparency, and innovation in the AI space.
A global and inclusive processMer Joyce highlighted the co-design process for the Open Source AI Definition, which has been truly global in scope. Workshops, talks, and activities were held on five continents, including Africa, Europe, Asia, North, and South America, with participants from over 35 countries. These in-person and virtual sessions ensured that voices from a wide range of backgrounds—especially those from underrepresented regions—contributed to shaping the OSAID.
The four freedomsThe core of the OSAID rests on the “Four Freedoms” of Open Source AI:
- Use the system for any purpose and without having to ask for permission.
- Study how the system works and inspect its components.
- Modify the system for any purpose, including to change its output.
- Share the system for others to use with or without modifications, for any purpose.
Four working groups were formed with the intention of identifying what components must be open in order for an AI system to be used, studied, modified, and shared. The working groups focused on Bloom, OpenCV, Llama 2, and Pythia, four systems with different approaches to OSAI.
Each working group voted on the required components and evaluated legal frameworks and legal documents for each component. Subsequently, each working group proceeded to publish a recommendation report.
The end result is the OSAID with a comprehensive definition checklist encompassing a total of 17 components. As part of the validation process, more working groups are being formed to evaluate how well other AI systems align with the definition.
Nerdearla: a platform for open innovationMer Joyce’s presentation at Nerdearla exemplified the broader theme of the conference—creating a more open and collaborative future for technology. As one of the largest Open Source conferences in Latin America, Nerdearla serves as a vital hub for fostering innovation across the Open Source community. By bringing together experts like Mer Joyce to discuss pivotal issues such as AI transparency and openness, the event highlights the importance of defining shared standards for emerging technologies.
Moving forward: the future of the OSAIDThe OSAID is currently in its final stages of development, with version 1.0 expected to be launched at the All Things Open conference in October 2024. The OSI invites individuals and organizations to endorse the OSAID ahead of its official release. This endorsement signifies support for a global definition that aims to ensure AI systems are open, transparent, and aligned with the values of the Open Source movement.
To get involved, participants are encouraged to attend weekly town halls, contribute feedback, and participate in the public review process. Consider endorsing the OSAID to become a part of the movement to define and promote truly Open Source AI systems.
The Open Source AI Definition RC1 is available for comments
A little over a month after v.0.0.9, we have a Release Candidate version of the Open Source AI Definition. This was reached with lots of community feedback: 5 town hall meetings, several comments on the forum and on the draft, and in person conversations at events in Austria, China, India, Ghana, and Argentina.
There are three relevant changes to the part of the definition pertaining to the “preferred form to make modifications to a machine learning system.”
The feature that will draw most attention is the new language of Data Information. It clarifies that all the training data needs to be shared and disclosed. The updated text comes from many conversations with several individuals who engaged passionately with the design process, on the forum, in person and on hackmd. These conversations helped describe four types of data: open, public, obtainable and unshareable data, well described in the FAQ. The legal requirements are different for each. All are required to be shared in the form that the law allows them to be shared.
Two new features are equally important. RC1 clarifies that Code must be complete, enough for downstream recipients to understand how the training was done. This was done to reinforce the importance of the training, both for transparency, security and other practical reasons. Training is where innovation is happening at the moment and that’s why you don’t see corporations releasing their training and data processing code. We believe, given the current status of knowledge and practice, that this is required to meaningfully fork (study and modify) AI systems.
Last, there is new text that is meant to explicitly acknowledge that it is admissible to require copyleft-like terms for any of the Code, Data Information and Parameters, individually or as bundled combinations. A demonstrative scenario is a consortium owning rights to training code and a dataset deciding to distribute the bundle code+data with legal terms that tie the two together, with copyleft-like provisions. This sort of legal document doesn’t exist yet but the scenario is plausible enough that it deserves consideration. This is another area that OSI will monitor carefully as we start reviewing these legal terms with the community.
A note about science and reproducibilityThe aim of Open Source is not and has never been to enable reproducible software. The same is true for Open Source AI: reproducibility of AI science is not the objective. Open Source’s role is merely not to be an impediment to reproducibility. In other words, one can always add more requirements on top of Open Source, just like the Reproducible Builds effort does.
Open Source means giving anyone the ability to meaningfully “fork” (study and modify) a system, without requiring additional permissions, to make it more useful for themselves and also for everyone. This is why OSD #2 requires that the “source code” must be provided in the preferred form for making modifications. This way everyone has the same rights and ability to improve the system as the original developers, starting a virtuous cycle of innovation. Forking in the machine learning context has the same meaning as with software: having the ability and the rights to build a system that behaves differently than its original status. Things that a fork may achieve are: fixing security issues, improving behavior, removing bias. All these are possible thanks to the requirements of the Open Source AI Definition.
What’s coming nextWith the release candidate cycle starting today, the drafting process will shift focus: no new features, only bug fixes. We’ll watch for new issues raised, watching for major flaws that may require significant rewrites to the text. The main focus will be on the accompanying documentation, the Checklist and the FAQ. We also realized that in our zeal to solve the problem of data that needs to be provided but cannot be supplied by the model owner for good reasons, we had failed to make clear the basic requirement that “if you can share the data you must.” We have already made adjustments in RC1 and will be seeking views on how to better express this in an RC2.
In the next weeks until the 1.0 release of October 28, we’ll focus on:
- Getting more endorsers to the Definition
- Continuing to collect feedback on hackmd and forum, focusing on new, unseen-before concerns
- Preparing the artifacts necessary for the launch at All Things Open
- Iterating on the Checklist and FAQ, preparing them for deployment.
A Journey toward defining Open Source AI: presentation at Open Source Summit Europe
A few weeks ago I attended Open Source Summit Europe 2024, an event organized by the Linux Foundation, that brought together brilliant developers, technologists and leaders from all over the world, reinforcing what Open Source is truly about—collaboration, innovation and community.
I had the honor of leading a session that tackled one of the most critical challenges in the Open Source movement today—defining what it means for AI to be “Open Source.” Along with OSI Board Director Justin Colannino, we presented the v.0.0.9 for the Open Source AI Definition. This session marked an important milestone for both the Open Source Initiative (OSI) and the broader community, a moment that encapsulated years of collaboration, learning and exploration.
The story behind the Open Source AI DefinitionOur session, titled “The Open Source AI Definition Is (Almost) Ready” was more than just a talk—it was an interactive dialogue. As Justin kicked off the session, he captured the essence of the journey we’ve been on. OSI has been grappling with what it means to call AI systems, models and weights “Open Source.” This challenge comes at a time when companies and even regulations are using the term without a clear, agreed-upon definition.
From the outset, we knew we had to get it right. The Open Source values that have fueled so much software innovation—transparency, collaboration, freedom—needed to be the foundation for AI as well. But AI isn’t like traditional software, and that’s where our challenge began.
The origins: a podcast and a visionWhen I first became Executive Director of OSI, I pitched the idea of exploring how Open Source principles apply to AI. We spent months strategizing, and the more we dove in, the more we realized how complex the task would be. We didn’t know much about AI at the time, but we were eager to learn. We turned to experts from various fields—a copyright lawyer, an ethicist, AI pioneers from Eleuther AI and Debian ML, and even an AI security expert from DARPA. Those conversations culminated in a podcast we created called Deep Dive AI, which I highly recommend to anyone interested in this topic.
Through those early discussions, it became clear that AI and machine learning are not software in the traditional sense. Concepts like “source code,” which had been well-defined in software thanks to people like Richard Stallman and the GNU GPL, didn’t apply 1:1 to AI. We didn’t even know what the “program” was in AI, nor could we easily determine the “preferred form for making modifications”—a cornerstone of Open Source licensing.
This realization sparked the need to adapt the Open Source principles we all know so well to the unique world of AI.
Co-designing the future of Open Source AIOnce we understood the scope of the challenge, we knew that creating this definition couldn’t be a solo endeavor. It had to be co-designed with the global community. At the start of 2023, we had limited resources—just two full-time staff members and a small budget. But that didn’t stop us from moving forward. We began fundraising to support a multi-stakeholder, global conversation about what Open Source AI should look like.
We brought on Mer Joyce, a co-design expert who introduced us to creative methods that ensure decisions are made with the community, not for it. With her help, we started breaking the problem into smaller pieces and gathering insights from volunteers, AI experts and other stakeholders. Over time, we began piecing together what would eventually become v.0.0.9 of the Open Source AI Definition.
By early 2024, we had outlined the core principles of Open Source AI, drawing inspiration from the free software movement. We relied heavily on foundational texts like the GNU Manifesto and the Four Freedoms of software. From there, we built a structure that mirrored the values of freedom, collaboration and openness, but tailored specifically to the complexities of AI.
Addressing the unique challenges of AIOf course, defining the freedoms was only part of the battle. AI and machine learning systems posed new challenges that we hadn’t encountered in traditional software. One of the key questions we faced was: What is the preferred form for making modifications in AI? In traditional software, this might be source code. But in AI, it’s not so straightforward. We realized that the “weights” of machine learning models—those parameters fine-tuned by data—are crucial. However, data itself doesn’t fit neatly into the Open Source framework.
This was a major point of discussion during the session. Code and weights need to be covered by an OSI-approved license because they represent the modifiable core of AI systems. However, data doesn’t meet the same criteria. Instead, we concluded that while data is essential for understanding and studying the system, it’s not the “preferred form” for making modifications. Instead, the data information and code requirements allow Open Source AI systems to be forked by third-party AI builders downstream using the same information as the original developers. These forks could include removing non-public or non-open data from the training dataset, in order to retrain a new Open Source AI system on fully public or open data. This insight was shaped by input from the community and experts who joined our study groups and voted on various approaches.
The road ahead: a collaborative futureAs we wrap up this phase, the next step is gathering even more feedback from the community. The definition isn’t final yet, and it will continue to evolve as we incorporate insights from events like this summit. I’m incredibly grateful for the thoughtful comments we’ve already received from people all over the world who have helped guide us along this journey.
At the core of this project is the belief that Open Source AI should reflect the same values that have made Open Source a force for good in software development. We’re not there yet, but together, we’re building something that will have a lasting impact—not just on AI, but on the future of technology as a whole.
I want to thank everyone who has contributed to this project so far. Your dedication and passion are what make Open Source so special. Let’s continue to shape the future of AI, together.
Is “Open Source” ever hyphenated?
No! Open Source is never hyphenated when referring to software. If you’re familiar with English grammar you may have more than an eyebrow raised: read on, we have an explanation. Actually, we have two.
We asked Joseph P. De Veaugh-Geiss, a linguist and KDE’s project manager, to provide us with an explanation. If that’s not enough, we have one more argument at the end of this post.
Why Open Source is not hyphenatedIn summary:
- “open source” (no hyphen) is a lexicalized compound noun which is no longer transparent with respect to its meaning (i.e., open source is not just about being source-viewable, but also about defining user freedoms) which can then be further compounded (with for example “open source license”);
- by contrast, “open-source” (with a hyphen) is a compound modifier modifying the head noun (e.g. “intelligence”) with open having a standard dictionary meaning (i.e., “transparent” or “open to or in view of all”).
“Open source” is a lexicalized compound noun. Although it originates with the phrase “open source software”, today “open source” is itself a unique lexeme. An example, in Red Hat’s article:
Open source has become a movement and a way of working that reaches beyond software production.
The word open in “open source” does not have the meaning “open” as one would find in the dictionary. Instead, “open source” also entails user freedoms, inasmuch as users of the software for any purpose do not have to negotiate with the rights owners to enjoy (use/improve/share/monetise) the software. That is, it is not only about transparency.
A natural example of this usage, in which the phrase open source license is clearly about more than just licensing transparency:
Because Linux is released under an open source license, which prevents restrictions on the use of the software, anyone can run, study, modify, and redistribute the source code, or even sell copies of their modified code, as long as they do so under the same license.” (from Red Hat website https://www.redhat.com/en/topics/open-source/what-is-open-source)
Note that “open source license” is itself a compound noun phrase made up of the lexicalized compound noun “open source” + the noun “license”; same for “open source movement”, etc.
What is lexicalization?According to the Lexicon of linguistics (Utrecht University), ‘lexicalization’ is a “phenomenon by which a morphologically complex word starts to behave like an underived word in some respect, which means that at least one feature (semantic, syntactic, or phonological) becomes unpredictable”.
Underived word here means the phrase has a specific, unique meaning not (necessarily) transparent from its component parts. For instance, a “black market” is not a market which is black but rather a specific kind of market: an illegal one. A “blackboard” can be green. In other words, the entire complex phrase can be treated as a single unit of meaning stored in the mental lexicon. The meaning of the phrase is not derived using grammatical rules.
Today, the meaning of open source is unpredictable or semantically intransparent given its usage (at least by a subset of speakers) and meaning, i.e., open source is about user freedoms, not just transparency.
Other examples of lexicalized compound nouns include “yellow journalism”, “purple prose”, “dirty bomb”, “fat chance”, “green card”, “blackbird”, “greenhouse”, “high school”, etc. I tried to think of examples which are composed of adjectives + nouns but with a specific meaning not derivable by the combination of the two. I am sure you can come up with many more!
In some cases, lexicalization results in writing the compound noun phrase together as a single word (‘blackboard’), in other cases not (‘green card’). One can also build larger phrases by combining the lexicalized compound noun with another noun (e.g., black market dealer, green card holder).
Hyphenated open-source is a compound modifierBy contrast, open in “open-source intelligence” is the dictionary meaning of “open”, i.e., “open to or in view of all” or “transparent”. In this case, open-source is a compound modifier/compound adjective with a meaning comparable to “source-viewable”, “source-available”, “source-transparent”.
For compound modifiers, the hyphenation, though not obligatory, is common and can be used to disambiguate. The presence of a head noun like “intelligence” or “journalism” is obligatory for the compound-modifier use of open-source, unlike in lexicalized compounds.
Examples of other compound modifiers + a head noun: “long-term contract”, “single-word modifier”, “high-volume printer”, etc.
ExamplesThere are some examples of the compound-modifier use on Wikipedia where I think the difference between meanings lexicalized compound noun and compound modifier becomes clear:
“Open-source journalism, a close cousin to citizen journalism or participatory journalism, is a term coined in the title of a 1999 article by Andrew Leonard of Salon.com.” (from Wikipedia)
“Open-source intelligence” is intelligence “produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement” (from Wikipedia)
In these examples open-source is clearly referring to transparent, viewable-to-all sources and not to something like ‘guaranteeing user freedoms’. Moreover, my intuition for these latter examples is that removing the hyphen would change the meaning, however subtle it may be, and the change could make the original sentences incoherent (without implicit internal modification while reading):
- “open source journalism” would refer to journalism about open source software (in sense I above), not transparent, participatory journalism;
- “open source intelligence” would refer to intelligence about open source software (in sense I above, whatever that would mean!), not intelligence from publicly available information.
If that explanation still doesn’t convince you, we invoke the rules of branding and “pull a Twitter”, who vandalized English with their Who To Follow : we say no hyphen!
Luckily others have already adopted the “no hyphen” camp, like the CNCF style guide. Debate closed.
If you like debates, let’s talk about capitalization: OSI in its guidelines chose to always capitalize Open Source because it is a proper noun with a specific definition. Which camp are you on?