Open Source Initiative

Subscribe to Open Source Initiative feed Open Source Initiative
The steward of the Open Source Definition, setting the foundation for the Open Source Software ecosystem.
Updated: 11 hours 47 min ago

Unveiling ClearlyDefined: this free SBOM service gets cleared for takeoff

Thu, 2024-05-16 09:43

With all the buzz around SBOMs and Open Source supply chain compliance and security, a new revolution is igniting at ClearlyDefined. This amazing project has been flying under the radar since its inception six years ago, but now this free service and open source project from the Open Source Initiative (OSI) gets cleared for takeoff with the launch of a new website focused on stellar documentation, excellent engineering, and healthy community growth.

Generating SBOMs at scale for each stage on the supply chain, for every build or release, has proven to be a real challenge for organizations. And fixing the same missing or wrongly identified licensing metadata over and over again has been a redundant pain for everyone. This is where ClearlyDefined shines, as it makes it really easy for organizations to fetch a cached copy of licensing metadata for each component through a simple API, which is always up-to-date thanks to its crowdsourced database.

The all-new ClearlyDefined website was completely revamped to welcome community members and foster collaboration united by a shared vision of Open Source excellence. The website is divided into three sections: Docs, Resources, and Community.

Under Docs, both new and existing community members will find several comprehensive guides and tutorials. The main guide is “Getting involved,” where members will embark on a journey to learn how to use the data, curate the data, contribute data, contribute code, add a harvest and adopt practices. The “Roles” guide provides a detailed description of how different roles can master ClearlyDefined, from data consumer and data curator to data contributor and code contributor. Other guides that will expand in the coming months include the “Curation” and “Harvest” guides. Curation is the process of fixing or identifying missing licensing metadata and sharing that with the community, while harvest is the process of fetching licensing metadata directly from the source (package managers like npm and PyPi), processing the license definitions, and making them available through an API.

Under Resources, members will find a rich collection of content: Blog, FAQ, Glossary, Providers, Architecture and Roadmap. The roadmap was created in collaboration with members of the community, who provided input into what they would like to see in 2024 and how they would be able to contribute towards these goals.

Under Community, members will find links to various channels where they can engage with others online or in-person: GitHub, Forum, Events and Meetings. They’ll also find a list of other community members with whom they can forge connections, as well as the Code of Conduct and the project Charter.
We would like to extend a heartfelt thank you to our existing community members who have been instrumental with the launch of the new website and welcome new ones who are learning about the project. Besides expanding the “Curation” and “Harvest” guides, next steps include enhancing the user experience by implementing sitewide search and adding case studies filled with rich media. Come and join the ClearlyDefined community here and get ready to take off together with us. Let’s define the future of Open Source, one definition at a time!

Categories: FLOSS Research

The Open Source AI Definition gets closer to reality with a global workshop series

Wed, 2024-05-15 08:05

The OSI community is traveling to five continents seeking diverse input on how to guarantee the freedoms to use, study, share and modify Open Source AI systems.

SAN FRANCISCO – May 14, 2024 Open Source Initiative (OSI), globally recognized by individuals, companies and public institutions as the authority that defines Open Source, is driving a global multi-stakeholder process to define “Open Source AI.” This definition will provide a framework to help AI developers and users determine if an AI system is Open Source or not, meaning that it’s available under terms that allow unrestricted rights to use, study, modify and share. There are currently no accepted means by which openness can be validated for AI, yet many organizations are claiming their AI to be “Open Source.” Just as the Open Source Definition serves as the globally accepted standard for Open Source software, so will the Open Source AI Definition act as a standard for openness in AI systems and their components.

In 2022 the OSI started an in-depth global initiative to engage key players, including corporations, academia, the legal community and organizations and nonprofits representing wider civil society, in a collaborative effort to draft a definition of Open Source AI that ensures that society at large can retain agency and control over the technology. The project has increased in importance as legislators around the world started regulating AI, asking for feedback as guardrails are defined.

This open process has resulted in a massive body of work including podcasts, panel discussions, webinars, published reports, and a plethora of town halls, workshops and conference sessions around the world. A big emphasis was given to make the process as inclusive and representative as possible: 53% of the working groups were composed of people of color. Women and femmes, including transgender women, accounted for 28% of the total and 63% of those individuals are women of color. 

After months of weekly town hall meetings, draft releases and reviews the OSI is nearing a stable version of the Open Source AI Definition. Now, the OSI is embarking on a roadshow of workshops to be held on five continents to solicit input from diverse stakeholders on the draft definition. The goal is to present a stable version of the definition in October at the All Things Open event in Raleigh, North Carolina. This “Open Source AI Definition Roadshow” is sponsored by the Alfred P. Sloan Foundation, and OSI’s sponsors and donors.

“AI is different from regular software and forces all stakeholders to review how the Open Source principles apply to this space,” said Stefano Maffulli, executive director of the OSI. “OSI believes that everybody deserves to maintain agency and control of the technology. We also recognize that markets flourish when clear definitions promote transparency, collaboration and permissionless innovation. After spending almost two years gathering voices from all over the world to identify the principles of Open Source suitable for AI systems, we’re embarking on a worldwide roadshow to refine and validate the release candidate version of the Open Source AI Definition.”

The schedule of workshops is as follows: 

  • North America
  • Europe
  • Africa
    • Nigeria, Lagos, August tentative
  • Asia Pacific
    • Hong Kong, AI_Dev (August 23)
    • Asia – details TBD, DPGA members meeting (November 12 – 14)
  • Latin America
    • Argentina, Buenos Aires, Nerdearla (September 24 – 28)

For weekly updates, town hall recordings and access to all the previously published material, visit

Supporters of the Open Source AI Definition Process

The Deep Dive: Defining Open Source AI co-design process is made possible thanks to grant 2024-22486 from Alfred P. Sloan Foundation, donations from Google Open Source, Cisco, Amazon and others, and donations by individual members. The media partner is
Others interested in offering support can contact OSI at

Categories: FLOSS Research

Open Source AI Definition – Weekly update May 13

Tue, 2024-05-14 11:08

Early thoughts on “Apple sample code license”?
  • Apple has released a license to distribute its new model, OpenELM. The license looks BSD/MIT-like with the exclusion of patents. According to you, does it seem OSD compliant?
    • Initial thoughts:
    • @pchestek added that the license appears to be similar to open source but raises concerns about potential limitations on rights, particularly regarding patents. It highlights Apple’s approach of granting only a copyright license, which might not be sufficient for ensuring all necessary freedoms, especially in the context of AI models
    • @shujisado agreed, saying that the terms related to trademarks and patents need to be scrutinized
Question regarding the 0.0.8 version 
  • @Aspie96 asks clarifying questions regarding the list of open components and points out how, unlike “traditional” software which can be released as open source software without as easy as proprietary software, this definition seem to require a lot more components to be open.
    • Stefano points out that “The “classic” Open Source Definition is applied to licenses, not to the software” and “ if a program is shipped with a license approved by the OSI then the software is considered Open Source”
    • He further states that “Through the co-design process of the Open Source AI Definition we learned that to use, study, share and modify an ML system one needs a complex combo of multiple components each following diverse legal regimes (not just the usual copyright+patents.) Therefore we must describe in more details what is required to grant users the agency and control expected.”
The FAQ page is being developed
  • The frequently asked questions page is starting to take form
  • We add relevant questions that have arisen from the forums so far, though if you have any contributions in mind, please leave a comment!
Open Source Initiative at PyCon!

This week, Stefano, Mer and the OSI team are visiting Pittsburgh, PA, hosting the first workshop of our Open Source AI Definition Roadshow! We are starting to get more in-person feedback on our draft definition.

If you are at PyCon come visit us on the 17th, from 11 am to 1pm in the Open Space area!

Categories: FLOSS Research

Why datasets built on public domain might not be enough for AI

Tue, 2024-05-07 06:00

There is tension between copyright laws and large datasets suitable to train large language models. Common Corpus is a dataset that only uses text from copyright-expired sources to bypass the legal issues. It’s a useful achievement, paving the path to research without immediate risk of lawsuits. I also fear that this approach may lead to bad policies, reinforcing the power of copyright holders; not the small creators but large corporations. 

A dataset built on public domain sources

In March 2024 Common Corpus was released as an open access dataset for training large language models (LLMs). Announcing the release, the lead developer Pierre-Carl Langlais says “Common Corpus shows it is possible to train fully open LLMs on sources without copyright concerns.” The dataset contains 500 billion words in multiple European languages and different cultural heritages. It is a project coordinated by the French startup Pleias and supported by organizations committed to open science such as Occiglot, Eleuther AI and Nomic AI as well as being partly funded by the French government. The stated intention of Common Corpus is to democratize access to large quality datasets. It has many other positive characteristics, highlighted also by Open Future’s summary of a talk given by Langlais

The commons needs more data

The debates sparked by the Deep Dive: AI process on the role of training data highlighted that AI practitioners encounter many obstacles assembling datasets. At the same time, we discovered that tech giants have an incredible advantage over researchers and startups. They’ve been slurping data for decades, have the financial means to go to court and can enter into bilateral agreements to license data. These strategies are inaccessible to small competitors and academics. Accepting that the only path to creating open large datasets suitable to train Open Source AI systems is to use sources in the public domain, risks cementing the dominant positions of existing large corporations.

The open landscape already faces issues with big tech and their ability to influence legislation. The big corporations have lobbied to extend the duration of copyright, introduced the DMCA, are opposing the right to repair, and have the resources to continue lobbying and sue any new entrant who they deem to get too close. There are plenty of examples showing an unequal advantage in protecting what they think is theirs. The non-profit Fairly Trained certifies companies “willing to prove that they’ve trained their AI models on data that they own, have licensed, or that is in the public domain,” respecting copyright law: who’s going to benefit from this approach?

Unsuitable for public policies

Initiatives like Common Corpus and The Stack (used to train Starcoder2) are important achievements as they allow researchers to develop new AI systems while mitigating the risk of being sued. They also push the technical boundaries of what can be achieved with smaller datasets that don’t require a nuclear power plant to train new models. But I think they mask the underlying issue: AI needs data and limiting open datasets to only public domain sources will never give them a chance to match the size of the proprietary ones. The lobby for copyright maximalists is always looking for ways to expand scope and extend terms for copyright laws, and when they succeed it is a one-way ratchet. It would be a tragedy for society if legislators listened to their sophistry and made new laws doing this based on the apparent consensus that creators need protection from AI.
The role of data for training machine learning systems is a divisive topic and a complex one. Having datasets like Common Corpus is a very useful way for the science of AI to progress with better sources. For policies, we’d be better off pushing for something like the proposal advanced by Open Future and Creative Commons in their paper Towards a Books Data Commons for AI Training.

Categories: FLOSS Research

Open Source AI Definition – Weekly update May 6

Mon, 2024-05-06 12:02
Definition validation: Seeking volunteers

The process has entered a new phase: We are now seeking volunteers to validate the Open Source AI Definition, using it to review existing AI systems. The objective of the phase is to confirm that the Definition works as intended and understand where it fails.  

  • A spreadsheet is given where you locate and link to the license, research paper, or other document that grants rights or provides information for each required component. 
  • Systems include, but are not limited to:
    • Arctic
    • BLOOM
    • Falcon
    • Grok
    • Llama 2
    • Mistral
    • OLMo
    • OpenCV
    • Phi-2
    • Pythia
    • T5
  • To volunteer by May 20th, please contact Mer on the forum
Summary of comments received on the Definition draft
  • Grammatical and wording corrections 
    • Some minor grammatical suggestions were made. These change and order the layout slightly differently, though the overall message remains. 
    • One user suggested to explain what Open Source is under the “preamble” and “Why we need open source AI”. Instead of speaking about why Open Source is important, the section should rather be an introduction to what it is and why it matters for AI.
    • Under “Preferred form to make modifications to machine-learning systems” and “data information”, clarification is needed regarding “the training data set used”. It is not clear whether this means that all training data must be open source for the whole model to be.
      • Stefano Maffulli added here that the intention is to know what dataset was used, not to necessarily have it made available, and that it indeed seems to need clarification
  • Technical points
    • Under “Preferred form to make modifications to machine-learning systems” the release of checkpoints is mentioned as an example of required components, under “model parameters”. An objection was raised, arguing that this poses an unnecessary burden: It’d be like requiring that for software to be Open Source, it should include past versions of the program.
      • Maffulli reiterated that this was merely an example but that this might need to be a submission to the FAQ page
    • “Preferred form to make modifications to machine-learning systems” and “data information”, a “skilled person” is mentioned in the context of requiring sufficient information about the training data used to create a model. Question regarding why skill has to do with acquiring data
      • Clarification was given by Maffulli, pointing out that this is in the context of getting information about the data so that a “skilled person” can use, study, share and modify the AI system.
      • A user suggested that this confusion can be solved by changing the context of the wording “a skilled person can recreate”. From “using the same or similar data” to “if able to gain access to the same or similar data”.
      • A user points out that “skilled person” as a legal term used in patent law might not be appropriate as it has different legal connotations and precedence in different countries.
  • Discussion on why specifically we focus on machine learning (ML) as an AI system
    • A question was raised regarding why we explicitly mention ML systems under “preferred form to make modification to an ML system” and subsequently the “checklist”, pointing out that not all AI systems are ML.
      • Maffulli replied that we address ML as they need special and urgent attention as rule-based AI systems can fit under the open source definition. This needs to be addressed in the FAQ
Town hall announcement 
  • The 9th town hall meeting was held on the 3d of May. Access the recording here if you missed it!
Categories: FLOSS Research

CRA standards request draft published

Thu, 2024-05-02 08:19

The European Commission recently published a public draft of the standards request associated with the Cyber Resilience Act (CRA). Anyone who wants to comment on it has until May 16, after which comments will be considered and a final request to the European Standards Organizations (ESOs) will be issued. This process is all governed by regulation 2012/1025, which will be discussed in a future post.

The publication of this draft is important for every entity that will have duties under the CRA, namely “manufacturers” and “software stewards.” Conformance with the harmonized standards that emerge from this process will allow manufacturers to CE-mark their software on the presumption it complies with the requirements of the CRA, without taking further steps.

For those who depend on incorporating or creating Open Source software, there is an encouraging new development found here. For the first time in a European standards request, there is an express requirement to respect the needs of Open Source developers and users. Recital 10 tells each standards organization the following:

“where relevant, particular account should be given to the needs of the free and open source software community”

That is made concrete in Article 2 which specifies:

“The work programme shall also include the actions to be undertaken to ensure effective participation of relevant stakeholders, such as small and medium enterprises and civil society organizations, including specifically the open source community where relevant”

Article 3 requires proof that effective participation has been facilitated. The community is going to have to step up to help the ESOs satisfy these requirements—or corporations claiming to speak for the community will do it instead.

OSI applauds the Commission’s steps to include the Open Source community and will be pleased to work with the European standards organizations towards that initial goal of effective representation and consultation. Additionally, the OSI will:

  • Work with our Affiliates to identify additional suitable participants with relevant skills and experience, and make connections between them and the ESOs.
  • Assist the Commission in validating responses to Article 3.

Our goal is to ensure that the development and use of Open Source software is at best facilitated and at worst not obstructed by any aspect of the standards development process, the resulting harmonized standards, and the access and IPR terms of those standards.

This post may be discussed on our forum

Categories: FLOSS Research

Open Source AI Definition – Weekly update April 29

Mon, 2024-04-29 07:59
New draft of the Open Source AI Definition v.0.0.8 is live!
  • The draft is ready for feedback
  • The changelog: 
    • incorporated feedback from legal review in Gothenburg and 0.0.7
      • transformed Data transparency to Data information following feedback from the
      • separated the Out of scope section to a FAQ document 18
      • added mention of frictionless in the preamble
      • moved the definition of preferred form to make modifications to ML above the checklist
    • updated language to follow the latest version of the Model Openness Framework
    • added the legal requirements for optional components
    • the first incarnation of the FAQ added
  • The next steps now include:
    • Widen the call for reviewers in the next couple of weeks
    • Test the Definition with more AI systems (Olmo, Phy, Mistral, etc)
    • Run a review workshop at PyCon US
Initial reactions 
  • Question regarding why under “Preferred form to make modifications to machine-learning systems” and “model”, mention of model weights has been removed. 
Vote on how to describe the acceptable terms to receive documentation?
  • As part of the next steps, we are continuing to review legal documents from different AI systems to test our definition. Should we describe the terms listed on the 0.0.8 draft under “checklist to evaluate machine learning systems”, should we consider them OSD Compliant or OSD Compatible?
    • This matters as it has different implications for documentation for the components in the class of Data transparency: There is no formal definition of “open documentation” and the OSI hasn’t reviewed licenses used for documentation.
  • A user has concerns with both, stating that:
    • OSD-compliant means that documentation need to be under a license that fulfils all ten OSD criteria, and many of those are quite software-specific. This could be tricky, there is a reason why OSI hasn’t approved (m)any non-software licenses thus its meaning. Many proprietary licenses are compatible with many (non-copyleft) OSD-compliant licenses, that It can lose its meaning.
  • Maffulli replies stating that:
    • The main difference he sees lie in their perceived legal strictness, where “Compatible suggests a lightweight review that anyone can do”
    • He further suggests that OSI could create a special category of licenses for documentation only. When stating that documentation of Open Source AI needs to be available with OSD-compliant terms, do we need to create a special category of OSI Approved Licenses for documentation?
    • He further adds that he reads “compliant”, not in terms of existing licenes but rather in terms of the checklist
  • Regarding creating a “special category of license for documentation only, a user adds:
    • “We need that the documentation is free from restrictions that would limits its circulation, including by requiring seeking additional permission or requiring royalties or requiring audited distribution or the likes.” and its scope therefore is quite limited.
FAQ document has been created 
  • An FAQ needs to be written to address concerns heard often during the drafting process. The document is a work in progress and is waiting for contributions.
See if OSI is coming near you to host a workshop
  • The Open Source AI Definition is going on tour to get a wide array of reviews. This is important to ensure through reviews and secure global significance. Check the dates of the roadshow.
Categories: FLOSS Research

Openly Shared

Fri, 2024-04-26 08:02

The definition of “open source” in the most recent version (article 2(48)) of the Cyber Resilience Act (CRA) goes beyond the Open Source Definition (OSD) managed by OSI. It says:

“Free and open-source software is understood as software the source code of which is openly shared and the license of which provides for all rights to make it freely accessible, usable, modifiable and redistributable.”

The addition of “openly shared” was a considered and intentional addition by the co-legislators – they even checked with community members that it did not cause unintended effects before adding it. While open source communities all “openly share” the source code of their projects, the same is not true of some companies, especially those with “open core” business models.

For historical reasons, it is not a requirement either of the OSD or of the FSF’s Free Software Definition (FSD) and the most popular open source licenses do not require it. Notably, the GPL does not insist that source code be made public – only that those receiving the binaries must be able to request the corresponding source code and enjoy it however they wish (including making it public).

For most open source projects and their uses, the CRA’s extra requirement will make no difference. But it complicates matters for companies that either restrict source availability to paying customers (such as Red Hat) or make little distinction between available and non-available source (such as ForgeRock) or withhold source to certain premium elements.

A similar construct{1} is used in the AI Act (recital 102) and I anticipate this trend will continue through other future legislation. Personally I welcome this additional impetus to openness.

This post may be discussed on our forum

{1} The mention in the AI Act has a different character to that in the CRA. In the AI Act it is more narrative, restricted to a recital and is a subset of attributes of the license. In this form it actually refers to virtually no OSI-approved licenses. In the CRA the wording part of the formal definition in an Article, so much more impactful, and adds an additional requirement over the basic requirements of licensing.

Categories: FLOSS Research

Open Source AI Definition on the road: Looking back and forward

Tue, 2024-04-23 13:15

With version 0.0.7 of the Open Source AI Definition just published, we are getting very close to a release candidate version in June, as planned. We’ve covered a lot of ground since FOSDEM 2024, where we presented draft 0.0.4. This month we presented at Open Source Summit North America (OSS NA 24) and ran a co-design workshop at the Legal and Licensing Workshop (LLW) in Gothenburg. We’re very close to a “feature complete”: below are the next steps and ideas on how you might get involved.

Opportunities to meet in person

We are taking the draft definition on the road and coming to a town near you! Or, kind of, that is if you live in any of the following cities or happen to be there on the given dates:

  1. North America 
    1. USA, Pittsburgh, PyCon US (May 17)
    2. USA, NYC OSPOs for Good (July 9-11)
    3. USA, Raleigh, All Things Open (October 27-29)
  2. Europe
    1. France, Paris, OW2 (June)
    2. France, Paris, data governance event (September)
  3. Africa
    1. Nigeria, Lagos, Sustain Africa (June)
  4. Latin America
    1. Argentina, Buenos Aires, Nerdearla (September 24-28)
  5. Asia Pacific
    1. Hong Kong, AI_Dev (August 23)

It’s important for you to catch up.

Draft v.0.0.5 at FOSDEM 2024

The talk “Moving a Step Closer to Defining Open Source AI” (click here to watch the recorded live stream) by Stefano Maffulli presented draft v.0.0.5, released a few days before. The process at the time was focusing on finding the required components to “use, study, share and modify” an AI system. 

Maffulli quickly summarized why OSI started the Deep Dive: AI process, after Copilot not only demonstrated machines’ ability to write functioning code but also highlighted the new role of data as input to the machine learning system. Recognizing there is no simple answer to the question “what is the source code of Copilot?” Maffulli focused OSI’s attention to finding the Open Source principles applied to AI together with stakeholders from academia, legal communities, tech companies, and civil rights groups.

Building the framework

OSI defined a process to co-design the Open Source AI Definition in public. This framework encompasses a clear definition of AI systems, a preamble outlining the rationale behind open source AI, a concise articulation of the freedoms users should enjoy, and a checklist for evaluating AI components and associated legal documents.

He highlighted the rapid progress and policy decisions that shaped the trajectory of software development, emphasizing the need to compress decades of evolution into a few months in the realm of AI. Stefano emphasized the importance of community feedback and collaboration in refining the definition of Open Source AI. With monthly draft releases, bi-weekly town halls, and an active forum, we gather diverse perspectives and insights to craft a robust definition.

OSS North America 2024 and next steps

Since FOSDEM, the Definition has reached version 0.0.7. First, working groups analyzed Pythia, OpenCV, Llama2 and Bloom  to find the preferred form of making modifications to the AI system, the fundamental unit for users to exercise their freedoms. Later, the groups shifted focus to reviewing the legal frameworks used by the components used by Pythia, OpenCV, Llama2 and Bloom. Together with the definition of AI system provided by the OECD, the preamble, out-of-scope issues and four freedoms, this draft looks very close to a full document. A new version is expected to be released very soon now. On the 16th of April, Ofer Hermoni of the Linux Foundation and Mer Joyce (OSI/DoBigGood) presented the work at the OSS NA 24 meeting in Seattle. A huge part of our job currently is getting this definition reviewed by as many stakeholders as possible. A far-reaching and diverse perspective is necessary as we aim for a global impact. 
To participate in shaping the definition of Open Source AI and stay updated on the latest developments, visit and engage with the ongoing discussions, participate and watch previous town hall meetings and draft releases. Go to to participate in our forum.

Categories: FLOSS Research

Open Source AI Definition – Weekly update April 22

Mon, 2024-04-22 10:42
Comments on the forum
  • A user added in the forum that there is an issue as traditional copyright protection might not apply to weight models because they are essentially mathematical calculations. “ licensing them through any kind of copyright license will not be enforceable !! and this means that anybody can use them without any copyright restriction (assuming that they have been made public) and this means that you cannot enforce any kind of provisions such as attribution, no warranty or copyleft” They suggest using contractual terms instead of relying on copyright as a workaround, acknowledgement that this will trigger a larger conversation
Comments left on the definition text
  • Clarification needed under “What is Open Source AI”
  1. Discussion on whether “made available” should be changed to “released” or “distributed”
    1. One user pointed out that “made available” is the most appropriate, as the suggested wordings would be antagonistic and limiting
  2. Continuation of last week’s issue regarding defining who these four freedoms are for, deployers, users or someone else.
    1. Added that a user understands it as “We need essential freedoms to enable users…”
    2. But, then who are we defining as “Users”? Is it the person deploying the AI or the calling prompt?
    3. Another wording is suggested: “Open Source AI is an AI system that is made available under terms that grant, without conditions or restrictions, the rights to…”
  • Clarification is needed under “Preferred form to make modification to a machine learning system”, 
  1. Specifically to the claim: (The following components are not required,) but their inclusion in releases is appreciated.
    1. Clarification regarding whether this means best practice or it’s a mere a suggestion.
    2. Suggestion to change the sentence to “The following components are not required to meet the Open Source AI definition and may be provided for convenience.” This will also “consider if those components are provided, can they be provided under different terms that don’t meet the Open Source AI definition, or do they fall under the same OSI compliant license automatically. “
  2. Question regarding the addition of “may” under data transparency in the 0.0.7 draft definition, which was not included in the 0.0.6 one, considering that the components are described as “required” in the checklist below
    1. (Context: “Sufficiently detailed information on how the system was trained. This may include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics; how the data was obtained and selected, the labelling procedures and data cleaning methodologies.”)
    2. Another user seconds this and further adds that it should be changed to “must”, or something else which is definitive.
Town Hall meeting was held on April 19th

In case you missed it, the with town hall was held last Friday. Access the recordings and slides used here

Categories: FLOSS Research

Open Source AI Definition – Weekly update April 15

Mon, 2024-04-15 12:40

Having just exited a very busy week here are the two major milestones to know about.

Definition v.0.0.7 is out!
  • Access the definition here and the discussion of it here
  • The changelog:
    • Incorporating the comments to draft v.0.0.6 and results of the working group analysis
    • Removed reference to “the public” in the four freedoms, left the object (users) implied
    • Removed reference to ML systems following the text “Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system”
    • Separated the ‘checklist’ and made it specific to ML systems, based on the Model Openness Framework
    • Described in highly generic terms the conditions to access the model parameters
  • A concern was raised regarding the checklist making training data optional, potentially undermining the freedom to modify AI systems. This echoes previous debates we have had and likely will continue to have, regarding access to training data.
  • Discussion on the need to clarify licensing terms to ensure compliance with Open Source principles, suggesting a change to “Available under terms that satisfy the Open Source principles”.
    • Proposal to consider the Open Source Definition itself as a checklist and cautious approach suggested before dictating specific requirements for legal documents.
  • A comment on the definition rather than the forum clarified that there needs to determine whether the freedoms outlined in the Open Source AI Definition should be granted to the deployer or the end user, considering differing access levels and implications for openness
The results of the working groups are out!
  • Four different working groups connected with four different AI systems (Llama-2, Pythia, Bloom and OpenCV) have been reviewing legal document and comparing them to the previous 0.0.6 checklist on the Open Source AI Definition
  • The goal was to see how well the documents align with the components as described in the checklist.
  • Go here to see the updated checklist
  • The changes can be described as follows:
    • Added legal framework for model parameters (including weights). The framework proposes that, if copyrightable, model parameters can be shared as code

Added the five (5) data transparency components from v.0.0.6 to the checklist under the category “Documentation,” along with legal frameworks

Categories: FLOSS Research

Submit your proposal for All Things Open – Doing Business with Open Source

Tue, 2024-04-09 09:28

The supply-side value of widely-used Open Source software is estimated to be worth $4.15 billion, and the demand-side value is much larger, at $8.8 trillion. And yet, maintaining a healthy business while producing Open Source software feels more like an art than a science.

The Open Source Initiative wants to facilitate discussions about doing business with and for Open Source.

If you run a business producing Open Source products or your company’s revenue depends on Open Source in any way, we want to hear from you! Share your insights on:

  • How you balance the needs of paying customers with those of partners and non-paying users
  • How you organize your sales, marketing, product and engineering teams to deal with your communities
  • What makes you decide where to draw the lines between pushing fixes upstream and maintaining a private fork
  • Where do you see the value of copyleft in software-as-a-service
  • Why you chose a specific license for your product offering and how do you deal with external contributions
  • What trends do you see in the ecosystem and what effects are these having

We want to hear about these and other topics, from personal experiences and research. Our hope is to provide the ecosystem with accessible resources to better understand the problem space and find solutions.

How it works

We’re tired of panel discussions that start and end at a conference. We want to share knowledge to the widest possible base. We’re going to have a panel at All Things Open, with preparation work before the event.

  • You’ll send your proposals as pitches to, a title and abstract (300 words max) and a short bio.
  • Our staff will review the pitches and get back to you, selecting as many articles as deemed interesting for publication.
  • We’ll also pick the authors of five of the most interesting articles to be speakers at a panel discussion at ATO, on October 29 in Raleigh, NC. Full conference passes will be offered. 
  • Authors of accepted pitches to write a full article (1,200-1,500 words) to be published leading up to ATO.
  • We’ll also select other pitches worth developing into full-length articles but, for any reason, didn’t fit into the panel discussion.

Note: Please read and follow the guidelines carefully before submitting your proposal.

Submission Requirements
  • Applications should be submitted via web form
  • Add a title and a pitch, 300 words maximum
  • Include a brief bio, highlighting why you’re the right person to write about this topic
  • Submissions should be well-structured, clear and concise
Evaluation Criteria
  • Relevance to the topic
  • Originality and uniqueness of the submission
  • Clarity and coherence of argumentation
  • Quality of examples and case studies
  • Presenter’s expertise and track record in the field
  • Although the use of generative AI is permitted, pitches evidently written by AI won’t be considered
  • Submission deadline: May 17, 2024
  • Notification of acceptance: May 30, 2024
  • Accepted authors must submit their full article by June 30, 2024
  • Articles will be published between July 8 and October 10, 2024
  • The authors of the selected articles will be invited to join a panel by July 20, 2024
  • Event dates: Oct 28, 29, 2024
What to Expect
  • Your submission will be reviewed by a panel of experts in the field
  • If accepted, you will be asked to produce a full article that will be published at

We look forward to receiving your submission!

Follow The Open Source Initiative:

Categories: FLOSS Research

Compelling responses to NTIA’s AI Open Model Weights RFC

Tue, 2024-04-09 08:03

The National Telecommunications and Information Administration (NTIA) posted a request for comments on Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights, and it has received 362 comments.

In addition to the Open Source Initiative’s (OSI) joint letter drafted by Mozilla and the Center for Democracy and Technology (CDT), the OSI has also sent a letter of its own, highlighting our multi-stakeholder process to create a unified, recognized definition of Open Source AI.

The following is a list of some comments from nonprofit organizations and companies.

Comments from additional nonprofit organizations
  • Researchers from Stanford University’s Human-centered AI (HAI) and Princeton University recommend that the federal government prioritize understanding of the marginal risk of open foundational models when compared to proprietary, creating policies based on this marginal risk. Their response also highlighted several unique benefits from open foundational models, including higher innovation, transparency, diversification, and competitiveness.
  • Wikimedia Foundation recommends that regulatory approaches should support and encourage the development of beneficial uses of open technologies rather than depending on more closed systems to mitigate risks. Wikimedia believes open and widely available AI models, along with the necessary infrastructure to deploy them, could be an equalizing force for many jurisdictions around the world by mitigating historical disadvantages in the ability to access, learn from, and use knowledge.
  • EleutherAI Institute recommends Open Source AI and warns that restrictions on open-weight models are a costly intervention with comparatively little benefit. EleutherAI believes that open models enable people close to the deployment context to have greater control over the capabilities and usage restrictions of their models, study the internal behavior of models during deployment, and examine the training process and especially training data for signs that a model is unsafe to deploy in a specific use-case. They also lower barriers of entry by making models cheaper to run and enable users whose use-cases require strict guarding of privacy (e.g., medicine, government benefits, personal financial information) to use.
  • MLCommons recommends the use of standardized benchmarks, which will be a critical component for mitigating the risk of models both with and without widely available open weights. MLCommons believes models with widely available open weights allow the entire AI safety community – including auditors, regulators, civil society, users of AI systems, and developers of AI systems – to engage with the benchmark development process. Together with open data and model code, open weights enable the community to clearly and completely understand what a given safety benchmark is measuring, eliminating any confounding opacity around how a model was trained or optimized.
  • The AI Alliance recommends regulation shaped by independent, evidence-based research on reliable methods of assessing the marginal risks posed by open foundation models; effective risk management frameworks for the responsible development of open foundation models; and balancing regulation with the benefits that open foundation models offer for expanding access to the technology and catalyzing economic growth.
  • The Alliance for Trust in AI recommends that regulation should protect the many benefits of increasing access to AI models and tools. The Alliance of Trust in AI believes that openness should not be artificially restricted based on a misplaced belief that this will decrease risk.
  • Access Now recommends NTIA to think broadly about how developments in AI are reshaping or consolidating corporate power, especially with regard to ‘Big Tech.’ Access Now believes in the development and use of AI systems in a sustainable, resource-friendly way that considers the impact of models on marginalized communities and how those communities intersect with the Global South.
  • Partnership on AI (PAI) recommends NTIA’s work should be informed by the following principles: all foundation models need risk mitigations; appropriate risk mitigations will vary depending on model characteristics; risk mitigation measures, for either open or closed models, should be proportionate to risk; and voluntary frameworks are part of the solution.
  • R Street recommends pragmatic steps towards AI safety, relying on multistakeholder processes to address problems in a more flexible, agile, and iterative fashion. The government should not impose arbitrary limitations on the power of Open Source AI systems, which could result in a net loss of competitive advantage.
  • The Computer and Communications Industry Association (CCIA) recommends assessment based on the risks, highlighting that open models provide the potential for better security, less bias, and lower costs to AI developers and users alike. The CCIA acknowledged that the vast majority of Americans already use systems based on Open Source software (knowingly or unknowingly) on a daily basis.
  • The Information Technology Industry Council (ITI) recommends adopting a risk-based approach with respect to open foundation models, since not all models pose an equivalent degree of risk, and that the risk management is a shared responsibility across the AI value chain.
  • The Center for Data Innovation recommends that U.S. policymakers defend open AI models at the international level as part of its continued embrace of the global free flow of data. It also encourages them to learn lessons from past debates about dual-use technologies, such as encryption, and refrain from imposing restrictions on foundation models because such policies would not only be ultimately ineffective at addressing risk, but they would slow innovation, reduce competition, and decrease U.S. competitiveness.
  • The International Center for Law & Economics recommends that AI regulation must be grounded in empirical evidence and data-driven decision making. Demanding a solid evidentiary basis as a threshold for intervention would help policymakers to avoid the pitfalls of reacting to sensationalized or unfounded AI fears.
  • New America’s Open Technology Institute (OTI) recommends a coordinated interagency approach designed to ensure that the vast potential benefits of a flourishing open model ecosystem serve American interests, in order to counter or at least offset the trend toward dominant closed AI systems and continued concentrations of power in the hands of a few companies.
  • Electronic Privacy Information Center (EPIC) recommends NTIA to grapple with the nuanced advantages, disadvantages, and regulatory hurdles that emerge within AI models along the entire gradient of openness, highlighting that AI models with weights widely available may foster more independent evaluation of AI systems and greater competition compared to closed systems.
  • The Software & Information Industry Association (SIIA) recommends a risk-based approach to foundation models that considers the degree and type of openness. SIIA believes openness has already proved to be a catalyst for research and innovation by essentially democratizing access to models that are cost-prohibitive for many actors in the AI ecosystem to develop on their own.
  • The Future Society recommends that the government should establish risk categories (i.e., designations of “high-risk” or “unacceptable-risk”), thresholds, and risk-mitigation measures that correspond to evaluation outcomes. The Future Society is concerned that overly restrictive policies could lead to market concentration, hindering competition and innovation in both industry and academia. A lack of competition in the AI market can have far-reaching knock-on consequences, including potentially stifling efforts to improve transparency, safety, and accountability in the industry. This, in turn, can impair the ability to monitor and mitigate the risks associated with dual-use foundation models and to develop evidence-based policymaking.
  • The Software Alliance (BSA) recommends NTIA to avoid restricting the availability of open foundation models; ground policies that address risks of open foundation models on empirical evidence; and encourage the implementation of safeguards to enhance the safety of open foundation models. BSA recognizes the substantial benefits that open foundation models provide to both consumers and businesses.
  • The US Chamber of Commerce recommends NTIA to make decisions based on sound science and not unsubstantiated concerns that open models pose an increased risk to society. The US Chamber of Commerce believes that Open-source technology allows developers to build, create, and innovate in various areas that will drive future economic growth.
Comments from companies
  • Meta recommends NTIA to establish common standards for risk assessments, benchmarks and evaluations informed by science, noting that the U.S. national interest is served by the broad availability of U.S.-developed open foundation models. Meta highlighted that Open source democratizes access to the benefits of AI, and that these benefits are potentially profound for the U.S., and for societies around the world. 
  • Google recommends a rigorous and holistic assessment of the technology to evaluate benefits and risks. Google believes that Open models allow users across the world, including in emerging markets, to experiment and develop new applications, lowering barriers to entry and making it easier for organizations of all sizes to compete and innovate.
  • IBM recommends preserving and prioritizing the critical benefits of open innovation ecosystems for AI for increasing AI safety, advancing national competitiveness, and promoting democratization and transparency of this technology. 
  • Intel recommends accountability for responsible design and implementation to help mitigate potential individual and societal harm. This includes establishing robust security protocols and standards to identify, address, and report potential vulnerabilities. Intel believes openness not only allows for faster advancement of technology and innovation, but also faster, transparent discovery of potential harms and community remediation and address. Intel also believes that Open AI development is essential to facilitate innovation and equitable access to AI, as open innovation, open platforms, and horizontal competition help offer choice and build trust. 
  • Stability AI recommends that regulation must support a diverse AI ecosystem – from the large firms building closed products to the everyday developers using, refining, and sharing open technology. Stability AI recognizes that Open models promote transparency, security, privacy, accessibility, competition, and grassroots innovation in AI.
  • Hugging Face recommends establishing standards for best practices building on existing work and prioritizing requirements of safety by design across both the AI development chain and its deployment environments. Hugging Face believes that open-weight models contribute to competition, innovation, and broad understanding of AI systems to support effective and reliable development.
  • GitHub recommends regulatory risk assessment should weigh empirical evidence of possible harm against the benefits of widely available model weights. GitHub believes Open source and widely available AI models support research on AI development and safety, as well as the use of AI tools in research across disciplines. To-date, researchers have credited these models with supporting work to advance the interpretability, safety, and security of AI models; to advance the efficiency of AI models enabling them to use less resources and run on more accessible hardware; and to advance participatory, community-based ways of building and governing AI.
  • Microsoft recommends cultivating a healthy and responsible open source AI ecosystem and ensuring that policies foster innovation and research. This will be achieved through direct engagement with open source communities to understand the impact of policy interventions on them and, as needed, calibrations to address risks of concern while also minimizing negative impacts on innovation and research.
  • Y Combinator recommends NTIA and all stakeholders to realize the immense promise of open-weight AI models while ensuring this technology develops in alignment with our values. Y Combinator believes the degree of openness of AI models is a crucial factor shaping the trajectory of this transformative technology. Highly open models, with weights accessible to a broad range of developers, offer unparalleled opportunities to democratize AI capabilities and promote innovation across domains. Y Combinator has seen firsthand the incredible progress driven by open models, with a growing number of startups harnessing these powerful tools to pioneer groundbreaking applications. 
  • AH Capital Management, L.L.C. (a16z) recommends NTIA to be wary of generalized claims about the risks of Open Models and calls to treat them differently from Closed Models, especially those made by AI companies seeking to insulate themselves from market competition. a16z believes Open Models promote innovation, reduce barriers to entry, protect against bias, and allow such models to leverage and benefit from the collective expertise of the broader artificial intelligence (“AI”) community. 
  • Uber recommends promoting widely available model weights to spur innovation in the field of AI. Uber believes that, by democratizing access to foundational AI models, innovators from diverse backgrounds can build upon existing frameworks, accelerating the pace of technological advancement and increasing competition in the space. Uber also believes widely available model weights, source code, and data are necessary to foster accountability, facilitate collaboration in risk mitigation, and promote ethical and responsible AI development.
  • Databricks recommends regulation of highly capable AI models should focus on consumer-facing deployments and high risk deployments, with the obligations focused on the deployer. Databricks believes that the benefits of open models substantially outweigh the marginal risks, so open weights should be allowed, even at the frontier level.
Categories: FLOSS Research

Open Source AI Definition – Weekly update April 8

Mon, 2024-04-08 13:15
Seeking document reviewers for OpenCV
  • This is your final opportunity to register for the review of licenses provided by OpenCV. Join us for our upcoming phase, where we meticulously compare various systems’ documentation against our latest definition to test compatibility.
    • For more information, check the forum
Action on the 0.0.6 draft 
  • Under “The following components are not required, but their inclusion in public releases is appreciated”, a user highlighted that model cards should be a required open component, as its purpose is to promote transparency and accountability
  • Under “What is Open Source AI?”, a user raises a concern regarding “made available to the public”, stating that software carries an Open Source license, even if a copy was only made available to a single person.
    • This will be considered for the next draft
Open Source AI Definition Town Hall – April 5, 2024

Access the slides and the recording of the previous town hall meeting here.

Categories: FLOSS Research

OSI participates in Columbia Convening on openness and AI; first readouts available

Thu, 2024-04-04 09:47

I was invited to join Mozilla and the Columbia Institute of Global Politics in an effort that explores what “open” should mean in the AI era. A cohort of 40 leading scholars and practitioners from Open Source AI startups and companies, non-profit AI labs, and civil society organizations came together on February 29 at the Columbia Convening to collaborate on ways to strengthen and leverage openness for the good of all. We believe openness can and must play a key role in the future of AI. The Columbia Convening took an important step toward developing a framework for openness in AI with the hope that open approaches can have a significant impact on AI, just as Open Source software did in the early days of the internet and World Wide Web. 

This effort is aligned and contributes valuable knowledge to the ongoing process to find the Open Source AI Definition

As a result of this first meeting of Columbia Convening, two readouts have been published; a technical memorandum for technical leaders and practitioners who are shaping the future of AI, and a policy memorandum for policymakers with a focus on openness in AI.

Technical readout

The Columbia Convening on Openness and AI Technical Readout was edited by Nik Marda with review contributions from myself, Deval Pandya, Irene Solaiman, and Victor Storchan.

The technical readout highlighted the challenges of understanding openness in AI. Approaches to openness are falling under three categories: gradient/spectrum, criteria scoring, and binary. The OSI is championing a binary approach to openness, where AI systems are either “open” or “closed” based on whether they meet a certain set of criteria.

The technical readout also provided a diagram that shows how the AI stack may be described by the different dimensions (AI artifacts, documentation, and distribution) of its various components and subcomponents.

Policy readout

The Columbia Convening on Openness and AI Policy Readout was edited by Udbhav Tiwari with review contributions from Kevin Klyman, Madhulika Srikumar, and myself.

The policy readout highlighted the benefits of openness, including:

  • Enhancing reproducible research and promoting innovation
  • Creating an open ecosystem of developers and makers
  • Promoting inclusion through open development culture and models
  • Facilitating accountability and supporting bias research
  • Fostering security through widespread scrutiny
  • Reducing costs and avoiding vendor lock-In
  • Equipping supervisory authorities with necessary tools
  • Making training and inference more resource-efficient, reducing environmental harm
  • Ensuring competition and dynamism
  • Providing recourse in decision-making

The policy readout also showcased a table with the potential benefits and drawbacks of each component of the AI stack, including the code, datasets, model weights, documentation, distribution, and guardrails.

Finally, the policy readout provided some policy recommendations:

  • Include standardized definitions of openness as part of AI standards
  • Promote agency, transparency and accountability
  • Facilitate innovation and mitigate monopolistic practices
  • Expand access to computational resources
  • Mandate risk assessment and management for certain AI applications
  • Hold independent audits and red teaming
  • Update privacy legislation to specifically address AI challenges
  • Updated legal framework to distinguish the responsibilities of different actors
  • Nurture AI research and development grounded in openness
  • Invest in education and specialized training programs
  • Adapt IP laws to support open licensing models
  • Engage the general public and stakeholders

You can follow along with the work of Columbia Convening at and the work from the Open Source Initiative on the definition of Open Source AI at

Categories: FLOSS Research

OSI’s Response to NTIA ‘Dual Use’ RFC 3.27.2024

Tue, 2024-04-02 14:00

March 27, 2024

Mr. Bertram Lee
National Telecommunications and Information Administration (NTIA)
U.S. Department of Commerce
1401 Constitution Avenue NW
Washington, DC 20230

RE: [Docket Number 240216-0052] Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights

Dear Mr. Lee:

The Open Source Initiative (“OSI”) appreciates the opportunity to provide our views on the above referenced matter. As steward of the Open Source Definition, the OSI sets the foundation for Open Source software, a global public good that plays a vital role in the economy and is foundational for most technology we use today. As the leading voice on the policies and principles of Open Source, the OSI helps build a world where the freedoms and opportunities of Open Source software can be enjoyed by all and supports institutions and individuals working together to create communities of practice in which the healthy Open Source ecosystem thrives. One of the most important activities of the OSI, a California public benefit 501(c)(3) organization founded in 1998, is to maintain the Open Source Definition for the good of the community.

The OSI is encouraged by the work of NTIA to bring stakeholders together to understand the lessons from the Open Source software experience in having a recognized, unified Open Source Definition that enables an ecosystem whose value is estimated to be worth $8.8 trillion. As provided below in more detail, it is essential that federal policymakers encourage Open Source AI models to the greatest extent possible, and work with organizations like the OSI which is endeavoring to create a unified, recognized definition of Open Source AI.

The Power of Open Source

Open Source delivers autonomy and personal agency to software users which enables a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of Open Source is higher quality, better reliability, greater flexibility, lower cost, and an end to proprietary lock-in.

Open Source software is widely used across the federal government and in every critical infrastructure sector. “The Federal Government recognizes the immense benefits of Open Source software, which enables software development at an incredible pace and fosters significant innovation and collaboration.” For the last two decades, authoritative direction and educational resources have been given to agencies on the use, management and benefits of Open Source software.

Moreover, Open Source software has direct economic and societal benefits. Open Source software empowers companies to develop, test and deploy services, thereby substantiating market demand and economic viability. By leveraging Open Source, companies can accelerate their progress and focus on innovation. Many of the essential services and technologies of our society and economy are powered by Open Source software, including, e.g., the Internet.

The Open Source Definition has demonstrated that massive social benefits accrue when the barriers to learning, using, sharing and improving software systems are removed. The core criteria of the Open Source Definition – free redistribution; source code; derived works; integrity of the author’s source code; no discrimination against persons or groups; no discrimination against fields of endeavor; distribution of license; license must not be specific to a product; license must not restrict other software; license must be technology-neutral – have given users agency, control and self-sovereignty of their technical choices and a dynamic ecosystem based on permissionless innovation.

A recent study published by the European Commission estimated that companies located in the European Union invested around €1 billion in Open Source Software in 2018, which brought about a positive impact on the European economy of between €65 and €95 billion.

This success and the potency of Open Source software has for the last three decades relied upon the recognized unified definition of Open Source software and the list of Approved Licenses that the Open Source Initiative maintains.

OSI believes this “open” analog is highly relevant to Open Source AI as an emerging technology domain with tremendous potential for public benefit.

Distinguishing the Open Source Definition

The OSI Approved License trademark and program creates a nexus of trust around which developers, users, corporations and governments can organize cooperation on Open Source software. However, it is generally agreed that the Open Source Definition, drafted 26 years ago and maintained by the OSI, does not cover this new era of AI systems.

AI models are not just code; they are trained on massive datasets, deployed on intricate computing infrastructure, and accessed through diverse interfaces and modalities. With traditional software, there was a very clear separation between the code one wrote, the compiler one used, the binary it produced, and what license they had. However, for AI models, many components collectively influence the functioning of the system, including the algorithms, code, hardware, and datasets used for training and testing. The very notion of modifying the source code (which is important in the Open Source Definition) becomes fuzzy. For example, there is the key question of whether the training dataset, the model weights, or other key elements should be considered independently or collectively as the source code for the model/weights that have been trained.

AI (specifically the Models that it manifests) include a variety of technologies, each is a vital element to all Models.

This challenge is not new. In its guidance on use of Open Source software, the US Department of Defense distinguished open systems from open standards, that while “different from Open Source software, they are complementary and can work well together”:

Open standards make it easier for users to (later) adopt an Open Source software
program, because users of open standards aren’t locked into a particular
implementation. Instead, users who are careful to use open standards can easily
switch to a different implementation, including an OSS implementation. … Open
standards also make it easier for OSS developers to create their projects, because
the standard itself helps developers know what to do. Creating any interface is an
effort, and having a predefined standard helps reduce that effort greatly.

OSS implementations can help create and keep open standards open. An OSS
implementation can be read and modified by anyone; such implementations can
quickly become a working reference model (a “sample implementation” or an
“executable specification”) that demonstrates what the specification means
(clarifying the specification) and demonstrating how to actually implement it.
Perhaps more importantly, by forcing there to be an implementation that others can
examine in detail, resulting in better specifications that are more likely to be used.

OSS implementations can help rapidly increase adoption/use of the open standard.
OSS programs can typically be simply downloaded and tried out, making it much
easier for people to try it out and encouraging widespread use. This also pressures
proprietary implementations to limit their prices, and such lower prices for
proprietary software also encourages use of the standard.

With practically no exceptions, successful open standards for software have OSS

Towards a Unified Vision of what is ‘Open Source AI’

With these essential differentiating elements in mind, last summer, the OSI kicked off a multi-stakeholder process to define the characteristics of an AI system that can be confidently and generally understood to be considered as “Open Source”.

This collaboration utilizes the latest definition of AI system adopted by the Organization for Economic Cooperation and Development (OECD), and which has been the foundation for NIST’s “AI Risk Management Framework” as well as the European Union’s AI Act:

An AI system is a machine-based system that, for explicit or implicit objectives,
infers, from the input it receives, how to generate outputs such as predictions,
content, recommendations, or decisions that can influence physical or virtual
environments. Different AI systems vary in their levels of autonomy and
adaptiveness after deployment.

Since its announcement last summer, the OSI has had an open call for papers and held open webinars in order to collect ideas from the community describing precise problem areas in AI and collect suggestions for solutions. More than 6 community reviews – in Europe, Africa, and various locations in the US – have taken place in 2023, coinciding with a first draft of the Open Source AI Definition. This year, the OSI has coordinated working groups to analyze various foundation models, released three more drafts of the Definition, hosted bi-weekly public town halls to review and continues to get feedback from a wide variety of stakeholders, including:

  • System Creators (makes AI system and/or component that will be studied, used, modified, or shared through an Open Source license;
  • License Creators (writes or edits the Open Source license to be applied to the AI system or component; includes compliance;
  • Regulators (writes or edits rules governing licenses and systems (e.g. government policy-maker);
  • Licensees (seeks to study, use modify, or share an Open Source AI system (e.g. AI engineer, health researcher, education researcher);
  • End Users (consumes a system output, but does not seek to study, use, modify, or share the system (e.g., student using a chatbot to write a report, artist creating an image);
  • Subjects (affected upstream or downstream by a system output without interacting with it intentionally; includes advocates for this group (e.g. people with loan denied, or content creators.
What is Open Source AI?

An Open Source AI is an AI system made available to the public under terms that grant the freedoms to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.

Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system.

The OSI expects to wrap up and report the outcome of in-person and online meetings and anticipates having the draft endorsed by at least 5 reps for each of the stakeholder groups with a formal announcement of the results in late October.

To address the need to define rules for maintenance and review of this new Open Source AI Definition, the OSI Board of Directors approved the creation of a new committee to oversee the development of the Open Source AI Definition, approve version 1.0, and set rules for the maintenance of Definition.

Some preliminary observations based on these efforts to date:

  • It is generally recognized, as indicated above, that the Open Source Definition as created for software does not completely cover this new era of Open Source AI. This is not a software-only issue and is not something that can be solved by applying the same exact terms in the new territory of defining Open Source AI. The Open Source AI definition will start from the core motivation of the need to ensure users of AI systems retain their autonomy and personal agency.
  • To the greatest degree practical, Open Source AI should not be limited in scope, allowing users the right to adopt the technology for any purpose. One of the key lessons and underlying successes of the Open Source Definition is that field-of-use restrictions deprive creators of software to utilize tools in a way to affect positive outcomes in society.
  • Reflecting on the past 20-to-30 years of learning about what has gone well and what hasn’t in terms of the open community and the progress it has made, it’s important to understand that openness does not automatically mean ethical, right or just. Other factors such as privacy concerns and safety when developing open systems come into play, and in each element of an AI model – and when put together as a system — there is an ongoing tension between something being open and being safe, or potentially harmful.
  • Open Source AI systems lower the barrier for stakeholders outside of large tech companies to shape the future of AI, enabling more AI services to be built by and for diverse communities with different needs that big companies may not always address.
  • Similarly, Open Source AI systems make it easier for regulators and civil society to assess AI systems for compliance with laws protecting civil rights, privacy, consumers, and workers. They increase transparency, education, testing and trust around the use of AI, enabling researchers and journalists to audit and write about AI systems’ impacts on society.
  • Open source AI systems advance safety and security by accelerating the understanding of their capabilities, risks and harms through independent research, collaboration, and knowledge sharing.
  • Open source AI systems promote economic growth by lowering the barrier for innovators, startups, and small businesses from more diverse communities to build and use AI. Open models also help accelerate scientific research because they can be less expensive, easier to fine-tune, and supportive of reproducible research.

The OSI looks forward to working with NTIA as it considers the comments to this RFI, and stands ready to participate in any follow on discussions to this or the general topic of ‘Dual Use Foundation Artificial Intelligence Models With Widely Available Model Weights’. As shared above, it is essential that federal policymakers encourage Open Source AI models to the greatest extent possible, and work with organizations like the OSI and others who are endeavoring to create a unified, recognized definition of Open Source AI.

Respectfully submitted,

For more information, contact:

  • Stefano Maffulli, Executive Director
  • Deb Bryant, US Policy Director


Categories: FLOSS Research

Open Source AI Definition – Weekly update April 2

Mon, 2024-04-01 17:10
Seeking document reviewers for Pythia and OpenCV
  • We are now in the process of reviewing legal documents to check the compatibility with the version 0.0.6 definition of open-source AI, specifically for Pythia and OpenCV.
    • Click here to see the past activities of the four working groups
  • To get involved, respond on the forum or message Mer here.
The data requirement: “Sufficiently detailed information” for what?
  • Central question: What criteria define “sufficiently detailed information”?
    • There is a wish to change the term “Sufficiently detailed information” to “Sufficiently detailed to allow someone to replicate the entire dataset” to avoid vagueness and solidify reproducibility as openness
  • Stefano points out that “reproducibility” in itself might not be a sustainable term due to its loaded connotations.
  • There’s a proposal to modify the Open Source AI Definition requirement to specify providing detailed information to replicate the entire dataset.
    • However, concerns arise about how this would apply to various machine learning methods where dataset replication might not be feasible.
Action on the 0.0.6 draft
  • Contribution concerned with the usage of the wording “deploy” under “Out of Scope Issues” in relation to code alone.
    • OSI has replied asking for clarification on the question, as “deploy” refers to the whole AI system, not just the code.
  • Contribution concerned with the wording of “learning, using, sharing and improving software systems” under “Why We Need Open Source Artificial Intelligence”. Specifically, when relating to AI as opposed to “traditional” software, there is a growing concern that these values might be broad compared to the impact, in terms of safety and ethics, AI can have.
    • OSI replied that while the ethics of AI will continue to be discussed, these discussions are out of the scope of this definition. This will be elaborated on in an upcoming FAQ.
Categories: FLOSS Research

Letter to U.S. Commerce Secretary Raimondo urging protection of openness and transparency in AI

Mon, 2024-03-25 14:18

The Open Source Initiative (OSI) contributed, along with other members of civil society and academia, to a letter drafted by Mozilla and the Center for Democracy & Technology (CDT) asking the White House and Congress to exercise great caution when considering whether and how to regulate the publication of open models.

The letter demonstrates how openness allows collaborative efforts to build, shape and test AI for the benefit of all, and speaks of the need for policy, technology and advocacy in creating a better future through trustworthiness and accountability in AI innovation. The letter highlighted three broad points of consensus about openness and transparency in AI:

  • Open models can provide significant benefits to society, and policy should sustain and expand these benefits.
  • Policy should be based on clear evidence of marginal risks that open models pose compared to closed models.
  • Policy should consider a wide range of solutions to address well-defined marginal risks in a tailored fashion.

The letter was sent today, March 25, 2024, in advance of the Department of Commerce’s comment deadline on AI models which closes March 27. You can read the letter below and at CDT’s website.

Categories: FLOSS Research

Open Source AI Definition – Weekly Update Mar 25

Mon, 2024-03-25 12:56

The current draft is up for review and comment. Please spread the word with your peers as we are entering the last 2 months of drafting: this is the time to raise concerns and shape the final text of the Open Source AI Definition.

Where to find the description of the “components”
  • An article regarding the components of machine learning systems has been published by the Linux Foundation team. The paper establishes a ranked classification system that rates machine learning models based on their completeness, following principles of open science, open source, open data, and open access.
    • The list of components on this paper is what we used for the evaluation of Pythia, Llama2, BLOOM and OpenCV with the working groups.
    • The default required components went in the 0.0.6 draft definition.
  • The definitions of these terms are now public and will be cited going forward.
Open Source AI Definition Town Hall – March 22, 2024
  • If you missed the latest town hall, access the recording through the link above.
  • Next town hall meeting will be held on the 5th of April
Categories: FLOSS Research

Results of 2024 elections of OSI board of directors

Tue, 2024-03-19 15:34

The polls just closed, the results are in. Congratulations to the returning directors Thierry Carrez and Josh Berkus, and the newly elected director Chris Aniszczyk.

Thierry Carrez has been confirmed and joins as a director elected by the Affiliate organizations. Chris Aniszczyk and Josh Berkus collected the votes of the Individual members.

The OSI thanks all of those who participated in the 2024 board elections by casting a ballot and asking questions to the candidates. We also want to extend our sincerest gratitude to all of those who stood for election. We were once again honored with an incredible slate of candidates who stepped forward from across the open source software community to support the OSI’s work, and advance the OSI’s mission. The 2024 nominees were again, remarkable: experts from a variety of fields and technologies with diverse skills and experience gained from working across the open source community. We hope the entire Open Source software community will join us in thanking them for their service and their leadership. We’re better off because of their contributions and commitment, and we thank them.

Next steps

The board of directors has formalized the election results in an ad-hoc meeting and invited the newly elected director to the onboarding meeting.

The complete election results OSI Affiliate directors elections 2024

There were 6 candidates competing for 1 seat. The number of voters was 38 and there were 38 valid votes and 0 empty ballots.

Counting votes using Scottish STV.

Winner is Thierry Carrez.

Details from affiliates elections.

OSI Individual directors elections 2024

There were 11 candidates competing for 2 seats. The number of voters was 158 and there were 158 valid votes and 0 empty ballots.

Counting votes using Scottish STV.

Winners are Chris Aniszczyk and Josh Berkus.

Details from individuals elections.

Categories: FLOSS Research