Open Source Initiative

Subscribe to Open Source Initiative feed Open Source Initiative
The steward of the Open Source Definition, setting the foundation for the Open Source Software ecosystem.
Updated: 14 hours 13 min ago

Open Source AI Definition – Weekly update April 22

Mon, 2024-04-22 10:42
Comments on the forum
  • A user added in the forum that there is an issue as traditional copyright protection might not apply to weight models because they are essentially mathematical calculations. “ licensing them through any kind of copyright license will not be enforceable !! and this means that anybody can use them without any copyright restriction (assuming that they have been made public) and this means that you cannot enforce any kind of provisions such as attribution, no warranty or copyleft” They suggest using contractual terms instead of relying on copyright as a workaround, acknowledgement that this will trigger a larger conversation
Comments left on the definition text
  • Clarification needed under “What is Open Source AI”
  1. Discussion on whether “made available” should be changed to “released” or “distributed”
    1. One user pointed out that “made available” is the most appropriate, as the suggested wordings would be antagonistic and limiting
  2. Continuation of last week’s issue regarding defining who these four freedoms are for, deployers, users or someone else.
    1. Added that a user understands it as “We need essential freedoms to enable users…”
    2. But, then who are we defining as “Users”? Is it the person deploying the AI or the calling prompt?
    3. Another wording is suggested: “Open Source AI is an AI system that is made available under terms that grant, without conditions or restrictions, the rights to…”
  • Clarification is needed under “Preferred form to make modification to a machine learning system”, 
  1. Specifically to the claim: (The following components are not required,) but their inclusion in releases is appreciated.
    1. Clarification regarding whether this means best practice or it’s a mere a suggestion.
    2. Suggestion to change the sentence to “The following components are not required to meet the Open Source AI definition and may be provided for convenience.” This will also “consider if those components are provided, can they be provided under different terms that don’t meet the Open Source AI definition, or do they fall under the same OSI compliant license automatically. “
  2. Question regarding the addition of “may” under data transparency in the 0.0.7 draft definition, which was not included in the 0.0.6 one, considering that the components are described as “required” in the checklist below
    1. (Context: “Sufficiently detailed information on how the system was trained. This may include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics; how the data was obtained and selected, the labelling procedures and data cleaning methodologies.”)
    2. Another user seconds this and further adds that it should be changed to “must”, or something else which is definitive.
Town Hall meeting was held on April 19th

In case you missed it, the with town hall was held last Friday. Access the recordings and slides used here

Categories: FLOSS Research

Open Source AI Definition – Weekly update April 15

Mon, 2024-04-15 12:40

Having just exited a very busy week here are the two major milestones to know about.

Definition v.0.0.7 is out!
  • Access the definition here and the discussion of it here
  • The changelog:
    • Incorporating the comments to draft v.0.0.6 and results of the working group analysis
    • Removed reference to “the public” in the four freedoms, left the object (users) implied
    • Removed reference to ML systems following the text “Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system”
    • Separated the ‘checklist’ and made it specific to ML systems, based on the Model Openness Framework
    • Described in highly generic terms the conditions to access the model parameters
  • A concern was raised regarding the checklist making training data optional, potentially undermining the freedom to modify AI systems. This echoes previous debates we have had and likely will continue to have, regarding access to training data.
  • Discussion on the need to clarify licensing terms to ensure compliance with Open Source principles, suggesting a change to “Available under terms that satisfy the Open Source principles”.
    • Proposal to consider the Open Source Definition itself as a checklist and cautious approach suggested before dictating specific requirements for legal documents.
  • A comment on the definition rather than the forum clarified that there needs to determine whether the freedoms outlined in the Open Source AI Definition should be granted to the deployer or the end user, considering differing access levels and implications for openness
The results of the working groups are out!
  • Four different working groups connected with four different AI systems (Llama-2, Pythia, Bloom and OpenCV) have been reviewing legal document and comparing them to the previous 0.0.6 checklist on the Open Source AI Definition
  • The goal was to see how well the documents align with the components as described in the checklist.
  • Go here to see the updated checklist
  • The changes can be described as follows:
    • Added legal framework for model parameters (including weights). The framework proposes that, if copyrightable, model parameters can be shared as code

Added the five (5) data transparency components from v.0.0.6 to the checklist under the category “Documentation,” along with legal frameworks

Categories: FLOSS Research

Submit your proposal for All Things Open – Doing Business with Open Source

Tue, 2024-04-09 09:28

The supply-side value of widely-used Open Source software is estimated to be worth $4.15 billion, and the demand-side value is much larger, at $8.8 trillion. And yet, maintaining a healthy business while producing Open Source software feels more like an art than a science.

The Open Source Initiative wants to facilitate discussions about doing business with and for Open Source.

If you run a business producing Open Source products or your company’s revenue depends on Open Source in any way, we want to hear from you! Share your insights on:

  • How you balance the needs of paying customers with those of partners and non-paying users
  • How you organize your sales, marketing, product and engineering teams to deal with your communities
  • What makes you decide where to draw the lines between pushing fixes upstream and maintaining a private fork
  • Where do you see the value of copyleft in software-as-a-service
  • Why you chose a specific license for your product offering and how do you deal with external contributions
  • What trends do you see in the ecosystem and what effects are these having

We want to hear about these and other topics, from personal experiences and research. Our hope is to provide the ecosystem with accessible resources to better understand the problem space and find solutions.

How it works

We’re tired of panel discussions that start and end at a conference. We want to share knowledge to the widest possible base. We’re going to have a panel at All Things Open, with preparation work before the event.

  • You’ll send your proposals as pitches to OpenSource.net, a title and abstract (300 words max) and a short bio.
  • Our staff will review the pitches and get back to you, selecting as many articles as deemed interesting for publication.
  • We’ll also pick the authors of five of the most interesting articles to be speakers at a panel discussion at ATO, on October 29 in Raleigh, NC. Full conference passes will be offered. 
  • Authors of accepted pitches to write a full article (1,200-1,500 words) to be published leading up to ATO.
  • We’ll also select other pitches worth developing into full-length articles but, for any reason, didn’t fit into the panel discussion.

Note: Please read and follow the guidelines carefully before submitting your proposal.

Submission Requirements
  • Applications should be submitted via web form
  • Add a title and a pitch, 300 words maximum
  • Include a brief bio, highlighting why you’re the right person to write about this topic
  • Submissions should be well-structured, clear and concise
Evaluation Criteria
  • Relevance to the topic
  • Originality and uniqueness of the submission
  • Clarity and coherence of argumentation
  • Quality of examples and case studies
  • Presenter’s expertise and track record in the field
  • Although the use of generative AI is permitted, pitches evidently written by AI won’t be considered
Timeline
  • Submission deadline: May 17, 2024
  • Notification of acceptance: May 30, 2024
  • Accepted authors must submit their full article by June 30, 2024
  • Articles will be published between July 8 and October 10, 2024
  • The authors of the selected articles will be invited to join a panel by July 20, 2024
  • Event dates: Oct 28, 29, 2024
What to Expect
  • Your submission will be reviewed by a panel of experts in the field
  • If accepted, you will be asked to produce a full article that will be published at opensource.net

We look forward to receiving your submission!

Follow The Open Source Initiative:

Categories: FLOSS Research

Compelling responses to NTIA’s AI Open Model Weights RFC

Tue, 2024-04-09 08:03

The National Telecommunications and Information Administration (NTIA) posted a request for comments on Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights, and it has received 362 comments.

In addition to the Open Source Initiative’s (OSI) joint letter drafted by Mozilla and the Center for Democracy and Technology (CDT), the OSI has also sent a letter of its own, highlighting our multi-stakeholder process to create a unified, recognized definition of Open Source AI.

The following is a list of some comments from nonprofit organizations and companies.

Comments from additional nonprofit organizations
  • Researchers from Stanford University’s Human-centered AI (HAI) and Princeton University recommend that the federal government prioritize understanding of the marginal risk of open foundational models when compared to proprietary, creating policies based on this marginal risk. Their response also highlighted several unique benefits from open foundational models, including higher innovation, transparency, diversification, and competitiveness.
  • Wikimedia Foundation recommends that regulatory approaches should support and encourage the development of beneficial uses of open technologies rather than depending on more closed systems to mitigate risks. Wikimedia believes open and widely available AI models, along with the necessary infrastructure to deploy them, could be an equalizing force for many jurisdictions around the world by mitigating historical disadvantages in the ability to access, learn from, and use knowledge.
  • EleutherAI Institute recommends Open Source AI and warns that restrictions on open-weight models are a costly intervention with comparatively little benefit. EleutherAI believes that open models enable people close to the deployment context to have greater control over the capabilities and usage restrictions of their models, study the internal behavior of models during deployment, and examine the training process and especially training data for signs that a model is unsafe to deploy in a specific use-case. They also lower barriers of entry by making models cheaper to run and enable users whose use-cases require strict guarding of privacy (e.g., medicine, government benefits, personal financial information) to use.
  • MLCommons recommends the use of standardized benchmarks, which will be a critical component for mitigating the risk of models both with and without widely available open weights. MLCommons believes models with widely available open weights allow the entire AI safety community – including auditors, regulators, civil society, users of AI systems, and developers of AI systems – to engage with the benchmark development process. Together with open data and model code, open weights enable the community to clearly and completely understand what a given safety benchmark is measuring, eliminating any confounding opacity around how a model was trained or optimized.
  • The AI Alliance recommends regulation shaped by independent, evidence-based research on reliable methods of assessing the marginal risks posed by open foundation models; effective risk management frameworks for the responsible development of open foundation models; and balancing regulation with the benefits that open foundation models offer for expanding access to the technology and catalyzing economic growth.
  • The Alliance for Trust in AI recommends that regulation should protect the many benefits of increasing access to AI models and tools. The Alliance of Trust in AI believes that openness should not be artificially restricted based on a misplaced belief that this will decrease risk.
  • Access Now recommends NTIA to think broadly about how developments in AI are reshaping or consolidating corporate power, especially with regard to ‘Big Tech.’ Access Now believes in the development and use of AI systems in a sustainable, resource-friendly way that considers the impact of models on marginalized communities and how those communities intersect with the Global South.
  • Partnership on AI (PAI) recommends NTIA’s work should be informed by the following principles: all foundation models need risk mitigations; appropriate risk mitigations will vary depending on model characteristics; risk mitigation measures, for either open or closed models, should be proportionate to risk; and voluntary frameworks are part of the solution.
  • R Street recommends pragmatic steps towards AI safety, relying on multistakeholder processes to address problems in a more flexible, agile, and iterative fashion. The government should not impose arbitrary limitations on the power of Open Source AI systems, which could result in a net loss of competitive advantage.
  • The Computer and Communications Industry Association (CCIA) recommends assessment based on the risks, highlighting that open models provide the potential for better security, less bias, and lower costs to AI developers and users alike. The CCIA acknowledged that the vast majority of Americans already use systems based on Open Source software (knowingly or unknowingly) on a daily basis.
  • The Information Technology Industry Council (ITI) recommends adopting a risk-based approach with respect to open foundation models, since not all models pose an equivalent degree of risk, and that the risk management is a shared responsibility across the AI value chain.
  • The Center for Data Innovation recommends that U.S. policymakers defend open AI models at the international level as part of its continued embrace of the global free flow of data. It also encourages them to learn lessons from past debates about dual-use technologies, such as encryption, and refrain from imposing restrictions on foundation models because such policies would not only be ultimately ineffective at addressing risk, but they would slow innovation, reduce competition, and decrease U.S. competitiveness.
  • The International Center for Law & Economics recommends that AI regulation must be grounded in empirical evidence and data-driven decision making. Demanding a solid evidentiary basis as a threshold for intervention would help policymakers to avoid the pitfalls of reacting to sensationalized or unfounded AI fears.
  • New America’s Open Technology Institute (OTI) recommends a coordinated interagency approach designed to ensure that the vast potential benefits of a flourishing open model ecosystem serve American interests, in order to counter or at least offset the trend toward dominant closed AI systems and continued concentrations of power in the hands of a few companies.
  • Electronic Privacy Information Center (EPIC) recommends NTIA to grapple with the nuanced advantages, disadvantages, and regulatory hurdles that emerge within AI models along the entire gradient of openness, highlighting that AI models with weights widely available may foster more independent evaluation of AI systems and greater competition compared to closed systems.
  • The Software & Information Industry Association (SIIA) recommends a risk-based approach to foundation models that considers the degree and type of openness. SIIA believes openness has already proved to be a catalyst for research and innovation by essentially democratizing access to models that are cost-prohibitive for many actors in the AI ecosystem to develop on their own.
  • The Future Society recommends that the government should establish risk categories (i.e., designations of “high-risk” or “unacceptable-risk”), thresholds, and risk-mitigation measures that correspond to evaluation outcomes. The Future Society is concerned that overly restrictive policies could lead to market concentration, hindering competition and innovation in both industry and academia. A lack of competition in the AI market can have far-reaching knock-on consequences, including potentially stifling efforts to improve transparency, safety, and accountability in the industry. This, in turn, can impair the ability to monitor and mitigate the risks associated with dual-use foundation models and to develop evidence-based policymaking.
  • The Software Alliance (BSA) recommends NTIA to avoid restricting the availability of open foundation models; ground policies that address risks of open foundation models on empirical evidence; and encourage the implementation of safeguards to enhance the safety of open foundation models. BSA recognizes the substantial benefits that open foundation models provide to both consumers and businesses.
  • The US Chamber of Commerce recommends NTIA to make decisions based on sound science and not unsubstantiated concerns that open models pose an increased risk to society. The US Chamber of Commerce believes that Open-source technology allows developers to build, create, and innovate in various areas that will drive future economic growth.
Comments from companies
  • Meta recommends NTIA to establish common standards for risk assessments, benchmarks and evaluations informed by science, noting that the U.S. national interest is served by the broad availability of U.S.-developed open foundation models. Meta highlighted that Open source democratizes access to the benefits of AI, and that these benefits are potentially profound for the U.S., and for societies around the world. 
  • Google recommends a rigorous and holistic assessment of the technology to evaluate benefits and risks. Google believes that Open models allow users across the world, including in emerging markets, to experiment and develop new applications, lowering barriers to entry and making it easier for organizations of all sizes to compete and innovate.
  • IBM recommends preserving and prioritizing the critical benefits of open innovation ecosystems for AI for increasing AI safety, advancing national competitiveness, and promoting democratization and transparency of this technology. 
  • Intel recommends accountability for responsible design and implementation to help mitigate potential individual and societal harm. This includes establishing robust security protocols and standards to identify, address, and report potential vulnerabilities. Intel believes openness not only allows for faster advancement of technology and innovation, but also faster, transparent discovery of potential harms and community remediation and address. Intel also believes that Open AI development is essential to facilitate innovation and equitable access to AI, as open innovation, open platforms, and horizontal competition help offer choice and build trust. 
  • Stability AI recommends that regulation must support a diverse AI ecosystem – from the large firms building closed products to the everyday developers using, refining, and sharing open technology. Stability AI recognizes that Open models promote transparency, security, privacy, accessibility, competition, and grassroots innovation in AI.
  • Hugging Face recommends establishing standards for best practices building on existing work and prioritizing requirements of safety by design across both the AI development chain and its deployment environments. Hugging Face believes that open-weight models contribute to competition, innovation, and broad understanding of AI systems to support effective and reliable development.
  • GitHub recommends regulatory risk assessment should weigh empirical evidence of possible harm against the benefits of widely available model weights. GitHub believes Open source and widely available AI models support research on AI development and safety, as well as the use of AI tools in research across disciplines. To-date, researchers have credited these models with supporting work to advance the interpretability, safety, and security of AI models; to advance the efficiency of AI models enabling them to use less resources and run on more accessible hardware; and to advance participatory, community-based ways of building and governing AI.
  • Microsoft recommends cultivating a healthy and responsible open source AI ecosystem and ensuring that policies foster innovation and research. This will be achieved through direct engagement with open source communities to understand the impact of policy interventions on them and, as needed, calibrations to address risks of concern while also minimizing negative impacts on innovation and research.
  • Y Combinator recommends NTIA and all stakeholders to realize the immense promise of open-weight AI models while ensuring this technology develops in alignment with our values. Y Combinator believes the degree of openness of AI models is a crucial factor shaping the trajectory of this transformative technology. Highly open models, with weights accessible to a broad range of developers, offer unparalleled opportunities to democratize AI capabilities and promote innovation across domains. Y Combinator has seen firsthand the incredible progress driven by open models, with a growing number of startups harnessing these powerful tools to pioneer groundbreaking applications. 
  • AH Capital Management, L.L.C. (a16z) recommends NTIA to be wary of generalized claims about the risks of Open Models and calls to treat them differently from Closed Models, especially those made by AI companies seeking to insulate themselves from market competition. a16z believes Open Models promote innovation, reduce barriers to entry, protect against bias, and allow such models to leverage and benefit from the collective expertise of the broader artificial intelligence (“AI”) community. 
  • Uber recommends promoting widely available model weights to spur innovation in the field of AI. Uber believes that, by democratizing access to foundational AI models, innovators from diverse backgrounds can build upon existing frameworks, accelerating the pace of technological advancement and increasing competition in the space. Uber also believes widely available model weights, source code, and data are necessary to foster accountability, facilitate collaboration in risk mitigation, and promote ethical and responsible AI development.
  • Databricks recommends regulation of highly capable AI models should focus on consumer-facing deployments and high risk deployments, with the obligations focused on the deployer. Databricks believes that the benefits of open models substantially outweigh the marginal risks, so open weights should be allowed, even at the frontier level.
Categories: FLOSS Research

Open Source AI Definition – Weekly update April 8

Mon, 2024-04-08 13:15
Seeking document reviewers for OpenCV
  • This is your final opportunity to register for the review of licenses provided by OpenCV. Join us for our upcoming phase, where we meticulously compare various systems’ documentation against our latest definition to test compatibility.
    • For more information, check the forum
Action on the 0.0.6 draft 
  • Under “The following components are not required, but their inclusion in public releases is appreciated”, a user highlighted that model cards should be a required open component, as its purpose is to promote transparency and accountability
  • Under “What is Open Source AI?”, a user raises a concern regarding “made available to the public”, stating that software carries an Open Source license, even if a copy was only made available to a single person.
    • This will be considered for the next draft
Open Source AI Definition Town Hall – April 5, 2024

Access the slides and the recording of the previous town hall meeting here.

Categories: FLOSS Research

OSI participates in Columbia Convening on openness and AI; first readouts available

Thu, 2024-04-04 09:47

I was invited to join Mozilla and the Columbia Institute of Global Politics in an effort that explores what “open” should mean in the AI era. A cohort of 40 leading scholars and practitioners from Open Source AI startups and companies, non-profit AI labs, and civil society organizations came together on February 29 at the Columbia Convening to collaborate on ways to strengthen and leverage openness for the good of all. We believe openness can and must play a key role in the future of AI. The Columbia Convening took an important step toward developing a framework for openness in AI with the hope that open approaches can have a significant impact on AI, just as Open Source software did in the early days of the internet and World Wide Web. 

This effort is aligned and contributes valuable knowledge to the ongoing process to find the Open Source AI Definition

As a result of this first meeting of Columbia Convening, two readouts have been published; a technical memorandum for technical leaders and practitioners who are shaping the future of AI, and a policy memorandum for policymakers with a focus on openness in AI.

Technical readout

The Columbia Convening on Openness and AI Technical Readout was edited by Nik Marda with review contributions from myself, Deval Pandya, Irene Solaiman, and Victor Storchan.

The technical readout highlighted the challenges of understanding openness in AI. Approaches to openness are falling under three categories: gradient/spectrum, criteria scoring, and binary. The OSI is championing a binary approach to openness, where AI systems are either “open” or “closed” based on whether they meet a certain set of criteria.

The technical readout also provided a diagram that shows how the AI stack may be described by the different dimensions (AI artifacts, documentation, and distribution) of its various components and subcomponents.

Policy readout

The Columbia Convening on Openness and AI Policy Readout was edited by Udbhav Tiwari with review contributions from Kevin Klyman, Madhulika Srikumar, and myself.

The policy readout highlighted the benefits of openness, including:

  • Enhancing reproducible research and promoting innovation
  • Creating an open ecosystem of developers and makers
  • Promoting inclusion through open development culture and models
  • Facilitating accountability and supporting bias research
  • Fostering security through widespread scrutiny
  • Reducing costs and avoiding vendor lock-In
  • Equipping supervisory authorities with necessary tools
  • Making training and inference more resource-efficient, reducing environmental harm
  • Ensuring competition and dynamism
  • Providing recourse in decision-making

The policy readout also showcased a table with the potential benefits and drawbacks of each component of the AI stack, including the code, datasets, model weights, documentation, distribution, and guardrails.

Finally, the policy readout provided some policy recommendations:

  • Include standardized definitions of openness as part of AI standards
  • Promote agency, transparency and accountability
  • Facilitate innovation and mitigate monopolistic practices
  • Expand access to computational resources
  • Mandate risk assessment and management for certain AI applications
  • Hold independent audits and red teaming
  • Update privacy legislation to specifically address AI challenges
  • Updated legal framework to distinguish the responsibilities of different actors
  • Nurture AI research and development grounded in openness
  • Invest in education and specialized training programs
  • Adapt IP laws to support open licensing models
  • Engage the general public and stakeholders

You can follow along with the work of Columbia Convening at mozilla.org/research/cc and the work from the Open Source Initiative on the definition of Open Source AI at opensource.org/deepdive.

Categories: FLOSS Research

OSI’s Response to NTIA ‘Dual Use’ RFC 3.27.2024

Tue, 2024-04-02 14:00

March 27, 2024

Mr. Bertram Lee
National Telecommunications and Information Administration (NTIA)
U.S. Department of Commerce
1401 Constitution Avenue NW
Washington, DC 20230

RE: [Docket Number 240216-0052] Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights

Dear Mr. Lee:

The Open Source Initiative (“OSI”) appreciates the opportunity to provide our views on the above referenced matter. As steward of the Open Source Definition, the OSI sets the foundation for Open Source software, a global public good that plays a vital role in the economy and is foundational for most technology we use today. As the leading voice on the policies and principles of Open Source, the OSI helps build a world where the freedoms and opportunities of Open Source software can be enjoyed by all and supports institutions and individuals working together to create communities of practice in which the healthy Open Source ecosystem thrives. One of the most important activities of the OSI, a California public benefit 501(c)(3) organization founded in 1998, is to maintain the Open Source Definition for the good of the community.

The OSI is encouraged by the work of NTIA to bring stakeholders together to understand the lessons from the Open Source software experience in having a recognized, unified Open Source Definition that enables an ecosystem whose value is estimated to be worth $8.8 trillion. As provided below in more detail, it is essential that federal policymakers encourage Open Source AI models to the greatest extent possible, and work with organizations like the OSI which is endeavoring to create a unified, recognized definition of Open Source AI.

The Power of Open Source

Open Source delivers autonomy and personal agency to software users which enables a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of Open Source is higher quality, better reliability, greater flexibility, lower cost, and an end to proprietary lock-in.

Open Source software is widely used across the federal government and in every critical infrastructure sector. “The Federal Government recognizes the immense benefits of Open Source software, which enables software development at an incredible pace and fosters significant innovation and collaboration.” For the last two decades, authoritative direction and educational resources have been given to agencies on the use, management and benefits of Open Source software.

Moreover, Open Source software has direct economic and societal benefits. Open Source software empowers companies to develop, test and deploy services, thereby substantiating market demand and economic viability. By leveraging Open Source, companies can accelerate their progress and focus on innovation. Many of the essential services and technologies of our society and economy are powered by Open Source software, including, e.g., the Internet.

The Open Source Definition has demonstrated that massive social benefits accrue when the barriers to learning, using, sharing and improving software systems are removed. The core criteria of the Open Source Definition – free redistribution; source code; derived works; integrity of the author’s source code; no discrimination against persons or groups; no discrimination against fields of endeavor; distribution of license; license must not be specific to a product; license must not restrict other software; license must be technology-neutral – have given users agency, control and self-sovereignty of their technical choices and a dynamic ecosystem based on permissionless innovation.

A recent study published by the European Commission estimated that companies located in the European Union invested around €1 billion in Open Source Software in 2018, which brought about a positive impact on the European economy of between €65 and €95 billion.

This success and the potency of Open Source software has for the last three decades relied upon the recognized unified definition of Open Source software and the list of Approved Licenses that the Open Source Initiative maintains.

OSI believes this “open” analog is highly relevant to Open Source AI as an emerging technology domain with tremendous potential for public benefit.

Distinguishing the Open Source Definition

The OSI Approved License trademark and program creates a nexus of trust around which developers, users, corporations and governments can organize cooperation on Open Source software. However, it is generally agreed that the Open Source Definition, drafted 26 years ago and maintained by the OSI, does not cover this new era of AI systems.

AI models are not just code; they are trained on massive datasets, deployed on intricate computing infrastructure, and accessed through diverse interfaces and modalities. With traditional software, there was a very clear separation between the code one wrote, the compiler one used, the binary it produced, and what license they had. However, for AI models, many components collectively influence the functioning of the system, including the algorithms, code, hardware, and datasets used for training and testing. The very notion of modifying the source code (which is important in the Open Source Definition) becomes fuzzy. For example, there is the key question of whether the training dataset, the model weights, or other key elements should be considered independently or collectively as the source code for the model/weights that have been trained.

AI (specifically the Models that it manifests) include a variety of technologies, each is a vital element to all Models.

This challenge is not new. In its guidance on use of Open Source software, the US Department of Defense distinguished open systems from open standards, that while “different from Open Source software, they are complementary and can work well together”:

Open standards make it easier for users to (later) adopt an Open Source software
program, because users of open standards aren’t locked into a particular
implementation. Instead, users who are careful to use open standards can easily
switch to a different implementation, including an OSS implementation. … Open
standards also make it easier for OSS developers to create their projects, because
the standard itself helps developers know what to do. Creating any interface is an
effort, and having a predefined standard helps reduce that effort greatly.

OSS implementations can help create and keep open standards open. An OSS
implementation can be read and modified by anyone; such implementations can
quickly become a working reference model (a “sample implementation” or an
“executable specification”) that demonstrates what the specification means
(clarifying the specification) and demonstrating how to actually implement it.
Perhaps more importantly, by forcing there to be an implementation that others can
examine in detail, resulting in better specifications that are more likely to be used.

OSS implementations can help rapidly increase adoption/use of the open standard.
OSS programs can typically be simply downloaded and tried out, making it much
easier for people to try it out and encouraging widespread use. This also pressures
proprietary implementations to limit their prices, and such lower prices for
proprietary software also encourages use of the standard.

With practically no exceptions, successful open standards for software have OSS
implementations.

Towards a Unified Vision of what is ‘Open Source AI’

With these essential differentiating elements in mind, last summer, the OSI kicked off a multi-stakeholder process to define the characteristics of an AI system that can be confidently and generally understood to be considered as “Open Source”.

This collaboration utilizes the latest definition of AI system adopted by the Organization for Economic Cooperation and Development (OECD), and which has been the foundation for NIST’s “AI Risk Management Framework” as well as the European Union’s AI Act:

An AI system is a machine-based system that, for explicit or implicit objectives,
infers, from the input it receives, how to generate outputs such as predictions,
content, recommendations, or decisions that can influence physical or virtual
environments. Different AI systems vary in their levels of autonomy and
adaptiveness after deployment.

Since its announcement last summer, the OSI has had an open call for papers and held open webinars in order to collect ideas from the community describing precise problem areas in AI and collect suggestions for solutions. More than 6 community reviews – in Europe, Africa, and various locations in the US – have taken place in 2023, coinciding with a first draft of the Open Source AI Definition. This year, the OSI has coordinated working groups to analyze various foundation models, released three more drafts of the Definition, hosted bi-weekly public town halls to review and continues to get feedback from a wide variety of stakeholders, including:

  • System Creators (makes AI system and/or component that will be studied, used, modified, or shared through an Open Source license;
  • License Creators (writes or edits the Open Source license to be applied to the AI system or component; includes compliance;
  • Regulators (writes or edits rules governing licenses and systems (e.g. government policy-maker);
  • Licensees (seeks to study, use modify, or share an Open Source AI system (e.g. AI engineer, health researcher, education researcher);
  • End Users (consumes a system output, but does not seek to study, use, modify, or share the system (e.g., student using a chatbot to write a report, artist creating an image);
  • Subjects (affected upstream or downstream by a system output without interacting with it intentionally; includes advocates for this group (e.g. people with loan denied, or content creators.
What is Open Source AI?

An Open Source AI is an AI system made available to the public under terms that grant the freedoms to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.

Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system.

The OSI expects to wrap up and report the outcome of in-person and online meetings and anticipates having the draft endorsed by at least 5 reps for each of the stakeholder groups with a formal announcement of the results in late October.

To address the need to define rules for maintenance and review of this new Open Source AI Definition, the OSI Board of Directors approved the creation of a new committee to oversee the development of the Open Source AI Definition, approve version 1.0, and set rules for the maintenance of Definition.

Some preliminary observations based on these efforts to date:

  • It is generally recognized, as indicated above, that the Open Source Definition as created for software does not completely cover this new era of Open Source AI. This is not a software-only issue and is not something that can be solved by applying the same exact terms in the new territory of defining Open Source AI. The Open Source AI definition will start from the core motivation of the need to ensure users of AI systems retain their autonomy and personal agency.
  • To the greatest degree practical, Open Source AI should not be limited in scope, allowing users the right to adopt the technology for any purpose. One of the key lessons and underlying successes of the Open Source Definition is that field-of-use restrictions deprive creators of software to utilize tools in a way to affect positive outcomes in society.
  • Reflecting on the past 20-to-30 years of learning about what has gone well and what hasn’t in terms of the open community and the progress it has made, it’s important to understand that openness does not automatically mean ethical, right or just. Other factors such as privacy concerns and safety when developing open systems come into play, and in each element of an AI model – and when put together as a system — there is an ongoing tension between something being open and being safe, or potentially harmful.
  • Open Source AI systems lower the barrier for stakeholders outside of large tech companies to shape the future of AI, enabling more AI services to be built by and for diverse communities with different needs that big companies may not always address.
  • Similarly, Open Source AI systems make it easier for regulators and civil society to assess AI systems for compliance with laws protecting civil rights, privacy, consumers, and workers. They increase transparency, education, testing and trust around the use of AI, enabling researchers and journalists to audit and write about AI systems’ impacts on society.
  • Open source AI systems advance safety and security by accelerating the understanding of their capabilities, risks and harms through independent research, collaboration, and knowledge sharing.
  • Open source AI systems promote economic growth by lowering the barrier for innovators, startups, and small businesses from more diverse communities to build and use AI. Open models also help accelerate scientific research because they can be less expensive, easier to fine-tune, and supportive of reproducible research.

The OSI looks forward to working with NTIA as it considers the comments to this RFI, and stands ready to participate in any follow on discussions to this or the general topic of ‘Dual Use Foundation Artificial Intelligence Models With Widely Available Model Weights’. As shared above, it is essential that federal policymakers encourage Open Source AI models to the greatest extent possible, and work with organizations like the OSI and others who are endeavoring to create a unified, recognized definition of Open Source AI.

Respectfully submitted,
THE OPEN SOURCE INITIATIVE


For more information, contact:

  • Stefano Maffulli, Executive Director
  • Deb Bryant, US Policy Director

 

Categories: FLOSS Research

Open Source AI Definition – Weekly update April 2

Mon, 2024-04-01 17:10
Seeking document reviewers for Pythia and OpenCV
  • We are now in the process of reviewing legal documents to check the compatibility with the version 0.0.6 definition of open-source AI, specifically for Pythia and OpenCV.
    • Click here to see the past activities of the four working groups
  • To get involved, respond on the forum or message Mer here.
The data requirement: “Sufficiently detailed information” for what?
  • Central question: What criteria define “sufficiently detailed information”?
    • There is a wish to change the term “Sufficiently detailed information” to “Sufficiently detailed to allow someone to replicate the entire dataset” to avoid vagueness and solidify reproducibility as openness
  • Stefano points out that “reproducibility” in itself might not be a sustainable term due to its loaded connotations.
  • There’s a proposal to modify the Open Source AI Definition requirement to specify providing detailed information to replicate the entire dataset.
    • However, concerns arise about how this would apply to various machine learning methods where dataset replication might not be feasible.
Action on the 0.0.6 draft
  • Contribution concerned with the usage of the wording “deploy” under “Out of Scope Issues” in relation to code alone.
    • OSI has replied asking for clarification on the question, as “deploy” refers to the whole AI system, not just the code.
  • Contribution concerned with the wording of “learning, using, sharing and improving software systems” under “Why We Need Open Source Artificial Intelligence”. Specifically, when relating to AI as opposed to “traditional” software, there is a growing concern that these values might be broad compared to the impact, in terms of safety and ethics, AI can have.
    • OSI replied that while the ethics of AI will continue to be discussed, these discussions are out of the scope of this definition. This will be elaborated on in an upcoming FAQ.
Categories: FLOSS Research

Letter to U.S. Commerce Secretary Raimondo urging protection of openness and transparency in AI

Mon, 2024-03-25 14:18

The Open Source Initiative (OSI) contributed, along with other members of civil society and academia, to a letter drafted by Mozilla and the Center for Democracy & Technology (CDT) asking the White House and Congress to exercise great caution when considering whether and how to regulate the publication of open models.

The letter demonstrates how openness allows collaborative efforts to build, shape and test AI for the benefit of all, and speaks of the need for policy, technology and advocacy in creating a better future through trustworthiness and accountability in AI innovation. The letter highlighted three broad points of consensus about openness and transparency in AI:

  • Open models can provide significant benefits to society, and policy should sustain and expand these benefits.
  • Policy should be based on clear evidence of marginal risks that open models pose compared to closed models.
  • Policy should consider a wide range of solutions to address well-defined marginal risks in a tailored fashion.

The letter was sent today, March 25, 2024, in advance of the Department of Commerce’s comment deadline on AI models which closes March 27. You can read the letter below and at CDT’s website.

Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024Download
Categories: FLOSS Research

Open Source AI Definition – Weekly Update Mar 25

Mon, 2024-03-25 12:56

The current draft is up for review and comment. Please spread the word with your peers as we are entering the last 2 months of drafting: this is the time to raise concerns and shape the final text of the Open Source AI Definition.

Where to find the description of the “components”
  • An article regarding the components of machine learning systems has been published by the Linux Foundation team. The paper establishes a ranked classification system that rates machine learning models based on their completeness, following principles of open science, open source, open data, and open access.
    • The list of components on this paper is what we used for the evaluation of Pythia, Llama2, BLOOM and OpenCV with the working groups.
    • The default required components went in the 0.0.6 draft definition.
  • The definitions of these terms are now public and will be cited going forward.
Open Source AI Definition Town Hall – March 22, 2024
  • If you missed the latest town hall, access the recording through the link above.
  • Next town hall meeting will be held on the 5th of April
Categories: FLOSS Research

Results of 2024 elections of OSI board of directors

Tue, 2024-03-19 15:34

The polls just closed, the results are in. Congratulations to the returning directors Thierry Carrez and Josh Berkus, and the newly elected director Chris Aniszczyk.

Thierry Carrez has been confirmed and joins as a director elected by the Affiliate organizations. Chris Aniszczyk and Josh Berkus collected the votes of the Individual members.

The OSI thanks all of those who participated in the 2024 board elections by casting a ballot and asking questions to the candidates. We also want to extend our sincerest gratitude to all of those who stood for election. We were once again honored with an incredible slate of candidates who stepped forward from across the open source software community to support the OSI’s work, and advance the OSI’s mission. The 2024 nominees were again, remarkable: experts from a variety of fields and technologies with diverse skills and experience gained from working across the open source community. We hope the entire Open Source software community will join us in thanking them for their service and their leadership. We’re better off because of their contributions and commitment, and we thank them.

Next steps

The board of directors has formalized the election results in an ad-hoc meeting and invited the newly elected director to the onboarding meeting.

The complete election results OSI Affiliate directors elections 2024

There were 6 candidates competing for 1 seat. The number of voters was 38 and there were 38 valid votes and 0 empty ballots.

Counting votes using Scottish STV.

Winner is Thierry Carrez.

Details from affiliates elections.

OSI Individual directors elections 2024

There were 11 candidates competing for 2 seats. The number of voters was 158 and there were 158 valid votes and 0 empty ballots.

Counting votes using Scottish STV.

Winners are Chris Aniszczyk and Josh Berkus.

Details from individuals elections.

Categories: FLOSS Research

Open Source AI Definition – Weekly update Mar 18

Mon, 2024-03-18 12:58
Comments on draft 0.0.6 from the forum
  • Point raised by participant that training data has been listed both as optional and a precondition. This might cause confusion as it is unclear whether we should have the right to access training data or know what training data was used for the model
  • To contribute, read the new draft here 
Moving on to next steps! Town hall this Friday Comments on definitions under “What is open source AI? Still strong debate about access to training data 
  • There is a fear that this will harm the ecosystem in the long run, as the original work of the model never can be “forked” to improve the model itself.
Categories: FLOSS Research

ClearlyDefined at the ORT Community Days

Wed, 2024-03-13 10:45

Once again Bosch’s campus in Berlin received ORT Community Days, the annual event organized by the OSS Review Toolkit (ORT) community. ORT is an Open Source suite of tools to automate software compliance checks.

During this two day event, members from startups like Double Open and NexB, as well as large corporations like Mercedes-Benz, Volkswagen, CARIAD, Porsche, Here Technologies, EPAM, Deloitte, Sony, Zeiss, Fraunhofer, and Roche, came together to discuss best practices around software supply chain compliance.

The ClearlyDefined community had an important presence at the event, represented by E. Lynette Rayle and Lukas Spieß from GitHub and Qing Tomlinson from SAP. I had the pleasure to represent the Open Source Initiative as the community manager for ClearlyDefined. The mission of ClearlyDefined is to crowdsource a global database of licensing metadata for every software component ever published. We see the ORT community as an important partner towards achieving this mission.

Relevant talks

There were several interesting talks at ORT Community Days. These are the ones I found most relevant to ClearlyDefined:

Philippe Ombredanne presented ScanCode, a project of great importance to ClearlyDefined, as we use this tool to detect licenses, copyrights, and dependencies. Philippe gave an overview of the project and its challenges. For ClearlyDefined, we would like to see better accuracy and performance improvements. 

Sebastian Schuberth presented the Double Open Server (DOS) companion for ORT. DOS is a server application that scans the source code of open source components, stores the scan results for use in license compliance pipelines, and provides a graphical interface for manually curating the license findings. I believe there’s an opportunity to integrate DOS with ClearlyDefined by providing access to our APIs to fetch licensing metadata and allowing the sharing of curations.

Marcel Kurzmann and Martin Nonnenmacher presented Eclipse Apoapsis, another ORT server that makes use of its integration APIs for dependency analysis, license scanning, vulnerability databases, rule engine, and report generation. Again, I feel we could also integrate Eclipse Apoapsis with ClearlyDefined the same way as with DOS.

Till Jaeger gave an excellent talk about curation of ORT output from the perspective of FOSS license compliance. He highlighted the Cyber Resilient Act (CRA), which brings legal provisions for SBOMs, and which will likely increase the need for tools like ORT. Till shared the many challenges in the curation process, particularly the compatibility issues from dual licensing, and went on to showcase the OSADL compatibility matrix.

Presenting ClearlyDefined

I had the privilege of presenting ClearlyDefined together with E. Lynette Rayle from GitHub and we got some really good feedback and questions from the audience.

With the move towards SBOMs everywhere for compliance and security reasons, organizations will face great challenges to generate these at scale for each stage on the supply chain, for every build or release. Additionally, multiple organizations will have to curate the same missing or wrongly identified licensing metadata over and over again.

ClearlyDefined is well suited to solve these problems by serving a cached copy of licensing metadata for each component through a simple API. Organizations will also be able to contribute back with any missing or wrongly identified licensing metadata, helping to create a database that is accurate for the benefit of all.

GitHub is well aware of these challenges and is interested in helping its users in this regard. They recently added 17.5 million package licenses sourced from ClearlyDefined to their database, expanding the license coverage for packages that appear in dependency graph, dependency insights, dependency review, and a repository’s software bill of materials (SBOM).

To make use of ClearlyDefined’s data, a user can simply make a call to its API service. For example, to fetch licensing metadata from the lodash library on NPM at version 4.17.21, one would call:

curl -X GET "https://api.clearlydefined.io/definitions/npm/npmjs/-/lodash/4.17.21" -H "accept: */*"

This API call would be processed by the service for ClearlyDefined, as illustrated in the diagram below. If there’s a match in the definition store, then that definition would be sent back to the user. Otherwise, this request would trigger the crawler for ClearlyDefined (part of the harvesting process), which would download the lodash library from NPM, scan the library, and write the results to the raw results store. The service for ClearlyDefined would then read the raw results, summarize it, and create a definition to be written in the definition store. Finally, the definition would be served to the user.

The curation process is done through another API call via PATCHes. For example, the below PATCH updates a declared license to Apache-2.0:

"contributionInfo": {
      "summary": "[Test] Update declared license",
      "details": "The declared license should be Apache as per the LICENSE file.",
      "resolution": "Updated declared license to Apache-2.0.",
      "type":"incorrect",
      "removeDefinitions":false
  },

This curation is handled by the service for ClearlyDefined, as illustrated in the diagram below. The curation would trigger the creation of a PR in ClearlyDefined’s curated-data repository, which would be reviewed by and signed off by two curators. The PR would then be merged and written in the curated-data store.

GitHub has deployed its own local Harvester for ClearlyDefined, as illustrated in the diagram below. GitHub’s OSPO Policy Service posts requests to GitHub’s Harvester for ClearlyDefined, which downloads any components and dependencies from various package managers, scans these components, and writes the results directly to ClearlyDefined’s raw results store. GitHub’s OSPO Policy Service fetches definitions from the service for ClearlyDefined as well as licenses and attributions from GitHub’s Package License Gateway. GitHub maintains a local cache store which is synced with any updates from ClearlyDefined’s changes-notifications blob storage.

ClearlyDefined’s development has seen an increased participation from various organizations this past year, including GitHub, SAP, Microsoft, Bloomberg, and CodeThink.

Currently, maintainers of ClearlyDefined are focused on ongoing maintenance. Key goals for ClearlyDefined in 2024 include:

  • Publishing periodic releases and switching to semantic versioning
  • Bringing dependencies up to date (in particular using the latest scancode)
  • Improving the NOASSERTION/OTHER issue
  • Advancing usability and the curation process through the UI 
  • Enhancing the documentation and process for creating a local harvest

Our slides are available here.

Relevant breakout sessions

ORT Community Days provided several breakout sessions to allow participants to discuss pain points and solutions.

A special discussion around curations was led by Sebastian Schuberth and E. Lynette Rayle. The ORT Package Curation Data can be broken down into two categories: metadata interpretations and legal curations. The group discussed their thoughts about the curation process and its challenges, including handling false positives and the sharing of curations.

Nowadays, no conference would be complete without at least one talk or discussion about Artificial Intelligence. A group gathered to discuss the potential use of AI to improve user experience as well as for OSS compliance. The majority of attendees believed ORT’s documentation could be improved through the use of AI and even an assistant would be helpful to answer the most common questions. As for the use of AI for OSS compliance, there’s a lot of potential here, and one idea would be to use ClearlyDefined’s curation dataset to fine tune a LLM.

Conclusion

The second edition of ORT Community Days represented a unique opportunity for the ClearlyDefined community to better engage with the ORT community. We were able to meet the maintainers and members of ORT and learn from them about the current and future challenges. We were also able to explore how our communities can further collaborate. 

On behalf of the ClearlyDefined community, I would like to thank the organizers of this wonderful event: Marcel Kurzmann, Nikola Babadzhanov, Surya Santhi, and Thomas Steenbergen. I would also like to thank E. Lynette Rayle, Lukas Spieß and Qing Tomlinson from the ClearlyDefined community who have accepted my invitation to participate in this conference.

If you are interested in Open Source supply chain compliance and security, I invite you to learn a bit more about the ClearlyDefined and the ORT communities. You might also be interested in my report from FOSS Backstage.

Categories: FLOSS Research

Three perspectives from FOSS Backstage

Wed, 2024-03-13 10:45

As a community manager, I find FOSS Backstage to be one of my favorite conferences content-wise and community-wise. This is a conference that happens every year in Berlin, usually in early March. It’s a great opportunity to meet community leaders from Europe and across the world with the goal of fostering discussions around three complementary perspectives: a) community health and growth, b) project governance and sustainability, and c) supply chain compliance and security.

Community health and growth

While there were several interesting talks, one of the highlights of the “Community health and growth” track was Tom “spot” Callaway’s talk embracing your weird: community building through fun & play. Tom shared some really interesting ideas to help members bond together: a badge program, a candy swap activity, a coin giveaway, a scavenger hunt, and a karaoke session.

FOSS Backstage this year was special because I got to finally meet 3 members from the ClearlyDefined community who have given a new life to this project: E. Lynette Rayle and Lukas Spieß from GitHub and Qing Tomlinson from SAP. While we did not go into a scavenger hunt or a karaoke session (that would have been fun), we spent most of our time during the week having lunch and dinner together, watching talk sessions together, networking with old and new acquaintances, and even going for some sightseeing in Berlin. This has allowed us to not only share ideas about the future of ClearlyDefined, but most importantly to have fun together and create a strong bond between us.

Please find below a list of interesting talks from this track:

Project governance and sustainability

In last year’s FOSS Backstage, I had the opportunity to meet Thomas Steenbergen for the first time. He’s the co-founder of ClearlyDefined and the OSS Review Toolkit (ORT) communities. Project governance and sustainability is something Thomas deeply cares about, and I was honored to be invited to give  a talk together with him for this year’s conference.

Our talk was about aligning wishes of multiple organizations into an Open Source project. This is a challenge that many projects face: oftentimes they struggle to align wishes and get commitment from multiple organizations towards a shared roadmap. There’s also the challenge of the “free rider” problem, where the overuse of a common resource without giving back often leads to the tragedy of the commons. Thomas shared the idea of a collaboration marketplace and a contributor commitment agreement where organizations come together to identify, commit, and implement a common enhancement proposal. This is a strategy that we are applying to ORT and ClearlyDefined.

Our slides are available here.

Please find below a list of interesting talks from this track:

Supply chain compliance and security

Under the “supply chain compliance and security” track, I was happy to watch a wonderful talk from my friend Ana Jimenez Santamaria entitled looking at Open Source security from a community angle. She has been leading the TODO Group at the Linux Foundation for quite a few years now, and it was interesting to learn how they are helping OSPOs (Open Source Program Offices) to create a trusted software supply chain. Ana highlighted three takeaways:

  • OSPOs integrate Open Source in an organization’s IT infrastructure.
  • Collaboration between employees, Open Source staff, and security teams with the Open Source ecosystem offers a complete security coverage across the whole supply chain.
  • OSPOs have the important mission of achieving digitalization, innovation and security in a healthy and continuous way.

Please find below a list of interesting talks from this track:

Bonus: Open Source AI

Nowadays, no conference would be complete without at least one talk about Artificial Intelligence, so Frank Karlitschek’s keynote what the AI revolution means for Open Source and our society was very welcome! Frank demonstrated that Open Source AI can indeed compete with proprietary solutions from the big players. He presented Nextcloud Assistant that runs locally, and that can be studied and modified. This assistant offers several exciting features: face recognition in photos, text translation, text summarization, text generation, image generation, speech transcript, and document classification –  all this while preserving privacy.

It’s worth pointing out that the Open Source Initiative is driving a multi-stakeholder process to define an “Open Source AI” and everyone is welcome to be part of the conversation.

Conclusion

I had a wonderful time at FOSS Backstage and I invite everyone interested in community, governance, and supply chain to join this amazing event next year. I would like to thank the organizers who work “backstage” to put together this conference. Thank you Paul Berschick, Sven Spiller, Alexander Brateanu, Isabel Drost-Fromm, Anne Sophie Riege, and Stefan Rudnitzki. A special thanks also to the volunteers, speakers, sponsors, and last but not least to all attendees who made this event special.

If you are interested in Open Source supply chain compliance and security, I invite you to learn a bit more about the ClearlyDefined and the ORT communities. Be sure to check out my report from the ORT Community Days.

Categories: FLOSS Research

Open Source AI Definition – weekly update Mar 11

Mon, 2024-03-11 02:00

Big week, marked by the release of draft 0.0.6! The document is available for live comments and more general comments on the forum.

Changes in the section “What is open source AI”

  • added, “Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system.” With an example of what this looks like for a machine learning system
  • checklist to evaluate legal documents: the component details are presented. They reflect the results of the working groups
  • Change in wording from “license” to “legal document”

The preamble is left untouched, more discussion seems to be needed.

We held our fifth town hall meeting this previous Friday, the 8th of March.

Click here to access the recording

Why were these 4 systems picked and not others? Will more AI systems be analyzed?
  • (Participant question) out of 4 working groups, 2 refer to models under proprietary licenses. Of the 4 groups, 3 refer to LLMs. The Groups do not reflect what will (likely) be in the definition, therefore, it is a waste of time for OSI to consider them when crafting a definition.
  • (OSI response) It is important to have a diversified analysis, as at this stage, we are considering how the models operate rather than their license. The objective of the working groups was to identify the required components to exercise the 4 freedoms, and we found them to be quite similar. 

Next steps: Analyze a combination of each of the systems, which systems have these components, and find and review their accompanying legal documents. Follow this thread on the forum if you want to help.

How will the Open Source AI Definition and the “classic” OSD interact?
  • An interesting question was raised for which there should be an answer once the Open Source AI Definition gets closer to being feature complete.
Categories: FLOSS Research

A candid conversation on The Changelog Podcast about defining Open Source AI, and what is really at stake

Tue, 2024-03-05 01:00

I was recently invited to join hosts Adam Stacoviak and Jerod Santo on The Changelog podcast. The Changelog features deep technical reviews and conversations about the most recent news in the world of software, and this was the first time anyone from the OSI has appeared on the show. 

After introducing the Open Source Initiative, we discussed the challenges of not only defending the Definition itself, but the idea that we need a Definition at all. And I was able to explain the complicated nature of being a global nonprofit organization defending the Open Source Definition for over 25 years.

I outlined the three programs that comprise the work of the OSI—legal and licensing, policy and standards, and advocacy and outreach—at which time we dove right into the project that falls under the latter program: the Open Source AI Definition.

Open Source AI is not the same as Open Source software. This reality led to the Deep Dive: AI project, now in year 3, in which OSI is collaborating with some of the largest corporations, researchers, creators, foundations and others. 

The Changelog hosts asked a lot of great questions and we had a candid and productive conversation. I hope you’ll follow the link to listen to the full episode: Changelog Interviews: What exactly is Open Source AI?

As I shared with Adam and Jerod, I’m hosting bi-weekly discussions on the status of the project and we’ve put together a forum for public input, so if you are interested in learning more about this or contributing, you are welcome to join us at discuss.opensource.org.

Categories: FLOSS Research

Open Source AI Definition – weekly update Mar 4

Mon, 2024-03-04 05:00

A weekly summary of interesting threads on the forum.

The results from the working groups are in

The groups that analyzed OpenCV and Bloom have completed their work and the results of the votes have been published.

We now have a full overview of the result of the four (Llama2, Pythia, Bloom, OpenCV) working groups and the recommendations that they have produced.

Access our spreadsheet to see the complete overview of the compiled votes. This is a major milestone of the co-design process.

Discussion on access to training data continues

This conversation continues with a new question: What does openness look like when original datasets are not accessible due to privacy preserving?

Is the definition of “AI system” by the OECD too broad?

Central question: How can an “AI system” be precisely defined to avoid loopholes and ensure comprehensive coverage under open-source criteria?

“AI system” might create loopholes in open-source licensing, potentially allowing publishers to avoid certain criteria. 

Though, defining “AI system” is useful to clarify what constitutes an open-source AI, needed to outline necessary components, like sharing training code and model parameters, while acknowledging the need for further work on aspects such as model architecture.

If you have not already seen, our fourth town hall meeting was held on the 23/02-2024. Access the recording here and the slides here.

A new townhall meeting is scheduled for this week.

Categories: FLOSS Research

NTIA engages civil society on questions of open foundation models for AI, hears benefits of openness in the public interest

Wed, 2024-02-28 04:52

The recent US Executive Order on AI directs action for numerous federal agencies. This includes directing the National Telecommunications and Information Agency (NTIA*) to discuss benefits, risks and policy choices associated with dual-use foundation models, which are powerful models that can be fine-tuned and used for multiple purposes, with widely available model weights. 

The NTIA process is centered on a Request for Comment soliciting public feedback about how making model weights and other model components widely available creates benefits or risks to the broader economy, communities, individuals, and to national security.

NTIA also initiated a series of listening sessions last December. Owing to OSI’s critical effort in the Defining Open Source AI project, we are grateful to have been included in their most recent listening session organized by the Center for Democracy & Technology (CDT) for Civil Society organizations. We joined other non-profits working in the public interest to share comments, concerns and encouragement in a generous two hour session with NTIA staff. 

The core of the discussions was centered around open versus closed models. Several organizations brought historical perspectives going back to battles over Open Source in the 90s. A short list of key takeaways from organizations weighing in during the session:

  • Open models represent marginal risk. More research is needed to understand where unacceptable risks lie beyond generating negative scenarios – for both open and closed models.
  • Encouragement to not regulate the emerging technology itself, rather focus on addressing bad actors and bad behavior.
  • Understand the benefits to research in open models, and in particular to provide transparency and accountability to privacy, security and bias concerns.
  • Consider equitable access to economic benefits by keeping models open as well as an established factor in innovation.
  • Completion of the OSI’s Defining Open Source AI and clarifying terms would greatly assist policy discussions.

NTIA staff expressed an interest in understanding what lessons we might draw from the Open Source software community’s experience with the federal government over the years. (OSI expects to speak to this in their formal response to NTIA’s Request for Comment).

OSI ED Stefano Maffulli provided OSI’s perspective in his comments at the meeting:

The Open Source Initiative is a 501(c)(3) nonprofit organization that is driving a global, multistakeholder discussion to find an unequivocal definition of Open Source AI. We’ve been maintaining the Definition of Open Source software for over 25 years, providing a stable north star for all participants in the Open Source ecosystem, including US federal agencies. 

The Department of Defense, Department of Commerce, Office of Management and Budget, Center for Medicaid/Medicare Services and others are examples of agencies which have relied on the standard Open Source Definition maintained by OSI in crafting their IT policies. 

The Open Source Definition has demonstrated that massive social benefits accrue when you remove the barriers to learning, using, sharing and improving software systems. There is ample evidence that giving users agency, control and self-sovereignty of their technical choices produces an ecosystem based on permissionless innovation. Recent research estimates that if Open Source software didn’t exist, firms would have to spend the equivalent of 8.8 trillion dollars to replace it. This is all based on the clear definition of Open Source software and the list of approved licenses that the Open Source Initiative maintains.

The same kind of unambiguous definition of terms is also needed and deserved in the domain of AI. We’re aware of various uses of the term ‘Open Source’ referring to AI systems and machine learning models whose terms of service have a wide range of obligations and restrictions. 

We found AI systems available publicly with full implementation details, code and data distributed without any obligations as well as other systems only available with limited implementation details, no data, very limited amount of description of the data used to train the model… all generally referred to as “Open Source.”

It’s worth noting that Open Source licenses are a way to flip the intellectual property system: the approved licenses grant rights to users instead of removing them. When thinking about the terms of distribution for model weights, which are basically facts, we should aim to remove the intellectual property regime to begin with.

We’re very concerned about the “economic upside capture” licensing terms we’ve seen in popular models like Llama2, for example. These terms of use are designed to create a network that favors only one economic actor (like the original distributor).

Uncertainties break the innovation cycles. This lack of clarity of terms doesn’t help consumers, scientists, developers or regulators. We’re on target to deliver a usable definition of Open Source AI by the end of October 2024. The definition work is focusing on identifying the preferred form to make modifications to an AI system: the equivalent of “source code” for software programs. This preferred form will be the basis to grant users the same level of self-sovereignty over the AI technologies.

* The NTIA, located within the US Department of Commerce, is the Executive Branch agency that is principally responsible by law for advising the President on telecommunications and information policy issues.

Coming up next: What might we draw from Open Source software’s experience with the federal government?

Categories: FLOSS Research

New risk assessment framework offers clarity for open AI models

Tue, 2024-02-27 12:45

There is a debate within the AI community around the risks of widely releasing foundation models with their weights and the societal impact of that decision. Some are arguing that the wide availability of Llama2 or Stable Diffusion XL are a net negative for society. A position paper released today shows that there is insufficient evidence to effectively characterize the marginal risk of these models relative to other technologies. 

The paper was authored by Sayash Kappor of Princeton University and Rishi Bommasani of Stanford University, me and others and is directed at AI developers, researchers investigating the risks of AI, competition regulators, and policymakers who are challenged with how to govern open foundation models. 

This paper introduces a risk assessment framework to be used with open models. This resource helps explain why the marginal risk is low in some cases where we already have evidence from past waves of digital technology. It reveals that past work has focused on different subsets of the framework with different assumptions, serving to clarify disagreements about misuse risks. By outlining the necessary components of a complete analysis of the misuse risk of open foundation models, it lays out a path to a more constructive debate moving forward.

I hope this work will support a constructive debate where risks of AI are grounded in science and today’s reality, rather than hypothetical, future scenarios. This paper offers a position that balances the case against open foundation models with substantiated analysis and a useful framework on which to build. Please read the paper and leave your comments on Mastodon or LinkedIn.

Categories: FLOSS Research

Modernized, streamlined, and fediverse-friendly: OpenSource.org is fully migrated and ready to connect!

Tue, 2024-02-27 03:00

Two years ago, we started migrating our website from Drupal to WordPress. We knew it wasn’t going to be a quick weekend project, but more of a journey. Today, we celebrate the final leg of this journey – merging our blog back into the main site, creating a unified online experience for our community.

Let’s rewind to 2022. Our Drupal site, while trusty, was starting to show its age. It lacked the modern features and it was self hosted, which was taking a huge toll on our team. We knew a change was necessary, but a complete overhaul would have taken too long. So, we decided to move in steps: blog first, main site later.

We first migrated our blog content to a brand new, WordPress-powered platform in early 2023. This gave us a taste of the agility and flexibility WordPress offered. We loved the intuitive interface, the vast plugin ecosystem, and the worry-free managed WordPress provided by DreamHost.

Emboldened by this success, we set our sights on the bigger challenge: migrating the entire website. This wasn’t just about moving content; it was about restructuring, modernizing, and enhancing. We meticulously migrated web pages, ensuring the least amount of URL broke during the transition.

But migration wasn’t just about moving pixels and text. We took this opportunity to modernize our licenses pages. We added missing metadata and made them easily accessible to our users with a dedicated search engine. We also created a Custom Post Type for directors and forms to improve how we handle the nominations for the board elections

Closing the loop with the blog

Now, here we are, at the final stage of our migration journey: merging the blog back into the main site. This completes the circle, creating a unified online experience where our blog seamlessly integrates with the rest of our content – licenses, events, elections, blog and more.

But the most exciting part? We’ve embraced the power of the fediverse! Comments on our blog posts can now be posted and shared across different platforms, fostering a lively and open discussion space. This integration with ActivityPub opens up our content to a wider audience and encourages a more vibrant online community.

Looking back, our Drupal-to-WordPress migration was an odyssey filled with technical hurdles, strategic decisions, and moments of pure satisfaction. We learned, we created a single-sign-on mechanism for OSI members that works on other sites (OpenSource.net and the forum, to start) and ultimately, we emerged with a website that is modern, functional, and better serves our mission. 

Next steps for opensource.org

Next project for us will be a content cleanup and expansion. We will soon start combing through years of content, removing outdated information and streamlining what remained. This decluttering will make space for new content for the website to be more useful, letting visitors learn what Open Source is and how it can help them. We’ll also add more features for OSI members based on the new forum. Explore the new blog, engage with our content, and join the conversation on the fediverse! And if you’re considering a website migration yourself, take heart from our story. With careful planning, the right tools, and the wonderful help of Automattic and the Pressable team, even the most complex migration can be a successful and rewarding journey.

Categories: FLOSS Research

Pages