FLOSS Research

Open Source AI Definition – weekly update Feb 23

Open Source Initiative - Fri, 2024-02-23 05:00

A weekly summary of interesting threads on the forum.

Is the definition of “AI system” by the OECD too broad?

Central question: Do we need to define what AI systems are?

Training data access

Central question: for a model to be open source, do we need “open” access to its training data?

Recognising Open Source “Components” of an AI System

Central question: Should the definition of Open Source AI take a gradient approach (such as the case with RAIL licence), judging the openness of the components of a model, rather than the whole of it? How do we balance between becoming a definition too restrictive? 

It is worth highlighting, that it is the intention of OSI to have a definition which is:

Also worth noting
  • Results from Pythia and Llama2 working groups are out!
  • Watch the recordings of the fourth town hall meeting on Defining Open Source AI and the accompanying slides.
Categories: FLOSS Research

A comparative view of AI definitions as we move toward standardization

Open Source Initiative - Fri, 2024-02-09 05:54

Discussions of Artificial Intelligence (AI) regulation will be heating up in 2024 with a provisional agreement for the EU AI Act having been reached in December 2023. The evolution of the EU AI Act is progressing toward a technology-neutral definition for AI to be applied to future AI systems. In the coming months, multiple states will agree on precise legal definitions, which reflect moral considerations of the role that AI will and will not be allowed to play in Europe for the very first time. And formally defining AI is an ongoing debate. 

Precise definitions within a rapidly expanding field are perhaps not the first things that come to mind when asked about pressing issues concerning AI. However, as its influence grows, arriving at one seems essential when considering how to regulate it. Agreeing on what AI is–and what it is not–on a transnational level, is proving to be increasingly important. Online spaces rarely respect sovereignty, and the role of AI in public life is expected to increase rapidly. 

Different countries and organizations have different definitions, though the AI Act is expected to provide some standardization, not only within the EU but also outside of it due to its influence. Other than providing a framework for businesses to operate within in the future, it further shows the anticipation of what, how and where AI will act and what it will develop towards. Let’s consider how different organizations and states currently are defining AI systems.

OECD

So far, the AI ACT’s definition of AI systems is expected to follow the OECD’s current definition. This currently seems to be the most influential definition and it reads as follows: 

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

Notably, the OECD’s definition has undergone changes from its first draft to the current one above. The removal of “human-based inputs” and the addition of “decisions” when referring to outputs reflects a potential for vastly limiting human-centred decisions and actions. While acknowledging that different systems vary in their autonomy, this change opens up the potential for full autonomy. This can be controversial, to say the least, and can be expected  to feed into the growing concerns of AI alignment. As we await the EU AI Act, if they indeed adopt the same or even a similar definition, it will be interesting to see their definition of personhood, considering the removal of “human-based” under inputs. 

ISO

The International Organization for Standardization has defined AI systems as follows:

AI

<engineered system> set of methods or automated entities that together build, optimize and apply a model (3.1.26) so that the system can, for a given set of predefined tasks (3.1.37), compute predictions (3.2.12), recommendations, or decisions

Note 1 to entry: AI systems are designed to operate with varying levels of automation (3.1.7).

Note 2 to entry: Predictions (3.2.12) can refer to various kinds of data analysis or production (including translating text, creating synthetic images or diagnosing a previous power failure). It does not imply anteriority.

<discipline> study of theories, mechanisms, developments and applications related to artificial intelligence <engineered system> (3.1.2)

AI System:

engineered system featuring AI <engineered system> (3.1.2)

Note 1 to entry: AI systems can be designed to generate outputs such as predictions (3.2.12), recommendations and classifications for a given set of human-defined objectives.

Note 2 to entry: AI systems can be designed to operate with varying levels of automation.

Here, there is a consideration of what kind of system is considered, notably an engineered one. This is interesting as previous definitions have been somewhat ambiguous about what technologies, in fact, will fall under such legislation. There is also a focus on the cooperation of different entities, not specified of human or otherwise. Notably, they do not mention the origin and what kind of input is being processed, though through “varying levels of automation” it can be inferred that it covers the balance between human or non-human inputs, thus offering varying levels of autonomy. 

South Korea

South Korea also adopted their definition of AI system in their 2023 AI Act, and it reads as follows:

Article 2 (Definitions) As used in this Act, the following terms have the following meanings.

  1. “Artificial intelligence” refers to the electronic implementation of human intellectual abilities such as learning, reasoning, perception, judgment, and language comprehension.

  2. “Artificial intelligence technology” means hardware technology required to implement artificial intelligence, software technology that systematically supports it, or technology for utilizing it.

While not mentioning AI systems, they attribute human attributes, like perception, to an electronic entity. While not mentioning “decisions,” attributing human characteristics perhaps makes that point redundant, as it can be interpreted as an actor, acting on a similar level as humans. Further, they are expansive on what technology is considered AI, as even a cable providing power can, under their current definition, be classified as a piece of AI technology. 

US Executive Order

In the last part of 2023, The Biden administration issued an executive order whereby they defined an AI system:

“a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. Artificial intelligence systems use machine- and human-based inputs to perceive real and virtual environments; abstract such perceptions into models through analysis in an automated manner; and use model inference to formulate options for information or action.”

Here, The Biden Administration merges human and machine-based inputs, highlighting the cooperation between the two actors. And while not legally binding, it shows intent. It shows more caution and perhaps skepticism regarding AI acting autonomously, as compared to any other of the major actors. Interestingly, the distinction between virtual and “real” (assuming this means physical, though the wording of it remains problematic) environments shows a similar skepticism to the scope and spheres that the Biden Administration is interested in AI occupying. This limits the controversial issue of potential autonomy present in previous definitions, though it limits communication between systems independently of human inputs, which can prove problematic in practice. 

Answers we are excited to see

As we enter into an important legislative year for AI, we are looking forward to getting answers to the following questions regarding the legal definitions of AI systems:

  • What definition of personhood will accompany the AI systems definition in the AI Act? And what does this mean for the intellectual protection of something entirely made by an AI, considering that it allows for large amounts of autonomy? That is, if it indeed follows the same definition as the OECD. 
  • What kind of technology will be considered to be AI? Will it range from Excel spreadsheets to LLMs? Are we considering “machine-based systems,” an “engineered system” or something else?
  • Will legislation be strong enough, or perhaps broad enough, to encompass the massive changes AI is currently undergoing? And what predictions can we infer that the EU is making on behalf of the future advancements of AI?

The post <span class='p-name'>A comparative view of AI definitions as we move toward standardization</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Open Source AI Definition: Where it stands and what’s ahead

Open Source Initiative - Wed, 2024-02-07 03:00

2023 was a big year of progress toward our goal of establishing a Definition for Open Source AI, but we still have a long way to go. Strong momentum of global collaboration toward this end will continue in 2024, and we need your help.

I detailed what we have accomplished and what lies ahead in a talk at FOSDEM and in the online public townhall meetings. There are more townhall meetings scheduled already, every two weeks at alternating times.

We began the drafting process by looking at the Free Software Definition during in-person and online sessions last year. The current draft v.0.0.5 of the Open Source AI Definition is as follows:

What is Open Source AI

To be Open Source, an AI system needs to be available under legal terms that grant the freedoms to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system to change its recommendations, predictions or decisions to adapt to your needs.
  • Share the system with or without modifications, for any purpose.
From Open Source AI Definition draft 0.0.5

However, in order to get to the complete draft, we need to answer the following question: What is the preferred form to make modifications to an AI system?

The specification to consider are outlined in this diagram:

I also presented the 2024 timeline:

TL;DR is we are working toward an Open Source AI Definition release candidate 1 (RC1) by early summer and a version 1 (v. 1.0) in October. We have established working groups to analyze all the components of popular AI systems (like Llama2, Pythia, BLOOM, OpenCV), and new drafts of the Definition will be released monthly with town hall meetings every two weeks and constant public review. 

RC1 must be supported by at least 2 representatives for each of the 6 stakeholder groups.

V. 1.0 must be endorsed by at least 5 representatives for each of the 6 stakeholder groups.

As this Definition is the first one maintained by OSI to have a version number, we will have rules for maintenance and review as the technical and legal landscape of AI continues to change. The Board started working on this task.

Following are the 6 stakeholder groups we have identified: Invite them to join the forum or tell them to email/contact me.

Next steps
  • Bi-weekly townhalls to make the process more public.
  • Outreach to get more stakeholders involved.
  • Raise more funds to support this work in 2024.
  • OSI is updating the project landing page and engaging the board in preparation for review and approval of v. 1.0 later this year.

Public draft along with comments are available from the redesigned landing page at https://opensource.org/deepdive/. We invite you to join the conversation, and don’t forget to become an OSI member!

The post <span class='p-name'>Open Source AI Definition: Where it stands and what’s ahead</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

The European regulators listened to the Open Source communities!

Open Source Initiative - Fri, 2024-02-02 04:10

During 2023, OSI and many others across the Open Source communities spent a great deal of time and energy engaging with the various co-legislators of the European Union (EU) concerning the Cyber Resilience Act (CRA). Together with a revision to Europe’s Product Liability Directive (PLD), the CRA will bring the responsibilities of product liability to software for the first time.

In the light of the EU’s own research showing the huge impact of Open Source on Europe’s economy, the authors of these legislative instruments sought to ensure that the lifecycle of Open Source software was impacted as little as possible. Indeed, at FOSDEM 2023 the authors of the CRA and PLD said as much in their first-of-a-kind main track appearance. But when we all looked at the details, community members found that was not as true as we hoped. As a range of organizations explained, the CRA was likely to be an existential threat to Open Source development, because instead of placing all the compliance requirements of the CRA on companies deploying Open Source software for profit, the obligations as written potentially fell on developers and Open Source foundations.

Reactions To The Final Text

Many OSI Affiliates engaged with the European Commission, European Parliament and European Council during 2023. With the welcome coordination of Open Forum Europe, a group met regularly to keep track of progress explaining the issues. Many of us also committed time and travel to meet in-person. As a result of all this effort from so many people, the final text of the CRA mitigated pretty much all the risks we had identified to individual developers and to Open Source foundations. As the Python Software Foundation said in their update:

…the final text demonstrates a crisper understanding of how open source software works and the value it provides to the overall ecosystem of software development.

And the Eclipse Foundation wrote:

The revised legislation has vastly improved its exclusion of open source projects, communities, foundations, and their development and package distribution platforms. It also creates a new form of economic actor, the “open source steward,” which acknowledges the role played by foundations and platforms in the open source ecosystem.

As the Apache Software Foundation said:

So, all in all, this is mostly good news for volunteers who run and innovate with open source software. Or, more accurately, much better than most of us could have imagined at the end of last summer.

This time last year OSI recommended that the CRA:

…exclude all activities prior to commercial deployment of the software and … clearly ensure that responsibility for CE marks does not rest with any actor who is not a direct commercial beneficiary of deployment.

That recommendation has been accepted and implemented, and the OSI is very grateful to the various experts who took the time to listen.

OSI Observations

While it’s all much better, and while the burden placed on individuals and charities is minimal, there are still challenges ahead. For example, the concerns that the Debian project articulated give cause for thought. With Open Source projects exempted from the requirement to place a CE certification mark on their software, downstream users will need to pay careful attention to their responsibilities under the CRA as well as to their liabilities to consumers under the PLD.

In particular, “digital artisans” using Open Source software at small scale – the main concern of Debian – will need guidance from the European Commission. While the experts we have met have all said that using an Open Source software distribution as part of a commercial activity is unlikely to require CE marking of the distribution itself, the interpretation of the key phrase “making available on the market” will need careful clarification. OSI encourages the Commission to seek expert advice from the Open Source communities as they did last year, and not to rely on outsourced consultants alone in preparing this advice.

FOSDEM 2024

There is also the question of how future engagement by legislators should proceed. The effort made by developers and Open Source foundations in 2023 is not sustainable, and the Commission needs to accommodate the Fourth Sector in future deliberations. To get this started, a group of us who have engaged during 2023 got together to organize a unique set of workshops at FOSDEM 2024 on Sunday February 4. If you want your voice heard, come along to one of the workshops!

The post <span class='p-name'>The European regulators listened to the Open Source communities!</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Announcing: The 2024 State of Open Source Report

Open Source Initiative - Thu, 2024-02-01 09:30

Brussels, February 1, 2024 – Today, the results of the annual Open Source survey conducted by OpenLogic by Perforce in collaboration with the OSI and the Eclipse Foundation were shared in the 2024 State of Open Source Report

The 2024 State of Open Source Report sheds light on the factors driving Open Source Software (OSS) adoption, the most in-demand Open Source technologies, and the difficulties that teams using OSS most frequently encounter. Also covered in the report is support and planning for end-of-life (EOL) or soon-to-be EOL software.

More than 2,000 open source users working across numerous industries all over the world answered more than two dozen questions about the use and support of OSS by their organizations, from large enterprises to early-stage startups.

Open Source practitioners and IT leadership alike should find the report enlightening. Three things in particular struck interest for me:

  • OpenTofu already has 30% of the users as Terraform
  • OpenSearch has 50% of the users of ElasticSearch
  • OSI is the third organization by donor after Linux Foundation and Apache Software Foundation! We made a lot of progress folks!

Also of interest is rapid growth in the AI/ML/DL space, both in the data itself and the concurrent investment in Open Source data technologies. OSI has been on a mission to establish a Definition of Open Source AI, so this trend is something we’re watching closely. 

Thank you to Perforce and the Eclipse Foundation for the production of this valuable resource. Please share the 2024 State of Open Source Report far and wide! 

I will be participating in a webinar along with Perforce Open Source Evangelist Javier Perez and Eclipse Foundation Director of Product Marketing Clark Roundy, on February 22nd. Register here and continue to support Open Source and be a part of the conversation!

The post <span class='p-name'>Announcing: The 2024 State of Open Source Report</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

The OSI board expands, adding two new seats; focus on AI and international policies

Open Source Initiative - Thu, 2024-02-01 04:34

At the August board meeting, the OSI board voted to add two new appointed seats, and at the December board meeting named Professor Sayeed Choudhury and Gaël Blondelle as new board members.

The expansion of the board was voted upon to give greater operational stability and continuity to the organization. The rationale for this decision is explained in further detail in the August 2023 board meeting minutes. The new composition of the board is:

  • four directors elected among individual members (seated for two years), 
  • four directors elected among representatives of Affiliate organizations (seated for three years), and
  • four directors (previously two) appointed by the board (seated for three years).

All four board-appointed seats are carefully selected by the board based on the strategic priorities of the OSI. Professor Sayeed Choudhury and Gaël Blondelle were chosen to fill the two new board seats because of their expertise in areas that will be most relevant to the OSI in coming years: AI and international policies.

The skills and contacts Sayeed and Gaël bring to the board will serve the OSI’s mission and goals moving forward. The two new board members will also be instrumental in the fundraising efforts of the organization with their deep networks of corporate donors and grant givers.

Sayeed Choudhury

Sayeed Choudhury is the associate dean for digital infrastructure and director of the Open Source Programs Office (OSPO) at Carnegie Mellon University. He started the first OSPO based at a US university while at Johns Hopkins University. He is the director of an Alfred P. Sloan Foundation grant for coordination of University OSPOs and a co-investigator for the Black Beyond Data Project. He is the software task force leader and member of the steering committee for the Research Data Alliance (RDA) – US. Choudhury was a President Obama appointee to the National Museum and Library Services board. He has testified for the Research Subcommittee of the Congressional Committee on Science, Space and Technology.

“The Open Source Initiative plays an important role in the Open Source ecosystem from a community, legal and policy perspective,” said Choudhury. “Carnegie Mellon University has recently launched two initiatives that focus on impact from Open Source software — Ecosystem for Next Generation Infrastructure (ENGIN) and Open Forum for AI (OFAI). I look forward to partnering with the OSI board and working with the OSI membership on these initiatives and other programs being advanced by the OSI.”

Gaël Blondelle 

Gaël Blondelle joined the Eclipse Foundation in 2013 and now serves as chief membership officer. He has been involved in the Open Source arena for more than 18 years in a number of key roles. Blondelle co-founded an Open Source start-up and worked as its chief technology officer. He then worked in business development for an Open Source systems integration company and managed a strategic research project aiming to create an Open Source ecosystem with major industrial players. Blondelle joined the Eclipse Foundation to pursue his goal of helping more companies work in Open Source, and to grow open, innovative and collaborative ecosystems.

“I am honored to join the OSI board, and look forward to helping the OSI onboard more sponsors and affiliates globally,” said Blondelle. “The work being done on the Open Source AI Definition is fantastic and we need an organization like the OSI to stand for Open Source AI in an elaborated and well articulated way. At the same time, we also need to stand for the Open Source Definition (OSD) that is regularly under attack from different sides. The OSD has enabled the development of Open Source technologies over the last 25 years, and we need to make sure this continues.”

Please join us in welcoming these two new board members to the OSI!

The post <span class='p-name'>The OSI board expands, adding two new seats; focus on AI and international policies</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Fixing a gap in the SEP regulation

Open Source Initiative - Wed, 2024-01-31 04:09

In OSI’s feedback to the European Commission’s proposed Standard Essential Patent (SEP) Regulation (SEP-R), OSI recommended that the legislation add a waiting period for patent claims registered as standard-essential after the standard has been ratified. The recommendation was based on the social purpose of tolerating the presence of royalty-due patents in standards at all.

SEPs in context

Royalty-due SEPs are an artifact of a requirements-led standardization process. Not all standards are affected by SEPs, and not all SEPs require licensing on royalty-due terms. While some standards are encumbered by patents registered by contributors to the standards process, patents are not an essential or inherent aspect of standardization.

Patents are mechanisms that exist for a societal reason — in order to create a benefit to society by encouraging inventors to openly share their techniques — not because there is any inherent “property” to recognize. So it is incumbent on government administrations to regulate their use so that a societal benefit is preserved.

As I explained for Open Forum Europe, some standards are developed in a sequence of activities that starts from a statement of requirements aiming to create a new market (“requirements-led”) while others are developed as a harmonization of existing industry implementations in an existing market (“implementation-led”).

  • The implementation-led approach (harmonizing existing markets) frequently arises in circumstances where recovery of R&D costs is already in hand and patent monetization is not a proportionate compromise. As a result, projects developed under an implementation-led approach (such as at OASIS and W3C) frequently opt for restriction-free (RF) terms that result in a negotiation-free usage since royalties are waived and do not need to be negotiated.
  • The requirements-led approach (specifying the interoperability for a future market) leads some standards development organizations (SDOs) to tolerate restricted licensing of included patented technologies due to the long lead-times in research and development investment by standards contributors. While royalty-due and negotiation-required licensing of SEPs is desirable for the commercial entities benefiting from the tradition, the bilateral negotiation with NDA-enforced privacy that results gives the incumbents market power that could be easily interpreted as anti-competitive.

Despite the practice of accommodating royalty-due patents in standards leading to barriers to entry in the resulting markets, tolerating SEP monetization appears a compromise that its advocates assert can be a proportionate remedy to the delayed monetization opportunity for participants. As a result SDOs put in place safeguards during the standardization process to avoid triggering anti-trust regulations, such as ensuring equal terms of participation for all in the process, requiring disclosure by participants of patents that could prove standard-essential, and especially in requiring negotiated terms to be “Fair, Reasonable And Nondiscriminatory” (FRAND) — although not backing this up practically.

Bugs in the process

But these SDO safeguards only prevent the SDO itself from being regarded as anti-competitive, and do nothing to protect the markets that go on to be created by requirements-led standards.

  1. What needs licensing is unclear. While the patents of those involved in the standardization process will have been declared, the resulting standard may not embody their claims, and others outside the SDO may make claims. Published standards are thus not accompanied by a list of patents that need to be licensed for implementation. The task of identifying exactly which patents need to be licensed for exactly which parts of the standard is therefore significant. That burden is only placed on smaller innovators and market entrants; the incumbents are likely to have cross-licensing agreements in place, making their market participation simpler and cheaper. 1 2
  2. Power is with the incumbents. While the term “FRAND” (Fair, Reasonable And Nondiscriminatory patent licensing terms) is much used, the reality is that the negotiations for patent licenses are 1:1 and conducted in commercial secrecy under NDA. There is no way any party can know if the terms they are offered are like those offered to others, and the power is imbalanced heavily in favor of the patent owner who will use early legal proceedings to force a conclusion. Since the patent owners are frequently the dominant market players, small companies and new market entrants are at a significant disadvantage.
  3. The cost of licensing is unknown. Since each patent is likely to need separate negotiation with large corporations, it’s hard to know what the cost of licensing a given standard will be, even after the list has been painstakingly built.
  4. Patent pools can demand unwarranted licensing. Patent pools are held up as a partial remedy for this. They sometimes list all the patents they are licensing but don’t explain why they are essential. As a result, the lists they produce can be inaccurate, especially when the pool is not connected with the standardization process. 1 2 3 A
Better markets with SEP-R

The proposed Standard Essential Patent Regulation addresses many of these issues as part of its proposals, and that’s the reason OSI broadly welcomed the proposal. Where royalty-due patents in standards are present, they should at least function to create a fair market for both patent owners and licensees.

OSI’s concern relates to a potential loop-hole in the new arrangements. Knowing that some patent owners prefer not to participate in standardization activity, and that some owners prefer to be as non-specific about essentiality as possible, OSI was concerned that the otherwise excellent public registration system might be ignored by some patent owners in order to bias the market towards adoption without possessing the full costs, deeming them free to disregard the regulation’s collective pricing measures. OSI considers this a gap in the regulation.

The late registration gap

Because of the improvements in SEP-R, implementers will be able to know which entities will require negotiation and assess whether to use the standard based on the registrations made by patent owners as well as on the collectively-agreed total royalty. But there is a risk the improvements will be avoided intentionally by some patent owners.

  • Late-registered patents are likely to be those not arising from the standards process. They are unlikely to be owned by participants in the collectively-agreed total royalty.
  • Since implementors could not take these patents and their burden into account, their late registration is likely to require at best revised costings, probably new engineering, and at worst market withdrawal by some implementers.
  • This represents market harm and needs to be discouraged and those in the market protected.
  • But SEP-R does not do so, leaving predatory late disclosure as a viable control point for incumbents and NPEs (trolls) whose advantage has been impacted by SEP-R.
  • The only major consequence of late-stage registration is the loss of royalties before the registration is valid; however, for a widely adopted standard this is likely to be of small consequence to the SEP owner over the long term. The market will already have formed and such a delay will significantly impact companies with products already in the market. Products with Open Source elements will be more significantly affected as they will likely need to remove affected capabilities.
Possible remedies

Recognizing that the existence of patents is for the enablement of a social good from an effective market, and recognizing that late registration of patents as essential to a standard after it has been promulgated harms those trusting the registry, it seems reasonable to apply a remedy both to ameliorate and discourage late registration. The best remedy to late registration would be to simply prevent any patents registered as essential to a standard from being able to claim any royalties in association with the implementation of the standard.

Realistically, this option would face huge opposition from SEP-dependent corporations and would be better considered a long term goal. 

Instead, OSI proposed that registering a patent as essential after the market has adopted a standard affected by it should result in a waiting period before royalties could be claimed. This would allow time for the adjustment of the allocation of the total estimated cost of licensing to accommodate the new patent, as well as allow the market to adjust to the new reality.

Given the pace at which these changes will be made, it seems reasonable to have a waiting period of at least two years from registration before patent royalties can become due.

Notes, Tags & Mentions
  • OSI’s interest in this topic arises from the well-documented reluctance of Open Source developers to entertain patent-encumbered standards. Their presence can sometimes be accommodated but reduces the stochastic confidence level that leads to Open Source being an effective trigger for innovation.
  • To read a similar discussion but from an Open Source perspective, see the OSI blog and my earlier article exploring the topic.
  • OSI made an earlier submission to the consultation and also published a corresponding article.

The post <span class='p-name'>Fixing a gap in the SEP regulation</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

A public forum to discuss the Open Source AI Definition

Open Source Initiative - Fri, 2024-01-26 06:32

OSI announced a public forum to widen the conversations that will lead to version 1.0 of the Open Source AI Definition. The forums are part of our commitment to inclusiveness and transparency, matching the public town hall meetings that started two weeks ago.

The public forum‘s goal is to welcome the broader community to engage in the conversations surrounding AI. There is only one category at the moment, but we plan to expand the forum’s scope over time.

Access to the forum is restricted to OSI members: if you’re not already a member, you can register now for free or you can use this as an opportunity to support OSI’s work and become a full member: Donating $50 or more will give you the option to vote in the board’s upcoming election and support our programs. OSI’s membership also allows you to submit a story to OpenSource.net, the community-based magazine.

The video recordings and slides from the first two town halls are on the forum:

By making the discussions accessible to the public, we believe that we can encourage greater participation, diversity of perspectives, and ultimately, a more comprehensive understanding of the challenges and opportunities in Open Source AI.

We have an aggressive timeline, we know we must get to a version 1.0 quickly but we also want to get there with the right amount of support from a wide set of stakeholders. Join the forum today and help us speed up the process.

The post <span class='p-name'>A public forum to discuss the Open Source AI Definition</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

How OSI will renew its board of directors in 2024

Open Source Initiative - Tue, 2024-01-16 08:19

In the next few weeks, the OSI board of directors will renew three of its seats with an open election process among its full individual members and affiliates. There will be two elections in March, running in parallel:

  • The affiliate organizations will elect one director
  • Individual members will elect two directors

The results of elections for both Individual and Affiliate member board seats are advisory with the OSI Board making the formal appointments to open seats based on the community’s votes.

Signup now to become a full individual member (Supporting or Professional) to qualify as a candidate when the application opens on Feb 5th.

2024 elections timeline The role of the board of directors

The board of directors is the ultimate authority responsible for the Open Source Initiative as a California public benefit corporation, with 501(c)3 tax-exempt status. The board’s responsibilities include oversight of the organization, approving the budget and supporting the executive director and staff to fulfill its mission. The OSI isn’t a volunteer-run organization anymore and the role of the directors has changed accordingly.

Each director is expected to be a counsel and a guide for staff rather than an active contributor. Directors should guide discussions, support the vision and mission of the organization, and advocate for the OSI. They’re also asked to support the fundraising efforts however they feel comfortable doing.

The board is governed by the bylaws. Each board member is expected to sign the board member agreement. Depending on expertise and availability, directors are expected to serve on the active committees: the license, fundraising, standards and financial committees.

Candidates will be asked to share their ideas on how they’ll contribute to the vision and mission, and the 2024 strategic objectives.

The rules for how OSI runs the elections are published on our website. We’ll communicate more details in the coming weeks: stay tuned for announcements on our social media channels (Fediverse, LinkedIn, Twitter.)

Affiliate organizations will receive instructions via email.

The post <span class='p-name'>How OSI will renew its board of directors in 2024</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

A historic view of the practice to delay releasing Open Source software: OSI’s report

Open Source Initiative - Wed, 2024-01-10 10:00

The Open Source Initiative published today a new report that looks at the history of the business practice to delay releasing their code under freedom-respecting licenses. Since the early days of the Open Source movement, companies have experimented with finding a balance between granting their users the basic freedoms guaranteed by Open Source licenses while also capitalizing on their investments in software development. One common approach, albeit with many different flavors, is what this report calls “Delayed Open Source Publication” (DOSP) — “the practice of distributing or publicly deploying software under a proprietary license at first, then subsequently and in a planned fashion publishing that software’s source code under an Open Source license.”

The new report titled “Delayed Open Source Publication: A Survey of Historical and Current Practices” was authored by the team of Open Tech Strategies (Seth Schoen, James Vasile and Karl Fogel) based on crowdsourced interviews. Their research was made possible through a donation by Sentry and the financial contributions of OSI individual members. 

Like the authors, I found that the historical survey revealed numerous surprises, and what I found even more intriguing are the new questions raised (see Section 7) that beg for more dedicated research. 

I encourage you to give it a read and share it with others. We encourage feedback from the community: I hold office hours for OSI members and you can discuss this on Mastodon or LinkedIn.

Download the report.

The post <span class='p-name'>A historic view of the practice to delay releasing Open Source software: OSI’s report</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

ClearlyDefined: recapping a year of progress and sharing a vision for 2024

Open Source Initiative - Mon, 2024-01-08 12:41

At the beginning of 2023, I started as a community manager for ClearlyDefined, with the goals of creating an open governance model for the project and helping the OSI to establish a neutral infrastructure to foster collaboration among multiple stakeholders. Thanks to the amazing work from our community members, a lot of progress has been made in 2023, but there’s still a lot of work ahead of us. In this post, we would like to highlight some milestones achieved this past year and acknowledge some individuals who have contributed to the project. We would also like to share a vision for 2024 and invite all organizations who care about the Open Source supply chain to become involved.

ClearlyDefined is an Open Source project and service that aims to serve as a global database of licensing metadata for every software component ever published. It was originally developed and used by Microsoft and it’s now in use at companies like GitHub, SAP, and Bloomberg, as well as Open Source projects like the Linux Foundation’s GUAC and ORT (OSS Review Toolkit). At the beginning of 2023, Open Source Initiative took over as community steward of the project.

In the first quarter, outstanding work was developed by Manny Martinez (Microsoft) in collaboration with Qing Tomlinson (SAP) to optimize ClearyDefined’s back-end, particularly the database. This work has resulted in a 10-fold decrease in terms of database size and costs.

In the second quarter, GitHub added 17.5 million package licenses sourced from ClearlyDefined to their database, expanding the license coverage for packages that appear in dependency graph, dependency insights, dependency review, and a repository’s software bill of materials (SBOM).

In the third quarter, we saw greater collaboration between GitHub and SAP spearheaded by E. Lynette Rayle and Qinq Tomlinson. They are making improvements to the documentation and  process of running a local ClearlyDefined harvest and sharing the licensing metadata with other harvesters.

In the fourth quarter, we saw various members currently using ClearlyDefined and new members alike coming together to create a unified vision for the project. Thomas Steenbergen, co-founder of ClearlyDefined and ORT, has come forward to help lead this effort. Key goals for ClearlyDefined in 2024 include:

  • Publishing periodic releases and switching to semantic versioning
  • Bringing dependencies up to date (in particular using the latest scancode)
  • Improving the NOASSERTION/OTHER issue (please check this analysis by Aleksandrs Volodjkins to learn more)
  • Advancing usability and the curation process through the UI 
  • Enhancing the documentation and process for creating a local harvest

ClearlyDefined’s mission is to help organizations to collaboratively achieve accurate licensing metadata (oftentimes part of SBOMs) at scale, for each stage on the supply chain, for every build or release. If your organization is interested in achieving better compliance and security of the Open Source supply chain, please consider joining ClearlyDefined. We are still working to consolidate a roadmap for 2024, and this is a great time to join the project and learn more about how ClearlyDefined can help your organization.

The post <span class='p-name'>ClearlyDefined: recapping a year of progress and sharing a vision for 2024</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

OSI work in developing an Open Source AI definition featured in the State of the Digital Public Goods Ecosystem 2023 report

Open Source Initiative - Thu, 2023-12-21 10:49

The OSI joined the Digital Public Goods Alliance (DPGA) at the beginning of 2023 to support its mission to accelerate the attainment of the Sustainable Development Goals (SDGs) in low- and middle-income countries by facilitating the discovery, development, use of and investment in digital public goods. Digital public goods include Open Source software, open data, open AI models, open standards and open content that adhere to privacy and other applicable laws and best practices, do no harm by design and help attain SDGs.

Members of the DPGA include governments and their agencies, multilateral organizations including UN entities, philanthropic foundations, funders, think tanks and technology companies. Deb Bryant serves as the OSI liaison, and Stefano Maffulli led an AI workshop at the annual meeting in Ethiopia in November.

In its recently released State of the Digital Public Goods Ecosystem 2023 report, the DPG-related work and contributions of DPGA member entities are highlighted. For the OSI, our featured work includes participation in and collaboration with the DPGA’s Community of Practice (CoP) on AI which advocates for responsible use of AI. The DPGA agrees that an Open Source AI definition that embodies principles of transparency and clarity will be crucial to supporting wider efforts towards responsible AI. We look forward to continued collaboration with DPGA and its members.

You can learn more about the DPGA on X/Twitter and LinkedIn.

The post <span class='p-name'>OSI work in developing an Open Source AI definition featured in the State of the Digital Public Goods Ecosystem 2023 report</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

2023 in review: many reasons to celebrate

Open Source Initiative - Wed, 2023-12-13 11:07

The year of 2023 was a busy one for the Open Source Initiative (OSI), as we celebrated the 25th Anniversary of Open Source while looking towards present and future challenges and opportunities. Our work has revolved around 3 grand areas: licensing and legal, policy and standards, and advocacy and outreach. As the steward of the Open Source Definition, license and legal have been part of our core program since the very beginning of our foundation. We serve as an anchor for open community consensus on what constitutes Open Source. We protect the Open Source principles, enforcing the marks “Certified Open Source” and “Open Source Approved License”. Under policy and standards, we have monitored policy and standards setting organizations, supporting legislators and policy makers educating them about the Open Source ecosystem, its role in innovation and its value for an open future. Lastly, under advocacy and outreach, we are leading global conversations with non-profits, developers and lawyers to improve the understanding of Open Source principles and practice. OSI investigates the impacts of ongoing debates around Open Source, from artificial intelligence to cybersecurity.

Highlights WebsiteMembershipNewsletter
~2M visitors / year
(stable YoY)
~2800 members
(+136% YoY)
~10100 subscribers
(+25% YoY)The OSI’s website received around 2 million unique visitors in 2023.
We surpassed 2,500 members and 10,000 subscribers. EventsWorkshopKeynoteTalkWebinar
36
6
12
24
18The OSI contributed to 36 events worldwide in 2023, holding 6 workshops, 12 keynotes, 24 talks, and 18 webinar sessions. Licensing and Legal

The License Review working group has continued to examine and improve the license review process and has created a systematic and well-ordered database of all the licenses that have been submitted to OSI for approval since the time of the organization’s founding. The OSI has also worked towards establishing an open governance model for ClearlyDefined, an open source project with a mission to create a global database of licensing metadata for every software component ever published. This year, GitHub has added 17.5 million package licenses sourced from ClearlyDefined to their database, expanding the license coverage for packages that appear in dependency graph, dependency insights, dependency review, and a repository’s software bill of materials (SBOM).

OSI Approved Licenses

We provide a venue for the community to discuss Open Source licenses and we maintain the OSI Approved Licenses database.

ClearlyDefined

We aim to crowdsource a global database of licensing metadata for every software component ever published for the benefit of all.

List of articles:

Policy and Standards

The OSI’s senior policy directors Deb Bryant and Simon Phipps have been busy keeping track of policies affecting Open Source software mostly across the US and Europe and bringing different stakeholders together to voice their opinions. In particular, we are tracking the Securing Open Source Software Act in the US and the Cyber Resilience Act in Europe. In 2023, the OSI joined the Digital Public Goods Alliance, and launched the Open Source Alliance with 20 initial members, including the Apache Software Foundation, Eclipse Foundation, and the Python Software Foundation.

Open Policy Alliance

We are bringing non-profit organizations together to participate in educating and informing public policy decisions related to Open Source software, content, research, and education.

List of articles:

Advocacy and Outreach

The OSI has celebrated the 25th anniversary of Open Source in partnership with 36 conferences from around the world with a combined attendance of over 125,000 people. Throughout the year, our focus has shifted from reviewing the past of Free and Open Source software to exploring the future of Open Source in this new era of AI. We have organized several online and in-person activities as part of the Deep Dive AI, an open multi-stakeholder process to define Open Source AI. We have also organized License Clinic, a workshop tailored for the US federal government. Finally, we have launched Opensource.net as a new home for a community of writers and editors of the project formerly known as Opensource.com.

25 years of Open Source

We celebrated the 25th Anniversary by sharing the rich and interconnected history of the Free Software and Open Source movements and we explored the challenges and opportunities ahead, from AI to cybersecurity.

Deep Dive: Defining Open Source AI

We are bringing together global experts to establish a shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners.

Opensource.net

We gave a new home to the community of contributors from opensource.com. The new platform supports healthy dialog and informed education on Open Source software topics.

List of articles:

Press mentions

In 2023, the Open Source Initiative was cited 100 times in the press worldwide, educating and countering misinformation. Our work was featured at The Verge, TechCrunch, ZDNET, InfoWorld, Ars Technica, IEEE Spectrum, MIT Technology Review, among other top media outlets.

List of select articles:

Events

The Open Source Initiative contributed with keynotes, panels, talks, and other activities at 36 events worldwide throughout 2023, including the top tech and open source conferences, with a combined attendance of over 125,000 people.

List of events:

EU Open Source Policy Summit

Brussels, Belgium

3 February, 2023

Talk

In Support of Sound Public Policy: How to Avoid Unintended Consequences to OSS in Lawmaking

Deborah Bryant, James Lovegrove, Maarten Aertse, Simon Phipps

Link to video recording

As Open Source has taken on monumental importance in the digital markets being the main model for software development, its exposure to regulatory risk has increased. Just in the last few years we have seen policymakers, often despite their best intentions, unintentionally targeting Open Source developers, repositories or the innovation model itself. To name some examples: the Copyright Directive, and more recently the AI Act and the Cyber Resilience Act all have created unintended regulatory risk for OSS. In this panel, we will discuss the status of ongoing files, but also take a few steps back and suggest approaches to how policymakers can avoid these unintended consequences. How to consider developers and communities in the legislative process? How does the very horizontal Open Source ecosystem fit into the EU system of vertical multistakeholderism? What is the responsibility of Open Source experts to engage earlier with policymakers?

FOSDEM

Brussels, Belgium

4-5 February, 2023

Keynote, Talks

Celebrating 25 years of Open Source: Past, Present, and Future

Nick Vidal

Link to keynote

February 2023 marks the 25th Anniversary of Open Source. This is a huge milestone for the whole community to celebrate! In this session, we’ll travel back in time to understand our rich journey so far, and look forward towards the future to reimagine a new world where openness and collaboration prevail. Come along and celebrate with us this very special moment!

The open source software label was coined at a strategy session held on February 3rd, 1998 in Palo Alto, California. That same month, the Open Source Initiative (OSI) was founded as a general educational and advocacy organization to raise awareness and adoption for the superiority of an open development process. One of the first tasks undertaken by OSI was to draft the Open Source Definition (OSD). To this day, the OSD is considered a gold standard of open-source licensing.

In this session, we’ll cover the rich and interconnected history of the Free Software and Open Source movements, and demonstrate how, against all odds, open source has come to “win” the world. But have we really won? Open source has always faced an extraordinary uphill battle: from misinformation and FUD (Fear Uncertainty and Doubt) constantly being spread by the most powerful corporations, to issues around sustainability and inclusion.

We’ll navigate this rich history of open source and dive right into its future, exploring the several challenges and opportunities ahead, including its key role on fostering collaboration and innovation in emerging areas such as ML/AI and cybersecurity. We’ll share an interactive timeline during the presentation and throughout the year, inviting the audience and the community at-large to share their open source stories and dreams with each other.

Open Source Initiative – Changes to License Review Process

Pamela Chestek

Link to talk

The Open Source Initiative is working on making improvements to its license review process and has a set of recommendations for changes it is considering making, available [link to be provided]. This session will review the proposed changes and also take feedback from the participants on what it got right, what it got wrong, and what it might have missed.

The License Review Working Group of the Open Source Initiative was created to improve the license review process. In the past, the process has been criticized as unpredictable, difficult to navigate, and applying undisclosed requirements. The Working Group developed a set of recommendations for revising the process for reviewing and approving or rejecting licenses submitted to the OSI. The recommendations include separate review standards for new and legacy licenses, a revised group of license categories, and some specific requirements for license submissions. The recommendations are available [link to be provided] and the OSI is in the feedback stage of its process, seeking input on the recommendations. The session will review the proposed changes and also take feedback from the participants on what it got right, what it got wrong, and what it might have missed.

The role of Open Infrastructure in digital sovereignty

Thierry Carrez

Link to talk

Pandemics and wars have woken up countries and companies to the strategic vulnerabilities in their infrastructure dependencies, with digital sovereignty now being a top concern, especially in Europe.

In this short talk, Thierry Carrez, the General Manager for the Open Infrastructure Foundation, will explore the critical role that open source has to play in general in enabling digital sovereignty. In particular, he will explore how Open Infrastructure (open source solutions for providing infrastructure), with its interoperability, transparency and independence properties, is essential to to reach data and computing sovereignty.

State of Open Con

London, UK

7-8 February, 2023

Partner

Summary

Link to blog post

SCALE 20x

Pasadena, USA

9-12 March, 2023

Talk

Defining an Open Source AI

Stefano Maffulli

Link to talk

The traditional view of open source code implementing AI algorithms may not be sufficient to guarantee inspectability, modifiability and replicability of the AI systems. The Open Source Initiative is leading an exploration of the world of AI and Open Source, diving around the boundaries of data and software to discover how concepts like copy, distribution, modification of source code apply in the context of AI.

AI systems are already deciding who stays in jail or which customers deserve credit to buy a house. More kinds of “autonomous” systems are appearing so fast that government regulators are rushing to define policies.

Artificial Intelligence/Machine learning, explained at a high level, is a type of complex system that combines code to create and train/tune models, and data used for training and validation to generate artifacts. The most common tools are implemented with open source software like TensorFlow or PyTorch. But from a practical perspective, these packages are not sufficient to enable a user to exercise their rights to run, study, modify and redistribute a “machine learning system.” What’s the definition of open source in the context of AI/ML? Where is the boundary between data and software? How do we apply copyleft to software that can identify your cats in your collection of pictures?

FOSS Backstage

Berlin, Germany

13-14 March, 2023

Talk

Summary

Link to blog post

Securing OSS across the whole supply chain and beyond

Nick Vidal

Link to talk

As we celebrate the triumph of open source software on its 25th anniversary, at the same time we have to acknowledge the great responsibility that its pervasiveness entails. Open source has become a vital component of a working society and there’s a pressing need to secure it across the whole supply chain and beyond. In this session, we’ll take the opportunity to look at three major advancements in open source security, from SBOMs and Sigstore to Confidential Computing.

Open source plays a vital role in modern society given its pervasiveness in the Cloud, mobile devices, IoT, and critical infrastructure. Securing it at every step in the supply chain and beyond is of ultimate importance.

As we prepare for the “next Log4Shell”, there are some technologies that are emerging on the horizon, among which SBOMs, Sigstore, and Confidential Computing. In this session, we’ll explore these technologies in detail.

While SBOMs (Software Bill Of Materials) allow developers to track the dependencies of their software and ensure that they are using secure and reliable packages, Sigstore allows developers to verify the authenticity and integrity of open source packages, ensuring that the code has not been tampered with or compromised,

Confidential Computing, on the other hand, protects code and data in use by performing computation in a hardware-based, attested Trusted Execution Environment, ensuring that sensitive code and data cannot be accessed or tampered by unauthorized parties, even if an attacker were to gain access to the computing infrastructure.

SBOMs, Sigstore, and Confidential Computing provide a powerful combination to address security concerns and ensure the integrity and safety of open source software and data. They focus on “security first,” rather than perpetuating existing approaches which have typically attempted to bolt on security measures after development, or which rely on multiple semi-connected processes through the development process to provide marginal improvements to the overall security of an application and its deployment.

As we celebrate the 25th anniversary of open source, these three technologies emerging represent a step forward on securing OSS across the whole supply chain and beyond. We foresee them playing a key role on minimizing the risk of vulnerabilities and protecting software and data against potential attacks, providing greater assurances for society as a whole.

Podcast: SustainOSS

Nick Vidal, Richard Littaeur

Link to podcast

Link to talk

As we celebrate the triumph of open source software on its 25th anniversary, at the same time we have to acknowledge the great responsibility that its pervasiveness entails. Open source has become a vital component of a working society and there’s a pressing need to secure it across the whole supply chain and beyond. In this session, we’ll take the opportunity to look at three major advancements in open source security, from SBOMs and Sigstore to Confidential Computing.

Open source plays a vital role in modern society given its pervasiveness in the Cloud, mobile devices, IoT, and critical infrastructure. Securing it at every step in the supply chain and beyond is of ultimate importance.

As we prepare for the “next Log4Shell”, there are some technologies that are emerging on the horizon, among which SBOMs, Sigstore, and Confidential Computing. In this session, we’ll explore these technologies in detail.

While SBOMs (Software Bill Of Materials) allow developers to track the dependencies of their software and ensure that they are using secure and reliable packages, Sigstore allows developers to verify the authenticity and integrity of open source packages, ensuring that the code has not been tampered with or compromised,

Confidential Computing, on the other hand, protects code and data in use by performing computation in a hardware-based, attested Trusted Execution Environment, ensuring that sensitive code and data cannot be accessed or tampered by unauthorized parties, even if an attacker were to gain access to the computing infrastructure.

SBOMs, Sigstore, and Confidential Computing provide a powerful combination to address security concerns and ensure the integrity and safety of open source software and data. They focus on “security first,” rather than perpetuating existing approaches which have typically attempted to bolt on security measures after development, or which rely on multiple semi-connected processes through the development process to provide marginal improvements to the overall security of an application and its deployment.

As we celebrate the 25th anniversary of open source, these three technologies emerging represent a step forward on securing OSS across the whole supply chain and beyond. We foresee them playing a key role on minimizing the risk of vulnerabilities and protecting software and data against potential attacks, providing greater assurances for society as a whole.

ORT Community Days

Berlin, Germany

15-16 March, 2023

Talk

Summary

Link to blog post

Presenting ClearlyDefined

Nick Vidal

Nick Vidal, the new community manager for ClearlyDefined, will provide a brief background on the project and later focus on gathering feedback from the audience as to the next steps for how ClearlyDefined can best serve the community.

Open Source License Clinic

Washington D.C., USA

4 April, 2023

Talks

Organizer

Open Source Licenses 201

Pam Chestek

This essential Clinic session is an advanced primer on open source licenses and why one should care, which are most commonly used and why. Also included are insights into the OSI license process and who are involved in considering and approving new licenses based on Open Source Definition, plus which have been approved in the last five years. Topics include challenges, successes, best practices, operational policies, resources.The briefing is followed by an expert panel discussion.

SBOM This, and SBOM That

Aeva Black

ust a few years ago the notion of a Software Bill of Material (SBOM) was centered around open source licenses. How has it changed, and why is it increasingly being called out as a key component of software transparency by governments around the world? The presenter will share a history of the SBOM, its evolution and role today in cybersecurity. The session will be followed by a Q&A session.

Are AI Models the New Open Source Projects?

Justin Colannino

Communities of machine learning developers are working together and creating thousands of powerful ML models under open source and other public licenses. But these licenses are for software, and ML is different. This briefing discusses how to square ML with open source software licenses and collaboration practices, followed by a panel discussion on the implications that ML and its growing communities have on the future open source of software development.

Alternative Licenses

Luis Villa

The past several years have seen an increase in the number of software licenses which appear to nod to open source software (OSS) licenses – those conforming with the Open Source Definition (OSD) – but are developed to meet different objectives, often withholding some benefits of OSS. What are the emerging patterns in the creation of new licensing strategies? The briefing offers a look at the current landscape and provides an opportunity to answer questions and discuss concerns.

FOSS North

Gothenburg, Sweden

24-25 April, 2023

Keynote

Speaking Up For FOSS Now Everyone Uses It

Simon Phipps

Link to keynote

At 40 years old, FOSS has become a full citizen in modern society. By popularising and catalysing the pre-existing concepts from the free software movement, open source has moved to the heart of the connected technology revolution over the last 25 years. In Europe, it now drives nearly 100 Billion Euros of GDP. Unsurprisingly, it is now the focus of much political attention from all directions – including regulators and detractors. Today everyone wants to be FOSS – including many who really don’t but want the cachet.
In 2022, the mounting wave broke and legislation affecting our movement cascaded into view in the USA and Europe. In Europe, the DSA, Data Act, AI Act, CRA, PLD, and several more major legislative works emerged from the Digital Agenda. Despite its apparent awareness of open source, this legislation appeared ill-suited for the reality of our communities. Why is that? Where do standards come into this? Where is this heading?
Simon Phipps is currently director of standards and EU policy for the Open Source Initiative, where he was previously a member of the board of directors and board President. He has also served as a director at The Document Foundation, the UK’s Open Rights Group and other charities and non-profits. Prior to that, he ran one of the first OSPOs at Sun Microsystems, was one of the founders of IBM’s Java business, worked on video conference software and standards at IBM and was involved with workstation and networking software at Unisys/Burroughs. A European rendered stateless by British politics, he lives in the UK.

Web Summit Rio

Rio de Janeiro, Brazil

1-4 May, 2023

Participant

Open Source Summit NA

Vancouver, Canada

10-13 May, 2023

Talk

OmniBOR: Bringing the Receipts for Supply Chain Security

Aeva Black, Ed Warnicke

Link to presentation

Supply Chain requirements got you down? Getting an endless array of false positives from you ‘SBOM scanners’ ? Spending more of your time proving you don’t have a ‘false positive’ from your scanners than fixing real vulnerabilities in your code? There has to be a better way. There is. Come hear from Aeva and Ed about a new way to capture the full artifact dependency graph of your software, not as a ‘scan’ after the fact, but as an output of your build tools themselves. Find out when this feature is coming to a build tool near you.

OpenInfra Summit

Vancouver, Canada

13-14 June, 2023

Keynote

Rising to the Supply Chain Security Challenge in Global Open Source Communities

Aeva Black

Policy-makers around the world are debating how best to secure the open source components in software supply chains critical to national infrastructure. For individuals who are not steeped in open source communities’ culture, it can seem logical to apply paradigms designed to model the commercial supply chain of physical goods – but this could lead to catastrophic results for open source projects, where liability is expressly disclaimed in the license and contributors are often unpaid volunteers willing to share their time and ingenuity. What, then, should each of us do?

The OpenInfra community was an early leader in defining secure build practices for a large open source project. Comparable processes are now recommended for all open source projects, and are reflected in frameworks published by the OpenSSF… but even more might soon be necessary.

Data + AI Summit

San Francisco, USA

26-28 June, 2023

Participant

Summary

Link to blog post

FOSSY

Portland, USA

13-15 July, 2023

Talk, Workshop

Summary

Link to blog post

Keeping Open Source in the public Interest

Stefano Maffulli

Link to talk

Following an explosion of growth in open collaboration in solving the world’s most urgent problems related to the 2020 global Covid-19 pandemic, open source software moved from mainstream to the world’s main stage. In 2022 the United Nation’s Digital Public Goods (DPG) Alliance began formally certifying open source software as DPG; the European Union wrote open source into their road map; both the EU and the US began crafting Cybersecurity legislation in support of secure software – not targeting OSS as a specific concern but rather protecting and investing in it as critical to its own and its citizens’ interest.

OSI has recognized these important seachanges in the environment, including unprecedented interest in open source in public arenas. Stefano Maffulli’s briefing will provide an overview of important trends in Open Source Software in public policy, philanthropy and research and talk about a new initiative at OSI designed to bring open voices to the discussion.

Workshop – Defining Open Source AI

Stefano Maffulli

Join this in-promptu meeting to share your thoughts on what it means for Artificial Intelligence and Machine Learning systems to be “open”. The Open Source Initiative will host this lunch break to hear from the FOSSY participants what they think should be the shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners, similar to what the Open Source Definition has done.

Campus Party Brazil

Sao Paulo, Brazil

25-29 July, 2023

Keynote

Summary

Link to blog post

25 years of Open Source

Nick Vidal, Bruno Souza

Link to video recording (Portuguese)

Este ano comemoramos 25 anos de Open Source. Este é um grande marco para toda a comunidade! Nesta sessão, viajaremos no tempo para entender nossa rica história e como, apesar de todas as batalhas travadas, viemos a conquistar o mundo, hoje presente em todos os cantos, da Web à Nuvem. Em seguida, mergulharemos direto para o futuro, explorando os vários desafios e oportunidades à frente, incluindo seu papel fundamental na promoção da colaboração e inovação em áreas emergentes como Inteligência Artificial e Cybersecurity. Compartilharemos uma linha do tempo interativa durante a apresentação e convidaremos o público e a comunidade em geral a compartilhar suas histórias e sonhos de código aberto uns com os outros.

The future of Artificial Intelligence: Sovereignty and Privacy with Open Source

Nick Vidal, Aline Deparis

Link to video recording (Portuguese)

O futuro da Inteligência Artificial está sendo definido neste exato momento, em uma incrível batalha travada entre grandes empresas e uma comunidade de empreendedores, desenvolvedores e pesquisadores ao redor do mundo. Existem dois caminhos que podemos seguir: um em que os códigos, modelos e dados são proprietários e altamente regulamentados ou outro em que os códigos, modelos e dados são abertos. Um caminho levará a uma monopolização da IA por algumas grandes corporações, onde os usuários finais terão seu poder e privacidade limitados, enquanto o outro democratizará a IA, permitindo que qualquer indivíduo estude, adapte, contribua, inove e construa negócios em cima dessas fundações com total controle e respeito à privacidade.

Open Source Congress

Geneva, Switzerland

27-28 July, 2023

Talk, Workshop

Panel: Does AI Change Everything? What is Open? Liability, Ethics, Values?

Joanna Lee, The Linux Foundation; Satya Mallick, OpenCV; Mohamed Nanabhay, Mozilla Ventures; Stefano Maffuli, OSI

The rapid advancements in artificial intelligence (AI) have ushered in a new era of possibilities and challenges across various sectors of society. As AI permeates our lives, it is crucial to foster a comprehensive understanding of its implications. The panel will bring together experts from diverse backgrounds to engage in a thought-provoking dialogue on the current challenges for AI in open source. Panelists will address the critical challenges facing the ecosystem, including the need to align on defining open AI, how to foster collaboration between and among open source foundations, explore avenues for improvement, and identify current cross-foundational initiatives, all to improve the state of open source AI.

Defining “Open” AI/ML

Stefano Maffuli

Join this in-promptu meeting to share your thoughts on what it means for Artificial Intelligence and Machine Learning systems to be “open”. The Open Source Initiative will host this session to hear from the Open Source Congress participants what they think should be the shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners, similar to what the Open Source Definition has done.

COSCUP

Taipei, Taiwan

29-30 July, 2023

Keynote

Summary

Link to blog post

The Yin and Yang of Open Source: Unveiling the Dynamics of Collaboration, Diversity, and Cultural Transformation

Paloma Oliveira

Link to the presentation

“The Yin and Yang of Open Source” is a captivating exploration of the intricate relationship between collaboration, diversity, and open source culture. Looking into its rich history, benefits, challenges, and current issues, with a particular focus on its influence in cultural transformation, the talk aims to inspire a deeper appreciation for the immense power of free and open source philosophy and practical application. It emphasizes the importance of responsible practices and the creation of inclusive communities, urging us to embrace this transformative force and actively contribute to a future that is more inclusive and collaborative.

North Bay Python

Petaluma, California

29-30 July, 2023

Talk

Celebrating 25 years of Open Source & our friend Betsy

Josh Simmons

Link to the video recording

Diana Initiative

Las Vegas, USA

5 August, 2023

Participant

Summary

Link to blog post

Black Hat US

Las Vegas, USA

5-9 August, 2023

Participant

Summary

Link to blog post

Ai4

Las Vegas, USA

6-8 August, 2023

Participant

Summary

Link to blog post

DEFCON

Las Vegas, USA

10-13 August, 2023

Participant

Summary

Link to blog post

NextCloud Conference

Berlin, Germany

16-17 September, 2023

Keynote

The Fourth Sector: an often overlooked and misunderstood sector in the European worldview

Simon Phipps

Link to video recording

Simon Phipps is known for his time at Sun Microsystems, where he took over leadership of Sun’s open source program and ran one of the first OSPOs. During this time in the 2000s, most of Sun’s core software was released under open source licenses, including Solaris and Java (which he had previously worked on co-establishing IBM’s Java business in the 1990s). When Sun was broken up in 2010, he was freed to focus purely on open source and dedicated time to re-imagining the Open Source Initiative (OSI) – the non-profit organization that acts as a steward of the canonical list of open source licenses and the Open Source Definition. Today Simon leads OSI’s work educating European policymakers about the needs of the open source community.

Open Source Summit EU

Bilbao, Spain

19-21 September, 2023

Keynote, Talk, Interview

Sponsor

Summary

Link to blog post

Keynote: The Evolving OSPO

Nithya Ruff

Link to keynote

The OSPO or open source program office has become a well-established institution for driving open source strategy and operations inside companies and other institutions. And 2023 has been a year of strong change and growth for OSPOs everywhere. This keynote will take a look at new challenges and opportunities that face OSPOs today.

Panel Discussion: Why Open Source AI Matters: The EU Community & Policy Perspective

Justin Colannino; Astor Nummelin Carlberg; Ibrahim Haddad; Sachiko Muto; Stefano Maffulli

Link to video recording

Interview: The New Stack

Stefano Maffulli, Alex Williams

Link to video recording

Interview: TFiR

Stefano Maffulli, Swapnil Bhartiya

Link to video recording

Nerdearla

Buenos Aires, Argentina

26-30 September, 2023

Keynote

Partner

Summary

Link to blog post

Celebrating 25 years of Open Source

Nick Vidal

Link to video recording

February 2023 marks the 25th Anniversary of Open Source. This is a huge milestone for the whole community to celebrate! In this session, we’ll travel back in time to understand our rich journey so far, and look forward towards the future to reimagine a new world where openness and collaboration prevail. Come along and celebrate with us this very special moment! The open source software label was coined at a strategy session held on February 3rd, 1998 in Palo Alto, California. That same month, the Open Source Initiative (OSI) was founded as a general educational and advocacy organization to raise awareness and adoption for the superiority of an open development process. One of the first tasks undertaken by OSI was to draft the Open Source Definition (OSD). To this day, the OSD is considered a gold standard of open-source licensing. In this session, we’ll cover the rich and interconnected history of the Free Software and Open Source movements, and demonstrate how, against all odds, open source has come to “win” the world. But have we really won? Open source has always faced an extraordinary uphill battle: from misinformation and FUD (Fear Uncertainty and Doubt) constantly being spread by the most powerful corporations, to issues around sustainability and inclusion. We’ll navigate this rich history of open source and dive right into its future, exploring the several challenges and opportunities ahead, including its key role on fostering collaboration and innovation in emerging areas such as ML/AI and cybersecurity. We’ll share an interactive timeline during the presentation and throughout the year, inviting the audience and the community at-large to share their open source stories and dreams with each other.

Deep Dive: AI Webinar Series

Online

September, 2023

Talks

Organizer

Summary

Link to blog post

The Turing Way Fireside Chat: Who is building Open Source AI?

Jennifer Ding, Arielle Bennett, Anne Steele, Kirstie Whitaker, Marzieh Fadaee, Abinaya Mahendiran, David Gray Widder, Mophat Okinyi

Link to video recording

Facilitated by Jennifer Ding and Arielle Bennett of The Turing Way and the Alan Turing Institute, this panel will feature highlights from Abinaya Mahendiran (Nunnari Labs) Marzieh Fadaee (Cohere for AI), David Gray Widder (Cornell Tech), and Mophat Okinyi (African Content Moderators Union). As part of conversations about defining open source AI as hosted by the Open Source Initiative (OSI), The Turing Way is hosting a panel discussion centering key communities who are part of building AI today, whose contributions are often overlooked. Through a conversation with panellists from content moderation, data annotation, and data governance backgrounds, we aim to highlight different kinds of contributors whose work is critical to the Open Source AI ecosystem, but whose contributions are often left out of governance decisions or from benefitting from the AI value chain. We will focus on these different forms of work and how each are recognised and rewarded within the open source ecosystem, with an eye to what is happening now in the AI space. In the spirit of an AI openness that promotes expanding diverse participation, democratising governance, and inviting more people to shape and benefit from the future of AI, we will frame a conversation that highlights current best practices as well as legal, social, and cultural barriers. We hope this multi-domain, multi-disciplinary discussion can emphasise the importance of centering the communities who are integral to AI production in conversations, considerations, and definitions of “Open Source AI.”

Operationalising the SAFE-D principles for Open Source AI

Kirstie Whitaker

Link to video recording

The SAFE-D principles (Leslie, 2019) were developed at the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence. They have been operationalised within the Turing’s Research Ethics (TREx) institutional review process. In this panel we will advocate for the definition of Open Source AI to include reflections on each of these principles and present case studies of how AI projects are embedding these normative values in the delivery of their work.

The SAFE-D approach is anchored in the following five normative goals:

* **Safety and Sustainability** ensuring the responsible development, deployment, and use of a data-intensive system. From a technical perspective, this requires the system to be secure, robust, and reliable. And from a social sustainability perspective, this requires the data practices behind the system’s production and use to be informed by ongoing consideration of the risk of exposing affected rights-holders to harms, continuous reflection on project context and impacts, ongoing stakeholder engagement and involvement, and change monitoring of the system from its deployment through to its retirement or deprovisioning.
* Our recommendation: Open source AI must be safe and sustainable, and open ways of working ensure that “many eyes make all bugs shallow”. Having a broad and engaged community involved throughout the AI workflow keeps infrastructure more secure and keeps the purpose of the work aligned with the needs of the impacted stakeholders.
* **Accountability** can include specific forms of process transparency (e.g., as enacted through process logs or external auditing) that may be necessary for mechanisms of redress, or broader processes of responsible governance that seek to establish clear roles of responsibility where transparency may be inappropriate (e.g., confidential projects).
* Our recommendation: Open source AI should have clear accountability documentation and processes of raising concerns. These are already common practice in open source communities, including through codes of conduct and requests for comment for extensions or breaking changes.
* **Fairness and Non-Discrimination** are inseparably connected with sociolegal conceptions of equity and justice, which may emphasize a variety of features such as equitable outcomes or procedural fairness through bias mitigation, but also social and economic equality, diversity, and inclusiveness.
* Our recommendation: Open source AI should clearly communicate how the AI model and workflow are considering equity and justice. We hope that the open source AI community will embed existing tools for bias reporting into an interoperable open source AI ecosystem.
* **Explainability and Transparency** are key conditions for autonomous and informed decision-making in situations where data processing interacts with or influence human judgement and decision-making. Explainability goes beyond the ability to merely interpret the outcomes of a data-intensive system; it also depends on the ability to provide an accessible and relevant information base about the processes behind the outcome.
* Our recommendation: Open source AI should build on the strong history of transparency that is the foundation of the definition of open source: access to the source code, data, and documentation. We are confident that current open source ways of working will enhance transparency and explainability across the AI ecosystem.
* **Data quality, integrity, protection and privacy** must all be established to be confident that the data-intensive systems and models have been developed on secure grounds.
* Our recommendation: Even where data can not be made openly available, there is accountability and transparency around how the data is gathered and used.

The agenda for the session will be:

1. Prof David Leslie will give an overview of the SAFE-D principles.
2. Victoria Kwan will present how the SAFE-D principles have been operationalised for institutional review processes.
3. Dr Kirstie Whitaker will propose how the institutional process can be adapted for decentralised adoption through a shared definition of Open Source AI.

The final 20 minutes will be a panel responding to questions and comments from the audience.

Commons-based data governance

Alek Tarkowski, Zuzanna Warso

Link to video recording

Issues related to data governance (its openness, provenance, transparency) have traditionally been outside the scope of open source frameworks. Yet the development of machine learning models shows that concerns over data governance should be in the scope of any approach that aims to govern open-source AI in a holistic way. In this session, I would like to discuss issues such as: – the need for openly licensed / commons based data sources – the feasibility of a requirement to openly share any data used in the training of open-source models – transparency and provenance requirements that could be part of an open-source AI framework.

Preempting the Risks of Generative AI: Responsible Best Practices for Open-Source AI Initiatives

Monica Lopez, PhD

Link to video recording

As artificial intelligence (AI) has proliferated across many industries and use cases, changing the way we work, interact and live with one another, AI-enabled technology poses two intersecting challenges to address: the influencing of our beliefs and the engendering of new means for nefarious intent. Such challenges resulting from human psychological tendencies can inform the type of governance needed to ensure safe and reliable generative AI development, particularly in the domain of open-source content.

The formation of human beliefs from a subset of available data from the environment is critical for survival. While beliefs can change with the introduction of new data, the context in which such data emerges and the way in which such data are communicated all matter. Our live dynamic interactions with each other underpin our exchange of information and development of beliefs. Generative AI models are not live systems, and their internal architecture is incapable of understanding the environment to evaluate information. Considering this system reality with the use of AI as a tool for malicious actors to commit crimes, deception –strategies humans use to manipulate others, withhold the truth, and create false impressions for personal gain– becomes an action further amplified by impersonal, automated means.

With the entrance in November 2022 of large language models (LLMs) and other multimodal AI generative systems for public use and consumption, we have the mass availability of tools capable of blurring the line between reality and fiction and of outputting disturbing and dangerous content. Moreover, open-source AI efforts, while laudable in their goal to create a democratized technology, speed up collaboration, fight AI bias, encourage transparency, and generate community norms and AI standards by standards bodies all to encourage fairness, have highlighted the dangers of model traceability and the complex nature of data and algorithm provenance (e.g., PoisonGPT, WormGPT). Further yet, regulation over the development and use of these generative systems remains incomplete and in draft form, e.g., the European Union AI Act, or as voluntary commitments of responsible governance, e.g., Voluntary Commitments by Leading United States’ AI Companies to Manage AI Risks.

The above calls for a reexamination and subsequent integration of human psychology, AI ethics, and AI risk management for the development of AI policy within the open-source AI space. We propose a three-tiered solution founded on a human-centered approach that advocates human well-being and enhancement of the human condition: (1) A clarification of human beliefs and the transference of expectations on machines as a mechanism for supporting deception with AI systems; (2) The use of (1) to re-evaluate ethical considerations as transparency, fairness, and accountability and their individual requirements for open-source code LLMs; and (3) A resulting set of technical recommendations that improve risk management protocols (i.e., independent audits with holistic evaluation strategies) to overcome both the problems of evaluation methods with LLMs and the rigidity and mutability of human beliefs.

The goal of this three-tiered solution is to preserve human control and fill the gap of current draft legislation and voluntary commitments, balancing the vulnerabilities of human evaluative judgement with the strengths of human technical innovation.

Data privacy in AI

Michael Meehan

Link to video recording

Data privacy in AI is something everyone needs to plan for. As AI technology continues to advance, it is becoming increasingly important to protect the personal information that is used to train and power these systems, and to ensure that companies are using personal information properly. First, understand that AI systems can inadvertently leak the data used to train the AI as it is producing results. This talk will give an overview of how and why this happens. Second, ensure that you have proper rights to use data fed into your AI. This is not a simple task at times, and the stakes are high. This talk will go into detail about circumstances where the initial rights were not proper, and the sometimes-catastrophic results of that. Third, consider alternatives to using real personal information to train models. One particularly appealing approach is to use the personal data to create statistically-similar synthetic data, and use that synthetic data to train your AI systems. The considerations are important to help protect personal information, or other sensitive information, from being leaked by using AI. This will help to ensure that AI technology can be used safely and responsibly, and that the benefits of AI can be enjoyed with fewer risks.

Perspectives on Open Source Regulation in the upcoming EU AI Act

Katharina Koerner

Link to video recording

This presentation will delve into the legal perspectives surrounding the upcoming EU AI Act, with a specific focus on the role of open source, non-profit, and academic research and development in the AI ecosystem. The session will cover crucial topics such as defining open data and AI/ML systems, copyrightability of AI outputs, control over code and data, data privacy, and fostering fair competition while encouraging open innovation. Drawing from existing and upcoming AI regulations globally, we will present recommendations to facilitate the growth of an open ecosystem while safeguarding ethical and accountable AI practices. Join this session for an insightful exploration of the legal landscape shaping the future of open source.

What You Will Learn in the Presentation:

The key problems faced by open source projects under the draft EU AI Act.
The significance of clear definitions and exemptions for open source AI components.
The need for effective coordination and governance to support open source development.
The challenges in implementing the R&D exception for open source AI.
The importance of proportional requirements for “foundation models” to encourage open source innovation and competition.
Recommendation to address the concerns of open source platform providers and ensure an open and thriving AI ecosystem under the AI Act.

Data Cooperatives and Open Source AI

Tarunima Prabhakar, Siddharth Manoharrs

Link to video recording

Data Cooperatives have been proposed as a possible remediation to the current power disparity between citizens/internet users from whom data is generated and corporations that process data. But these cooperatives may also evolve to develop their own AI models based on the pooled data. The move to develop machine learning may be driven by a need to make the cooperative sustainable or to address a need of the people pooling the data. The cooperative may consider ‘opening’ its machine learning model even if the data is not open. In this talk we will use Uli, our ongoing project to respond to gendered abuse in Indian languages, as a case study to describe the interplay between community pooled data and open source AI. Uli relies on instances of abuse annotated by activists and researchers at the receiving end of gendered abuse. This crowdsourced data has been used to train a machine learning model to detect abuse in Indian languages. While the data and the machine learning model were made open source for the beta release, in subsequent iterations the team is considering limiting the data that is opened. This is, in part, a recognition that the project is compensating for the lack of adequate attention to non anglophone languages by trust and safety teams across platforms. This talk will explore the different models for licensing data and the machine learning models built on it, that the team is considering, and the tradeoffs between economic sustainability and public good creation in each.

Fairness & Responsibility in LLM-based Recommendation Systems: Ensuring Ethical Use of AI Technology

Rohan Singh Rajput

Link to video recording

The advent of Large Language Models (LLMs) has opened a new chapter in recommendation systems, enhancing their efficacy and personalization. However, as these AI systems grow in complexity and influence, issues of fairness and responsibility become paramount.This session addresses these crucial aspects, providing an in-depth exploration of ethical concerns in LLM-based recommendation systems, including algorithmic bias, transparency, privacy, and accountability. We’ll delve into strategies for mitigating bias, ensuring data privacy, and promoting responsible AI usage.Through case studies, we’ll examine real-world implications of unfair or irresponsible AI practices, along with successful instances of ethical AI implementations. Finally, we’ll discuss ongoing research and emerging trends in the field of ethical AI.Ideal for AI practitioners, data scientists, and ethicists, this session aims to equip attendees with the knowledge to implement fair and responsible practices in LLM-based recommendation systems.

Challenges welcoming AI in openly-developed open source projects

Thierry Carrez, Davanum Srinivas, Diane Mueller

Link to video recording

Openly-developed open source projects are projects that are developed in a decentralized manner, fully harnessing the power of communities by going beyond open source to also require open development, open design and open community (the 4 opens). This open approach to innovation has led to creation of very popular open source infrastructure technologies like OpenStack or Kubernetes.

With the rise of generative solutions and LLMs, we are expecting more and more code to be produced, directly or indirectly, by AI. Expected efficiencies may save millions of dollars. But at what cost? How is that going to affect the 4 opens? What are the challenges in welcoming AI in our open communities?

This webinar will explore questions such as:
– Can AI-generated code be accepted in projects under an open source license?
– How can we expect open design processes to evolve in a AI world?
– Is it possible to avoid that the burden just shifts from code authoring to code reviewing?
– What does open community mean with AI-powered participants? Is there a risk to create a second class of community members?

Opening up ChatGPT: a case study in operationalizing openness in AI

Andreas Liesenfeld, Mark Dingemanse

Link to video recording

Openness in AI is necessarily a multidimensional and therefore graded notion. We present work on tracking openness, transparency and accountability in current instruction-tuned large language models. Our aim is to provide evidence-based judgements of openness for over ten specific features, from source code to training data to model weights and from licensing to scientific documentation and API access. The features are grouped in three broad areas (availability, documentation, and access methods). The openness judgements can be used individually by potential users to make informed decisions for or against deployment of a particular architecture or model. They can also be used cumulatively to derive overall openness scores (tracked at https://opening-up-chatgpt.github.io). This approach allows us to efficiently point out questionable uses of the term “open source” (for instance, Meta’s Llama2 emerges as the least open of all ‘open’ models) and to incentivise developers to consider openness and transparency throughout the model development and deployment cycle (for instance, the BLOOMZ model stands out as a paragon of openness). While our focus is on LLM+RLHF architectures, the overall approach of decomposing openness into its most relevant constituent features is of general relevance to the question of how to define “open” in the context of AI and machine learning. As scientists working in the spirit of open research, the framework and source code underlying our openness judgements and live tracker is itself open source.

Open source AI between enablement, transparency and reproducibility

Ivo Emanuilov, Jutta Suksi

Link to video recording

Open source AI is a misnomer. AI, notably in the form of machine learning (ML), is not programmed to perform a task but to learn a task on the basis of available data. The learned model is simply a new algorithm trained to perform a specific task, but it is not a computer program proper and does not fit squarely into the protectable subject matter scope of most open source software licences. Making available the training script or the model’s ‘source code’ (eg, neural weights), therefore, does not guarantee compliance with the OSI definition of open source as it stands because AI is a collection of data artefacts spread across the ML pipeline.
The ML pipeline is formed by processes and artifacts that focus on and reflect the extraction of patterns, trends and correlations from billions of data points. Unlike conventional software, where the emphasis is on the unfettered downstream availability of source code, in ML it is transparency about the mechanics of this pipeline that takes centre stage.
Transparency is instrumental for promoting use maximisation and mitigating the risk of closure as fundamental tenets of the OSS definition. Instead of focusing on single computational artefacts (eg, the training and testing data sets, or the machine learning model), a definition of open source AI should zoom in on the ‘recipe’, ie the process of making a reproducible model. Open source AI should be less interested in the specific implementations protected by the underlying copyright in source code and much more engaged with promoting public disclosure of details about the process of ‘AI-making’.
The definition of open source software has been difficult to apply to other subject matter, so it is not surprising that AI, as a fundamentally different form of software, may similarly require another definition. In our view, any definition of open source AI should therefore focus not solely on releasing neural network weights, training script source code, or training data, important as they may be, but on the functioning of the whole pipeline such that the process becomes reproducible. To this end, we propose a definition of open source AI which is inspired by the written description and enablement requirement in patent law. Under that definition, to qualify as open source AI, the public release should disclose details about the process of making AI that are sufficiently clear and complete for it to be carried out by a person skilled in machine learning.
This definition is obviously subject to further development and refinement in light of the features of the process that may have to be released (eg, model architecture, optimisation procedure, training data etc.). Some of these artefacts may be covered by exclusive IP rights (notably, copyright), others may not. This creates a fundamental challenge with licensing AI in a single package.
One way to deal with this conundrum is to apply the unitary approach known from the European case law on video games (eg, the ECJ Nintendo case) whereby if we can identify one expressive element that attracts copyright protection (originality), this element would allow us to extend protection to the work as a whole. Alternatively, we can adopt the more pragmatic and technically correct approach to AI as a process embedding a heterogenous collection of artefacts. In this case, any release on open source terms that ensures enablement, reproducibility and downstream availability would have to take the form of a hybrid licence which grants cumulatively enabling rights over code, data, and documentation.
In this session, we discuss these different approaches and how the way we define open source AI and the objectives pursued with this definition may predetermine which licensing approach should apply.

Federated Learning: A Paradigm Shift for Secure and Private Data Analysis

Dimitris Stripelis

Link to video recording

Introduction
There are situations where data relevant to a machine learning problem are distributed across multiple locations that cannot share the data due to regulatory, competitiveness, security, or privacy reasons. Federated Learning (FL) is a promising approach to learning a joint machine learning model over all the available data across silos without transferring data to a centralized location. Federated Learning was originally introduced by Google in 2017 for next-word prediction on edge devices [1]. Recently, Federated Learning has witnessed vast applicability across multiple disciplines, especially in healthcare, finance, and manufacturing.

Federated Learning Training
Typically, a federated environment consists of a centralized server and a set of participating devices. Instead of sending the raw data to the central server, devices only send their local model parameters trained over their private data. This computational approach has a great impact on how traditional training of the machine and deep learning models is performed. Compared to centralized machine learning where data need to be aggregated in a centralized location, Federated Learning allows data to live at their original location, hence improving data security and reducing associated data privacy risks. When Federated Learning is used to train models across multiple edge devices, e.g., mobile phones, sensors, and the like, it is known as cross-device FL, and when applied across organizations it is known as cross-silo FL.

Secure and Private Federated Learning
Federated Learning addresses some data privacy concerns by ensuring that sensitive data never leaves the user’s device. Individual data remains secure and private, significantly reducing the risk of data leakage, while users actively participate in the data analysis processes and maintain complete control over their personal information. However, Federated Learning is not always secure and private out-of-the-box. The federated model can still leak sensitive information if not adequately protected [3] while an eavesdropper/adversary can still access the federated training procedure through the communication channels. To alleviate this, Federated Learning has to be combined with privacy-preserving and secure data analysis mechanisms, such as Differential Privacy [4] and Secure Aggregation [5] protocols. Differential Privacy can ensure that sensitive personal information is still protected even under unauthorized access, while Secure Aggregation protocols enable models’ aggregation even under collusion attacks.

Conclusion
In a data-driven world, prioritizing data privacy and secure data analysis is not just a responsibility but a necessity. Federated Learning emerges as a game-changer in this domain, empowering organizations to gain insights from decentralized data sources while safeguarding data privacy. By embracing Federated Learning, we can build a future where data analysis and privacy coexist harmoniously, unlocking the full potential of data-driven innovations while respecting the fundamental rights of privacy.

Should OpenRAIL licenses be considered OS AI Licenses?

Daniel McDuff, Danish Contractor, Luis Villa, Jenny Lee

Link to video recording

Advances in AI have been enabled in-part thanks to open source (OS) which has permeated ML research both in the academy and industry. However, there are growing concerns about the influence and scale of AI models (e.g., LLMs) on people and society. While openness is a core value for innovation in the field, openness is not enough and does not address the risks of harm that might exist when AI is used negligently or maliciously. A growing category of licenses are open responsible AI licenses (https://www.licenses.ai/ai-licenses) which include behavioral-use clauses, these include high profile projects such as Llama2 (https://ai.meta.com/llama/) and Bloom (https://bigscience.huggingface.co/blog/bloom). In this proposed session the panelists would discuss whether OpenRAIL (https://huggingface.co/blog/open_rail) licenses should be considered as OS AI licenses.

Topics will include: Whether the definition of OS is not adequate for AI systems; Whether OS of AI systems requires open-sourcing every aspect of the model (data, model, source) and whether that is feasible; How data use requirements could be included in such a definition; and therefore, whether inclusion of behavioral use restrictions is at odds with any future definition of OS AI. In responding to these questions the panelists will discuss how the components of AI systems (e.g., data, models, source code, applications) each have different properties and whether this is part of the motivation for a new form of licensing. The speakers have their own experience of building, distributing and deploying AI systems and will provide examples of these considerations in practice.

Copyright — Right Answer for Open Source Code, Wrong Answer for Open Source AI?

McCoy Smith

Link to video recording

Open source has always found its legal foundation primarily in copyright. Although many codes of behavior around open source have been adopted and promulgated by various open source communities, in the end it is the license attached to any piece of open source that dictates how it may be used and what obligations a user must abide by in order to remain legally compliant.
Artificial Intelligence is raising, and will continue to raise, profound questions about how copyright law applies — or does not apply — to the process of ingesting training content, processing that content to extract information used to generate output, what the that information is, and the nature of the output produced.
Much debate, and quite a bit of litigation, has recently been generated around questions raised by the input phase of training Artificial Intelligence, and to what extent the creators of materials used in that input phase have any right — morally or legally — to object to that training. At the same time, whether or not the output of AI can be the subject matter of copyright, or patent, protection is also being tested in various jurisdictions — with clashing results. What occurs between input and output remains an unresolved issue — and whether there is any legal regime that can be used to guarantee that legal, normative rules can control how those processes are used exist in the way that copyright, and copyright licensing, do so in open source at present.
The presentation will discuss these issues in depth with a lens toward testing whether copyright — or any other intellectual property regime — really can be useful in keeping AI “open.”

Should we use open source licenses for ML/AI models?

Mary Hardy

Link to video recording

Open source AI models are exponentially increasing in number and the variety of open source licenses chosen is substantial. Can all OSI-approved licenses be used uniformly to fit the various components of AI?

During the session, open source attorney Mary Hardy will explore questions present and future about open ML model licenses, including:

Why is AFL-3.0 so popular?

What about Apache-2.0? GPL-2.0/3.0?

What are the implications of licensing modifications under a different OS license than the checkpoint used as a basis?

Is a new license that explicitly considers ML model weights needed?

Covering your bases with IP Indemnity

Justin Dorfman, Tammy Zhu, Samantha Mandell

Link to video recording

When working with LLM providers that don’t have their models public (Anthropic, OpenAI, etc.), it’s near impossible to know if any Copyleft code has been trained upon. So how do you bring AI developer tools to the market without risking legal jeopardy? I asked Sourcegraph’s head of Legal, Tammy Zhu, to teach me how we protect ourselves from failing to comply with attribution requirements.

The Ideology of FOSS and AI: What “Open” means relating to platforms and black box systems

Mike Nolan

Link to video recording

The initial conception of Free and Open Source Software was developed during a time where software was bundled into discrete packages to be run on machines owned and operated by a single individual. The initial FOSS movement utilized licensing and copyright law to provide better autonomy and control over these software systems. Now, our software systems often operate as platforms, monopolizing access between networks and resources and profiting greatly through that monopoly.

In this talk, listeners will learn more about the ideological foundations of FOSS and the blindspots that have developed in our community as software has transitioned from individual discrete packages into deeply interconnected systems that gate access to critical resources for many. We will delve into what autonomy might mean in a world where the deployment of technology inherently affects so many. Finally, we will observe the flaws in conventional open source approaches to providing autonomy and what other tools we may have at our disposal to ensure better community governance of this increasingly pervasive technology.

Community Leadership Summit

Raleigh, USA

15 October, 2023

Participant

All Things Open

Raleigh, USA

16-17 October, 2023

Talks, Workshop

Sponsor, Organizer (OSI Track)

Summary

Link to blog post

Celebrating 25 years of Open Source

Nick Vidal

Link to slides

This year marks the 25th Anniversary of Open Source. This is a huge milestone for the whole community to celebrate! In this session, we’ll travel back in time to understand our rich journey so far, and look forward towards the future to reimagine a new world where openness and collaboration prevail. Come along and celebrate with us this very special moment! The open source software label was coined at a strategy session held on February 3rd, 1998 in Palo Alto, California. That same month, the Open Source Initiative (OSI) was founded as a general educational and advocacy organization to raise awareness and adoption for the superiority of an open development process. One of the first tasks undertaken by OSI was to draft the Open Source Definition (OSD). To this day, the OSD is considered a gold standard of open-source licensing. In this session, we’ll cover the rich and interconnected history of the Free Software and Open Source movements, and demonstrate how, against all odds, open source has come to “win” the world. But have we really won? Open source has always faced an extraordinary uphill battle: from misinformation and FUD (Fear Uncertainty and Doubt) constantly being spread by the most powerful corporations, to issues around sustainability and inclusion. We’ll navigate this rich history of open source and dive right into its future, exploring the several challenges and opportunities ahead, including its key role in fostering collaboration and innovation in emerging areas such as Artificial Intelligence. We’ll share an interactive timeline during the presentation and throughout the year, inviting the audience and the community at-large to share their open source stories and dreams with each other.

Open Source 201

Pamela Chestek

This essential session is an advanced primer on open source licenses and why one should care, which are most commonly used and why. Also included are insights into the OSI license process and who are involved in considering and approving new licenses based on Open Source Definition, plus which have been approved in the last five years. Topics include challenges, successes, best practices, operational policies, resources.

Open Source and Public Policy

Deb Bryant, Stephen Jacobs, Patrick Masson, Ruth Suehle, Greg Wallace

Link to video recording

New regulations in the software industry and adjacent areas such as AI, open science, open data, and open education are on the rise around the world. Cyber Security, societal impact of AI, data and privacy are paramount issues for legislators globally. At the same time, the COVID-19 pandemic drove collaborative development to unprecedented levels and took Open Source software, open research, open content and data from mainstream to main stage, creating tension between public benefit and citizen safety and security as legislators struggle to find a balance between open collaboration and protecting citizens.

Historically, the open source software community and foundations supporting its work have not engaged in policy discussions. Moving forward, thoughtful development of these important public policies whilst not harming our complex ecosystems requires an understanding of how our ecosystem operates. Ensuring stakeholders without historic benefit of representation in those discussions becomes paramount to that end.

Please join our open discussion with open policy stakeholders working constructively on current open policy topics. Our panelists will provide a view into how oss foundations and other open domain allies are now rising to this new challenge as well as seizing the opportunity to influence positive changes to the public’s benefit.

Topics: Public Policy, Open Science, Open Education, current legislation in the US and EU, US interest in OSS sustainability, intro to the Open Policy Alliance

Panel: Open Source Compliance & Security

Aeva Black, Brian Dussault, Madison Oliver, Alexander Beaver

Link to blog post

The goal of this panel is to cover all things supply chain (from SBOMs in general to other technologies/approaches in particular) exploring four different perspectives, from CISA (Cybersecurity and Infrastructure Security Agency) and the latest efforts by the US government to secure open source; from GitHub, the largest open source developer platform; from Stacklock, one of the most exciting startups in this space, being led by the founders of Kubernetes and Sigstore; and from Rochester Institute of Technology, one of the leading universities in the US.

Open Source AI definition (workshop)

Mer Joyce, Stefano Maffulli

The Open Source Initiative (OSI) continues the work of exploring complexities surrounding the development and use of artificial intelligence in this in-person session, part of Deep Dive: AI – Defining Open Source AI 2023 series. The goal is to collaboratively establish a clear and defensible definition of “Open Source AI.” This is going to be an interactive session where every participant will have an active role. OSI will share an early draft of the Open Source AI Definition and, with the help of a facilitator, we will collect feedback from the participants. Be in the room where it happens!

EclipseCon

Ludwigsburg, Germany

October 16-18, 2023

Keynote

Open Source Is 25 Years Young

Carlos Piana

This year, the Open Source Initiative (OSI) celebrates 25 years of activity, mainly defining what open source is. We are certainly proud of what we have done in the past and believe that open source has delivered many if not all of its promises.

But we are more interested in what lies ahead for the next 25 years. The paradigm shifts with increasing speed, from mainframe, to client/server, to Internet, to cloud, to AI, to what? We must make sure that openness and freedom remain as unhindered as possible. Simply using the same tools that made open source a resounding success will not be enough.

In this talk, Carlo Piana will share OSI’s views and plans to foster openness for the years to come.

Latinoware

Foz do Iguaçu, Brazil

18-20 October

Keynote, Talk

The future of Artificial Intelligence: Sovereignty and Privacy with Open Source

Nick Vidal, Aline Deparis

Link to video recording (Portuguese)

The future of Artificial Intelligence is being defined right now, in an incredible battle between large companies and a community of entrepreneurs, developers and researchers around the world. The world of AI is at an important crossroads. There are two paths forward: one where highly regulated proprietary code, models, and datasets are going to prevail, or one where Open Source dominates. One path will lead to a stronghold of AI by a few large corporations where end-users will have limited privacy and control, while the other will democratize AI, allowing anyone to study, adapt, contribute back, innovate, as well as build businesses on top of these foundations with full control and respect for privacy.

Celebrating 25 years of Open Source

Nick Vidal

This year marks the 25th Anniversary of Open Source. This is a huge milestone for the whole community to celebrate! In this session, we’ll travel back in time to understand our rich journey so far, and look forward towards the future to reimagine a new world where openness and collaboration prevail. Come along and celebrate with us this very special moment! The open source software label was coined at a strategy session held on February 3rd, 1998 in Palo Alto, California. That same month, the Open Source Initiative (OSI) was founded as a general educational and advocacy organization to raise awareness and adoption for the superiority of an open development process. One of the first tasks undertaken by OSI was to draft the Open Source Definition (OSD). To this day, the OSD is considered a gold standard of open-source licensing. In this session, we’ll cover the rich and interconnected history of the Free Software and Open Source movements, and demonstrate how, against all odds, open source has come to “win” the world. But have we really won? Open source has always faced an extraordinary uphill battle: from misinformation and FUD (Fear Uncertainty and Doubt) constantly being spread by the most powerful corporations, to issues around sustainability and inclusion. We’ll navigate this rich history of open source and dive right into its future, exploring the several challenges and opportunities ahead, including its key role in fostering collaboration and innovation in emerging areas such as Artificial Intelligence. We’ll share an interactive timeline during the presentation and throughout the year, inviting the audience and the community at-large to share their open source stories and dreams with each other.

Linux Foundation Member Summit

Monterey, USA

24-25 October, 2023

Talk, Workshop

Why Open Source AI Matters: The Community & Policy Perspective

Mary Hardy; Stefano Mafulli; Mike Linksvayer; Katharina Koerner

The number of publicly available AI models is growing exponentially, doubling every six months. With this explosion, communities and policymakers are asking questions about open source AI’s innovation benefits, safety risks, impact on sovereignty, and competitive economics against closed-source models. In this panel discussion, Mary and panelists will talk about why a clear and consistent definition of open source AI matters for open source communities in the face of growing policy tending towards greater regulation of open communities.

Workshop: Define “Open AI”

Stefano Maffulli, Mer Joyce

As the legislators accelerate and the doomsayers chant, one thing is clear: It’s time to define what “open” means in the context of AI/ML before it’s defined for us.

Join this interactive session to share your thoughts on what it means for Artificial Intelligence and Machine Learning systems to be “open”. The Open Source Initiative to hear from the attendees what they think should be the shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners, similar to what the Open Source Definition has done for software.

We’ll share a draft of a new definition of “open” AI/ML systems and ask attendees to review it in real time.

Community over Code

Halifax, Canada

26-28 October, 2023

Keynote

Why open source AI matters: towards a clear definition

Justin Colannino

The number of publicly available AI models is growing exponentially, doubling every six months. With this explosion, communities and policymakers are asking questions and proposing legislation to address open source AI’s innovation benefits, safety risks, impact on sovereignty, and competitive economics against closed models.

Against this backdrop, open source communities need a clear and consistent definition of open source AI to ensure that the “open source” marker signals safety and respect for AI developers, rights for end users, and frictionless improvement for the broader community. In this keynote, OSI board member Justin Colannino will talk about what OSI is doing to build this needed open source AI definition and how you can help.

SFScon

Bolzano, Italy

9-10 November, 2023

Keynote

Regulation, AI and the State of Software Freedom in Europe

Simon Phipps

Link to video recording

For many years, we have relied on a big, ALL CAPS waiver of liability in licenses and the ability of the recipient to examine and run the code to ensure software freedom for all. But the cloud, AI and now a wave of European regulation have eroded that dream. Where have we got to, and is software freedom still a viable objective?

Digital Public Goods Alliance Meeting

Addis Ababa, Ethiopia

15-16 November

Workshop

Partner

Summary

Link to blog post

Workshop: Define “Open AI”

Stefano Maffulli, Nicole Martinelli

As the legislators accelerate and the doomsayers chant, one thing is clear: It’s time to define what “open” means in the context of AI/ML before it’s defined for us.

Join this interactive session to share your thoughts on what it means for Artificial Intelligence and Machine Learning systems to be “open”. The Open Source Initiative to hear from the attendees what they think should be the shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners, similar to what the Open Source Definition has done for software.

We’ll share a draft of a new definition of “open” AI/ML systems and ask attendees to review it in real time.

Open Source Experience

Paris, France

6-7 December, 2023

Keynote

Partner

25 years of Open Source

Simon Phipps, Thierry Carrez, Florent Zara

Et LinuxFr.org répond présent comme d’habitude depuis de nombreuses années. Vous pourrez donc nous y retrouver, stand B10, juste à côté de l’espace animation (que l’on va animer, comptez sur nous !). Une partie de l’équipe du site LinuxFr.org sera présente au sein du village associatif pour vous faire découvrir le site, discuter, répondre à toutes les questions que vous pourriez vous poser, vous donner des autocollants du site et vous faire gagner des kilos de livres, mais pas que, car nous fêterons nos 25 ans conjointement avec l’Open Source Initiative. Nous devrions même pouvoir vous proposer une cuvée spéciale !

The role of Foundations in today’s open source

Thierry Carrez

The rise of software development forges like GitHub has dramatically reduced friction to create and run open source projects. In this context, what is the role of Foundations today, in the wider open source ecosystem?
In this talk, Thierry Carrez, General Manager at the Open Infrastructure Foundation and vice-chair of the Open Source Initiative, will share his vision on this topic. After a quick history of open source Foundations, this talk will present a landscape of the type of open source Foundations in activity today, with their differences in scope and principles, then focus on the value add of modern Foundations: enabling open collaboration across several organizations by providing a range of services to the supported project.

AI.dev

Palo Alto, USA

12-13 December, 2023

Talk, Workshop

Partner

Panel Discussion: Why a Universal Definition of ‘Open Source AI’ is Essential for Humanity

Roman Shaposhnik, Apache Software Foundation; Tanya Dadasheva, Ainekko, Co.; Nithya Ruff, Amazon; Sal Kimmich, GadflyAI

When Open Source definition was created more than a quarter century ago nobody could anticipate an enormous, multi trillion dollar market formation effect it would have on the IT industry. AI is now entering an era when it isn’t just an application of computing but rather a radically different way of how computational systems can be engineered. If we want these novel computational systems to be built in the same collaborative setting we are used to, we need to be extra smart about what parts of our open source legacy we take into the future and what parts we need to reinvent. In short, we need a level-setting, cross-industry definition of an “Open Source AI”. This session will cover topics ranging from impact of Generative AI to the fact that the traditional view of open source code implementing AI algorithms may not be sufficient to guarantee inspectability, modifiability and replicability. We will touch upon ongoing government efforts creating policies regulating AI and more specifically OSS AI proliferation. While the panel will mostly focus on the results and lessons learned from the OSI’s Deep Dive in AI we will also cover similar efforts by the Apache and Linux Foundations.

Workshop: Define “Open AI”

Mer Joyce, Ruth Suehle

As the legislators accelerate and the doomsayers chant, one thing is clear: It’s time to define what “open” means in the context of AI/ML before it’s defined for us.

Join this interactive session to share your thoughts on what it means for Artificial Intelligence and Machine Learning systems to be “open”. The Open Source Initiative to hear from the attendees what they think should be the shared set of principles that can recreate the permissionless, pragmatic and simplified collaboration for AI practitioners, similar to what the Open Source Definition has done for software.

We’ll share a draft of a new definition of “open” AI/ML systems and ask attendees to review it in real time.

Towards 2024

Our mission is to educate about and advocate for the benefits of open source and to build bridges among different constituencies in the open source community. In 2023, thanks to the commitment and donations from several individuals and organizations, we made substantial progress towards our mission. We hope to continue to evolve our programs in 2024, embracing the new challenges and opportunities ahead, from Artificial Intelligence to cybersecurity. Please consider joining the OSI as an individual member and/or as a sponsor or affiliate.

In alphabetical order and at the risk of missing some names of individuals who have contributed to the OSI in 2023, we would like to thank: Aaron Campbell, Aaron Oldenburg, Aaron Williamson, Abby Kearns, Abby Mayes, Abinaya Mahendiran, Abram Connelly, Adam Bouhenguel, Aditya Mishra, Aeva Black, Agil Antony, Agustina Oubel, Aizhamal Nurmamat, Alberto Colon Viera, Alek Tarkowski, Aleksander Baranowski, Aleksandrs Volodjkins, Alessandra Lemos, Alex Williams, Alexander Beaver, Alexander Brateanu, Alin Opri, Aline Deparis, Allan Friedman, Alyssa Gravelle, Alyssa Wright, Amanda Brock, Amanda Casari, Amanda Nystrom, Amy Benson, Ana Jimenez Santamaria, Ana Paula Lauermann, Andreas Liesenfeld, Andreas Nettstrater, Andreas Schreiber, Andrew Flegg, Andrew Germann, Andrew Janke, Andrew Katz, Andy Piper, Angela Brown, Angie Barret, Anibal Prestamo, Annania Melaku, Anne Steele, Anne-Marie Scott, Anni Lai, Anthony Best, Ariel Jolo, Arielle Bennett, Ashley McDonald, Ashley Wolf, Astor Nummelin Carlberg, Aubert Emako Tientcheu, Aviya Skowron, Axel Rivas, Bart De Witte, Basit Ayantunde, Ben Abrams, Ben Brooks, Ben Cotton, Ben Ramsey, Ben Reser, Ben van ‘t Ende, Ben Werd, Benjamin Heap, Betsy Waliszewski, Biaowei Zhuang, Birthe Lindenthal, Bob van Luijt, Bolaji Ayodeji, Boris van Hoytema, Boris Veytsman, Brendan Miller, Brian Behlendorf, Brian Duran, Brian Dussault, Brian Shaughnessy, Brian Wisti, Brian Warner, Brianna Cluck, Brittney Q, Bruce Perens, Bruno Souza, Bryan Behrenshausen, Bryan Che, Cailean Osborne, Carl Hancock, Carl-Lucien Schwan, Carlos Ansotegui Pardo, Carlos Muñoz Ferrandis, Carlos Piana, Carol DeCoene, Carol Smith, Carol Willing, Caroline Henriksen, Casey Valk, Catharina Maracke, Celeste Horgan, Cesar Plasencia, Chad Ermacora, Chad Whitacre, Cheng Hai-Xu, Cheuk Ting Ho, Chris Aniszczyk, Chris Grams, Chris Hazard, Chris Hermansen, Chris Hill, Chris Rackauckas, Chris Short, Christian Grobmeier, Christian Hoge, Christian Savard, Christine Abernathy, Christophe Biocca, Christopher Cooper, Christoper Morrison, Ciarán O’Riordan, Clarke Wixon, Claudio Santoro, Clement Oudot, Colin Wright, Connor Leahy, Corinna Gunther, Courtenay Pope, Craig Northway, Cristian Iconomu, Cristin Zegers, Dan Cox, Dan Mahoney, Daniel Brotsky, Daniel Izquierdo, Daniel McDuff, Daniel Mierla, Daniel Naze, Daniel Park, Daniel Risacher, Daniel Scales, Daniel Silverstone, Danish Contractor, Danny Perez-Caballero, Davanum Srinivas, Dave Forgac, Dave Lester, Dave McAllister, David Ayers, David Banig Jr., David Both, David Craner, David Crick, David Gray Widder, David Marr, David Shears, David Woolley, Davide Gullo, Dawn Foster, Denise Allison, Denver Gingerich, Derek Long, Derek Slater, Diane Mueller, Dimitris Stripelis, Dirk Riehle, Donald Fischer, Donald Watkins, Doug Hellmann, Drew Adams, Duane O’Brien, Duda Nogueira, Duy Tran, E. Lynette Rayle, Edd Wilder-James, Ekkehard Gentz, Elior Fureraj, Ellie Evans, Ellyn Heald, Emanuele De Boni, Emily Omier, Emily Simonis, Eric Wright, Erik Solveson, Eron Hennessey, Evan Prodromou, Ezequiel Lanza, Fabrizio Trentin, Fatih Degirmenci, Fatima Khalid, Florence Blazy, Florent Zara, Florian Over, Francesco Giannoccaro, Frank Karlitschek, Frank Viernau, Fred Cox, Fred Fenimore, Frederick Mbuya, Frederik Dixon, Gabriel Engels, Gabriel Ramsey, Gabriele Columbro, Gaël Blondelle, Gene Agnew, Georg Link, Gerald Mayr, Gil Yehuda, Giulia Dellanoce, Gordon Haff, Gordon Lee, Grace Tuscano, Greg Lind, Greg Myers, Greg Wallace, Gregory Zingler, Guy Martin, Hannah Aubry, Heather Alphus, Heather Leson, Heather Meeker, Helen Hoy, Helio Castro, Henrik Ingo, Hidemoto Yamauchi, Hilary Richardson, Howard Thomson, Ian Kelling, Ian Sullivan, Ibrahim Haddad, Ildiko Vancsa, Imo Udom, Ingo Renner, Irina Mirkina, Isaac Sanz, Ivo Emanuilov, Jack Canty, Jackson Braider, Jacob Rogers, James (Jim) Wright, James Korein, James Lovegrove, James Tauber, James Vasile, Jamie Maggie, Jannis Leidel, Jason Baker, Jason Smith, Jathan McCollum, Jautau White, Javier Perez, Jean Devaux, Jean Parpaillon, Jeff Dralla, Jeff Johnson, Jeff Mendoza, Jeff Paul, Jeff Wilcox, Jeffrey Borek, Jeffrey Luszcz, Jen Wike Huger, Jenn McGinnis, Jennifer Ding, Jennifer E. Lee, Jennifer Fowler, Jennifer Pospishek, Jennifer Suber, Jenny Lee, Jeny De Figueiredo, Jeongkyu Shin, Jeremie Tarot, Jeremy Meiss, Jerrold Heyman, Jessica Iavarone, Jessica Smith, Jim Garrison, Jim Hall, Jim Jagielski, Jim Perrin, Jim Zemlin, Joachim Geffken, Joanna Głowacz, Joanna Lee, Joe Brockmeier, Joe Murray, Joey Amanchukwu, John Amaral, John Barbuto, John Eckman, John Sullivan, John Weir, John Yerhot, Jonathan Altman, Jonathan Shar, Jonathan Torres, Jono Bacon, Jordan Harband, Jose Ivan Hernandez, Jose Octavio de Castro Neves Jr, Joseph Beard, Joseph Donahue, Joseph Jacks, Joseph Lemor, Joseph Potvin, Joseph Presley, Josh Berkus, Joshua Drake, Joshua Simmons, Joventino Cinsers, Julia Ferraioli, Justin Colannino, Justin Dorfman, Jutta Suksi, Kara Deloss, Karen Sandler, Karl Fogel, Karsten Reincke, Karsten Wade, Kassandra Dhillon, Kat Walsh, Katharina Koerner, Katie McLaughlin, Keith Herrington, Kenneth Delaney, Kev Barnes, Kevin Fleming, Kevin Sonney, Kimberly Craven, Kirsten Petersen, Kirstie Whitaker, Knute Holian, Kriss Bajo, Kristin O’Connell, Kristina Podnar, Kyle Karsten, Lauren Maffeo, Lauren Pritchett, Laurence Moroney, Laurent Joubert, Laurent Marie, Lawrence Landis, Lawrence Rosen, Lea Gimpel, Leon Allen, Lila Bailey, Lindsay Colbern, Lisa Hoong, Lorna Mitchell, Luca Miotto, Lukas Atkinson, Lucas Carvalho, Lucas Gonze, Lucy Hyde, Luis Majano, Luis Villa, Lyn Muldrow, Maarten Aertse, Madison Oliver, Malcolm Herring, Manny Martinez, Manoj Hathi, Manrique Lopez, Marc Jones, Marcel Kurzmann, Marcos Siriaco, Marcus Cuda, Mariano Ortu, Mariatta Wijaya, Marissa Nino, Mark Atwood, Mark Cathcart, Mark Collier, Mark Dingemanse, Mark Radcliffe, Marsee Henon, Marshal Miller, Martial Michel, Martin Haynes, Marty Wolf, Mary Hardy, Mary Radomile, Marzieh Fadaee, Masayuki Igawa, Matt Mullenweg, Matt White, Matthew Broberg, Matthew Lien, Maxime Chambreuil, Maya A. Bernstein, Mayara Frade, McCoy Smith, Meagan Gill, Mer Joyce, Mia Lund, Micah Koch, Michael Brodeur, Michael Guo, Michael Hertig, Michael Meehan, Michael Rhöse, Michael Robinson, Michael Sheldon, Mick Smothers, Mike Bursell, Mike Linksvayer, Mike Milinkovich, Mike Nolan, Mo Zhou, Moez Draief, Mohamed Nanabhay, Monica Ayhens-Madon, Monica Lopez, Mophat Okinyi, Moustapha Abdoulaye Hima, Murat Guzel, Myles Borins, Naresh Adepu, Natali Vlatko, Nathan Urwin, Nicholas Weinstock, Nick Vidal, Nicolas Duminil, Nicole Martinelli, Nikola Desancic, Nithya Ruff, Nivedita M, Noah Boswell, Noel Hidalgo, Ole-Morten Duesund, Olga Creutzburg, Oliver Mensah, Olivier Dobberkau, Omar Santos, Otmar Humbel, Paige Miner, Pamela Chestek, Paris Buttfield-Addison, Patrick Lehmann, Patrick Masson, Patrick Ohnewein, Patrick Schleizer, Paul Berschick, Paul McGuire, Paul Mills, Paul Phillabaum, Paul Tyng, Paula Hunter, Pete Farkas, Pete Lilley, Peter Chu, Peter Dunkley, Peter Ellis, Peter Wang, Phil Robb, Philippe Krief, Philippe Laurens, Philippe Ombredanne, Phoebe Quincy, Phyllis Dobbs, Pierre Baudracco, Pieter van Noordennen, Qing Tomlinson, Rachel Foucard, Rachel Lawson, Ralph Loizzo, Ran Yu, Randal L. Schwartz, Reshama Shaikh, Ricardo Mirón Torres, Ricardo Sueiras, Richard Fontana, Richard Littaeur, Richard Schneeman, Richard Zak, Rick Clark, Rob Allen, Rob Landley, Rob Mackie, Robert Cathey, Robert Hansel, Rohan Singh Rajput, Roland Turner, Roman Iakovlev, Roman Shaposhnik, Rory MacDonald, Rounak Gupta, Rowan Wilson, Roy Hyunjin Han, Russell Nelson, Ruth Suehle, Ryan Coonan, Ryan Harvey, Sachiko Muto, Saira Jesani, Sal Kimmich, Sam Bishop, Sam Ramji, Samantha Mandell, Sarah Bower, Satya Mallick, Sean roberts, Sebahattin Özata, Sebastian Schuberth, Sebastien Michea, Seo-Young Isabelle Hwang, Serenella Saccon, Serkan Holat, Seth Hillbrand, Seth Kenlon, Seth Schoen, Shane Couglan, Shilla Saebi, Shivam Potdar, Shuji Sado, Siddharth Manoharrs, Silona Bonewald, Simeon Oriko, Simon Muskett, Simon Phipps, Somenath Dasgupta, Soohong Park, Stefano Canepa, Stefano Maffulli, Stefano Zacchiroli, Steffen Krause, Stella Biderman, Stephen Augustus, Stephen Jacobs, Stephen Mather, Steven Muegge, Steven Pritchard, Stuart Langley, Surya Santhi, Sven Spiller, Swapnil Bhartiya, Tammy Zhu, Tanya Dadasheva, Tarunima Prabhakar, Ted Liu, Tetsuya Kitahata, Thomas Blood, Thomas Koeppen, Thomas Peikert, Thomas Schwinge, Thomas Steenbergen, Thorsten Glaser, Timothy Gaudette, Tobie Langel, Todd Lewis, Tom Bedford, Tom Callaway, Tom Schoemaker, Tonico Novaes, Tony Scully, Tony Wasserman, Tracy Hinds, Tracy Miranda, Tyler Bevan, Vaishali Avhad, Van Lindberg, Veronica Abdala, Victor Storchan, Victoria Fierce, Vinay Vira, Vincenzo Disomma, Vineet Aguiar, Vipul Siddharth, Vivek Krishnan, Wiebe Cazemier, Will Norris, Wim de Vries, Yaroslav Russkih, Zaheda Bhora, and Zuzanna Warso.

The post <span class='p-name'>2023 in review: many reasons to celebrate</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

The most popular licenses for each language in 2023

Open Source Initiative - Thu, 2023-12-07 08:32

The 2023 report of the licenses in use by the biggest package managers highlights the need to educate developers on the importance of licensing information. While many developers know that Open Source software forms the backbone of modern development, the data shows that much of their software is shared (and most likely also used) without a license.

Using data from OSI’s community project ClearlyDefined, Aleksandrs Volodjkins explored the ClearlyDefined dataset from September 21, 2023. ClearlyDefined is a collaborative project providing comprehensive and standardized metadata about software components’ origins and licenses, its data shed light on the prevailing trends that shape the Open Source ecosystem.

Overall, MIT and Apache 2.0 are by far the most popular licenses, although popularity of licenses vary greatly depending on the package manager. The simplicity of these licenses, allowing users to modify and distribute code with minimal restrictions without imposing additional requirements, has undoubtedly contributed to their widespread adoption.

The license terrain is not uniform across all package managers. Each programming language has its own set of license preferences within their ecosystems. For instance, the JavaScript community often leans towards the MIT license, while Python developers show a similar affinity for Apache 2.0. The ISC license, with its simplicity and permissiveness, finds its niche in the JavaScript community. BSD licenses, both 3-Clause and 2-Clause, maintain a steady but comparatively lower adoption rate. The GNU General Public License (GPL), embodying the ethos of free software, enjoys a presence but falls behind MIT and Apache 2.0.

The Challenge of Unlicensed Components

Despite the prevalence of well-established licenses, a concerning revelation emerges from the ClearlyDefined dataset – a substantial percentage of Open Source components lack a designated license or carry the SPDX identifier “NOASSERTION.” This ambiguity introduces uncertainty about the permissible use of such components, potentially hindering collaboration, creating legal complexities, and security concerns for developers.

The Need for Clarity and Standardization

Addressing the issue of unlicensed components is crucial for the continued health of the Open Source community. Developers, organizations, and the community at large benefit from clear and standardized licensing. It not only facilitates collaboration but also ensures legal compliance and protects the intellectual property of contributors. Additionally, it helps developers to keep track of components that might have vulnerabilities.

Towards a collaborative solution

The issue of unlicensed components is a community-wide challenge that needs a community-wide approach. The ClearlyDefined project aims to address this challenge by inviting developers across different organizations to crowdsource a global database of licensing metadata for every software component ever published. It allows developers to fetch a cached copy of licensing metadata for each component through a simple API and contribute back with any missing or wrongly identified licensing metadata, helping to create a database that is accurate for the benefit of all. Check it out!

Javascript (npm)

The npm package manager for JavaScript contains components that mostly use the MIT license (53%), followed by Apache 2.0 (14,76%) and ISC (10,48%). The ISC license was published by the Internet Systems Consortium and, while popular among JavaScript projects, it’s not used much by other programming languages. A small percentage of projects don’t have a license (8%) or a SPDX-identified license / NOASSERTION (5.49%).

.NET (Nuget)

One of the most alarming data for Nuget, the package manager for . NET is that a great percentage of its components either don’t have a license (26.76%) or are found to be NOASSERTION (31.95%). Licenses under MIT or Apache 2.0 are at 21.55% and 13.37% respectively.

Java (Maven)

The great majority of components in Maven, the package manager for Java, use the Apache 2.0 license (69.18%). Components with the second most popular license, the MIT, represent only 7.4%. Components with NOASSERTION are at 14.75%.

Python (Pypi)

For Pypi, the package manager for Python, components under the MIT and Apache 2.0 licenses dominate, at 29.14% and 23.98% respectively. Components under BSD 2-Clause and GPL 3.0 are at 6.25% and 6.11%. A substantial percentage of components don’t have a license (23.69%).

Ruby (Gem)

The great majority of components at Gem, the package manager for Ruby, use the MIT license (63.11%). They are followed by the Apache 2.0 and BSD 3-Clause licenses at 8.22% and 6.66% respectively.

PHP (Composer)

The MIT license is a very popular choice among PHP components of the Composer package manager, at 64.37%. Projects under BSD 3-Clause and Apache 2.0 sit at 5.72% and 3.92% respectively. 

Go

Apache 2.0 and MIT licenses dominate Go, with 32.49% and 20.1%. A substantial percentage of Go components don’t have a license (29.67%).

Rust (Crate)

For crate, the Rust package manager, projects under MIT and/or Apache 2.0 dominate. Combined, they represent 83.52%.

The post <span class='p-name'>The most popular licenses for each language in 2023</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Open Source AI: Establishing a common ground

Open Source Initiative - Tue, 2023-11-28 08:00

The current draft v. 0.0.3 of the Open Source AI Definition borrows wordings from the GNU Manifesto’s golden rule stating: 

If I like a program, I must be able to share it with others who like it.

The GNU Manifesto

The GNU Manifesto refers to “program” (not “AI system”), without the need to define it.  When it was published in 1985, the definition of a program was pretty clear. Today’s scene around artificial intelligence is not as clear and there are multiple definitions for AI systems floating around.

The process of finding a shared definition of Open Source AI is only in its infancy. I’m fully aware that for many of us here this is trivial and this phase is almost boring. 

But the four workshops revealed that a significant number of people in the rooms did not know the 4 Freedoms nor had any idea that OSI has a formal Open Source Definition. And this happened also at two Open Source-focused events!

Which definition of AI system to adopt

I don’t think the Open Source community should write its own definition of an AI system as there are too many dangers with doing that. Most importantly, adopting a vocabulary foreign to the AI world increases the risks of not being understood or accepted. It’s a lot more effective and will be more palatable to use a widely adopted definition.

The OECD definition of AI system

The Organisation for Economic Co-operation and Development (OECD) published one in 2019 and updated it in November 2023. OECD’s definition has been adopted by the United Nations, NIST and the AI Act may use it too. 

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment

Recommendation of the Council on Artificial Intelligence Adopted on:  22/05/2019; Amended on:  08/11/2023

 I discovered a 2022 document of the OECD with a slightly amended definition from the one of 2019.The 2022 OECD Framework for the Classification of AI systems removes the words “or decisions” from their previous definition, saying in the note :

Experts Working Group decided [“or decisions”] should be excluded here to clarify that an AI system does not make an actual decision, which is the remit of human creators and outside the scope of the AI system 

2022 OECD Framework for the Classification of AI systems

The updated definition used by the Experts WG is:

An AI system is a machine-based system that is capable of influencing the environment by producing recommendations, predictions or other outcomes for a given set of objectives. It uses machine and/or human-based inputs/data to:

  1. perceive environments;
  2. abstract these perceptions into models; and
  3. use the models to formulate options for outcomes.

AI systems are designed to operate with varying levels of autonomy (OECD, 2019f[2]).”

2022 OECD Framework for the Classification of AI systems

Surprisingly, the version amended in November 2023 by the OECD still uses the words “or decisions”.

The definition of AI system for US National Institute of Standards (NIST)

NIST AI Risk Management Framework slightly modified the OECD definition that includes the word “outputs”:

The AI RMF refers to an AI system as an engineered or machine-based system that can, for a given set of objectives, generate outputs such as predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy (Adapted from: OECD Recommendation on AI:2019; ISO/IEC 22989:2022)

AI Risk Management Framework The definition of AI system in Europe

To complete the picture, I also looked at the EU. In a document from 2019, in the early days of the legislative process, the expert group on AI suggested: https://digital-strategy.ec.europa.eu/en/policies/european-approach-artificial-intelligence:

Artificial intelligence (AI) systems are software (and possibly also hardware) systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal. AI systems can either use symbolic rules or learn a numeric model, and they can also adapt their behaviour by analysing how the environment is affected by their previous actions.

As a scientific discipline, AI includes several approaches and techniques, such as machine learning (of which deep learning and reinforcement learning are specific examples), machine reasoning (which includes planning, scheduling, knowledge representation and reasoning, search, and optimization), and robotics (which includes control, perception, sensors and actuators, as well as the integration of all other techniques into cyber-physical systems).

High-Level expert group on AI: Ethics guidelines for trustworthy AI

It’s worth noting that this definition is not used in the AI Act. The text of the EU Council suggests this one be used: 

artificial intelligence system’ (AI system) means a system that

  1. receives machine and/or human-based data and inputs,
  2. infers how to achieve a given set of human-defined objectives using learning, reasoning or modelling implemented with the techniques and approaches listed in Annex I, and
  3. generates outputs in the form of content (generative AI systems), predictions, recommendations or decisions, which influence the environments it interacts with;

which seems to be quite similar to the OECD text.

Why we need to adopt a definition of AI system

There is agreement that the Open Source AI Definition needs to cover all AI implementations and not be specific to machine learning, deep learning, computer vision or other branches. That requires using a generic term. For software, the word “program” covers everything, from assembly, interpreted to compiled languages. “AI system” is the equivalent in the context of artificial intelligence.

“Program” is to software as “AI system” is to artificial intelligence.

In the document What is Free Software, the GNU project describes four fundamental freedoms that the “program” must carry to its users. Draft v. 0.0.3 similarly describes four freedoms that the AI system needs to deliver to its users.

In v. 0.0.3 draft there was debate on the wording of the freedom — freedom to modify. For software, that’s the freedom to modify the program to better serve user’s needs, fix bugs, etc. Draft v. 0.0.3 says:

Modify the system to change its recommendations, predictions or decisions to adapt to your needs.

Draft v.0.0.3

The intention to specify what the object of the change is to establish the principle that anyone should have the right to modify the behavior of the AI system as a whole. The words “recommendations, predictions or decisions” come from the definition of AI system: what does the “system” do and what would I want to modify?

That’s why it’s important to say what it is we expect to have the right to modify. Tying that to an agreed-upon definition of what an AI system does is a way to make sure that all readers are on the same page.

We can change the wordings for that bullet point but I think the verb “modify” should refer to the whole system, not individual components.

We’re trying to adopt a definition of an AI system that is widely understood and accepted, even though it’s not strictly correct scientifically. The Open Source AI Definition should align with other policy documents because many communities (legal, policy makers and even academia) will have to align too. 

The newest definition of AI system from the OECD is the best candidate, without the words “or decisions.”

Next steps

I met with the Digital Public Goods Alliance in Addis Ababa on November 14. I expected to encounter a different assortment of competences than the ones I’ve met so far, and that was true. How far we are from consensus on basic principles is something I’m contemplating before releasing draft v.0.0.4 and move on to the next phase of public conversations. For 2024 we’re planning a regular cadence of meetings (online and in- person) and a release roadmap leading to a v. 1.0 before the end of the year. More to come.

The post <span class='p-name'>Open Source AI: Establishing a common ground</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

DPGA members engage in Open Source AI Definition workshop

Open Source Initiative - Wed, 2023-11-22 14:36

The meeting of the Digital Public Goods Alliance (DPGA) members in Addis Ababa was very informative. The OSI led a workshop to define Open Source AI and joined the subsequent presentation by the AI Community of Practice (CoP) of the DPGA. There were about 40 people in the room, split into seven groups of 5-6 people each. They were asked to review individually the four basic elements of the draft Open Source AI Definition and provide suggestions. Few people were familiar with developing AI systems and almost no lawyers. In the audience there were mostly policy makers and DPG product owners (not developers.)

Results of the Open Source AI workshop

There was a fair amount of agreement that the wording as illustrated was fairly good but required some tweaks. Most of the tables were eager to widen the scope of the Definition to include principles of ethics.

Some of the most notable comments:

  • In order to study the AI system it must be possible to understand the design assumptions behind the AI system and another group suggested adding a reference to explainability of its outcomes. 
  • One group highlighted that the purpose of studying the AI system is to gain confidence, understand the risks it poses, its limits and provide a path to improve it. They recommended a more extensive wording to clarify that being able to inspect its components (datasets, assumptions, code, etc) is important. They also added that data is not strictly necessary to be fully made available, putting privacy as one reason.
  • On the “modify” question, the group suggested simplifying the wording, replacing … with “outputs”.
  • On “sharing”, the group recommended to limit the shareability to responsible purposes, extending the scope of their recommendations also to the Use.
  • There needs to be a fifth principle to “do no harm”

A surprising outcome came from a group that felt that the verbs (study, use, modify, share) in the draft definition are not sufficient and new ones are necessary for AI. They brainstormed and came up with an initial list: train, tend (curate and store), evaluate (its capabilities) and evoke the model. This was a fascinating conversation that I promised to continue with its main proponent.

The comments received gave me a chance to close the workshop explaining why the Open Source Definition doesn’t prescribe respecting the law and avoids discussing ethics and why the OSI recommends moving these issues outside of licenses and into project governance and policies. 

It was energizing to see DPGA members having such a good opinion of Open Source and its power to be positively transformative that people want it to do more good with it. But injecting ethical principles into open definitions overloads them massively. The OSI will have to do more to explain that a definition should no more be dictating “acceptable use” than Meta, Alphabet or anyone else. Ethical considerations are highly contextual and there are rarely clear answers that a universal standard like the Open Source Definition and the future Open Source AI Definition cannot reasonably cover. The DPG Standard, on the other hand, is a more suitable document to include ethical considerations because it’s more contextual to deployment of technologies.

A working group presenting their edits The working groups at their tables The activity of a working group Group working at the AI workshop DPGA members in Addis Ababa Group working at the AI workshop DPGA members in Addis Ababa Notes from the Community of Practice meeting

The second half of the afternoon saw the Community of Practice on AI systems as digital public goods, co-hosted by the DPGA and UNICEF. They showed their first approach to distinguish the degrees of openness of an AI system’s components. The CoP has a very difficult task with two major obstacles. The first is they have to come up with a proposal to update the DPG Standard to cover AI before a well established definition of Open Source AI exists. The second is that they need to look at the intersection of responsible and open AI, balancing the values of “open” with a set of risks that are not yet fully understood either. All while technology evolves rapidly and the AI business ecosystem spreads FUD in all directions.

I’ve been highly skeptical about this gradient approach, which is not too different from what Irene Solaiman at Hugging Face proposed. As someone in the audience said: Introducing a gradient approach for DPG AI risks creating an opening to also have a gradient for software, diluting the mandate for Open Source software in the DPG Standard. With the race to create “quasi-open-source” licenses, the threat is too real to dismiss.I believe that Open Source AI can be as binary as Open Source software and the way to achieve that is to look not at the individual components of AI systems but at the whole. The next phase of OSI’s work on the Open Source AI Definition will explore exactly this aspect, diving deeper into practical examples. What do I need in order to study, use, share and modify something like LAIoN’s Open Assistant?

The post <span class='p-name'>DPGA members engage in Open Source AI Definition workshop</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Closing the 2023 rounds of Deep Dive AI with first draft piece of the Definition of Open Source AI

Open Source Initiative - Mon, 2023-11-06 10:00

We embarked on a process, promising at the beginning of the year that we’d make a first announcement at All Things Open, kickstarting a public conversation. We’ve delivered, thanks to contributions of many experts and sponsors. But it’s only the starting point. There is a lot more to do.

After two community reviews in person and a first pass at online comments, we released a new draft version 0.0.3.

The base of the conversation is a preamble to explain “why Open Source AI,” followed by the beginning of a formal definition: the document will get longer. Open Source experts will recognize the heavy borrowing from the free software definition and the structure of the GNU Manifesto: it’s not a mistake. We believe that the consensus on a Definition of Open Source AI will emerge after the stakeholders will have made a similar journey that led to the Open Source Definition. The OSD is basically a checklist that appeared after decades of free software development, when developers, users, business leaders, lawyers and policymakers had time to learn what freedom meant in the context of software. We don’t have decades to wait for AI but we can accelerate by building on top of what many of us already know and reach out to diverse communities to join the conversation.

That’s what the OSI is doing with these Deep Dive: AI cycles: inviting multiple stakeholders to learn and share their knowledge as we all make progress together towards a common understanding of AI systems.

What’s in draft v.0.0.3

The four freedoms have received a bit of wordsmithing for consistency and clarity, making them shorter compared to previous drafts. I removed the words “without any limitation” from the Use and Share principles as recommended by Chestek, and because a question about copyleft also came up at the workshop in Monterey.

The current version reflects the consensus of the suggestions emerged from the workshops in Raleigh and Monterey, and the online comments to v.0.0.2.

In addition to those changes, I did some cleanup of the word soup, removing all instances of the most loaded concepts like trustworthy, reliable, fair, etc. from the preamble: they only appear in the “Out of scope” section.

Enjoy and comment on draft 0.0.3.  

Known issues and next steps

There is no consensus on what definition of AI system to use. The draft 0.0.3 still uses the definition introduced by OECD in 2019  for lack of a better option. We’ll continue the conversation.

We have two more in-person workshops scheduled before the end of the year: Nov 15 at the DPGA annual summit in Addis Ababa; and Dec 12-13 at the Linux Foundation AI.Dev conference in San Jose. These were not planned at the beginning of the year when we announced the 2023 series but they’re extremely important to reach African tech leaders and policy makers and AI developers. 

At this point we want to close DDAI 2023 thanking the sponsors Google, Amazon, GitHub, OSS Capital, GitLab, Weaviate and Sourcegraph; the Linux Foundation for their travel grants; and individual donors, because we couldn’t have hosted the webinar series and run three in-person meetings without them.

We’re working on a plan for 2024 that includes expanding our reach to other communities with an eye on reaching consensus on a 1.0 release of the Open Source AI Definition in the quickest amount of time.

The post <span class='p-name'>Closing the 2023 rounds of Deep Dive AI with first draft piece of the Definition of Open Source AI</span> appeared first on Voices of Open Source.

Categories: FLOSS Research