Text of the keynote at All Things Open

stefano · November 5, 2024, 11:17am

The organizers of the All Things Open conference offered OSI a 5 minutes slot to announce the launch of the Open Source AI Definition. I’m sharing the prepared text below as an inspiration for others to use in their promotional material.

The Open Source Initiative will announce today the first stable version of the Open Source AI Definition.

This milestone is the result of a journey, a broad multidisciplinary, multistakeholder, global conversation spanning almost two years. It’s a major achievement for OSI.

We’ve been maintaining the definition of Open Source for 26 years. That’s the foundation of a vast ecosystem valued at 8.8 Trillion U.S. Dollars.

Open Source software is everywhere, the list of licenses complying with the Open Source Definition is the pivot point for developers, lawyers and regulators around the world.

That was all going fine until a few years ago a turbulence shook a lot of our assumptions. New products based on Artificial Intelligence systems were rapidly going in the hands of millions of people and regulators all over the world were considering its implications for citizens and society.

These AI products call themselves “open” or “open source” even when they are not, and regulators used the term in Open Source AI in new laws.

We saw this shift and needed to act: We were forced to provide a Definition of Open Source AI, lest someone else would do it for us.

The OSI didn’t know how to treat applications like DALL-E or Copilot. They’re software but they’re not programmed the way we’re used to. They rely on data but OSI traditionally hasn’t dealt with data, we deal with software.

We had a lot of questions three years ago: if Copilot has bugs, how do you fix it? If we wanted to ask for DALL-E to be Open Source, what exactly shall we be asking for? Is there an equivalent for the “source code” of Claude or ChatGPT?

These questions go to the heart of our mission: what do developers need to fork AI systems, to level the AI playing field, and to learn how to build AI machines?

We had to go on a quest to learn. We’ve spoken to numerous machine learning experts, lawyers, open source developers, content creators, philosophers around the world, in person and online, thanks to a grant from Sloan Foundation. Each of these has brought vital insight and shaped the Definition as it stands. The process was difficult and not without surprises. There have been controversies, too.

What we’ll announce later today is just a first milestone. There are so many things we’ve learned, and we know we all still have more learning to do.

We started from the intention to give the AI communities the rights they need to drive the open source innovation engine worth 8.8 trillion in value. The rights to fork, learn from, and improve an AI system.

Where we are on this journey now is that Open Source AI is one that gives you access to model parameters, the code used to train it, the code used to build the dataset, and all of the data that you can legally share.

This is the minimum requirement for people to study how a machine learning system has been built and meaningfully fork it, take the pieces and build something on top of them or something new from scratch. Importantly, the folks who build AI in a collaborative way – LLM360 and EleutherAI agree with where we’ve drawn the line to allow the frictionless innovation that has proven so successful in Open Source software.

We need you to take this first version of Definition into the next phase. It took a decade for the Free Software Definition to add freedom 0, completing the freedoms to share, modify and study with the freedom to use the software for any purpose. The Open Source Definition was still receiving changes 5 years after its initial publication.

Just like Jono said yesterday at the Community Leadership Summit, we need progress more than perfection, and so long as we all keep working together to achieve progress, the Open Source AI Definition will get stronger over time.

Come to OSI’s session at 12:45 in Ballroom C to learn how we got the version 1.0 and see what’s coming next.