On the current definition of Open Source AI and the state of the data commons

Shamar · September 15, 2024, 8:16am

Why?

It’s perfectly possible to build LLM that match such Open Source AI definition.

So much that it has already been done in Italy by the University La Sapienza were a group of researcher trained a fundational model called Minerva with 5 billion tokens from open access texts only.

If an underfunded University did it already, we know it’s not difficult at all.

And in fact I guess there are several other LLM out there that would match a proper OSAI definition, but they are simply obscured by the hype that surround opaque and uninspectable commercial LLMs.
We shouldn’t strive to adopt a OSAI definition that let such opaque and closed systems to pass as “open source”, but we should strive to let existing and novel systems that really follows the values and provide the freedoms of open source to shine!

There is really no reason to adopt a misguided Open Washing AI definition instead of a coherent Open Source AI one.