[bug] OSAID RC1 still mentions the freedom to study

Shamar · October 4, 2024, 7:01pm

During the presentation at Open Source Summit Europe, @stefano (OSI President) argued that

we concluded that while data is essential for understanding and studying the system, it’s not the “preferred form” for making modifications

While most developers¹ do not agree with such "conclusion"², lets assume OSI is right.

But why the OSAID definition still pretends to grant the freedom to study if it allows unshareable data that cannot be studied by definition?

Can you fix the definition to preserve Open Source developers’ credibility?

At least, it would be a definition (a bit more) logically coherent.

the need of training data to fully modify an AI system has been reiterated here by several people, such as @lumin, @Mark, @madbob, @juliaferraioli, @anon18632855, @thesteve0, @quaid, @zack, @gvlx, and so many others (almost everywhere) that I cannot mention them all…
given how the exclusion of training and testing data was obtained by Meta employees from the Llama team, and given that nothing changed after Sam discovered the trick, calling that “conclusion” sounds “strange”… it was a political decision taken by OSI board long before the “co-design” process was designed to justify such decision.

gvlx · October 6, 2024, 5:23pm

That is my impression too.

As in the comment below, the fear of “be deemed ineffective” is real.

And, as I believe it was again reinforced in the last townhall and it was probably written elsewhere, the OSI board is also concerned that a too strict definition might lend itself to an “empty set” of OSAID-compliant Ai systems.

I would argue that:

firstly, we do not abdicate of our principles, and,
secondly, we invite all those stakeholders to see how they could approach their systems onto to our definitions and carve out a compromise which does not taint our ideas.

cora · October 7, 2024, 7:25am

In truth, this seems more like an excuse than anything else, given how community feedback showing otherwise has been ignored.

To paraphrase Tara Tarakiyee, the OSI has now chosen to undermine the credibility of open source (and its own authority):

there are good things about the definition that I like, only if it wasn’t called “open source AI”. If it was called anything else, it might still be useful, but the fact that it associates with open source is the issue.

It erodes the fundamental values of what makes open source what it is to users, the freedom to study, modify, run and distribute software as they see fit. AI might go silently into the night but this harm to the definition of open source will stay forever.

I look forward to reading @juliaferraioli’s followups regarding the backstage of the design process.

shujisado · October 7, 2024, 1:09pm

From the perspective of representing Japan, if OSAID simply results in creating an “empty set,” it would be more convenient for the definition itself not to exist. In Japan, there is no law that specifically mentions Open Source AI, like the EU AI Act, and if the goal is to prevent open-washing, the argument that open source AI does not exist could be acceptable. Though, this would be a disappointing outcome for organizations that are gathering datasets under Open Data licenses.

gvlx · October 7, 2024, 2:16pm

I’m convinced that will not happen, as it has already been proven with Open Source licenses, once the definition is set, those projects that are actually interested in being open will adjust themselves to match the requirements.

And even if some projects fail short of the definitions (for instance, their model was trained with unsharable data) the remaining artifacts could be enough to create a fork which complies fully with the OSAID.

You already see that case with software where projects are forked to work around closed binary blobs which were holding the project hostage,