Draft v.0.0.9 of the Open Source AI Definition is available for comments

kjetilk · September 9, 2024, 12:16pm

I’ll start my comment with two general observations:

The Open Source Definition came following 15 years of experience with Free Software. Now, there is a lot less practical experience to inform the OSAID work.
The main tension seems to me to be between the binary requirement and the pragmatic requirement. By the binary requirement I refer to @stefano 's statement:

Open Source is a binary concept: you’re either granting end users
and developers the right to have self-sovereignity over the
technology or you’re not.

By the pragmatic requirement, I refer to the requirement that some current systems must fit the definition. I very much agree with @stefano 's sentiment in the binary requirement, either essential freedoms are granted or they are not. However, it is also entirely possible that no current system does that, that we simply haven’t yet figured out to secure those freedoms in an AI world.

I also understand the urgency. This combined suggests to me that the OSAID must expect iterations and refinements (like most things in the industry). If the OSI is prepared to take on this long-term task, then I suggest a slight shift of focus to what must be done now, and what can change in (near) future.

I find that there are many weighty arguments against requiring open training data. Especially the (sad) situation of copyright and the colonization argument. I don’t know how much we can expect for copyright to change, but I have seen a substantial change in sentiment from 20-25 years back. As for the colonization argument, I think it is worth much more attention than it has had, and that we should proceed with care, but I also think that it can be addressed with OSS to enable disadvantaged people to find value in their data. I also think it is important to consider the federated training scenario. I could imagine (though I don’t know if it is practical) that heart data could be gathered from smart watches to give early warnings of heart disease, or say an impending heart attack. If this can’t be open source, then it would certainly be a proprietary Apple model…

Having been an open data advocate for 25 years, I find it very difficult to accept a the current definition, even though I acknowledge these arguments. There are two main points I would like to bring up:

Given that the definition can be updated, it seems to me to be better to have a requirement that can be relaxed as more information becomes available. I.e. it is easier to require open data now and relax that constraint later, then to say that “you were open source by the old definition, but not by the new one”.
The WGs studied four systems and found that open data was not necessary for the four freedoms. I acknowledge that. But how about the reverse, can we come up with concrete situations where the four freedoms would be restricted if training data was not available? And by that, I don’t mean hypothetical examples, nor analogies, but real world examples. I admit to not having been sufficiently deep in this area, so I am unable, but for those arguing against the current definition, this is the challenge I put to you, as I believe this is needed for the argument to be compelling.

Finally, it seems that the current definition needs some wordsmithing, especially around the “skilled person” and “equivalent system”. My experience is that such things are hard to get done online, but I would strongly encourage the OSI to bring it up in an upcoming F2F workshop.

I understand that the situation is complex, and overall I think you have made a good job. I am not quite comfortable about the lack of open data requirement, and would hope to see that further addressed. Also, I note (again) that governance is not being addressed and that it needs to be to alleviate the openwashing problem, and that such work should be encouraged even if it is out of scope for the current work.