Welcome diverse approaches to training data within a unified Open Source AI Definition

stefano · September 13, 2024, 5:09pm

Thanks @arandal for taking the time to recommend edits to the definition.

Absolutely! We’ve been and we still are in constant touch with the leadership of CC and OKF during this whole process to make sure that none of the other “opens” would be negatively affected.

If I understand correctly your suggested edits in #1 can be summarized as:

Rename data as “source data”
Add language that explicitly allows the upstream developer of Open Source AI to require that downstream modifications are only made with “open data” (allow the persistence of requirements, as in copyleft/share-alike licenses)
Add language to state that, if the system is trained on unshareable non-public training data, allow the downstream developers to use open data to fine tune that model

In update #2 you seem to be arguing that training data is to trained model weights as software source code is to binary code.

Did I understand your comments correctly?