Draft v.0.0.9 of the Open Source AI Definition is available for comments

This definition does not grant the freedom to modify and is unacceptable as an Open Source Definition.

With AI models, the weights are the user interface. I can use them directly as a user. They are what is typically distributed to everyone.

The actual source of the model comes from the data AND the code. The weights are built using the code and the data. Together they make up the ability to reproduce and modify the original.

The weights are the program and they can not be built/compiled without access to both the code AND the data.

As an analogy, imagine if every running/compiled version of PostgreSQL disappeared from existence. The moment after that I could recreate that binary and run it because I have access to the source code.
Now imagine the same happened to the model weights for model that does not share their data. There is no way I could ever reproduce those same weights from just the code they ran.

Yes I can modify the weights after the fact but that is the same as adding a separate package rather than truly modifying the source. There is no possible way for me to run the code on a different data set and get the same weights. Modifying the weights after the fact with new data (aka fine tuning) would not produce the same weights as adding that data to the original data.

I understand some models are built on proprietary of sensitive data. In this case, all that can be shared is the weights. That is still amazing and great but call it something other than open source - how about open weights.

If someone said there was a proprietary piece of code required for their FOSS software, you would say then the project is not Open Source. It can still be a great and valuable contribution but it is not Open Source. This is the same logic you used for the commercial shared source licenses.

The ones muddying the water are the people loosely using the terms open source and creating the problem. As one example of how the term of open source is being abused look at llama3 (which is a great model I love and I am glad they share the weights), On their hugging face page you can not download even the weights without agreeing to give your email. Item 2 of their license agreement has restrictions on terms of use "

  1. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.

The term is being abused by those who want to get the benefit of the label but not follow the definition.

2 Likes