Why and how to certify Open Source AI

stefano · June 26, 2024, 10:20am

Let’s put more thoughts in this thread since there were some news last week that may inform the conversation.

The concept of “preferred form to make modifications” in the OSAID leaves some space to interpretation but so does the OSD and the Free Software Definition, both have been object of furious debates to interpret corner cases (if interested, read the debates that CAL generated, spanning many months).

In practice and for the examples we’ve seen so far, it’s fairly easy to spot an Open Source AI by generalizing the examples of Pythia, OLMo, LLM360 and BigScience’s models (if they change their licenses.)

By looking at the results of the validation and @amcasari’s comment the issue is not much in the ambiguous terms on the Definition and understanding what the preferred form to make modifications is. The issue seems to be more in the Checklist below, and how difficult it is to find the required components for someone that is not the original developer.

This problem is felt also by the Linux Foundation, which the Checklist is based on.

To address it, the LF released last week the MOF Tool https://mot.isitopen.ai/. This tool allows the original developers of AI systems to add links to the components of their systems and their licenses. @lf_matt_white can explain better how that works internally. This has the potential of becoming an industry standard, given the size of the LF.

We already adapted the Model Openness Framework to the Open Source AI Definition, so I can imagine that the OSAID compliance could become an overlay of the MOF, displayed on the tool or somewhere else where it matters.

Does anybody want to play with the MOF tool and the OSAID? I’m happy to provide support.

But there could also be other frameworks (like Mozilla’s model), too like @amcasari asked Concerns and feedback on anchoring on the Model Openness Framework.

That’s to be expected but we don’t know what the future of AI looks like… Today I can see one reason for a model to show compliance to the Open Source AI Definition:

This! One of the reason for OSI to start this process is exactly to be able to offer a reference for policy makers, one Definition supported by a large variety of interest groups and maintained by a neutral group. Hopefully we’ll get one on time.

On licenses:

I tend to agree: the OSI License Committee hasn’t been asked if they intend to start evaluating licenses for data and datasets, for documentation and ultimately terms/covenants/agreements/contracts for model parameters. We should ask them to do now… Anybody wants to open a separate thread?