Following up to @amcasari’s nudge Concerns and feedback on anchoring on the Model Openness Framework, we’ve been thinking how to integrate the Checklist with Mozilla’s Framework for Openness in Foundation Models (Framework)
This framework is in the same family of Linux Foundation’s Model Openness Framework (MOF)[1]. The MOF focuses on deep learning artifacts, and Mozilla’s refers to “foundation models”. It makes sense to start from these as they’re the most prominent technologies in regulator’s eyes.
The Mozilla Framework is separated into two main areas: the Tech Stack and the Attributes of foundation models. The OSAID covers the Model Components of the tech stack and from the Attributes it covers Documentation and Licensing. The other pieces (infrastructure, product/ux, safeguards) are out of scope for the Open Source AI Definition.
The Model Components
The Model Components uses the same structure of the OSAID to identify the three main artifacts of an AI system, except it uses the term Model Weights while the OSAID uses the term “Parameters.”
Reading the OSAID RC1, the components required to comply seem to be:
- Code
- Data (pre)-processing code
- Inference code
- Training code
- Supporting libraries
- Model architecture
- Datasets
- Basically all of them
- Model weights
- Basically all of them
The Framework doesn’t distinguish datasets based on their legal qualities, though, so we must look at the Attributes and Licensing boxes too.
The Attributes and Licensing
If the datasets contain Obtainable and Unshareable non-public data, then the role of the elements in the Attributes Documentation becomes important to evaluate if one has the preferred form to make modifications: the content of the whole Documentation box needs to be detailed.
How detailed? This is an area that will need to be clarified over time by practice. Hopefully standards will be developed, just like the meaning of “preferred form to make modifications” to software evolved since the 80s.
Finally, the OSAID imposes Licensing requirements. All components need to be released with OSI-approved terms.
What’s the relationship between Framework for Openness and the Model Openness Framework?
Both seem suitable frameworks to interpret the Open Source AI Definition. Maybe there is a way to adapt the Checklist to use terminology borrowed from both frameworks. Any volunteers?
For reference, the list of components according to the MOF:
If you thought of Life of Brian put a like here ↩︎