Moving the Checklist to v1.0, MOF and Mozilla' Framework

stefano · October 15, 2024, 9:30am

Following up to @amcasari’s nudge Concerns and feedback on anchoring on the Model Openness Framework, we’ve been thinking how to integrate the Checklist with Mozilla’s Framework for Openness in Foundation Models (Framework)

This framework is in the same family of Linux Foundation’s Model Openness Framework (MOF)^[1]. The MOF focuses on deep learning artifacts, and Mozilla’s refers to “foundation models”. It makes sense to start from these as they’re the most prominent technologies in regulator’s eyes.

The Mozilla Framework is separated into two main areas: the Tech Stack and the Attributes of foundation models. The OSAID covers the Model Components of the tech stack and from the Attributes it covers Documentation and Licensing. The other pieces (infrastructure, product/ux, safeguards) are out of scope for the Open Source AI Definition.

The Model Components

The Model Components uses the same structure of the OSAID to identify the three main artifacts of an AI system, except it uses the term Model Weights while the OSAID uses the term “Parameters.”

Reading the OSAID RC1, the components required to comply seem to be:

Code
- Data (pre)-processing code
- Inference code
- Training code
- Supporting libraries
- Model architecture
Datasets
- Basically all of them
Model weights
- Basically all of them

The Framework doesn’t distinguish datasets based on their legal qualities, though, so we must look at the Attributes and Licensing boxes too.

The Attributes and Licensing

If the datasets contain Obtainable and Unshareable non-public data, then the role of the elements in the Attributes Documentation becomes important to evaluate if one has the preferred form to make modifications: the content of the whole Documentation box needs to be detailed.

How detailed? This is an area that will need to be clarified over time by practice. Hopefully standards will be developed, just like the meaning of “preferred form to make modifications” to software evolved since the 80s.

Finally, the OSAID imposes Licensing requirements. All components need to be released with OSI-approved terms.

What’s the relationship between Framework for Openness and the Model Openness Framework?

Both seem suitable frameworks to interpret the Open Source AI Definition. Maybe there is a way to adapt the Checklist to use terminology borrowed from both frameworks. Any volunteers?

For reference, the list of components according to the MOF:

If you thought of Life of Brian put a like here ↩︎

shujisado · October 16, 2024, 3:29pm

I finally finished reading Mozilla’s paper. It doesn’t seem like it will significantly change my perspective, and if adopting their framework improves compatibility with the outcomes of other organizations, I see no problem with it.

The paper also mentioned this, but I believe it’s time for OSI to provide some guidance on how to position open data licenses such as the Linux Foundation’s Community Data License Agreement (CDLA), ODC-By, and CC 4.0. Should they be treated as OSD-compliant licenses? Or should we encourage them to submit applications to the Open Data license review process for approval as OSI-approved licenses? I think we should start addressing this issue soon.

I assume you’re looking for comments regarding the elements in the Attributes Documentation, but my thoughts on this are still not organized. Let’s wait for someone else’s comments for now.

[Amendment]

I just mentioned that I would like to see other people’s comments, but I have one suggestion.

As Stefan-san has already mentioned, Mozilla’s framework deals with “foundation models.”

If we are going to create a checklist based on this, it would be a “Checklist for Foundation Models.” So, wouldn’t it be better to rename the checklist to “Foundation Model Checklist”?

This change would allow us, for example, to create a new checklist when dealing with systems that are not foundation models, rather than just modifying the existing “Foundation Model Checklist.” Of course, it’s still possible to modify the existing checklist, but this change would allow for greater flexibility in future revisions.

stefano · October 17, 2024, 9:40am

this is a very good point. There was a short discussion on license-review and there seems to be consensus that the OSI should and can review those licenses/legal terms. I guess it’s just a matter of starting with a trial run.

You touch on an important point: the original plan was to have a general purpose Checklist to apply to any machine learning system. Something like a guidebook that deployers and users of AI could use to understand if a system provides the ‘preferred form to make modifications.’ I don’t think we can have such general-purpose guidebook, not yet, at least.

Correctly, @amcasari noticed that the MOF is really tied to the needs of the LF’s GenAI & Data foundation, it’s not a general purpose framework for all machine learning.

I wonder if it makes sense to consider the checklist more like a “how to map OSAID to MOF” and “how to map OSAID to Mozilla Framework”. Such document is going to serve the purpose of anchoring the principles stated in the OSAID to references that are used in practice, in a very practical manner. Also, that leaves open the door for further expansion of these “howto” to become more a general guidebook that we envisioned at the beginning. What do you think?

gvlx · October 17, 2024, 4:43pm

Hi,

To make this as simple as possible, we should have more than a guide, a tool to orient the system’s proponents in their certification.

I proposed we could adapt existing tools to our purpose.

The MOT and FMTI could form a base for a process of certification, where the decision rules are given by the checklist (existing and future).

As for the process itself, I have to think about it and I’ll write down a proposal.

Answer me this, please: the actors of the validation process are the vendor/project submitting the Ai system and filling the required information, the validation system (described above), independent auditor(s) validating the input and outputs of the validation. Did I miss someone or something?