Where to find the description of the "components"

stefano · March 21, 2024, 8:04pm

The team at the Linux Foundation AI & Data Generative AI Commons group has published the article that we based the list of components of ML systems on. If you’re wondering what Training, Validation and Testing Code is, you should read this article.

Matt White, Ibrahim Haddad, Cailean Osborne, Xiao-Yang Liu (Yanglet), Ahmed Abdelmonsef, Sachin Varghese have worked with other LF members to establish a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access. This work is useful to better understand the environment in which the Open Source AI Definition needs to operate in.

The relevant piece for us is section 5, the Model Openness Framework (MOF) Components:

Datasets
Data Preprocessing Code
Model Architecture
Model Parameters
Model Metadata
Training, Validation and Testing Code
Inference Code
Evaluation Code
Evaluation Data
Evaluation Results
Supporting Libraries and Tools
Technical Report
Model Card
Data Card
Research Paper
Sample Model Outputs
Model Openness Framework Configuration File

and their definitions. You’ll recognize the terms from draft 0.0.6.

shujisado · March 23, 2024, 12:23am

I certainly recognize those terms.
Are we to share our definitions of these technical terms with LF AI&Data or are we to cite LF AI&Data’s definitions?

If we are going to cite them, I think it would be better to make it clear where we are citing them from, since these are important terms.
Yes, I understand that the paper was published a few days ago. It is in the future.

stefano · March 25, 2024, 3:45pm

We should cite the paper now that it’s public, we can do that Going forward we’ll make sure of that.

justin · March 27, 2024, 9:17pm

Unfortunate the paper itself is not open source, considering the non commercial clause.
I had considered writing an article about it, however as the site I operate is ad supported I would feel the coverage is very disconnected from the material. Or even trying to structure guides and details around the information would be more difficult than it has to be.

stefano · March 28, 2024, 9:21am

A paper is not “open source”: there are better reference frameworks than the OSD which refers to software. You should check the Definition of Open.

I think you’re misunderstanding the role of the license of the paper: you can cite, quote and criticize not only that paper but anything basically, even content and works that are distributed with a non-commercial clause. That’s part of how copyright works.

This conversation is off-topic.

justin · March 28, 2024, 4:29pm

Of course you are correct. Thank you for bringing clarity to the concern (I won’t pursue the conversation further, though I might point out there is TeX source included and TeX is a Turing complete language.)