Report from Pythia working group

stefano · February 21, 2024, 1:50pm

Scope of this exercise is to find the answers to the questions below, keeping in mind a simplified OECD’s definition: “An AI system is a system that given an input produces an output. With this in mind, think of what is the preferred form to make modifications to it.”

What do you need to give an input and get an output from Pythia? (Use)
What do you need to understand how Pythia was built, how can it be fine-tuned, what biases, get a sense of why it gives an output to an input … ? (Study)
- Understand how it was built, its biases, limitations, potential pitfalls, etc.
What do you need to give an input and get a different output from Pythia? (Modify)
- Techniques to adapt/modify the model for use including fine-tune and optimize for usage.
What do you need to let others give an input and get an output from Pythia? (Share)
- This part should refer to how the model is shared, as received or after it was fine-tuned or modified in any way.

Unresolved questions

These issues were raised but deserve to be discussed more widely

Inference code is necessary to use a system only if you want to run it on prem. What if you run a ML model off prem, as a service?
For certain use cases, access to the training data set or thorough documentation may be necessary to e.g. verify compliance with privacy laws

Also, two people raised the issue that the component row “Supporting tools” requires more details because depending on what’s included in there, some may be required for all freedoms. We’re waiting for the LF AI&Data team to finalize their paper so we can more stably refer to it for clarifications.

Participants to the WG

(In their personal capacity, not representing the views of the companies they work for)

Seo-Young Isabelle Hwang (Samsung)
Cailean Osborne (Researcher, Linux Foundation)
Stella Biderman (Eleuther AI)
Justin Colannino (Microsoft)
Aviya Skowron (Eleuther AI)

Results of the analysis

Code All code used to parse and process data, including:	Required to Use?	Required to Study?	Required to Modify?	Required to Share?
Data preprocessing code		4	3
Training code		4	3
Test code		4
Code used to perform inference for benchmark tests		3
Validation code		3
Inference code	3	2
Evaluation code		2
Other libraries or code artifacts that are part of the system, such as tokenizers and hyperparameter search code, if used.	2	4	2
Data All data sets, including:
Training data sets		4
Testing data sets		4
Validation data sets		3
Benchmarking data sets		3
Data card		1
Evaluation data		3
Evaluation metrics and results		3
All other data documentation		3
Model All model elements, including:
Model architecture	3	3	3	3
Model parameters	4	3	4	4
Model card
Sample model outputs		1
Other Any other documentation or tools produced or used, including:
Research paper
Usage documentation
Technical report
Supporting tools	1	1	1	1

Danish_Contractor · February 23, 2024, 5:00pm

Curious to understand from the WG – what would be situations where “Required to Use” would not be equal to “Require to Share?” Wouldn’t the former be a super-set?
eg: Inference code: For someone to share a model that one could generate outputs from; wouldn’t you, at the minimum, require inference code (or equivalent).

One could of course look at the model (code) = Model Architecture and figure out how what needs to be done to get outputs from a model but I think, for a computer programmer with no machine learning background that would be hard.