Draft v.0.0.7 of the Open Source AI Definition is available for comments

zack · April 17, 2024, 8:13am

I’m not sure if you meant that in absolute terms, but for what is worth, this is not true.

For instance, in the domain of LLMs for Code, StarCoder2 is a state-of-the-art model whose training dataset is redistributable (and redistributed).

I’m less familiar with other application domains, but ML models trained only on data sources like Wikipedia, Wikidata, Wikimedia Commons, etc., could also easily redistribute their training datasets.