Training data access

zack · February 9, 2024, 8:11am

The main argument for including data in the definition is that they are part of the preferred form for modification of an AI system. We can debate (ideally, together with the actors who are actually creating and modifying AI systems) whether it is true or not that training data is part of that “preferred form”, but it is undeniable that if they are, then the definition of Open Source AI must include them.

(Otherwise, you would have the analogous of a binary executable in ELF format under an open source license, with no source code available for it.)