Proposal to handle Data Openness in the Open Source AI definition [RFC]

Yes, being able to access the data is important. Understanding this, OSAID requires information about how the data was obtained, selected, and so on.

To use your example of PostgreSQL, if the source code can be obtained from another third-party repository rather than your own, and it is the same, then it is acceptable to direct users to that other repository.

For reference, here is the current clause on data information in OSAID:

Data information: Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data. Data information shall be made available with licenses that comply with the Open Source Definition.

  • For example, if used, this would include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics, how the data was obtained and selected, the labeling procedures, and data cleaning methodologies.
1 Like