Hi @shujisado,
While I agree this might be a possible solution, license-wise, it still misses a few considerations (which are also missing in almost all conversation elsewhere, including the EU and US legislation discussions): different types of data, versioning and preservation.
Types of data:
- Train Data
- Test Data
- Weights/parameters - the values that were obtained from training the system
- Model Metadata - the ML design which is the core of the system [note: this should be considered a new type of code, not dat]
- System Metadata - the precise description of the physical environment used to train the system, including processor (type, and variant) and hardware
- User Data - the data supplied by the user on their queries, including the queries themselves
Versioning and Preservation: all data must be kept in a versioned dataset repository so its reuse can be tracked precisely.
I’ll be defining these terms more fully in another post in a few days.