Training data access

lumin · February 26, 2024, 5:05am

Exactly. A pre-trained neural network without its dataset being publically available, is still factually controlled by its creator. It does not encourage people to make improvements over the original pre-trained neural network. Without the original dataset, when the community tries to make improvements through alternative datasets, there is no controlled experiment from which we can tell whether there is really an improvement. This is critical to academia. Open-source is important for academia.

People cannot exclude this requirement just because “training it again or making modification in its architectures etc” is “too high-end / power-user-specific / advanced” to a generic user. The open-source definition is designed to people including machine learning engineers, researchers, and scientists. Do not exclude those minority from the potential audience of the OSAID.

If OSAID does not require original training dataset, I’d say it will become a historical mistake.