On the current definition of Open Source AI and the state of the data commons

shujisado · September 12, 2024, 3:01pm

This is a very good discussion that touches on issues related to personal data. The analysis from a copyright perspective by Senficon should also be read by many people.

In Japan, where I live, it is already a legal practice, but in the future, it will likely become explicitly legal in many jurisdictions to use copyrighted works for AI training without permission. However, even if the data used for AI training is used legally, it should not be permitted to distribute it for purposes other than AI training. This is the nature of the copyright system, and obviously, such data is not open data. However, if we can provide other developers with a way to access the data used for training, those developers should be able to create equivalent AI models using the exact same methods. We must avoid a situation where we cannot call something Open Source simply because we used data that could be legally utilized without anyone’s permission.