Open Source AI needs to require data to be viable

shujisado · June 6, 2024, 2:57pm

In the past few days, I have been continuously contemplating the validity of Julia-san’s proposal. I can emotionally understand the need for high standards for datasets. However, Stefan-san’s counterargument is reasonable and aligns with OSI’s goals. We must draw a realistic line.

I finally understood the incident involving the Pile dataset, and I realized that almost the same thing could happen in Japan. Japanese copyright law considers training on the original Pile dataset to be legal, but the distribution of the dataset could be regulated by laws other than copyright law.