Training data access

fontana · March 6, 2024, 1:21am

It occurs to me that it would be quite problematic if “open access” were determined entirely by the risk tolerance of a distributor of third-party copyrighted material. One entity might release a model along with a training dataset that includes a lot of photos from Flickr, say, perhaps deciding to ignore the potential claims of authors of the photos or the licensing terms applied by those authors. Another entity might use the same training dataset and believe its use of the data in training is fair use, but might refrain from publishing the dataset because of concerns that in that context the dataset would be infringing. Why should the first entity benefit from the perception of having enabled “open access”?

Topic		Replies	Views
Data is required - my arguments all in one concise place Open Source AI	2	87	September 25, 2024
On the current definition of Open Source AI and the state of the data commons Open Source AI	16	156	September 15, 2024
The OSAID requires training data to be shared Open Source AI	9	248	October 2, 2024
Proprietary Data Considered Harmful to Open Source AI Open Source AI	6	127	October 10, 2024
Overarching concerns with Draft v.0.0.8 and suggested modifications Open Source AI draft	11	1563	May 19, 2024

Training data access

Related topics