The OECD just published a new paper titled “Intellectual property issues in artificial intelligence trained on scraped data”.
It’s worth reading as it contains recommendations on policies that affect creators of content and data aggregators and a very detailed overview of many legal issues surrounding scraping and using data for training AI models.
Data scraping, according to this paper, is the automated extraction of AI training data from the web, online databases, and other sources using automated software tools or scripts.
This is different from the definition of Text and Data Mining included in the EU Copyright directive ("[a]ny automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlation"
.)
Chapter 3, The legal landscape for data scraping and growing litigation, is a valuable compendium of all the discussions we’ve had on this forum around data.
Chapter 4, Preliminary considerations and potential policy approaches, makes recommendations that are not too different from the policy recommendations made in the data governance whitepaper.
Relevant notes in the Annex, with the Selected copyright exceptions in different jurisdictions
.
After reading this paper, what are your perspectives on the challenges and opportunities presented by data scraping for AI training?