Privacy or real data transparency? A false dichotomy

Shamar · October 10, 2024, 5:00pm

In the last opinion article from OSI, Mr. Choudhury argued:

You can’t legally give us all the data? Fine, we’ll fork it. For example, you made an AI that recognizes bone cancer in humans but the data can’t be shared. We’ll fork it! Tell us exactly how you built the system, how you trained it, share the code you used, and an anonymized sample of the data you used so we can train on our X-ray images.

His words mirror previous arguments from the OSI President

So apparently OSI believes that to get powerful “Open Source AI” systems we must renounce to system transparency, builders accountability and even to personal data protection (as many AI system notoriously output training data anyway) and allow unshareable data to be used in the training of such systems.

But is this really the case?

Privacy and personal data protection are different from secrecy.

For example, article 6 of the European GDPR define several alternative conditions that enable a lawful processing of personal data and article 9, that prohibit the processing of some special categories of data (including data concerning health), at the second comma lists 10 exception to such prohibition.

Of particular interest in our analysis is the first of such exceptions:

(a) the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject;

Consent is indeed the principal base of lawful personal data processing in Europe.

So any group that want to use personal data to train an medical AI system will ask for people consent anyway (at least in Europe).

So the simple and easy solution to this false dichotomy is to contextually request the consent to the distribution of properly anonymized versions of the health data collected.

Obviously, such data should be anonymized before feeding them to the training process, both to mitigate the damage caused by their leaks and to comply with a serious OSAID that mandate the real training data, not different ones.

But nothing prevent builders to create powerful AI systems that are totally transparent, granting both the freedom to study and to modify the system, while respecting human rights.

We don’t have to choose between privacy and Open Source AI.

We can have both.
(but not with the current version)