Are you sure we can cherry pick a single black box that might be persuaded to look (not become) a little bit less opaque and generalize it to the whole AI market?
Because you know, if the 3rd subset you describe covers the 99% of the whole training dataset, the other two are just smoke in the eyes.
And in an age of deadly supply chain attacks, how many organization would (ab)use such loophole to elude the technical and legal scrutiny imposed by the AI Act?
Do you really think that allowing healthcare black boxes to avoid any real scrutiny would push the AI market forward in a good direction?
And what about the freedom to study, @zack?
If the Open Source AI definition won’t require the full training data, most commercial open washed systems will distribute a tiny subset of such data to preserve the competitive advantage provided by their most valuable asset.
How could researcher do their job without access to the training dataset?
How could you write papers like this: Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification ?
Luckily there is no reason to renounce to a better OSAID just to please Meta&frieds, because you were not really outvoted at all!
If the co-design and all the work done by this global multi-stakeholder effort was not just a smoke screen to justify an open washing operation on large scale, we can surely expect OSI to abide the rules they have set and propose a release candidate that really requires the training and testing data to be shared (as they announced with this thread).