Overarching concerns with Draft v.0.0.8 and suggested modifications

zack · May 18, 2024, 7:55am

Hello Julia, Spot, I am in full agreement with the position that datasets (training included) are part of the “preferred form of modification” of AI systems and, as such, needed to exercise the freedoms to study and modify them. I supported and still support making their release required in OSAID, under an open data license. (That too has been discussed in the past, but not retained up to v0.0.8, unfortunately.)

Your take on requiring the release of a “high quality equivalent synthetic dataset”, when the original dataset cannot be released, is novel and I quite like it. It would be great if that can be the compromise we reach to include the dataset requirement back in OSAID.

Note however that it is plagued by the same problem that you (correctly) criticize elsewhere, of leaving undefined what “equivalent” means, opening the door to potential loopholes. I don’t think that problem is fixable for a definition aiming to cover “AI systems” in general, and it doesn’t make things worse on this front for OSAID, so I don’t consider it a blocker for adopting the proposal.

Cheers