The Missing Third Leg: Training Data Excluded from Open Source AI Definition by [Co-]Design

anon18632855 · September 30, 2024, 8:45am

Yes. Per my reply to @Shamar, the precedent is for the licence that is limited for pragmatic reasons (e.g., LGPL) to adopt the alternative branding while still protecting the four freedoms. Alternatively, any compromise could be temporary under a single brand, but it may be difficult or impossible to revoke later.

“The proposal is that we acknowledge that taking a purist approach to data (e.g., demanding open data licenses) will drastically limit the number of candidates for certification”, to your points above, rather “requiring data (analogous to the proprietary code [under the LGPL precedent]) be accessible when building (training) the Open Source licensed redistributable software (model), in order to protect the four freedoms”.

Freeware (free as in beer, not as in freedom) is what we used to call software that was made available in binary-only form to use and share but not study or modify, as you know, and this is one of the main reasons people are reluctant (or prohibited by policies) to rely on it. The current draft is basically freeware for AI, and this impinges on the freedom to even use the software.

No, it’s not, which is why I consider this a train worth throwing ourselves in front of.