FSF announced basics of free ML Application definition

peterbrownq · October 24, 2024, 10:55am

https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications

The key point is they do not agree with OSI that ML models with nonfree training data should be labeled as nonproprietary and so those ml models won’t be considered free as in freedom, or libre, or free software or foss. I’m glad. It is a consistent position.

For many nonfree models, seems the majority of the work done on them is collecting and curating a dataset outsiders can’t access. Outsiders are not on the same playing field, they cant collaborate or contribute to that work in any natural way. I remember many years past, I heard some open source boosters saying things like “open source is about collaboration (OSI says so!), which leads to innovation and making money. Free software is confusing and isn’t about that, so if you like collaboration and success, let other ppl know that by saying open source rather than free software.” It strikes me as a bit ironic that free/libre definition will diverge from open source on something crucial for collaboration, but it also reminds me that the money/success part of that open source promotion was implied to be the really important part, and I think that to some of those people, if the logical equivalent of open source for machine learning does not help people a lot of ppl make money, of course change it any way, in fact, if u see open source as a #1, a set of business practices with mutual benefits, then you should try to find that for ml, and then sort out the details. And, amazingly, the osi’s faq about training data is entirely held up by the same kind of logic, it says: if u just start from the premise that any open source definition should aim to be similarly influential in the ml field as in non-ml software, this is all perfectly logical. Oh well, OSI, go chase your dreams, I’m glad we have the FSF to call a spade a spade.

Kappa · October 24, 2024, 1:00pm

Thank you for giving us the opportunity to reach different conclusions. I think your reading is too hasty and misses the focal point.

First, the FSF and OSI agree on a very fundamental point, often “misunderstood”:

training data is not the “source code” of model parameters in the usual sense.

This is crucial and I’m glad that the two organizations agree at the root. The rest of their statement, in particular if you dissect the language, is actually very similar to OSI’s position: It recognizes the importance of training data but it also says:

nonfree ML [may] have valid moral reasons for not releasing training data, such as personal medical data. In that case, we would describe the application as a whole as nonfree. But using it could be ethically excusable

Since right before they have established – unsurprisingly – the equivalence “free = moral”, this a remarkable statement admitting that there is a grey area of “lack of freedom” but “ethically acceptable”, because data is not source code (aka “the preferred form of making modifications”). The FSF is not saying in unequivocal terms that the training data must be distributed with the four freedoms.

However, given that the final text is still being worked on, I wouldn’t draw any final conclusions on either side. The FSF is not done with their work, and we are sympathetic to their efforts. I hope that many will join them in helping to finalize it, as many have offered to contribute to ours…

I insist that no one has Truth (capital T) in their pocket. We are making a humble effort to find a sound, reasonable and meaningful definition in a largely uncharted territory. Most importantly, we are not expressing the personal opinions of OSI members, nor are we attempting to be the oracle of some revealed Absolute that is just out there for the taking.