Open Weights or Open Source AI?

It’s not our job to regulate the term “Open” (or “Freeware” for that matter, to the extent anyone uses the term or knows what it entails today), but the OSI has successfully established itself as the arbiter of the term “Open Source” over the past quarter century, at least within our industry — according to OSI co-founder Bruce Perens recently, “The common person doesn’t know about Open Source, they don’t know about the freedoms we promote which are increasingly in their interest.

That we’ve done so even without the government enforced monopoly of a trademark for the term (which is merely descriptive, as Software in the Public Interest apparently discovered around the time of the launch) is an impressive feat. It goes to the quality of the Open Source Definition and its ability to walk the tightrope to find a balance between Free software and the needs of businesses.

Our success is also in no small part due to the deference of behemoths in the ecosystem and their unwillingness to blatantly attempt to co-opt the term, at least until now — you don’t see Microsoft claiming Windows is Open Source. Indeed, initially they were actively hostile to it, which gave us some breathing room (others did not exist yet). That there is evidence of corporate capture of the “co-design” process today should give us pause, whether there is an actual conflict (as appears to be the case), or merely the appearance of conflicts of interest.

The objective of this exercise is to replicate that early success today for AI, and while the discussion has been difficult at times, the strongest steel is forged in the hottest fire. It’s been nearly 20 years since I moved to France to start beating the drum of cloud computing, and yet we still haven’t addressed the transition from products to services that undermines our licenses. It hasn’t even been two years since OpenAI captured the public’s imagination with the November 2022 launch of ChatGPT, so we can take a little more time rather than rush it out today — which for all we know is still the plan.

And yes, there are myriad challenges in distributing training data sets, but addressing them rather than brushing them under the rug is well worth the effort.