Thank you, @nick, for your efforts to foster and focus discussion, and to @quaid in particular for documenting a detailed proposal to address the “data openness” issue—I don’t like to propose problems without solutions, so this is very helpful (even if I maintain that “D+” should be the default and ideally only option).
I do note that all 7 of your points would be partly or fully addressed through the provision of training data. I also note from the component voting data relied upon to make recommendations—and regularly touted in the “interests of transparency and auditability”—that 16 different invited experts voted 27 times that training datasets specifically are required to protect all four core freedoms, which cannot and must not be dismissed using questionable statistics.
Aside from the elephant still in the room, I believe there is a more fundamental issue we need to address with finding consensus and the new “co-design” process we’re testing. There’s a dangerous tendency to rush forward with the definition despite valid objections—most critically, the lack of training dataset requirements—being dismissed without adequate resolution. This threatens the very foundation of the four freedoms we set out to protect: to Use, Study, Modify, and Share. Without access to training data, one simply cannot meaningfully study or modify a model (as they can given source code), and savvy users will hesitate to use let alone share a model if they don’t know what went into it. This is no different from refusing to eat suspect food without seeing what went into it, or to a chef being given an impossible recipe calling for mermaid tears (e.g., YouTube transcripts)!
While the OSI may lack the formal appeal process offered by the IETF, the IETF’s guiding principles on “rough consensus” still hold relevance. Specifically, the idea that:
“Simply having a large majority of people agreeing to dismiss an objection is not enough to claim there is rough consensus; the group must have honestly considered the objection and evaluated that other issues weighed sufficiently against it. Failure to do that reasoning and evaluating means that there is no true consensus.” (source)
Additionally, the principle that “lack of disagreement is more important than agreement” is especially relevant here. Sustained objections—critically the failure to include training datasets—are still unresolved. Yet the process marches on toward @Mer’s release candidate announcement at Nerdearla this Thursday, with endorsers lined up, and a board vote and public announcement next month. This doesn’t resemble a community-driven consensus but more of a train speeding toward a predetermined outcome.
It’s also concerning to hear calls for “compromise,” with some even prematurely publicly claiming “We finally have a definition for open-source AI” based on compromises apparently already accepted ahead of the board vote. Compromise, when applied to balancing technical trade-offs like speed versus power consumption, is useful. But when it becomes about compromising between people and their concerns, rather than addressing the core issues, it becomes harmful. As the IETF outlines, there’s a stark difference between compromise and capitulation of community leaders:
“A minority of a group might object to a particular proposal, and even after discussion still think the proposal is deeply problematic, but decide that they don’t have the energy to argue for it and say, ‘Forget it, do what you want.’ That surely can be called a compromise, but […] really all that they’ve done is conceded; they’ve simply given up by trying to appease the others. That’s not coming to consensus; there still exists an outstanding unaddressed objection.” (source)
More importantly, true consensus isn’t just a matter of people giving up objections due to fatigue. As stated in the IETF’s rough consensus draft:
“Coming to consensus is when everyone comes to the conclusion that either the objections are valid (and therefore making a change to address the objection) or that the objection was not really a matter of importance, but merely a matter of taste.” (source)
What we’re hearing is not that the objections are invalid, nor a superficial “matter of taste”, rather what feels more like capitulation than genuine compromise, like the pejoratively named Lesser GPL (LGPL). The objections—specifically regarding the protection of the core freedoms, especially with respect to training data—remain valid and unaddressed. Despite this, the process pushes forward in the name of expediency rather than a commitment to resolving the deep concerns that still exist. We’re at risk of mistaking the lack of disagreement for agreement, and rushing toward an announcement because we’ve held so many meetings, town halls, and discussions that it feels like we can’t afford to delay any longer.
But the reality is that pushing this through without addressing the fundamental flaws, particularly the exclusion of training data requirements, is a misstep. Rather than releasing it when it’s ready, we seem to be succumbing to the sunk cost fallacy: having invested so much time and energy, we’re afraid to pause for fear of appearing inefficient or indecisive.
I urge the OSI to pause any announcements and reconsider whether this is truly consensus, or merely a majority—assuming it’s not a minority given the quantity of opposing voices!—drowning out dissent for the sake of expediency. The stakes are too high to allow such a fundamental flaw in the definition to go unaddressed.
If I had more time I would have written a shorter letter, but in terms of concise concerns, I would ask that consensus and the validity of the “co-design” process be added to the list.