I’d like to hear all the candidates views on the OSAID and what they hope to accomplish during their term with respect to the OSAID.
pchestek wrote:
I’d like to hear all the candidates views on the OSAID and what they hope to accomplish during their term with respect to the OSAID.
The first and (in my personal opinion) most important plank of the Shared Platform for OSI Reform is item 1, calling on the OSI to repeal the OSAID. I’ll repeat what is said in the Platform about this:
We commend the OSI leadership and staff for addressing the controversial topic of “Open Source AI”. Nevertheless, the OSI’s push to adopt its Open Source AI Definition (OSAID) has been a mistake. The OSI acted too quickly to impose an overly ambitious policy compromise on the community. OSAID undeniably created a rift in the FOSS community; that rift seriously damaged the OSI’s reputation, authority and influence. Meanwhile, OSAID shows no signs of having any positive policy influence on machine learning practitioners, the FOSS community, or regulators.
If elected, we endeavor to persuade the OSI to acknowledge this error and repeal OSAID. We are not dogmatic on this point; we are open to a compromise that recharacterizes the elements of OSAID as a nonbinding and preliminary set of recommendations.
OSI should also commit to a substantial period of time for careful study and ongoing community discussion — perhaps as long as 5 or 10 years — before adoption of a formal definition of “Open Source AI”.
I’ve given a great deal of thought to this topic (of what “open source AI” ought to mean, and the OSI’s own solution to it) and there’s much further I could say about it and might do so in subsequent posts here. It is personally the issue that motivated me to run in the OSI board election. I will however attempt to respond here to @ttx’s question:
ttx wrote:
What do you think repealing the OSAID would achieve, beyond leaving even more free ground for Meta to run with their “open source AI” ads? If you think it is currently missing the mark, isn’t evolving it a better strategy?
It’s clear the OSAID has thus far had zero impact on the behavior or rhetoric of corporations releasing AI models. This is somewhat understandable, because it’s clear that the OSAID enjoys no general support in the open source community. Those who paid attention to it largely condemned it, and this was not some marginal group.
While the open source community is (at best) divided on what open source AI ought to mean, I think there has got to be broad agreement that we have suffered through a huge amount of openwashing around AI over the past few years. I think there is broad agreement that a major form of that openwashing has taken the form of releasing models under licenses that indisputably fail the anti-discrimination provisions of OSD 5 and 6. The OSI should have focused on that ground of consensus, instead of embarking on an effort to settle the most difficult FOSS policy issue of our time – put simply, the question of what the analogue of the OSD’s source code requirement is for AI. That effort has, in my view, been a failure. The OSAID not only does not enjoy broad support in the open source community, and has not only been poorly received, but it also is deeply flawed in numerous respects.
Hence my view that it would be best to repeal the OSAID and for the OSI to move on from there, continuing to deliberate over and gather opinion on the appropriate meaning of open source AI as the technology evolves and as a consensus of community views and practices emerges. Evolving the OSAID is not a viable option because the OSAID is structurally too flawed, I suppose because of its complexity, impracticality and attempted comprehensiveness, but also because it attempts to bridge what are irreconcilable views with a solution that is satisfactory to pretty much no one.
The EU has used the terms “open source AI model” and “open source AI systems” in the EU AI Act. Meta is heavily lobbying the EU to have it’s definition of open source AI adopted by the EU. What would be your strategy for combating Meta’s efforts?
Assuming Meta’s definition of open source AI is whatever Meta is doing, in terms of licensing and artifact distribution, with the Meta Llama models: The best way to combat this with respect to regulators, in a world without the OSAID, is to fall back on the OSD and what the community has demonstrably understood open source to mean (with respect to licensing) for ~27 years, or more if we look beyond the “open source” label. Licenses with acceptable use policies, licenses that prohibit use by persons in the EU, licenses that require special licenses to be agreed for certain categories of users – these are not open source licenses, therefore obviously the stuff they cover can’t be called open source.
This would not require the OSI to take a position on what I’m contending is the extremely difficult and contentious issue of what non-license characteristics are specifically needed for some AI technology to legitimately be labeled as “open source”. In that sense, the Meta problem is relatively simple.
It’s clear the OSAID has thus far had zero impact on the behavior or rhetoric of corporations releasing AI models. This is somewhat understandable, because it’s clear that the OSAID enjoys no general support in the open source community.
I have given presentations on OSAID to Japanese AI/ML developers, and nearly all of them understood the general outline of OSAID. It is also a fact that entities such as the Japanese government and companies like Toyota are aware of OSAID and the LF MOF. Furthermore, my personal blog post explaining OSAID has already reached 10,000 page views, and my slides have been viewed about 6.9k times. All of this content is in Japanese and was produced independently, but it still serves as evidence that OSAID is having some degree of influence.
In addition, OSAID is commonly used as a standard for referring to Open Source AI. In Japan, many companies have released models that are based on MIT and Apache 2.0, but since OSAID, they rarely refer to their models as Open Source. They simply refer to the license name as “we are applying an Open Source license” or “we are applying Apache 2.0”.
The reason OSAID does not stand out more is simply because it is currently very difficult to release an AI system that complies with OSAID. Back in 1998, when the OSD was announced, Netscape immediately adopted it. However, it will take more time before very large, well-known companies publishes an AI system that conforms to OSAID. I hope the OSI will continue to address these AI-related issues carefully and persistently.
Those who paid attention to it largely condemned it, and this was not some marginal group.
Perhaps this sounds harsh, but as far as I know, only a few individuals in certain communities have openly expressed opposition to OSAID. Until recently, I tried my best to respect and engage with those minority views, but I have slightly changed my stance of late. In my country, such opinions are held by an extremely small minority and can sometimes appear rather radical.
As I’ve noted in my intro post in this forum, I would want to relook the OSAID v1 and, using the LGPL lens, to have a Lesser OSAID and an OSAID v2 license definition. The Lesser OSAID would essentially be the current OSAID v1 where full and open access to the training data is not mandated and the new OSAID v2 to have mandated full and open access to training data.
I hope, as a OSI board member, I can help guide the discussion and debate so that we can quickly arrive at the goal above.
I’m not sure I understand this framework. The OSD defines the minimum conditions for a software license to be considered open source, but does not set the upper bound of where such a license could go (as long as that upper bound doesn’t itself violate the OSD). Thus, both Apache & GPL (and by extension LGPL) are OSD compliant, although GPL & LGPL are arguably more “open” than Apache. They all fit within the framework.
If you want a more “open” AI, you can license it as such as long as it satisfies the OSAID, even if other licenses that also satisify the OSAID are less “open.” So, you could have the “Apache AI” license and the “GPL AI” (or “LGPL AI”) licenses coexist within the OSAID framework. You don’t need to change the OSAID for that to happen.
I feel performing an in-depth retrospective is needed to evaluate the practical applications of the OSAID to the most common use cases. A focus area to consider is where the OSAID can be simplified: (1) identify and remove any barriers to accurately determining if an AI system meets the OSAID and (2) identify any barriers that are hindering the adoption of the OSAID as the standard, single source of truth definition and how those barriers can be addressed or removed if necessary. This retrospective should be a well-structured, constructive collaboration that considers all viewpoints.
Opinions are my own and not of my employer
The OSAID, as a first version, is a significant achievement, and I’m proud to have contributed to the team. I believe it will improve further with industry and user feedback, forming a strong foundation for the next version. My main concern lies in the use of training datasets and confidential data. Algorithmically and documentation-wise, everything may align with open-source principles, but without transparency on how AI models are trained—what datasets are used and what LLM architecture is applied—serious questions will remain unanswered. Fortunately, OSAID already addresses some of these concerns by requiring detailed documentation of training data, its provenance, and processing methodologies, ensuring transparency and reproducibility.
But at the end, this is a major step forward for AI, and I’m proud to be part of the team. Hopefully, it will serve as a strong example for other emerging technologies.
I do agree with you that the OSI approach has been fairly pragmatic… which the organization has always been since the beginning in regards to open source in my personal opinion.
Also I believe with Deepseek recently opened up, there are efforts to replicate their work without training data and it’s proving to be possible because Deepseek outlined their training recipe information well with research papers and information on github: Open-R1: a fully open reproduction of DeepSeek-R1
There’s also nothing to say that the OSAID evolves over time with more community input too!
It’s also important to keep in mind the academic sector. The OSAID has been socialized within universities. After careful thought and discussion, some faculty, university OSPOs, and a university research library consortium have issued statements of support or endorsements. Repealing OSAID might signal to this community that their opinions, feedback, engagement, etc. are not worthwhile. I believe that would be highly counter-productive, as we need the academic sector to evolve the OSAID (particularly along engineering dimensions), and for associated recommendations regarding relevant policy.
Just to make sure I’m understanding this correctly: in a world where OSAID does not exist, OSI could have said to the EU that Llama models are not open source because the license on their weights is not OSD-compliant, correct? (I agree with this.)
But it also means that people would have been able to call “open source” models that have a OSD-compliant license on their weights, but at the same time have neither an open training dataset available (something that OSAID allows, to some extent) nor open training pipeline, inference code, etc. (something that OSAID forbids). Still correct?
To be clear: it is a fair policy position, but it has both advantages and disadvantages for user freedoms into the AI space w.r.t. the status quo. (Assuming that policy regulators are going to take OSAID as “the” definition of open source AI, which is also unsettled yet.)
Yes. “Whatever “open source AI” means, we know at least it can’t mean using one of these licenses that clearly violates OSD 5/6.”
Well they are already able to call all sorts of models under all sorts of licensing terms and having all sorts of distribution characteristics “open source”, because they’ve been doing this without regard to the OSI or the existence of the OSAID. As far as I can tell, OSI hasn’t been publicly criticizing the pattern you refer to (to oversimplify, because there’s actually quite a range of behavior here, the “open weights” use of “open source”).
I must take issue however with your suggestion that the OSAID forbids the lack of an “open training pipeline” etc. because one of my major criticisms of the OSAID is that it is quite unclear or muddled on this point and it is also not clear that the handful of models it suggests are possibly compliant with the OSD actually do meet the OSAID’s requirements under what I assume is the intended reading.
OK, fair. (About those possibly compliant models, my take is that we’ll see how their review will pan out, pretty much as it happened on license-discuss over many decades for OSD and possibly compliant licenses. But I’d like to stress a more important point below.)
Without OSAID (notwithstanding all its flaws), people would be able to call “open source”—with OSI approval—AI models that only distribute weights and inference code, provided that they are available under an OSD-compliant license. That means that it would be possible to distribute “open source” models without releasing training datasets, which was one of the main reasons to criticize OSAID by many (myself included).
Am I reading correctly that your position is that such a situation would have been preferable to the status quo? (My anticipated apologies if not.)
I suppose it’s true that, in theory, with OSAID OSI is in a stronger position to criticize the labeling as “open source” of so-called open weights models (I’ll define that as putting weights under an open source license without meeting even the OSAID-compromise sort of analogue-to-source-code requirement). However, it doesn’t seem to be doing this publicly. I see OSI publicly condemning Meta, which it was doing before the adoption of OSAID too.
In any case, prior to the moment of adoption of OSAID, it’s not like OSI was saying “open source licensing is all you need”.
But to answer what I think your question was, I think the ‘no OSAID’ 2025 universe is preferable to the OSAID 2025 universe, overall. For one thing, the OSI could have taken more time to develop OSAID, which is one of the points made in the Shared Platform, and ended up with something that actually reflected some sort of mature community practice as well as broad community agreement, which is clearly not what we have right now. You might say that my position implies that we should tolerate some shorter term use of “open source AI” rhetoric or branding that might come to be seen as a misuse in the future, when we figure this all out satisfactorily, and which, to many, is a misuse right now. This is overlooking the point that the OSAID is not having any actual effect in shaping community behavior or community opinion, nor does that seem likely to change (without some non-incremental change in the OSAID).
I would like to see the OSI put out a clear graphic that shows how the OSD is applied in the case of software.
And from that, to extrapolate the entire flow and components of what makes language models.
I find that without a map, a lot of conversation is going around in circles with different degrees of understanding of what the various components are. I would recognize that these components might also be morphing as language models evolve and a clear map will help define for all.
I would want to work on that as my initial contribution to this.
Thanks for clarifying your position @fontana, really appreciated.
I’m glad to see people still care about the unaddressed data issue after more than one year. Thank you for continuing working on this.
During the OSAID discussion, no one argued that OSAID 1.0 should be the end of the story. I don’t know whether the next version will be 1.1 or 2.0, but I think there will be continuous discussions about OSAID.
This year or next year, judgments will gradually be handed down in the US lawsuits, and currently, the use of synthetic data is becoming the mainstream in the field of LLM. In addition, the technology for analyzing and studying models will also evolve. It would be good if the next discussion could be started by taking all of these into consideration.
Oh, I hope that the format will continue to allow me, who is only an individual member, to participate in the discussion.