Why and how to certify Open Source AI

This is important and deserves its own separate thread. The question of certification was raised also by @fontana in the thread Is the definition of "AI system" by the OECD too broad? - #11 by fontana. I think now we have more elements to continue that conversation.

There are many questions that need to be answered and I’d like to hear what people think.

  • who exactly needs a certification that an AI system is Open Source AI?
  • who is going to use such certification? Is anyone of the groups deploying open foundation models today thinking that they could use one? For what purpose?
  • who is going to consume the information carried by the certification, why and how?

These are the first ones that pop to my mind.

Thanks for splitting the thread, it is indeed an important separate discussion.

I think the need from “certifying” an AI system as OSAID compliant or not will emerge primarily from the following situation. The definition is not, and will never be, completely unambiguous. In OSAID 0.0.8 we have the “sufficiently”, “substantially equivalent”, etc. expressions mentioned in the parent post. Even with the proposed changes, we have “high quality equivalent synthetic dataset”. And no matter how hard we try, there will always be margins for different interpretations.

As soon as two parties will disagree on the OSAID compliance of a system, people will want a judge of sorts. For the OSD, OSI has been such a judge, via the license-review process. (Which was quantitatively easier to manage, because there were way fewer licenses than software products under such licenses. With OSAID we’re potentially looking at one judgment call per system…)

OSI will certainly be the first actor the community will turn to for such judgment calls.

Thanks also for starting this seperate thread.

I suspect the majority of folks involved with AI systems will be able to use the OSAID to “self-certify” e.g. “I meet the standard so I can use the open source label”. Lots of this will be straightforward and not controversial and needs minimal bottlenecks or external interference.

I agree with Zack that the most likely scenario where someone wants a form of objective “certification” will be where some kind of arbitration is required. Obvious mis-alignment with the OSAID will also be easy. The real work will be in nuanced edge cases.

Focussing on the arbitration element rather than some all-purpose certification process in my view is worth seriously considering.

The OSD and the development of OSS licenses has benefitted from over 25 years of community practice and discussion. We have a better and more informed understanding of how the OSD works in practice with good precendents to point to and expanded guidance alongside the OSD that supports the development of new licenses (https://opensource.org/licenses/review-process).

The OSAID is inevitably going to go through a similar maturation process, with the same discussions, precedent setting, and emergent good practice. There’s merit in thinking about how to best support that as ultimately it will strengthen the OSAID as it has for the OSD.

Having said all of that, I also wonder whether a simple self-certification tool/register would be of use to the community? Something quick which takes about 5 mins to fill in and checks whether a system aligns with the definition or not (based on stating which licenses apply to which components), with potentially some info about versions and locations of components? Beyond making it really easy to check alignment with the OSAID it creates a registry of what good practice looks like and promotes transparency.

In 1998, Perens and ESR declared a marketing campaign to promote free software to Wall Street, and they began using the term “Open Source”. We believed that was the right move and supported their idea. Currently, I do not feel the same excitement for Open Source AI as I did in 1998, and sometimes I still question its significance.

However, for companies like the one I belong to that are involved in the AI business, having the label of Open Source AI would be desirable if possible. Moreover, I believe we have a duty to leave a legacy of free and transparent AI for future generations. For these reasons, the act of certifying Open Source AI has its significance.

However, I am not sure how many entities will be able to obtain such certification.

1 Like

I think the issue of certification may also come up in relation to the EU AI Act. After all, this act puts legal weight on the term “open source” and provides open source systems with certain exemptions — exemptions which are apparently attractive enough to induce companies like Meta and Mistral to go all-in on co-opting the term “open source”.

Now, the EU AI Act stipulates a “template” that specifies some forms of disclosure even for “open source” systems, and an “AI Office” that will draw up this template and presumably oversee its enforcement (though by which processes and with what powers, is unclear at this point). As we point out in our FAccT paper:

If this exemption or one like it stays in place, it will have two important effects: (i) attaining open source status becomes highly attractive to any generative AI provider, as it provides a way to escape some of the most onerous requirements of technical documentation and the attendant scientific and legal scrutiny; (ii) an as-yet unspecified template (and the AI Office managing it) will become the focus of intense lobbying efforts from multiple stakeholders (e.g., [12]). Figuring out what constitutes a “sufficiently detailed summary” will literally become a million dollar question.

The ‘million dollar question’ is not even hyperbolic — Meta currently spends 8 million annually in Brussels to lobby on the DSA and EU AI Act. If organisations like OSI and other open source players can play a role in regulation and certification (and especially in ensuring and advocating maximum transparency) it seems this might strengthen the open source ecology.

2 Likes

I agree with @Mark that certification of open source AI will be a key issue in the context of the AI Act. I would like to point out that in this context the open source status of the licenses themselves will also be important.

The AI Act uses the term “under a free and open-source license”, which is itself quite confusing (I would expect “free and open”). One can assume that the definition is pretty clear and basically covers OSI-compliant licenses. But I think that one could just as well attempt to argue that responsible AI licensing fits the broad definition that’s included.

Looking at it differently, the issue of responsible AI licensing as a form of open-source licensing remains an open one today. And it needs to be resolved, so that there is clarity on the AI Act’s open source exemptions.

There is a possible simple answer: these licenses are not OSI-compliant, so they are not open source. But I am not sure that it will suffice. That’s because responsible AI licenses are getting some significant traction with developers – when you look at HuggingFace data, for example. So as a legislator I could see the sense of having them exempted from some of the regulation as well.

But to complicate things even further, there are at least several licenses – like the LLama or Falcon license – that introduce restrictions but are dubbed as “open source”. So the European AI Office, as it aims to clarify open-source licensing, might face pressure to accept a definition that includes these various responsible / restrictive licenses.

The conversation on this forum has focused on issues related to systems and their compliance with OSAID. I think that all this points to the need to reach consensus on licenses themselves.

1 Like