Interesting research paper: “Rethinking open source generative AI: open-washing and the EU AI Act”

hook · June 28, 2024, 12:21pm

I was recently forwarded this LinkedIn post, which I found interesting:

It summarises the paper “Rethinking open source generative AI:
open-washing and the EU AI Act” (which, is clearly, even more interesting):
https://dl.acm.org/doi/pdf/10.1145/3630106.3659005

Abstract

The past year has seen a steep rise in generative AI systems that
claim to be open. But how open are they really? The question of
what counts as open source in generative AI is poised to take on
particular importance in light of the upcoming EU AI Act that reg-
ulates open source systems differently, creating an urgent need
for practical openness assessment. Here we use an evidence-based
framework that distinguishes 14 dimensions of openness, from
training datasets to scientific and technical documentation and
from licensing to access methods. Surveying over 45 generative AI
systems (both text and text-to-image), we find that while the term
open source is widely used, many models are ‘open weight’ at best
and many providers seek to evade scientific, legal and regulatory
scrutiny by withholding information on training and fine-tuning
data. We argue that openness in generative AI is necessarily com-
posite (consisting of multiple elements) and gradient (coming in
degrees), and point out the risk of relying on single features like
access or licensing to declare models open or not. Evidence-based
openness assessment can help foster a generative AI landscape in
which models can be effectively regulated, model providers can be
held accountable, scientists can scrutinise generative AI, and end
users can make informed decisions

It does mention OSI-AI as well:

1.2 The moving target of open source AI

Until recently, classifications of software as open source could sim-
ply rely on the availability of source code under appropriate li-
censing: if some software is released under a licence approved by
the Open Source Initiative (OSI), it means that it is fully open and
minimally restrictive [45 ]. For software that is relatively portable
and user-deployable, this was long sufficient, and it afforded users
the rights to make copies, to tinker and, and to make improvements.
However, the rise of large language models and text-to-image gen-
erators means that a different approach is needed [47].

There are various efforts underway to update and tailor the
definition of open source to current generative AI systems. One is
a public consultation dubbed the “Open Source AI Deep Dive” that
the OSI board may draw on in their efforts to update their definition
of open source in the age of generative AI. Another is the “Joint
Statement on AI Safety and Openness” by parties including Creative
Commons, Mozilla, LAION and Open Future. The challenge that
these efforts face is to adopt the notion of open source, which used
to be fairly unambiguous, to the increasingly complex world of
generative AI systems [34].

And the interesting pictures …

zack · July 1, 2024, 12:52pm

Hello @hook, we have discussed this paper with the authors in a separate thread already, starting at Open Source AI needs to require data to be viable - #13 by Mark

hook · July 1, 2024, 1:11pm

Whoops, missed that one, I apologise.

nick · July 1, 2024, 3:21pm

No worries, @hook. Thank you for sharing the images.