We heard you: let's focus on substantive discussion

Dear Members,

We value your contributions to our community and want to ensure that our discussions remain productive and respectful. We’ve noticed a recent increase in repetitive posts that might be limiting the depth and variety of our conversations.

To encourage more diverse and insightful discussions, we’re compiling a list of concerns related to the Open Source AI Definition. This list is intended to be a neutral space where everyone can share their thoughts without judgment.

Here are some concerns that have been raised so far:

  • Data transparency: The data used to train an AI system should be openly available, as it’s essential for understanding and improving the model.
  • Synthetic data: If releasing the original data is not feasible, providing synthetic data and a clear explanation can be helpful.
  • Pretraining dataset distribution: The dataset used for pre-training should also be accessible to ensure transparency and allow for further development.
  • Dataset documentation: The documentation for training datasets should be thorough and accurate to address potential issues.
  • Versioning: To maintain consistency and reproducibility, versioned data is crucial for training AI systems.
  • Open licensing: Data used to train Open Source AI systems should be licensed under an open license.
  • Reproducibility: an Open Source AI must be reproducible using the original training data, scripts, logs and everything else used by the original developer.
  • Process : The co-design process as conducted was not democratic and ultimately unfair, voting was the wrong method, the selection of the volunteers was biased, the results didn’t show any consensus and many other issues.

Do you have any other concerns to add? Please share them in a concise reply below, and we’ll update the list. Remember to be respectful of others’ viewpoints and avoid repeating arguments that have already been discussed.

Make your statement and leave space for others to dissent with you, as per our community guidelines. Let’s work together to create a more inclusive and informative community.

Thank you for your participation.

Sincerely,
OSI Team

5 Likes

Thank you, @nick, for your efforts to foster and focus discussion, and to @quaid in particular for documenting a detailed proposal to address the “data openness” issue—I don’t like to propose problems without solutions, so this is very helpful (even if I maintain that “D+” should be the default and ideally only option).

I do note that all 7 of your points would be partly or fully addressed through the provision of training data. I also note from the component voting data relied upon to make recommendations—and regularly touted in the “interests of transparency and auditability”—that 16 different invited experts voted 27 times that training datasets specifically are required to protect all four core freedoms, which cannot and must not be dismissed using questionable statistics.

Aside from the elephant still in the room, I believe there is a more fundamental issue we need to address with finding consensus and the new “co-design” process we’re testing. There’s a dangerous tendency to rush forward with the definition despite valid objections—most critically, the lack of training dataset requirements—being dismissed without adequate resolution. This threatens the very foundation of the four freedoms we set out to protect: to Use, Study, Modify, and Share. Without access to training data, one simply cannot meaningfully study or modify a model (as they can given source code), and savvy users will hesitate to use let alone share a model if they don’t know what went into it. This is no different from refusing to eat suspect food without seeing what went into it, or to a chef being given an impossible recipe calling for mermaid tears (e.g., YouTube transcripts)!

While the OSI may lack the formal appeal process offered by the IETF, the IETF’s guiding principles on “rough consensus” still hold relevance. Specifically, the idea that:

“Simply having a large majority of people agreeing to dismiss an objection is not enough to claim there is rough consensus; the group must have honestly considered the objection and evaluated that other issues weighed sufficiently against it. Failure to do that reasoning and evaluating means that there is no true consensus.” (source)

Additionally, the principle that “lack of disagreement is more important than agreement” is especially relevant here. Sustained objections—critically the failure to include training datasets—are still unresolved. Yet the process marches on toward @Mer’s release candidate announcement at Nerdearla this Thursday, with endorsers lined up, and a board vote and public announcement next month. This doesn’t resemble a community-driven consensus but more of a train speeding toward a predetermined outcome.

It’s also concerning to hear calls for “compromise,” with some even prematurely publicly claiming “We finally have a definition for open-source AI” based on compromises apparently already accepted ahead of the board vote. Compromise, when applied to balancing technical trade-offs like speed versus power consumption, is useful. But when it becomes about compromising between people and their concerns, rather than addressing the core issues, it becomes harmful. As the IETF outlines, there’s a stark difference between compromise and capitulation of community leaders:

“A minority of a group might object to a particular proposal, and even after discussion still think the proposal is deeply problematic, but decide that they don’t have the energy to argue for it and say, ‘Forget it, do what you want.’ That surely can be called a compromise, but […] really all that they’ve done is conceded; they’ve simply given up by trying to appease the others. That’s not coming to consensus; there still exists an outstanding unaddressed objection.” (source)

More importantly, true consensus isn’t just a matter of people giving up objections due to fatigue. As stated in the IETF’s rough consensus draft:

“Coming to consensus is when everyone comes to the conclusion that either the objections are valid (and therefore making a change to address the objection) or that the objection was not really a matter of importance, but merely a matter of taste.” (source)

What we’re hearing is not that the objections are invalid, nor a superficial “matter of taste”, rather what feels more like capitulation than genuine compromise, like the pejoratively named Lesser GPL (LGPL). The objections—specifically regarding the protection of the core freedoms, especially with respect to training data—remain valid and unaddressed. Despite this, the process pushes forward in the name of expediency rather than a commitment to resolving the deep concerns that still exist. We’re at risk of mistaking the lack of disagreement for agreement, and rushing toward an announcement because we’ve held so many meetings, town halls, and discussions that it feels like we can’t afford to delay any longer.

But the reality is that pushing this through without addressing the fundamental flaws, particularly the exclusion of training data requirements, is a misstep. Rather than releasing it when it’s ready, we seem to be succumbing to the sunk cost fallacy: having invested so much time and energy, we’re afraid to pause for fear of appearing inefficient or indecisive.

I urge the OSI to pause any announcements and reconsider whether this is truly consensus, or merely a majority—assuming it’s not a minority given the quantity of opposing voices!—drowning out dissent for the sake of expediency. The stakes are too high to allow such a fundamental flaw in the definition to go unaddressed.

If I had more time I would have written a shorter letter, but in terms of concise concerns, I would ask that consensus and the validity of the “co-design” process be added to the list.

3 Likes

Hi @samj,

I’m not sure how you arrived at these numbers:

16 different invited experts voted 27 times that training datasets specifically are required to protect all four core freedoms

For Training datasets specifically:

  • 13 voted as required to study
  • 1 voted as required to use
  • 6 voted as required to modify
  • 0 voted as required to share

The sum of votes equals to 20, which is classified as “Maybe” (<1.5μ-μ votes).

This in contrast to Inference code, Model parameters, and Training, validation and testing code, for example, whose sum are roughly double.

cannot and must not be dismissed using questionable statistics.

@Mer’s co-design process and statistics are pretty solid:

image

I think it’s easy to dismiss or question her work because you disagree with the results. I ask you to please be respectful of her work and everyone who was involved in the co-design process.

I urge the OSI to pause any announcements and reconsider whether this is truly consensus, or merely a majority—assuming it’s not a minority given the quantity of opposing voices!

No doubt the opposing voices are louder, thus this topic asking for respect.

I hand-counted the votes in the spreadsheet immediately to the right of your screenshot, and double-checked by tallying the results from the working group reports for Llama 2, Pythia, OpenCV, and Bloom, double confirming that the results relied upon to make the recommendation to exclude training data were erroneous:

Model Use Study Modify Share Total
Llama 2 0 2 1 0 3
Pythia 0 4 0 0 4
OpenCV 4 4 2 0 10
Bloom 2 5 3 1 11
Total 6 15 6 1 28

Furthermore, if you consolidate training and testing data (as is often the case), you’d pick up another 4 votes (DT, JL, JT, and RA), taking you into hard “Yes = Required (≥2μ votes)” territory.

For good measure I then combined and totaled the vote totals from the final reports to examine the votes by component and freedom:

I also applied the “pretty solid” statistics to see what that methodology would actually recommend (whether or not I agree with its validity — you can’t vote butter out of a pound cake and have it still function as a pound cake!):

Finally, I went back to the original raw voting data consolidated incomplete rows, flattened the votes to produce raw data, and analysed that data to create this heatmap:

As for calls for civility, here like at Wikipedia I’m careful to comment on content, not the contributor, and I have plenty of respect for those doing the work being led by @Mer . As a design thinking advocate I’m intrigued by the co-design process too.

I disagree with the results in part because they throw me and my Open Source AI project under the bus of the AI behemoths (who got a vote while we didn’t, which is like the fox guarding the henhouse), but also because voting is not an appropriate way to reach consensus on technical topics, so the technical question of what is actually required to practically protect the four freedoms remains open.

That said, the vote tallies from the final reports do actually support the inclusion of training data so I’ll respect those results provided you do too — given training, validation, and testing code scores lower but is required by the checklist, both training data sets and testing data sets should be as well. I look forward to seeing them added to the 0.0.10 draft.

2 Likes

Hi @samj, thank you for diving deep into the data. I like your charts.

I believe @mer used the following methodology to count the votes: +1 (required), 0 (neutral), -1 (not necessary). Did you include the “not necessary” votes?

This does highlight one of the major drawbacks of sharing the original data: lack of privacy. One can guess who voted for what based on their initials.

The initials should have been randomized to ensure anonymization…

One concern, mostly ignored by all (including me) except the privacy minded-people: the license used on users data.

That includes not only in the user profile voluntarily given, but all metadata, the data use in the queries or prompt, and the queries themselves.

Is it removed after the session? Or kept at all times in the user’s account?
Is it used to retrain the model? Is it in any way available to third parties? And internal teams?

These and other cases offer a very troubled view of the entire query lifecycle…

All the best,
Gerardo Lisboa

To be honest, as one with a decent training in statistics and operational research, I find the whole method a bit weird.

You shouldn’t need to ask expert about predicates that logically derive from the declared goal of granting the four freedoms.

I’d really like to read the reasoning of those who did “vote” for training data availability as “not required” to study and modify, because, for example, I really can’t imagine an effective way to study the behavior of any AI system (not only a ANN-based one) without the training data. Even just identifying over-fitting around certain clusters would be impossible without the actual data.

And this is another issue with the method: who selected the experts? according to which criteria? who decided the criteria?

For example, I wonder why Llama experts were included since Llama is not open source and Mark Zuckerberg publicly try to open wash it anyway.

Well I’d argue that an AI system trained on people’s (please, do not reduce them to “users”) data that cannot be distributed and are not available to the public, cannot match any open source AI definition.

Thank you… you’ll be pleased to know I have more then.

The call for a vote and setting of arbitrary thresholds aside, how and why was this methodology decided on? Where was it discussed and/or documented, whether before or after the fact? That key information doesn’t appear anywhere in the public posts nor final reports, and was only ever captured in the spreadsheet for 1 of the 4 working groups (Llama 2). As such, it must obviously be excluded, and no it wasn’t counted by my scripts.

That one decision would effectively give only the Llama 2 working group the superpower of not only being able to cast their own votes, but also to silently erase the votes of others, even from other working groups! It would explain why the Data category is a sea of red! It’s like me sitting in the next room saying I absolutely need the data to do X, and you saying “yeah, nah, no you don’t”, or voting for Trump yourself while also quietly tearing up my vote for Harris — that’s not how voting works!

I wonder if the working group members even understood that’s what they were doing? Does @stefano realise he erased more votes than he cast? Does @zack realise he erased more votes than all the others put together (49 vs 43), and double what he cast himself (25), making him by far the strongest opponent to openness (-24)? It seems strange to me that the Open Source Initiative’s own members would knowingly cast votes against openness, unless it was so unclear even to them in the room that they didn’t know that’s what they were being asked to do?

No need to guess as we can see exactly who voted for (and against) what.

That’s how I also know that of the two Meta employees(!) in the Llama 2 working group with superpowers(!!), their lawyer voted against requiring data every time(!!!), erasing more votes than they cast too. This one’s less surprising given the discussion next door about Meta concealing its data sources (which is fine — just don’t call it Open Source!), but it raises even more questions about the validity of the vote.

Assuming enough has already been said publicly about the vote that we’re stuck with it—though I’m confident that vote erasing superpowers are a bridge too far for everyone here, or else perhaps I’m in the wrong room—I used k-means clustering for n in { 2, 3, 4, 5} to at least find somewhat more statistically justifiable/less arbitrary cut-offs. Given the binary nature of the decision and the large cliff, n=2 likely makes the most sense, and as a practitioner it aligns with common sense. I’ve included the raw data below, and the scripts/data/figures/etc. are in the samj/osaid repo.

That or we’ve designed by committee a car with no wheels. What next?

n=2

n=3

n=4

n=5

Clustering results for n=2:
Cluster centers: [31.91666667  9.73333333]
Ordered names: ['Yes', 'No']
Ordered colors: ['#a4ea78', '#d1a09b']

Components by cluster for n=2:
Cluster Yes (Mean: 31.92):
  - Other libraries or code artifacts (Mean: 49.00)
  - Inference code (Mean: 43.00)
  - Model parameters (Mean: 42.00)
  - Model architecture (Mean: 37.00)
  - Data preprocessing code (Mean: 35.00)
  - Supporting tools (Mean: 30.00)
  - Training data sets (Mean: 28.00)
  - Testing data sets (Mean: 27.00)
  - Training, validation and testing code (Mean: 25.00)
  - Usage documentation (Mean: 23.00)
  - Benchmarking data sets (Mean: 22.00)
  - Research paper (Mean: 22.00)

Cluster No (Mean: 9.73):
  - Validation data sets (Mean: 19.00)
  - All other data documentation (Mean: 17.00)
  - Evaluation code (Mean: 17.00)
  - Model card (Mean: 15.00)
  - Training code (Mean: 13.00)
  - Code used to perform inference for benchmark tests (Mean: 10.00)
  - Sample model outputs (Mean: 8.00)
  - Technical report (Mean: 8.00)
  - Data card (Mean: 7.00)
  - Evaluation data (Mean: 7.00)
  - Evaluation results (Mean: 6.00)
  - Model metadata (Mean: 6.00)
  - Evaluation metrics and results (Mean: 5.00)
  - Test code (Mean: 4.00)
  - Validation code (Mean: 4.00)


Clustering results for n=3:
Cluster centers: [23.    7.75 41.2 ]
Ordered names: ['Yes', 'Maybe', 'No']
Ordered colors: ['#a4ea78', '#f9f3d1', '#d1a09b']

Components by cluster for n=3:
Cluster Yes (Mean: 41.20):
  - Other libraries or code artifacts (Mean: 49.00)
  - Inference code (Mean: 43.00)
  - Model parameters (Mean: 42.00)
  - Model architecture (Mean: 37.00)
  - Data preprocessing code (Mean: 35.00)

Cluster Maybe (Mean: 23.00):
  - Supporting tools (Mean: 30.00)
  - Training data sets (Mean: 28.00)
  - Testing data sets (Mean: 27.00)
  - Training, validation and testing code (Mean: 25.00)
  - Usage documentation (Mean: 23.00)
  - Benchmarking data sets (Mean: 22.00)
  - Research paper (Mean: 22.00)
  - Validation data sets (Mean: 19.00)
  - All other data documentation (Mean: 17.00)
  - Evaluation code (Mean: 17.00)

Cluster No (Mean: 7.75):
  - Model card (Mean: 15.00)
  - Training code (Mean: 13.00)
  - Code used to perform inference for benchmark tests (Mean: 10.00)
  - Sample model outputs (Mean: 8.00)
  - Technical report (Mean: 8.00)
  - Data card (Mean: 7.00)
  - Evaluation data (Mean: 7.00)
  - Evaluation results (Mean: 6.00)
  - Model metadata (Mean: 6.00)
  - Evaluation metrics and results (Mean: 5.00)
  - Test code (Mean: 4.00)
  - Validation code (Mean: 4.00)


Clustering results for n=4:
Cluster centers: [20.          7.09090909 44.66666667 31.4       ]
Ordered names: ['Yes', 'Lean Yes', 'Lean No', 'No']
Ordered colors: ['#a4ea78', '#d6fbc4', '#e8cea3', '#d1a09b']

Components by cluster for n=4:
Cluster Yes (Mean: 44.67):
  - Other libraries or code artifacts (Mean: 49.00)
  - Inference code (Mean: 43.00)
  - Model parameters (Mean: 42.00)

Cluster Lean Yes (Mean: 31.40):
  - Model architecture (Mean: 37.00)
  - Data preprocessing code (Mean: 35.00)
  - Supporting tools (Mean: 30.00)
  - Training data sets (Mean: 28.00)
  - Testing data sets (Mean: 27.00)

Cluster Lean No (Mean: 20.00):
  - Training, validation and testing code (Mean: 25.00)
  - Usage documentation (Mean: 23.00)
  - Benchmarking data sets (Mean: 22.00)
  - Research paper (Mean: 22.00)
  - Validation data sets (Mean: 19.00)
  - All other data documentation (Mean: 17.00)
  - Evaluation code (Mean: 17.00)
  - Model card (Mean: 15.00)

Cluster No (Mean: 7.09):
  - Training code (Mean: 13.00)
  - Code used to perform inference for benchmark tests (Mean: 10.00)
  - Sample model outputs (Mean: 8.00)
  - Technical report (Mean: 8.00)
  - Data card (Mean: 7.00)
  - Evaluation data (Mean: 7.00)
  - Evaluation results (Mean: 6.00)
  - Model metadata (Mean: 6.00)
  - Evaluation metrics and results (Mean: 5.00)
  - Test code (Mean: 4.00)
  - Validation code (Mean: 4.00)


Clustering results for n=5:
Cluster centers: [22.2         6.5        44.66666667 31.4        15.5       ]
Ordered names: ['Yes', 'Lean Yes', 'Maybe', 'Lean No', 'No']
Ordered colors: ['#a4ea78', '#d6fbc4', '#f9f3d1', '#e8cea3', '#d1a09b']

Components by cluster for n=5:
Cluster Yes (Mean: 44.67):
  - Other libraries or code artifacts (Mean: 49.00)
  - Inference code (Mean: 43.00)
  - Model parameters (Mean: 42.00)

Cluster Lean Yes (Mean: 31.40):
  - Model architecture (Mean: 37.00)
  - Data preprocessing code (Mean: 35.00)
  - Supporting tools (Mean: 30.00)
  - Training data sets (Mean: 28.00)
  - Testing data sets (Mean: 27.00)

Cluster Maybe (Mean: 22.20):
  - Training, validation and testing code (Mean: 25.00)
  - Usage documentation (Mean: 23.00)
  - Benchmarking data sets (Mean: 22.00)
  - Research paper (Mean: 22.00)
  - Validation data sets (Mean: 19.00)

Cluster Lean No (Mean: 15.50):
  - All other data documentation (Mean: 17.00)
  - Evaluation code (Mean: 17.00)
  - Model card (Mean: 15.00)
  - Training code (Mean: 13.00)

Cluster No (Mean: 6.50):
  - Code used to perform inference for benchmark tests (Mean: 10.00)
  - Sample model outputs (Mean: 8.00)
  - Technical report (Mean: 8.00)
  - Data card (Mean: 7.00)
  - Evaluation data (Mean: 7.00)
  - Evaluation results (Mean: 6.00)
  - Model metadata (Mean: 6.00)
  - Evaluation metrics and results (Mean: 5.00)
  - Test code (Mean: 4.00)
  - Validation code (Mean: 4.00)
3 Likes

Over the past few months, we have been discussing based on the conclusions of that working group. Personally, I remember being surprised by the result, particularly regarding the handling of datasets, but I thought we should respect the outcome since it was a conclusion reached by experts through discussion after the vote.

However, to be frank, this quoted part raises doubts about the credibility of OSAID.

I recall discussions about the difficulty of evaluating each model without involving related parties from the model development companies, but did that lead to having stakeholders involved? I’ve only participated in public forums and on HACKMD, so I don’t know for sure, but at the very least, shouldn’t we re-examine the voting results?

6 Likes

Wait, what? I have consistently voted stating that availability of training datasets was a requirement for exercising both the freedom of study and the freedom of modify. (But I’ve been outvoted.)

So I’m not sure where you obtain the above conclusion from. (I apologize, but I haven’t found time yet to digest all the details you have provided. I am participating as volunteer in this process, with limited time availability.)

I’ve also raised earlier on this forum (don’t have the link at hand, sorry) the concern that casted voted could not be interpreted as consensus in favor of making training dataset optional, but rather that they denoted a 50/50 split on the data matter.

But, honestly, I think discussing the voting details is beside the point, and that’s why I have stopped arguing about them. Not only because voting is not a good way to decide on complex technical matters, but because I think OSI’s decisions to not mandate access to training data is a political decision, that they are entirely entitled to make. It’s even a pragmatic one, in the tradition of the organization, for better or for worse.

(In fact, I even think that OSAID 0.9 is potentially a good definition, that could improve the state of model freedom in the industry. The main problem it has is of naming/branding, as I consider that something as broad as “open source AI” should be more “radical”, and require training data availability. I am still hopeful that we can obtain a multi-tier definition, either via the D-/D+ classification, or via some even stricter split between “open weight” and “open source” labels. I planned to write more broadly about this later on, but it will not be here.)

1 Like

thanks for crunching the numbers. Is the bullet point below a fair summary of the core of your concern?

  • Process: The co-design process as conducted was not democratic and ultimately unfair.

Hi @nick, here a few concerns that should be added to the list:

  • Inherent user (in)security: without access to the whole training data, it’s possible to plant undetectable backdoors in machine learning Models.
  • Implicit or Unspecified formal requirements: if ambiguities in the OSAID will be solved for each candidate AI system though a formal certificate issued by OSI, such formal requirement should be explicitly stated in the OSAID.
  • OSI as a single point of failure: since each new version of each candidate Open Source AI system world wide should undergo to the certification process again, this would turn OSI to a vulnerable bottleneck in AI development, that would be the target of unprecedented lobbying from the industry.
  • Open Washing AI: any definition that a black box could pass would both damage the credibility the whole open source ecosystem, and open a huge loophole in the european normative (the AI Act).

To keep the concerns definition as concise as possible (as @nick requested), I add here the underlying arguments and references to other relevant threads.

Inherent user (in)security

Since Hearthbleed we know that Open Source Software can become a vehicle for overlooked backdoors, and XZ Utils reminded us that OSS based supply chain attacks are common and mostly undetected.

However, the freedom to study the source code let us identify them, learn how they were introduced and effectively fix them by studying how the executable match the declared source.

Cryptographers already proved that you can plant undetectable backdoors in ML models but it’s much easier to plant undetectable bias against certain marginalized groups.

It’s up to us to provide a definition that leads to a secure and safe environment for users of Open Source AIs.

Note that this security concern is related to some of other concerns (reproducibility, versioning and data transparency) by admitting a simple solution (mandatory availability of training data), but does not overlap in the consequences, such as large scale automated discrimination, undetectable mass-surveillance, large scale expoinage and so on.

Implicit or Unspecified formal requirements

Given the history and the license review process I’m a bit surprised to read that OSI is planning to become a sort of AI System Certification Authority through the Open Source AI definition.

So much that I’m afraid I misunderstood @stefano’s framing of the matter.

In another thread @shujisado argued about the ambiguities of OSD and OSAID

I’d argue that we could leverage OS history to remove all the ambiguities of the definition (or at least to minimize them to new and unpredictable corner cases), I see the simple appeal of a centralized “benevolent dictator” to solve ambiguities on a case by case (and version by version) basis.

However, such a centralized authority would not be analogue to a Justice system that is inherently decentralized, where several independent Judges relay on the Law and their own experience and culture to independently evaluate each case.

So if Open Source AI is what OSI certify as Open Source AI, such formal requirement should be explicit in the Open Source AI definition, eg in a new final section like this:

OSI Certification

OSI will be responsible to certify the compliance of each candidate AI system to the definition above.

  • For example, when a new version of an AI system is released with different weights, a skilled person at OSI will recreate a substantially equivalent system using the same or similar data, to verify that the Data information requirement still hold.

OSI as a single point of failure

As @jberkus pointed out software certification is not a easy task that can be easily and effectively fulfilled by volunteers.

Even just the amount of documentation and bureaucracy needed to prove that the process was properly executed, would be overwhelming for volunteers donating their free time.

Also, a legal evaluation of the licensing of the various components would not be enough to verify “that a skilled person can recreate a substantially equivalent system using the same or similar data”, but you’d need at least one skilled person, equivalent datacenters and energy to effectively check that the Data Information requirement is satisfied, by recreating the system from scratch and verifying that it’s “substantially equivalent”.

@jberkus concludes

But I wonder if such a setup would turn OSI into a huge bottleneck and single point of failure of the ecosystem. AI systems would compete for OSI resources, with larger models requiring more verification labor and larger datacenters and smaller models waiting for their application to be taken in account after the larger (and likely more influential ones).

This would also turn OSI into a center of pressure from the most powerful lobbying groups around the world, as @Mark pointed out, with all that it follows.

Open Washing AI

As @Mark pointed out

The recent Zuckerberg’s blog post confirms that he’s not going to wait for an OSI certificate to pretend that LLama 3.1 is “Open Source AI”, just “to escape some of the most onerous requirements of technical documentation and the attendant scientific and legal scrutiny”, as predicted by the FAccT paper.

Now, it would be very easy for Meta to tweak the license, provide a few sintetic dataset that does not show any bias or surveillance backdoor to get a OSI stamp with the the current draft (0.0.9) of the Open Source AI definition.

While some might call such outcome as a OSI success (Zuckerberg surely would :wink:), having OSI approved black boxes that formally match the OpenSource AI definition but nobody could really study or modify (fine-tuning is basically tweaking config) would damage every single attempt to create a truly transparent AI system, by exposing it to the unfair competition of an open washed alternative.

Also, the loophole in the AI Act would be huge: the AI Act exempted free and open source systems from detailed technical documentation and “scientific and legal scrutiny” because they are expected to be fully transparent.

But the adoption of an Open Source AI definition that can be applied to black box distributed with a facade dataset, would distort the application of the AI Act.

It’s like if OSI injected a vulnerability in the AI Act and sold it to Meta, Open AI, Google and their friends…

Primum non nocere

First, do not harm”.

If we cannot provide an Open Source AI definition that excludes black boxes, it’s better to avoid any official definition at all, so that users (and Judges in court) won’t be fooled by it.

5 Likes

Are them, given the legal weight that such decision assumes erga omnes?

Also, if it’s a political decision of Open Source Initiative’s board, where we could read their underlying arguments?

I mean, why they think that AI systems that do not grant the freedom to study should be exempted from the legal and scientific scrutiny that, for example, the AI Act requires?

How they think that such a legal and scientific exemption would “improve the state of model freedom in the industry”?

—though I’m confident that vote erasing superpowers are a bridge too far for everyone here, or else perhaps I’m in the wrong room—

Not in the wrong room as far as I’m concerned — that kind of superpower is definitely a bridge too far and clearly has unexpected side effects (as zack’s surprised response also seems to indicate). The counting of “not required” as a negative vote really has an outsized effect here that puts the whole edifice in danger. I am frankly shocked to see this analysis.

As an outside academic observer without a direct stake in the matter, I have to say the direction things have taken here doesn’t inspire confidence in the process or its outcomes.

1 Like

If @samj’s analysis is correct, I think a better description of the concern would be:

  • Process: The co-design process as conducted was unclear to participants and its vulnerabilities have been exploited by Meta.

It would be a master piece of corporate capture, given how a water-down definition of Open Source AI would subvert the AI Act.

The current OSAID defines an Open Source AI system as one where the four freedoms (Use, Study, Modify, Share) are recognized, and a skilled person would be able to recreate a substantially equivalent system. If Meta only provides incomplete information, then at least many of the collaborators with OSI would likely determine that it does not comply with OSAID. Also, if OSAID is incomplete, we can simply make revisions. The OSD was released as OSD 1.0 in February 1998, but it took several years until version 1.9 was released, and only then did it reach a stable state.

Hmm…, we seem to be having the same exchange again. The discussion on datasets, including myself, seems to be getting ahead of itself.

However, I believe that Samj-san’s point regarding the voting in the working group should be revisited separately. This is something that affects the credibility of OSAID.

1 Like

While this is true in so far as the “co-design” process—which purported to be democratic and has been parroted as such—has proven to be far from it, voting is neither appropriate nor adequate in this context. I’ve never advocated for a democratic process, rather consensus decision-making, and ideally rough consensus. The strong selection bias of voters has already been raised, but “in some ways, we can’t vote” as the OSI’s door is open to all interested parties and “it’s nearly impossible to figure out who would get a vote for any given question”.

More problematic though is that the outcome (draft 0.0.9)—which does not reflect the intentions of participants in the working groups, let alone the wider community—is a product that cannot function for its intended purpose; it’s a two-legged stool… a car with no wheels. Even if it were functional, valid questions have been raised about whether it is practically enforceable in its current form.

As such, 0.0.9 must not be graduated to release candidate status in its current state, especially now both process and product are deficient, and I hope we’ll get confirmation from you today that won’t happen in our slot at Nerdearla tomorrow.

Not being one to propose problems without solutions, per @Mer’s own methodology, the cut-off was determined visually: “there’s a pretty big drop-off […] so this felt like a reasonable place to draw the line”. Ignoring the most egregious violation of democratic norms—selective superpowers of vote nullification—we see a similar step in a k-means cluster analyis for n=2 (among others).

As a practitioner, this looks to be a workable RC/1.0 starting point which could be refined over time like the OSD now at v1.9. You would get your on-time launch, the loudest dissenting voices would be silenced (to @nick’s point above), the OSI would avoid losing trust in the wider community, and those hoping for us to take a less ambitious approach could advocate for future revisions to be more permissive (per @spotaws’ recent public appeal to the board):

That’s certainly one conclusion you could draw, but I’m just shining light on the situation with statistics, and you’re welcome to check my work.

While Meta’s negating votes in the “Data” category (highlighted below) demonstrate an obvious pattern, it was actually @zack (“SZ”)—who has “consistently voted stating that availability of training datasets was a requirement for exercising both the freedom of study and the freedom of modify”—who was shocked to have inadvertently done the most damage to his cause. Fortunately, they didn’t realise they could have 4x’d their negative vote too!

data-negation

If my votes for making training data required for exercising the freedom of study and modify made things worse for those outcomes than me not voting or voting something else (I’m taking your word for it, as I didn’t have time to check how you came to that conclusion), then I agree the chosen voting system was nonsense.

I still think the voting discussion is mostly a distraction.

But I’d appreciate it if OSI would not claim that the resulting decision on this important point is based on working group consensus, because I don’t think there ever was one. I had already argued this point in the past. Your voting analysis, if correct, would just strengthen my preexisting belief on the matter.

4 Likes

I mentioned this in the town hall and I’d like to repeat it here: the co-design process was not supposed to be democratic. It’s designed to be inclusive and representative of a wide variety of interests (global and multistakeholder.)

As @zack noticed already in March and repeated in his comments here, the results of the Analysis phase are not representing a consensus, in both of the scenarios.

I said in the town hall and adding it for the records: the results of the Analysis phase, the split position on the hard requirement of the component “Training dataset” simply gave us a working hypothesis to solve what Redmonk called the AI conundrums. These are the tension among large dataset maintainers like CommonCrawl and LAION, AI developers and the global legal/commercial systems.

The hypothesis was:

Which AI systems would be Open Source AI if we require all the details about the data and we require training code and data preprocessing code? Will we get bad results casting this net?

The purpose of the test was discover a way to give Open Source builders the potential to build powerful AI systems in all possible domains, including medical. In other words, we’re aiming to level the playing field, put Open Source AI on the same grounds as the large corporations who already have all the data they want, and the means to get more, something that the Open Source/Free Software movement have always tried to achieve.

The hypothesis was confirmed in the Validation phase, although it was small, and revealed an interesting pattern: The builders who release training code and data processing code are also the ones who release their datasets.

So one can ask: why don’t you mandate the release of the datasets then? That’s the intention of the draft definition! Except that the Definition needs to take into account the fact that there are 4 different kinds of data (more on an updated FAQ.) The evolution of the draft from 0.0.8 to 0.0.9 and what’s coming next all go in this direction: clarify that the datasets, when legally possible, should be made available.

So to summarize the concerns raised:

Process: The co-design process as conducted was not democratic and ultimately unfair, voting was the wrong method, the selection of the volunteers was biased, the results didn’t show any consensus and many other issues.

Is that fair?