We heard you: let's focus on substantive discussion

juliaferraioli · September 27, 2024, 2:38pm

Chiming in on this specific point, I requested documentation of processes and roles late last year, when discussion was still taking place in a closed mailing list, and was told that transparency into the process was not a priority. When pointing out that people were applying different definitions to terminology, it was dismissed as not being material to arriving at consensus.

shujisado · September 27, 2024, 3:39pm

Yes, the previous process was likely neither democratic nor transparent.

While you were having discussions on that closed mailing list, I was writing many comments on the early drafts that were publicly available on HACKMD. I remember wondering why no one was commenting on something so important.
Now, I can see a lot more information than I could back then. Recently, there are posts every day, so it’s a bit overwhelming to keep up, but I’m satisfied with the current situation.

We share the common goal of properly defining Open Source AI, so let’s focus on that.

juliaferraioli · September 27, 2024, 4:18pm

It is part of the goal, because we need to understand how we got to this point in order to identify what aspects of the current definition stem from misunderstandings and misalignments. The biases embedded in the early process have carried through to today’s definition.

This is a very meta point, as this is the same problem that we encounter with machine learning

shujisado · September 28, 2024, 2:18am

I see. I’m starting to want to understand that as well. What are the biases embedded in that early process? Is it the issue of “dataset completeness,” which is at the core of most recent discussions, or are we fundamentally influenced by hallucinations?

I am probably one of the earliest people in Japan to have been involved with Free Software/Open Source, but unfortunately, I do not speak English at all, which makes participating in discussions on this forum very challenging. Since I’m not proficient in English, I translate all the Drafts and other documents (including US and EU laws and precedents) into complete Japanese myself, and then participate in discussions based on that. So, I am curious about where the biases you mention might have been hidden.

To be honest, the early drafts certainly made me want to shout, “I have no idea what OSI is trying to do!”

nick · September 30, 2024, 6:58pm

Hi @anon18632855 and everyone,

After looking at your careful analysis of the Working Groups (WG) voting results, and after discussing this with @stefano and @Mer, we would like to acknowledge that you are in fact right. This is what Mer had to share:

What happened there was we actually removed negative votes from the process after the Llama 2 group, because we thought it was making the voting process more complicated. It’s really no more than that. So the other WGs (Bloom, Pythia, and OpenCV) did not have the option to cast a negative vote.

We agree that one cannot add the voting numbers from each WG as the same, because they used different methodologies. So from the Llama WG, all -1 votes should be converted to 0. Sam’s analysis is correct.

We would like to acknowledge and apologize for the mistake. Thank you @anon18632855 for your detailed analysis.

Kind regards,
Nick

anon18632855 · October 1, 2024, 6:28am

Thank you, @nick.

I know I didn’t make it easy for you to post this, and I really appreciate your candor and transparency—it’s reassuring to see we’re all working toward the same goal.

I owe you all an apology too for pushing the issue when I felt it was falling on deaf ears. It’s understandable that it took time to process everything, especially at the 11th hour.

You, @stefano, and I are personally invested in the OSI’s mission, and while I’m sure @Mer is as well after working on this for so long, I understand she’s also providing a service, so this is especially for her.

I’m genuinely intrigued by the co-design process, and with the additional context, the error makes more sense. I always assume good faith unless proven otherwise, and I had a feeling it might have been something like this.

There’s still work to be done. While I know I said I’d post less, I trust you’ll approach my recent replies with the same openness and willingness to reconsider, whether for RC1 or RC2. The MOF cited on many occasions calls for datasets under “any license or unlicensed,” and I hope we can bring you around to the position that this is a better compromise than an unenforceable and dangerous loophole that doesn’t protect the four freedoms.

That’s enough for one day. Thanks for your continued efforts.

Sam