Open Weights or Open Source AI?

@samj -san,
I thought this topic should be in a separate thread, so I’m glad you created one.

I believe the draft for RC1 might not be ready yet. The D+/D- discussions need to be reflected after version 0.0.9. That alone seems like a challenging task, and considering the discussions from the past few days, the release of RC1 still seems premature. Additionally, it took three months between the 0.0.8 and 0.0.9 releases, which was already several months behind the original schedule. Given the delays, I don’t personally feel there is any particular reason to insist on All Things Open as the goal for the 1.0 release. (As a non-native English speaker, I face the challenge of sleep deprivation from keeping up with everyone’s discussions.)

As for your point about the voting in the working group, I feel it is necessary to verify whether it was fair. If your concerns are valid, even if we manage to create the correct OSAID, more people will lose trust in OSI, which is not a good outcome.

However, I don’t think there is any need to halt the current development process. As Zack-san expressed in another topic, I also believe we can improve the current 0.0.9 to a point where most people will be satisfied.

Currently, Japan’s National Institute of Informatics (NII) is continuing development with the goal of making GPT-3-class large language models reproducible, even by ordinary companies, while ensuring that all necessary data, code, and developed models are published under Open Source compliant licenses. Several models ranging from 1.3B to 172B have already been released, and all the datasets used are publicly available on the following site. The licenses are CC BY and ODC-BY.
https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3

If CC BY and ODC-BY are considered OSD-compliant, then all components developed by this organization are open source. Thus, all AI systems developed by this organization meet the conditions required by OSAID 0.0.9. In other words, under the current OSAID, these systems would be considered Open Source AI.

However, they have not declared their development results as Open Source in any context. I believe this is because they are aware of the risk that their work could unintentionally cease to be Open Source. If OSAID ultimately requires the completeness of datasets, then if any part of a dataset is lost for any reason, their AI system could no longer be called Open Source AI. In that case, the safest solution is not to claim to be Open Source from the beginning. Yes, there are cases that are the exact opposite of companies like Meta, who falsely claim to be Open Source.

I want them to be able to proudly declare their work as Open Source AI, so I believe we need to allow for at least that level of imperfection. This is not to say that the current 0.0.9 wording should remain unchanged.

1 Like