Definition Validation: Seeking Volunteers

Mer · May 1, 2024, 5:37pm

CONTEXT Last week Stefano announced v. 0.0.8, the first feature complete version of the Open Source AI Definition (OSAID).

ASK We are now seeking volunteers to validate that definition by using it to review additional AI systems that are self-described as open. If you are interested, please comment below or DM me.

TIME We would like to complete this validation task by Monday, May 20th.

TASK You will have a spreadsheet (example below) in which you locate and link to the license, research paper, or other document that grants rights or provides information for each required component. You will then indicate in each cell whether the document Allows or Restricts the ability to study use, modify, or share that component.

Example: spreadsheet used to review Llama 2 according OSAID v. 0.0.6

SYSTEMS We are interested in reviewing about ten self-described open AI systems as part of this definitional process. Four (marked *) have already been reviewed by the workgroups. Below are some more systems we are interested in being part of this validation task. If there is another system you would like to review, please say so.

Arctic Jesús M. Gonzalez-Barahona
BLOOM * Danish Contractor, Jaan Li
Falcon Casey Valk, Jean-Pierre Lorre
Grok Victor Lu, Karsten Wade
Llama 2 * Davide Testuggine, Jonathan Torres, Stefano Zacchiroli, Victor Lu
Mistral Mark Collier, Jean-Pierre Lorre, Cailean Osborne
OLMo Amanda Casari, Abdoulaye Diack
OpenCV * Rasim Sen
Phi-2 Seo-Young Isabelle Hwang
Pythia * Seo-Young Isabelle Hwang, Stella Biderman, Hailey Schoelkopf, Aviya Skowron
T5 Jaan Li

TO VOLUNTEER Comment below or DM me if you would like to volunteer. Anyone who can complete the task may do so, either solo or as a group. If you are a creator or advisor on the system you are reviewing, we will also need to identity an unaffiliated individual to review the system, so please disclose that when you volunteer. Your name and organizational affiliation will also be made public as part of our transparency policy.

Women, trans, and non-binary folx, black, indigenous, latine/o/a, and other people of color, immigrants, people with disabilities, and people from poor and working class backgrounds are encouraged to respond.

Thanks

EDIT: I’ll update the list above with volunteer names so it’s clear where there is greatest need.
(last update: May 14th @ 11:03 am PDT)

amcasari · May 2, 2024, 1:30pm

I volunteer to help with a review!

I have a conflict of interest with #10, T5 (same employer), but could take the lead or assist on any others.

I would prefer to start w/ OLMo, if that isn’t taken yet.

Mer · May 2, 2024, 4:29pm

Thank you, @amcasari! You’ve got review on OLMo. I’m going to make up the new review spreadsheet today and will email you with further details.

stefano · May 3, 2024, 12:06pm

Here is another one claiming to be “truly open”: Can we get someone to review it, please?

Aspie96 · May 4, 2024, 2:33am

There are two kinds of systems I don’t see listed that I would like to see.

Non-foundational models.
Systems which are not based on deep learning (and, preferably, not even on machine learning).

That said, if I can only suggest specific systems, I have at least two:

Open Image Denoise, by Intel.
NLP models such as the Stanford Log-linear Part-Of-Speech Tagger.

stefano · May 6, 2024, 9:25am

Good thinking, thanks! I think you can get started by cloning this table structure and fill in the details.

quaid · May 13, 2024, 6:18pm

Heya Mer,

I love this exercise! I’ve been doing this lightly across systems recently but not to this same depth and level of certainty. In fact, Snowflake Arctic is one I looked more closely at, so I’m happy to help with this one if needed.

Otherwise, I have no preferences and will help wherever needed. Would you like to choose several for me, prioritize them if possible? I’m not aware of my having any affiliation with these systems, and Open Community Architects (OCA) is a neutral consultancy in those regards (aside from a bias for Open.)

Karsten (quaid)

Mer · May 13, 2024, 6:36pm

Thanks, @quaid. I think Arctic is well-covered, but let me see if anyone else needs help. If not, you can just pick a system that works for you. I’ll get back to you soon.

Mer · May 14, 2024, 6:02pm

@quaid I got no requests from the current volunteers, so I’m going to put you on Grok, which is a new system with only 1 volunteer. Please DM me your email address so I can add you to the reviewer chat and spreadsheet.

Aspie96 · May 16, 2024, 4:53am

Two models I hadn’t thought about mentioning previously, but are rather interesting, are Segment Anything, by Meta and Whisper, by OpenAI.

What makes them interesting is the fact that they sparked a lot of attention when they were published and they are built by companies that usually make in-house models (OpenAI) or often release models under proprietary licenses (Meta), but in this case were released under open source licenses (both code and weigths): Apache 2.0 and MIT license respectively.