My stance on this issue is the same I stated in Is the definition of "AI system" by the OECD too broad? - #20 by Aspie96. I don’t see why testing code would be required if a program is distributed that doesn’t perform any testing. If one develops a new test for a model they previously released, must they publish that for the model to be open source.
I’m not sure what the “data processing code” would imply. Does that include the code of any software used to produce the dataset that had some effect on the data? Images are cropped, compressed and altered in all sorts of way, including by software internal to cameras. One would probably not describe that as a preprocessing technique but the line seems rather blurred to me.
That said, outcome aside, I’d like to point out that out of 4 working groups, 2 refers to models under proprietary licenses, one of which OSI already complained about for being described as “open source”.
As I pointed out in Report from BLOOM working group - #2 by Aspie96, Mistral has published models under the Apache 2.0 license. I think a Mistral working group, as well as groups from other related open source AI projects would be more aligned with OSS values than groups revolving around proprietary systems, be they open access such as LLaMA and BLOOM or not such as ChatGPT.
As an additional note, this only represents a very partial portion of the landscape at hand. Of 4 groups, 3 refer to LLMs and the other one to a popular library. This misses other kinds of models, including both foundational models and not (such as Open Image Denoise), audio models and so on.
This also seems to only focus to ML systems that produce numerical statistical systems with a (mostly) predetermined structure and a (potentially) large amount of parameters.