I’ve reread your paper and I don’t think we’re too far apart. Your premise is that we need to have a binary categorization so that one will know whether a system is entitled to the benefits given to open source systems under the EU AI Act or not. You propose developing a judgment based on a cumulative score. I assume that your concept is that there is some minimum threshold for all metrics. In other words, no one would be able to game the system by being completely closed on one measure but offset it by being super open on another. The OSI’s undertaking is to figure out what the relevant metrics are and what the minimum threshold for each metric is for a system to be considered “open source.” The difference is the OSI is describing that minimum threshold in words, not implementing it in an algorithm.
But I still have some areas of concern about your paper. First, you seem to be saying that RAIL licenses are open source licenses. They are not; they impose field-of-use restrictions.
This is somewhat related to my second disagreement. I don’t follow your chart for how you decide whether something is “open,” “partial,” or “closed.” When it comes to licenses, there is no such thing as “partial” - it complies with the Open Source Definition or it does not, there is no in-between. It instead appears that you are using the word “open” to mean “publicly available,” which are two very different things. For example, under the “RL weights” column, the tool tips say things like “full model weights made available,” “finetuned model available for download,” “instruct version of the model made available but no information on fine-tuning procedure provided.” But this doesn’t inform anyone whether they can use, reproduce, modify or distribute the RL weights. Being “open” assures these rights, not simply that you have access to it. If “open” only means “I can see it,” then every published book would be “open.”
You also don’t seem to have evaluated what the rights are for each of the components. You have only one “license” column, not columns for both whether a component is publicly available and whether there are assurances (licenses, most typically) ensuring that the components can be used by others. While your article is critical of companies that are putting only their models under an open source license, you are encouraging that by having only one license column across five components - “open code,” “LLM data,” “LLM weights,” “RL data,” and “RL weights.” What component does the license cover? As spot points out:
The OSI approach is to require that every piece of the puzzle must be available and it must be under a license, promise, or covenant that ensures others can use, copy, distribute and modify each one of the necessary components. That does not appear to be something you are considering or requiring in your proposal for something to be considered “open.”