The Open Source AI Definition v.1.0-RC1 is available for comments

In OSAID 0.0.9, the phrase “recreate a substantially equivalent system” was used in the description of data information, but in OSAID RC1, this expression was changed to “build a substantially equivalent system,” avoiding the use of the word “recreate.” I believe this is because the purpose of Open Source is not reproducibility. I think that reasoning is valid, but on the other hand, I feel that the scope of what constitutes a “substantially equivalent system” has slightly expanded. If we assume a scale from 0 to 10, with 0 being a system that is merely similar and 10 being an identical system, what was previously defined as an “8 or higher” now feels like it permits a range of around 6-7. Perhaps this concerns me because I’ve been scrutinizing it more closely while translating into Japanese.

However, unlike in 0.0.9, the addition of detailed descriptions of data information has made this point less of a concern. At this point, I feel that the explanation of data information is overall well-structured and now functions adequately as a definition.

One thing I am concerned about is the phrase “including for a fee” in point (3). I believe the provision is legitimate, but personally, I fear it could lead to unnecessary disputes when interpreted under Japanese law. This phrase likely refers to purchasing commercially available datasets. In Japan, Copyright Act Article 30-4, which broadly permits the use of copyrighted works for AI training without permission, is interpreted such that cases falling under “If the action would unreasonably prejudice the interests of the copyright owner” are generally limited to situations where the dataset is sold commercially. In other words, if someone purchases a paid dataset, that buyer can use it for AI training, but third parties are not granted the freedom provided under Article 30-4 and are effectively forced to purchase the dataset themselves. I don’t think this is a major issue, but I am concerned that the presence of the phrase “including for a fee” could spark sensitive discussions within Japan about the rights to use copyrighted works for AI training. However… after thinking about it for two days, I feel I may be overthinking this. But, I believe it would be better without the phrase…

Article 30-4 It is permissible to exploit a work, in any way and to the extent considered necessary, in any of the following cases, or in any other case in which it is not a person’s purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work; provided, however, that this does not apply if the action would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation:
(ii) if it is done for use in data analysis (meaning the extraction, comparison, classification, or other statistical analysis of the constituent language, sounds, images, or other elemental data from a large number of works or a large volume of other such data; the same applies in Article 47-5, paragraph (1), item (ii));