Welcome diverse approaches to training data within a unified Open Source AI Definition

Alek_Tarkowski · September 4, 2024, 10:21am

@arandal thank you for making the point that open source software, and open source AI, need to be considered in the broader context of other open movements. I fully agree.
On this, and also replying to @mjbommar, great point about working with CC (disclosure: I’m on the board of CC). But I think the cooperation needs to go further, and not limit itself to licensing standards.
One key mechanism to pay attention to is the Open Definition - it has been stewarded by Open Knowledge Foundation, but actually had a community-driven governance model. The Open Definition is, unfortunately, currently dormant. It’s crucial, as it does for content what the OSI definition does for open source code.
Going with this analogy, I think that considering the Open Definition is not enough. Because the issue goes beyond defining licensing standard. Just as there was a need for OSI to work on a definition of open source AI, there’s a need to set a standard for how various resources (considered data from the perspective of AI development) are governed, made available, and used. This standardization work has not yet been undertaken, and at best would be a shared effort by various orgs mentioned in this thread.