Open Source to require documentation for the first time for AI?

samj · October 8, 2024, 2:59am

Whao, wait, what @nick? You mean to say the interview with @Lumin I’ve already cited was part of the OSI’s “deep dive”, and his input was ignored? Was he not clear enough? I can’t even…

[00:07:XX] SF: […] The neural network that has been trained to detect cats and dogs, now, if we wanted to distribute that piece of software inside Debian, or inside one of the few free software, mobile open-source systems to help retrieve our pictures, what do we need?

[00:08:20] MZ: Actually, we need lots of things, especially if we are doing distribution of free software. If we create a artificial intelligence application, we will need data. We’ll need the code for training neural network. We will need the inference code for actually running the neural network on your device. Without any of them, the application is not integral. None of them can be missing.

[00:08:52] SF: The definitions that we have right now for what is complete and corresponding source code, and how can it be applied to an AI system to an application like this that detects pictures of dogs?

[00:09:04] MZ: Well, actually, the neural network is a very simple structure, if we don’t care about its internal. You can just think of it as a matrix multiplication. Your input is an image and we just do lots of matrix multiplication, and it will give you a output vector. This is simply the things happened in the software. Both training code and the inference code are doing the similar thing.

Apart from the code, the data is something that can change. For example, we can use the same training and inference code for different data set. For example, I released a code for cat and dog classification problem, but you can decode and you say, “Oh, I’m more interested in classifying flowers.” Then you can collect new data sets about different kinds of flowers and use the same code to train the neural network and do the classification by yourself.

If you want to provide a neural network that performs consistently everywhere, you also have to release the pre-trained neural network. If you are releasing free software that also requires you to release the training data as well, because free software requires some freedom that allows you to study, to modify or to reproduce the work. Without any training data, it is not possible to reproduce the neural network that you have downloaded. That’s a very big issue.

Nowadays, in the research community, people are basically using neural networks that are trained on non-free data set. All of the existing models are somewhat problematic in terms of license.

Topic		Replies	Views
Open Source AI needs to require data to be viable Open Source AI	49	2344	June 14, 2024
[bug] OSAID RC1 still mentions the freedom to study Open Source AI	4	135	October 7, 2024
How to describe the acceptable terms to receive documentation? Open Source AI draft	7	1346	July 12, 2024
Why and how to certify Open Source AI Open Source AI	11	992	July 12, 2024
Data is required - my arguments all in one concise place Open Source AI	2	89	September 25, 2024

Open Source to require documentation for the first time for AI?

Related topics