Open Source to require documentation for the first time for AI?

Whao, wait, what @nick? You mean to say the interview with @Lumin I’ve already cited was part of the OSI’s “deep dive”, and his input was ignored? Was he not clear enough? I can’t even…

[00:07:XX] SF: […] The neural network that has been trained to detect cats and dogs, now, if we wanted to distribute that piece of software inside Debian, or inside one of the few free software, mobile open-source systems to help retrieve our pictures, what do we need?

[00:08:20] MZ: Actually, we need lots of things, especially if we are doing distribution of free software. If we create a artificial intelligence application, we will need data. We’ll need the code for training neural network. We will need the inference code for actually running the neural network on your device. Without any of them, the application is not integral. None of them can be missing.

[00:08:52] SF: The definitions that we have right now for what is complete and corresponding source code, and how can it be applied to an AI system to an application like this that detects pictures of dogs?

[00:09:04] MZ: Well, actually, the neural network is a very simple structure, if we don’t care about its internal. You can just think of it as a matrix multiplication. Your input is an image and we just do lots of matrix multiplication, and it will give you a output vector. This is simply the things happened in the software. Both training code and the inference code are doing the similar thing.

Apart from the code, the data is something that can change. For example, we can use the same training and inference code for different data set. For example, I released a code for cat and dog classification problem, but you can decode and you say, “Oh, I’m more interested in classifying flowers.” Then you can collect new data sets about different kinds of flowers and use the same code to train the neural network and do the classification by yourself.

If you want to provide a neural network that performs consistently everywhere, you also have to release the pre-trained neural network. If you are releasing free software that also requires you to release the training data as well, because free software requires some freedom that allows you to study, to modify or to reproduce the work. Without any training data, it is not possible to reproduce the neural network that you have downloaded. That’s a very big issue.

Nowadays, in the research community, people are basically using neural networks that are trained on non-free data set. All of the existing models are somewhat problematic in terms of license.

1 Like