Training data access

I really agree about the required access to training data. Not just for control and validation.

What makes open source software popular is the access to the source code. The author has a motivation in sharing due external contributions, the user has a motivation in adoption due improved control and ability to integrate and customize. A binary black box trained model doesn’t provide any benefit to both of them: the author keeps all the charges of training (collect, annotate, compute…) and cannot receive contributions (as the training data are opaque to others), the user has the only ability to use it with no modifications.

A binary trained model with no training data should be considered “open source” no more and no less than any proprietary freeware application.

I’ve detailed this position in this blog post. It is in Italian only, if anyone is interested she can probably translate it on the fly with some online translation service.