Open Source AI needs to require data to be viable

Dolma contains data that is not being used in a fashion consistent with its license, just like the Pile does. They didn’t go through C4 and validate the licensing of everything in it.

1 Like