Hi everyone,
I would like to share a recent paper that came out from an event organized by Mozilla and EleutherAI that convened 30 scholars and practitioners to create normative principles and technical best practices for creating openly licensed LLM training datasets.
https://arxiv.org/pdf/2501.08365
Let us know what are your thoughts about this paper!