@jasonbrooks I moved your comment here because it’s more pertinent to this thread.
I’m not surprised you “feel” that way but AI is radically different and I don’t expect to see anybody running the test you suggest.
Nobody is rebuilding from scratch OLMo or Pythia just because they want to replicate the build before shipping it to their users (like Debian does for its software packages.) It makes no sense to do so: retraining a system is not going to generate an identical system anyway, and it’s guaranteed to cost money and time without even generating academic points.
What we’re actually seeing are people finetuning trained models to build new systems/applications (like this one or this one) or re-training from scratch, but with very different reasons than what software packagers/distributions do. Training is done to build new systems that improve the performance of existing ones: that’s useful. I can see a reason to re-train also to fix bugs and security issues, when the cost of mitigation is superior to the cost of retraining. But I don’t expect to see anybody retraining like Debian does for software packages.