I’m cool with that, I didn’t choose ‘sameness’ as an idea per se, and we clearly need a more precise criteria (even if it’s from an imprecise measurement.)
How about ‘‘functionally similar’’? Or ‘‘practically similar’’?
Would it be useful to innumerate some of the value being sought with an Open Data dataset into the proof?
Is this measurement inclusive of the ‘‘model vs model’’ matrix values, looking to keep QA benchmark values within a certain value of the original?
What else matters in a recreated/retrained-from-source model?