While I’m not posting entire articles given recent discussions, the original post (Turning Open Source AI Insecurity up to 11 with OSI’s RC1) covers the inability to even deploy Open Source AI models without the training datasets:
While large “open weight” models from the likes of Meta and Mistral are able to be deployed by businesses because of the reputation and regulation of their creator, and due to widespread adoption that would surface certain vulnerabilities, Open Source is more about the long tail of software that scratches an itch for its developer. There will be no way to be confident in such models without being able to inspect the training data, and this limitation in the release candidate definition can and will be extensively exploited. Which means there will be no way to even use these models without accepting significant risk that is impossible to mitigate.
It is the equivalent of the freeware of old, where binaries without source code would be made available free of charge (i.e., free as in beer, but not as in freedom), and you would take your life into your own hands by executing them. Only this time they have access to all of your local & remote data and have agency to take actions on your behalf in order to maximise utility. Furthermore, many of them are clever enough to think for themselves — 6 months ago we learned LLM Agents can Autonomously Exploit One-day Vulnerabilities with an 87% success rate simply by reading CVE reports, and last week that with ShadowLogic “threat actors can implant codeless backdoors in ML models that will persist across fine-tuning and which can be used in highly targeted attacks”, as but two examples of recent advances.