Distillation and OSS Licensing

A point of OSD and OSS Licensing came up recently in discussion of a proposed OSS Model License that I’d love to discuss here at greater length. Specifically, this concerns attempts to enforce application of license terms to model distillation.

Here is a sample clause from a draft license:

“Derivative Materials” means all improvements, modifications or derivative works to the
Licensed Material or any part thereof, which are created or developed by You (either by
Yourself or jointly with other third parties), including any derivative model developed by
transferring patterns of weights, parameters, activations and/or Output from the Model, such as
through distillation methods or synthetic data generation techniques, in order to replicate,
approximate, or otherwise achieve functional behavior that is similar to the Model.

This kind of clause is understandable because of the practice of distillation, for example the creation of DeepSeek by training it on ChatGPT. This is a concern for model licensing, since distillation is even being offered as a service these days. This can be regarded by the model owner as a type of copying, although legally it may not be copying.

This clause raises two questions for the concept of a copyleft license for models:

  1. Is this type of clause a violation of the OSD, if the clause does not prohibit distillation, but makes it subject to the conditions of the otherwise-OSS license? Does it violate OSD9 or OSD6 somehow?

  2. In copyright law, does a clause like this have any possible effect? Or is it cancelled out by the same kinds of precedents that allow LLMs to be trained on copyrighted material while ignoring license conditions?

The OpenMDW license does not apply any conditions to model output. However, OpenMDW is also designed as a permissive license, applying minimal conditions to any use of the software.

I think this is a very thorny issue. To begin with, I do not think there is anything like a global consensus on it at this point.

From my perspective, defining models that are technically dependent on the original model’s weights, activations, or outputs as “a kind of derivative work,” and then imposing copyleft obligations only on those models, is not something that can be said to be an Open Source Definition violation straight away. As long as we are talking purely about a copyleft requirement such as “if you distribute the derivative, you must use the same license,” the situations in which OSD #6 or #9 become directly problematic seem rather limited.

That said, the scope of “technical dependence” is exactly where interpretations are likely to diverge between jurisdictions, and this is what makes the issue so tricky. How far one goes in treating something as “reliant on the original model” is entangled with copyright doctrines on derivative works and substantial reliance.

Looking only from the copyright-law angle, there is fair use in the United States, and TDM exceptions in the EU and Japan, and the act of training itself can potentially be justified within those frameworks. On the other hand, for users who download and use the model, there is at least room to argue that, if the license is treated as a contract, they incur an obligation “to use it in accordance with those conditions.”

In other words, for users who actually receive the model artifact and the license text, I think there is some significance in imposing a duty such as “if you distribute a distilled model, you must release it under the same license.”

By contrast, the situation is different in cases where someone performs distillation only via API calls or by scraping publicly available outputs. Such users may well never have accessed the model artifact or the license at all, and in that case one has to conclude that there is no agreement to the license in the first place, so the contractual terms simply do not reach them. Even if we assume that the clause is valid in principle, the practical cost of proving that “this smaller model is in fact a distillation of that specific model” is very high, so I am quite skeptical about how strong a deterrent effect it can actually have.

So you’re saying that the license under which the model was used could apply to use of the model’s output, even though the output itself might not be copyrightable?

That seems reasonable in the context of models, but that brings us back into possible OSD9 violation. Consider, for example, a piece of graphics design software that asserted licensing conditions over and image created with it. We would not accept that as OSS.

I think there is a slight misunderstanding of what I was trying to say.

When I wrote:

I did not mean that the license would automatically attach to any and all uses of individual outputs. I was trying to describe a much narrower situation:

  • There is a user who has actually downloaded the model artifact under a license,
  • that license defines certain “Derivative Materials” in contractual terms, and
  • among those, it includes “a new model created by using this model’s weights or outputs for distillation”
  • and then it says “if you distribute that particular kind of model, you must do so under the same license.”

In other words, the obligation is not arising because each output is copyrightable. It is arising, if at all, because the user has agreed by contract that “if I use this model in this particular way and then distribute the resulting model artifact, I will follow these conditions.”

That feels different to me from the classic example of graphics software claiming licensing control over normal images created with it. In that graphics example, the tool is trying to control the user’s own creative works in general. In the distillation case, the license is trying to control a specific new model that is technically and statistically dependent on the licensed model, and only where the user has explicitly accepted that condition when downloading the original model.

Whether such a clause should be considered compatible with the OSD is exactly the hard question, and I agree that OSD#9 is the place where the tension shows up. My point was only that, in legal terms, there is at least a colorable argument that this is a contractual obligation about a particular class of derivative models, not a general attempt to impose licensing conditions on arbitrary uses of uncopyrightable outputs.

This is a tricky area where open source principles and ML practices don’t fully align yet. From an OSD perspective, it likely doesn’t violate OSD6, but OSD9 is where things become less clear especially if copyleft obligations are being applied to models created through behavior learning rather than direct copying.

Legally, enforceability is also uncertain. Copyright law doesn’t clearly treat distillation as creating a derivative work, which is why training on copyrighted material typically doesn’t carry over license terms. That uncertainty is probably why permissive licenses like OpenMDW avoid placing conditions on model outputs.

1 Like