standardizing opt-out of EU data mining rights?

Luis Villa

Hi, all-

[Starting here, though I realize SPDX cannot be a complete answer to this problem. Also not on spdx-ai because it isn't about AI models/data, but happy to move discussion or cc if that makes sense.]

As you all have probably seen, one area of interesting research in machine learning right now is training models on source code in order to generate more source code. Whether or not this is legal in the US is somewhat unclear, but in the EU there appears to be more clarity: data mining is legal, but a licensor can opt out.[1] 

The W3C has done some work on how to implement this opt out in the digital space[2] but as you would imagine it is optimized for the web environment, not source code. So there is, as of yet, no standardized way for source code authors to express their desire to opt-out of data mining, as is their right under EU law.

So, some questions/thinking out loud about what role SPDX might play in such an opt-out scheme. 

Presume, for purposes of discussion, that someone else writes a standardized data mining opt-out clause, tailored for use with open source software, that a developer could attempt to apply to their project. 

(1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?

(2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?

(3) Because this is a restriction for a specific use case it might not be OSD-compliant, or might not be GPLv2-compliant. Without trying to answer here whether it is OSD-compliant, what requirements would SPDX want to see met? Would OSI review/approval be necessary? "Mere" deployment/usage in the wild? Other?

This is not a purely hypothetical question, for what it is worth - people in the AI community (specifically, part of the BigCode project[3]) are actively trying to figure this out right now, and I'd like to be able to build a bridge there if this group thinks it would be appropriate.


Join { to automatically receive all group messages.