Re: standardizing opt-out of EU data mining rights?

J Lovejoy

Hi Luis,

While I'm barely getting my head around the many complications related to the reality of AI models and data, let alone the related licensing issues...

Let me try to answer some of your questions below as to the SPDX License List and process :)


On 11/10/22 12:58 PM, Luis Villa wrote:
Hi, all-

[Starting here, though I realize SPDX cannot be a complete answer to this problem. Also not on spdx-ai because it isn't about AI models/data, but happy to move discussion or cc if that makes sense.]

As you all have probably seen, one area of interesting research in machine learning right now is training models on source code in order to generate more source code. Whether or not this is legal in the US is somewhat unclear, but in the EU there appears to be more clarity: data mining is legal, but a licensor can opt out.[1] 

The W3C has done some work on how to implement this opt out in the digital space[2] but as you would imagine it is optimized for the web environment, not source code. So there is, as of yet, no standardized way for source code authors to express their desire to opt-out of data mining, as is their right under EU law.

So, some questions/thinking out loud about what role SPDX might play in such an opt-out scheme. 

Presume, for purposes of discussion, that someone else writes a standardized data mining opt-out clause, tailored for use with open source software, that a developer could attempt to apply to their project.
JL: so you are thinking of a clause that would not be a stand-alone license, but made to be used with existing open source licenses? (assuming that is correct...)

(1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
JL: it could, potentially, be treated as an "exception" (that is, as described on the SPDX License List exceptions page: "exceptions grant an exception to a license condition or additional permissions beyond those granted in a license; they are not stand-alone licenses.") - which would mean, submit like usual, review by SPDX-legal, if accepted under the SPDX inclusion guidelines, then it would get assigned an SPDX id and could be used by way of an SPDX license expression using the operator, WITH

(2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
JL: it probably would be. This is because the long-standing license inclusion guidelines more-or-less followed the OSD, so we would not accept further restrictions. Since the license inclusion guidelines were updated and loosened a bit a couple years ago, we have not explicitly discussed a revised policy as to exception.

(3) Because this is a restriction for a specific use case it might not be OSD-compliant, or might not be GPLv2-compliant. Without trying to answer here whether it is OSD-compliant, what requirements would SPDX want to see met? Would OSI review/approval be necessary? "Mere" deployment/usage in the wild? Other?
JL: well, see above, and the factors in the license inclusion guidelines would still apply

This is not a purely hypothetical question, for what it is worth - people in the AI community (specifically, part of the BigCode project[3]) are actively trying to figure this out right now, and I'd like to be able to build a bridge there if this group thinks it would be appropriate.


Join to automatically receive all group messages.