Date
1 - 6 of 6
standardizing opt-out of EU data mining rights?
Luis Villa
Hi, all-
[Starting here, though I realize SPDX cannot be a complete answer to this problem. Also not on spdx-ai because it isn't about AI models/data, but happy to move discussion or cc if that makes sense.]
As you all have probably seen, one area of interesting research in machine learning right now is training models on source code in order to generate more source code. Whether or not this is legal in the US is somewhat unclear, but in the EU there appears to be more clarity: data mining is legal, but a licensor can opt out.[1]
The W3C has done some work on how to implement this opt out in the digital space[2] but as you would imagine it is optimized for the web environment, not source code. So there is, as of yet, no standardized way for source code authors to express their desire to opt-out of data mining, as is their right under EU law.
So, some questions/thinking out loud about what role SPDX might play in such an opt-out scheme.
Presume, for purposes of discussion, that someone else writes a standardized data mining opt-out clause, tailored for use with open source software, that a developer could attempt to apply to their project.
(1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
(2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
(3) Because this is a restriction for a specific use case it might not be OSD-compliant, or might not be GPLv2-compliant. Without trying to answer here whether it is OSD-compliant, what requirements would SPDX want to see met? Would OSI review/approval be necessary? "Mere" deployment/usage in the wild? Other?
This is not a purely hypothetical question, for what it is worth - people in the AI community (specifically, part of the BigCode project[3]) are actively trying to figure this out right now, and I'd like to be able to build a bridge there if this group thinks it would be appropriate.
Thanks-
Luis
[1] some more details: https://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/
Richard Fontana
On Thu, Nov 10, 2022 at 3:01 PM Luis Villa <luis@...> wrote:
[...]
https://github.com/spdx/change-proposal/issues/4#issuecomment-1283004681
https://github.com/spdx/change-proposal/issues/4#issuecomment-1304842184
Basically, I believe SPDX has locked itself into a model of what an
"exception" is that is based on normative FSF doctrine built up around
FSF-authorized GPL exceptions, but which does not fully reflect how
standardized license terms actually get supplemented by other terms in
the real world with the GPL and other FOSS licenses (in some cases by
removing permissions, and in some cases where it is not actually clear
whether permissions are being removed). I think this is inconsistent
with SPDX's professed mission of focusing on "just the facts". I think
Steve's view is that indeed generalizing the concept of what an
exception is would break some expectations around models and tooling.
- Richard
[...]
(1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?Recent exchange that is possibly slightly related to those questions:
(2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
https://github.com/spdx/change-proposal/issues/4#issuecomment-1283004681
https://github.com/spdx/change-proposal/issues/4#issuecomment-1304842184
Basically, I believe SPDX has locked itself into a model of what an
"exception" is that is based on normative FSF doctrine built up around
FSF-authorized GPL exceptions, but which does not fully reflect how
standardized license terms actually get supplemented by other terms in
the real world with the GPL and other FOSS licenses (in some cases by
removing permissions, and in some cases where it is not actually clear
whether permissions are being removed). I think this is inconsistent
with SPDX's professed mission of focusing on "just the facts". I think
Steve's view is that indeed generalizing the concept of what an
exception is would break some expectations around models and tooling.
- Richard
Luis Villa
Thanks for the links, Richard. I'll try to follow up there though of course welcome further discussion here as well.
On Thu, Nov 10, 2022 at 5:06 PM Richard Fontana <rfontana@...> wrote:
On Thu, Nov 10, 2022 at 3:01 PM Luis Villa <luis@...> wrote:
[...]
> (1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
>
> (2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
Recent exchange that is possibly slightly related to those questions:
https://github.com/spdx/change-proposal/issues/4#issuecomment-1283004681
https://github.com/spdx/change-proposal/issues/4#issuecomment-1304842184
Basically, I believe SPDX has locked itself into a model of what an
"exception" is that is based on normative FSF doctrine built up around
FSF-authorized GPL exceptions, but which does not fully reflect how
standardized license terms actually get supplemented by other terms in
the real world with the GPL and other FOSS licenses (in some cases by
removing permissions, and in some cases where it is not actually clear
whether permissions are being removed). I think this is inconsistent
with SPDX's professed mission of focusing on "just the facts". I think
Steve's view is that indeed generalizing the concept of what an
exception is would break some expectations around models and tooling.
- Richard
Luis Villa
On Thu, Nov 10, 2022 at 5:06 PM Richard Fontana <rfontana@...> wrote:
On Thu, Nov 10, 2022 at 3:01 PM Luis Villa <luis@...> wrote:
[...]
> (1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
>
> (2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
Recent exchange that is possibly slightly related to those questions:
https://github.com/spdx/change-proposal/issues/4#issuecomment-1283004681
https://github.com/spdx/change-proposal/issues/4#issuecomment-1304842184
Basically, I believe SPDX has locked itself into a model of what an
"exception" is that is based on normative FSF doctrine built up around
FSF-authorized GPL exceptions, but which does not fully reflect how
standardized license terms actually get supplemented by other terms in
the real world with the GPL and other FOSS licenses (in some cases by
removing permissions, and in some cases where it is not actually clear
whether permissions are being removed).
Note that this has gone beyond the hypothetical; for some reason they put the text behind a weird GitHub process-wall so I have not read the text nor seen what they do to make the information machine-readable:
J Lovejoy
Hi Luis,
While I'm barely getting my head around the many complications related to the reality of AI models and data, let alone the related licensing issues...
Let me try to answer some of your questions below as to the SPDX License List and process :)
Cheers,
Jilayne
While I'm barely getting my head around the many complications related to the reality of AI models and data, let alone the related licensing issues...
Let me try to answer some of your questions below as to the SPDX License List and process :)
Cheers,
Jilayne
On 11/10/22 12:58 PM, Luis Villa wrote:
JL: so you are thinking of a clause that would not be a stand-alone license, but made to be used with existing open source licenses? (assuming that is correct...)Hi, all-
[Starting here, though I realize SPDX cannot be a complete answer to this problem. Also not on spdx-ai because it isn't about AI models/data, but happy to move discussion or cc if that makes sense.]
As you all have probably seen, one area of interesting research in machine learning right now is training models on source code in order to generate more source code. Whether or not this is legal in the US is somewhat unclear, but in the EU there appears to be more clarity: data mining is legal, but a licensor can opt out.[1]
The W3C has done some work on how to implement this opt out in the digital space[2] but as you would imagine it is optimized for the web environment, not source code. So there is, as of yet, no standardized way for source code authors to express their desire to opt-out of data mining, as is their right under EU law.
So, some questions/thinking out loud about what role SPDX might play in such an opt-out scheme.
Presume, for purposes of discussion, that someone else writes a standardized data mining opt-out clause, tailored for use with open source software, that a developer could attempt to apply to their project.
JL: it could, potentially, be treated as an "exception" (that is, as described on the SPDX License List exceptions page: "exceptions grant an exception to a license condition or additional permissions beyond those granted in a license; they are not stand-alone licenses.") - which would mean, submit like usual, review by SPDX-legal, if accepted under the SPDX inclusion guidelines, then it would get assigned an SPDX id and could be used by way of an SPDX license expression using the operator, WITH
(1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
JL: it probably would be. This is because the long-standing license inclusion guidelines more-or-less followed the OSD, so we would not accept further restrictions. Since the license inclusion guidelines were updated and loosened a bit a couple years ago, we have not explicitly discussed a revised policy as to exception.
(2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
JL: well, see above, and the factors in the license inclusion guidelines would still apply
(3) Because this is a restriction for a specific use case it might not be OSD-compliant, or might not be GPLv2-compliant. Without trying to answer here whether it is OSD-compliant, what requirements would SPDX want to see met? Would OSI review/approval be necessary? "Mere" deployment/usage in the wild? Other?
This is not a purely hypothetical question, for what it is worth - people in the AI community (specifically, part of the BigCode project[3]) are actively trying to figure this out right now, and I'd like to be able to build a bridge there if this group thinks it would be appropriate.
Thanks-Luis
[1] some more details: https://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/
J Lovejoy
On 11/15/22 12:34 PM, Luis Villa wrote:
JL: to be clear, this proposal is about an improved way to capture "exceptions" that are NOT on the SPDX License List, so relevant to the extent that such a hypothetical additional clause would not end up being eligible for inclusion on the SPDX License List, you could still represent it with an SPDX conformant license expressionThanks for the links, Richard. I'll try to follow up there though of course welcome further discussion here as well.
On Thu, Nov 10, 2022 at 5:06 PM Richard Fontana <rfontana@...> wrote:
On Thu, Nov 10, 2022 at 3:01 PM Luis Villa <luis@...> wrote:
[...]
> (1) Would SPDX be an appropriate mechanism for representing that opt-out clause in a machine-readable way, eg via a short identifier + WITH?
>
> (2) This would be, to the best of my knowledge, the first proposed Exception that removes permissions[3] rather than granting new permissions. Would that be acceptable to SPDX? Would that break any implicit or explicit expectations of the specifications or tooling?
Recent exchange that is possibly slightly related to those questions:
https://github.com/spdx/change-proposal/issues/4#issuecomment-1283004681
https://github.com/spdx/change-proposal/issues/4#issuecomment-1304842184
JL: I don't think it's inconsistent with this, but it is consistent with the prior standing license inclusion guidelines - see previous emailBasically, I believe SPDX has locked itself into a model of what an
"exception" is that is based on normative FSF doctrine built up around
FSF-authorized GPL exceptions, but which does not fully reflect how
standardized license terms actually get supplemented by other terms in
the real world with the GPL and other FOSS licenses (in some cases by
removing permissions, and in some cases where it is not actually clear
whether permissions are being removed). I think this is inconsistent
with SPDX's professed mission of focusing on "just the facts".