Hi Luis,
While I'm barely getting my head around the many complications
related to the reality of AI models and data, let alone the
related licensing issues...
Let me try to answer some of your questions below as to the SPDX
License List and process :)
Cheers,
Jilayne
On 11/10/22 12:58 PM, Luis Villa wrote:
Hi, all-
[Starting here, though I realize SPDX cannot be a
complete answer to this problem. Also not on
spdx-ai because it isn't about AI models/data, but happy to
move discussion or cc if that makes sense.]
As you all have probably seen, one area of interesting
research in machine learning right now is training models on
source code in order to generate more source code. Whether
or not this is legal in the US is somewhat unclear, but in
the EU there appears to be more clarity: data mining is
legal, but a licensor can opt out.[1]
The W3C has done some work on how to implement this opt
out in the digital space[2] but as you would imagine it is
optimized for the web environment, not source code. So there
is, as of yet, no standardized way for source code authors
to express their desire to opt-out of data mining, as is
their right under EU law.
So, some questions/thinking out loud about what role SPDX
might play in such an opt-out scheme.
Presume, for purposes of discussion, that someone else
writes a standardized data mining opt-out clause, tailored
for use with open source software, that a developer could
attempt to apply to their project.
JL: so you are thinking of a clause that would not be a stand-alone
license, but made to be used with existing open source licenses?
(assuming that is correct...)
(1) Would SPDX be an appropriate mechanism for
representing that opt-out clause in a machine-readable way,
eg via a short identifier + WITH?
JL: it could, potentially, be treated as an "exception" (that is, as
described on the SPDX License List exceptions page: "exceptions
grant an exception to a license condition or additional permissions
beyond those granted in a license; they are not stand-alone
licenses.") - which would mean, submit like usual, review by
SPDX-legal, if accepted under the SPDX inclusion guidelines, then it
would get assigned an SPDX id and could be used by way of an SPDX
license expression using the operator, WITH
(2) This would be, to the best of my knowledge, the first
proposed Exception that removes permissions[3]
rather than granting new permissions. Would that be
acceptable to SPDX? Would that break any implicit or
explicit expectations of the specifications or tooling?
JL: it probably would be. This is because the long-standing license
inclusion guidelines more-or-less followed the OSD, so we would not
accept further restrictions. Since the license inclusion guidelines
were updated and loosened a bit a couple years ago, we have not
explicitly discussed a revised policy as to exception.
(3) Because this is a restriction for a specific use case
it might not be OSD-compliant, or might not be
GPLv2-compliant. Without trying to answer here
whether it is OSD-compliant, what requirements would SPDX
want to see met? Would OSI review/approval be necessary?
"Mere" deployment/usage in the wild? Other?
JL: well, see above, and the factors in the license inclusion
guidelines would still apply
This is not a purely hypothetical question, for what it
is worth - people in the AI community (specifically, part of
the BigCode project[3]) are actively trying to figure this
out right now, and I'd like to be able to build a bridge
there if this group thinks it would be appropriate.
Thanks-
Luis