update on only/or later etc.
J Lovejoy
Hi All,
Kate and I just had a call with Richard Stallman of the FSF to try and come to a resolution everyone can be happy with, taking into consideration the ask from the FSF and the many thorough discussions we’ve had on the mailing list and calls. This is similar to an approach we discussed on the last call, with one variation. As such, I’d like to propose the following path forward (again, using GPL-2.0 but for all GNU licenses): Deprecate the "GPL-2.0" identifier and add the word “only” for GPL version 2 only, e.g., "GPL-2.0-only" - this should not be problematic as it does not change the meaning of the identifier. GPL-2.0 has meant ‘version 2 only’ since the SPDX License List was born. We are simply adding explicit language for the identifier. No backwards compatibility issues in terms of the meaning. - we can do a “warning” for people using the deprecated identifier for a period before “GPL-2.0" becomes invalid to give people a chance to update. This will also encourage people who have been sloppy to fix their sloppiness. Add GPL version 2 or later back to the SPDX License List as it’s own entry with the short identifier of “GPL-2.0+” or “GPL-2.0-or-later” - This would essentially put us in the same position we are now: with two options - “only” and “or later” - it just alters how one gets there, where one finds it - this would also put both options back on the license list thus highlighting that the GNU licenses provides these options more obviously and hopefully providing a more overt encouragement to using one or the other - the identifier here could be “GPL-2.0+” (same as always) or “GPL-2.0-or-later” (differentiation from the + modifier might be better for tooling?) - we can discuss which is better, FSF is fine with either. - if we go with “GPL-2.0-or-later”, can take same approach with warning re: “GPL-2.0+” then invalid? Keep the + modifier in the license expression language - this allows use of + with other licenses as always, no change, no backwards compatibility Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause 9 - on the last call, we came up with two proposals that both incorporate 3 options for each GNU license, see: https://wiki.spdx.org/view/Legal_Team/Minutes/2017-11-09 - the above proposal is the same as “Paul’s alternative” / hard-coded proposal but omits adding the ‘text alone” option - we don’t need to solve this right now and we can always add this option later - without adding a third option, we are in the same position we have been in since the birth of the SPDX License List. incremental changes have always been our go-to strategy; let’s take a first step to clarify the current identifiers in a way that the FSF can get behind. If, for a later release, we think we need this third option, then we can discuss that further once we have some time under our belts with this change. I am really hoping we can all get behind this approach and spend the time on Tuesdays’ call discussing the specifics of implementation, whatever else needs to be done for the next release (for this change and generally), and then get the next release out in time for a nice Christmas present to us all :) Thanks, Jilayne |
|
W. Trevor King
On Thu, Nov 16, 2017 at 05:37:50PM -0700, J Lovejoy wrote:
Deprecate the "GPL-2.0" identifier and add the word “only” for GPLI think this “deprecation with an eventual removal” approach is part of all of the proposals, and is not unique to the “coin new per-version license identifiers” approach. Keep the + modifier in the license expression languageI am strongly against having both a ‘GPL-2.0+’ license ID and a ‘+’ operator. I think committing to a ‘GPL-2.0+’ license ID is an unfortunate but tenable postition. And if we go that way, I'd rather remove the ‘+’ operator entirely. I'd be ok with ‘GPL-2.0-or-later’ while preserving the ‘+’ operator for other licenses. But if a ‘+’ operator is deemed not good enough for the GPL, which licenses would it be good enough for? This feels like “we don't know when we'd recommend ‘+’, but didn't have the heart to kill it”. Personally, I think the ‘+’ operator *is* good enough for the GPL, but if that view was universal we wouldn't be adding an or-later license ID. If we cannot build a consensus around using ‘+’ for the GPL, I'd rather drop it entirely. My concern with coining license identifiers for ‘GPL-2.0-or-later’ and similar is the combinatoric increase in license identifiers, and that's more of an aesthetic concern than a technical concern (although there are some technical impacts, e.g. the size of license-list-XML and license-list-data will grow). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy |
|
Paul Madick
This is really great news. This was a difficult issue that sparked a lot of folks to join our SPDX Legal/Technical calls to the point we needed more con call space. We are fortunate to have such a vibrant community concerned with the particulars of the SPDX License List.
The solution addresses the primary concerns raised by Richard Stallman and the FSF while preserving the effectiveness of the SPDX License List for multiple use cases. There are still additional opportunities to further refine the SPDX License List to address situations that were not previously handled. I look forward to revisiting those issues in the future.
Paul
From: spdx-legal-bounces@... [mailto:spdx-legal-bounces@...]
On Behalf Of J Lovejoy
Sent: Thursday, November 16, 2017 4:38 PM To: SPDX-legal <spdx-legal@...> Subject: update on only/or later etc.
Kate and I just had a call with Richard Stallman of the FSF to try and come to a resolution everyone can be happy with, taking into consideration the ask from the FSF and the many thorough discussions we’ve had on the mailing list and calls. This is similar to an approach we discussed on the last call, with one variation. As such, I’d like to propose the following path forward (again, using GPL-2.0 but for all GNU licenses):
Deprecate the "GPL-2.0" identifier and add the word “only” for GPL version 2 only, e.g., "GPL-2.0-only" - this should not be problematic as it does not change the meaning of the identifier. GPL-2.0 has meant ‘version 2 only’ since the SPDX License List was born. We are simply adding explicit language for the identifier. No backwards compatibility issues in terms of the meaning. - we can do a “warning” for people using the deprecated identifier for a period before “GPL-2.0" becomes invalid to give people a chance to update. This will also encourage people who have been sloppy to fix their sloppiness.
Add GPL version 2 or later back to the SPDX License List as it’s own entry with the short identifier of “GPL-2.0+” or “GPL-2.0-or-later” - This would essentially put us in the same position we are now: with two options - “only” and “or later” - it just alters how one gets there, where one finds it - this would also put both options back on the license list thus highlighting that the GNU licenses provides these options more obviously and hopefully providing a more overt encouragement to using one or the other - the identifier here could be “GPL-2.0+” (same as always) or “GPL-2.0-or-later” (differentiation from the + modifier might be better for tooling?) - we can discuss which is better, FSF is fine with either. - if we go with “GPL-2.0-or-later”, can take same approach with warning re: “GPL-2.0+” then invalid?
Keep the + modifier in the license expression language - this allows use of + with other licenses as always, no change, no backwards compatibility
Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause 9 - on the last call, we came up with two proposals that both incorporate 3 options for each GNU license, see: https://wiki.spdx.org/view/Legal_Team/Minutes/2017-11-09 - the above proposal is the same as “Paul’s alternative” / hard-coded proposal but omits adding the ‘text alone” option - we don’t need to solve this right now and we can always add this option later - without adding a third option, we are in the same position we have been in since the birth of the SPDX License List. incremental changes have always been our go-to strategy; let’s take a first step to clarify the current identifiers in a way that the FSF can get behind. If, for a later release, we think we need this third option, then we can discuss that further once we have some time under our belts with this change.
I am really hoping we can all get behind this approach and spend the time on Tuesdays’ call discussing the specifics of implementation, whatever else needs to be done for the next release (for this change and generally), and then get the next release out in time for a nice Christmas present to us all :)
Thanks, Jilayne
SPDX Legal Team co-lead
|
|
Karen C.
There are so many things I admire about the people involved and the process that has been followed to get to this proposal for consensus. Many thanks for all Jilayne and Kate and so many others have done to bring SPDX to a point that exceeds all of our expectations.
________________________________ From: spdx-legal-bounces@... [spdx-legal-bounces@...] on behalf of J Lovejoy [opensource@...] Sent: Thursday, November 16, 2017 7:37 PM To: SPDX-legal Subject: update on only/or later etc. Hi All, Kate and I just had a call with Richard Stallman of the FSF to try and come to a resolution everyone can be happy with, taking into consideration the ask from the FSF and the many thorough discussions we’ve had on the mailing list and calls. This is similar to an approach we discussed on the last call, with one variation. As such, I’d like to propose the followingath forward (again, using GPL-2.0 but for all GNU licenses): Deprecate the "GPL-2.0" identifier and add the word “only” for GPL version 2 only, e.g., "GPL-2.0-only" - this should not be problematic as it does not change the meaning of the identifier. GPL-2.0 has meant ‘version 2 only’ since the SPDX License List was born. We are simply adding explicit language for the identifier. No backwards compatibility issues in terms of the meaning. - we can do a “warning” for people using the deprecated identifier for a period before “GPL-2.0" becomes invalid to give people a chance to update. This will also encourage people who have been sloppy to fix their sloppiness. Add GPL version 2 or later back to the SPDX License List as it’s own entry with the short identifier of “GPL-2.0+” or “GPL-2.0-or-later” - This would essentially put us in the same position we are now: with two options - “only” and “or later” - it just alters how one gets there, where one finds it - this would also put both options back on the license list thus highlighting that the GNU licenses provides these options more obviously and hopefully providing a more overt encouragement to using one or the other - the identifier here could be “GPL-2.0+” (same as always) or “GPL-2.0-or-later” (differentiation from the + modifier might be better for tooling?) - we can discuss which is better, FSF is fine with either. - if we go with “GPL-2.0-or-later”, can take same approach with warning re: “GPL-2.0+” then invalid? Keep the + modifier in the license expression language - this allows use of + with other licenses as always, no change, no backwards compatibility Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause 9 - on the last call, we came up with two proposals that both incorporate 3 options for each GNU license, see: https://wiki.spdx.org/view/Legal_Team/Minutes/2017-11-09<https://wiki.spdx.org/view/Legal_Team/Minutes/2017-11-09> - the above proposal is the same as “Paul’s alternative” / hard-coded proposal but omits adding the ‘text alone” option - we don’t need to solve this right now and we can always add this option later - without adding a third option, we are in the same position we have been in since the birth of the SPDX License List. incremental changes have always been our go-to strategy; let’s take a first step to clarify the current identifiers in a way that the FSF can get behind. If, for a later release, we think we need this third option, then we can discuss that further once we have some time under our belts with this change. I am really hoping we can all get behind this approach and spend the time on Tuesdays’ call discussing the specifics of implementation, whatever else needs to be done for the next release (for this change and generally), and then get the next release out in time for a nice Christmas present to us all :) Thanks, Jilayne SPDX Legal Team co-lead opensource@...<mailto:opensource@...> Choate Hall & Stewart LLP Confidentiality Notice: This message is transmitted to you by or on behalf of the law firm of Choate, Hall & Stewart LLP. It is intended exclusively for the individual or entity to which it is addressed. The substance of this message, along with any attachments, may contain information that is proprietary, confidential and/or legally privileged or otherwise legally exempt from disclosure. If you are not the designated recipient of this message, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please destroy and/or delete all copies of it and notify the sender of the error by return e-mail or by calling 1-800-520-2427. For more information about Choate, Hall & Stewart LLP, please visit us at choate.com |
|
Brad Edmondson
Wow! Hopefully this resolves this issue for the foreseeable future (as I think it should). I echo Karen's sentiments -- great work! As far as the next release, to my mind, the biggest open issue is adding XML for the recently added licenses, which I think should be 2.6+. I haven't done a careful check, but based on a quick scan of the Google Sheets document, that looks like it could be:
And perhaps also some/all of the licenses still under review:
Then we should add the accepted exceptions:
And perhaps the same for exceptions under review, although I'm not as familiar with these and they may be stale at this point. But as marked, these are "under review":
Best, Brad On Thu, Nov 16, 2017 at 8:35 PM, Copenhaver, Karen <kcopenhaver@...> wrote: There are so many things I admire about the people involved and the process that has been followed to get to this proposal for consensus. Many thanks for all Jilayne and Kate and so many others have done to bring SPDX to a point that exceeds all of our expectations. |
|
Gary O'Neall
I think this is a good overall solution.
toggle quoted message
Show quoted text
It solves the issue raised by the FSF and is reasonably compatible. On the last legal call, I raised a concern that it didn't handle the case where the version may be ambiguous. After the call, I realized that we have this issue today and we don't really need to solve this in this release of the license list. Probably better to solve one issue at a time, and I have no problem starting with the issue raised by Richard and the FSF. Thanks Jilayne for moving this forward. Additional thoughts on the '+' operator below: -----Original Message-----I agree with Trevor that we should not have both the + modifier and the GPL-2.0+ as a license ID as it makes the parsing ambiguous. My preference would be GPL-2.0-or-later and preserving the '+' operator. The '+' operator could be useful for licenses where they do not explicitly handle the 'or later' versions in the license text and it maintains better compatibility. Cheers, Gary |
|
Philip Odence
Great. We will start calling you two Kings Solomon.
From: <spdx-legal-bounces@...> on behalf of Jilayne Lovejoy <opensource@...>
Hi All,
Kate and I just had a call with Richard Stallman of the FSF to try and come to a resolution everyone can be happy with, taking into consideration the ask from the FSF and the many thorough discussions we’ve had on the mailing list and calls. This is similar to an approach we discussed on the last call, with one variation. As such, I’d like to propose the following path forward (again, using GPL-2.0 but for all GNU licenses):
Deprecate the "GPL-2.0" identifier and add the word “only” for GPL version 2 only, e.g., "GPL-2.0-only" - this should not be problematic as it does not change the meaning of the identifier. GPL-2.0 has meant ‘version 2 only’ since the SPDX License List was born. We are simply adding explicit language for the identifier. No backwards compatibility issues in terms of the meaning. - we can do a “warning” for people using the deprecated identifier for a period before “GPL-2.0" becomes invalid to give people a chance to update. This will also encourage people who have been sloppy to fix their sloppiness.
Add GPL version 2 or later back to the SPDX License List as it’s own entry with the short identifier of “GPL-2.0+” or “GPL-2.0-or-later” - This would essentially put us in the same position we are now: with two options - “only” and “or later” - it just alters how one gets there, where one finds it - this would also put both options back on the license list thus highlighting that the GNU licenses provides these options more obviously and hopefully providing a more overt encouragement to using one or the other - the identifier here could be “GPL-2.0+” (same as always) or “GPL-2.0-or-later” (differentiation from the + modifier might be better for tooling?) - we can discuss which is better, FSF is fine with either. - if we go with “GPL-2.0-or-later”, can take same approach with warning re: “GPL-2.0+” then invalid?
Keep the + modifier in the license expression language - this allows use of + with other licenses as always, no change, no backwards compatibility
Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause 9 - on the last call, we came up with two proposals that both incorporate 3 options for each GNU license, see: https://wiki.spdx.org/view/Legal_Team/Minutes/2017-11-09 - the above proposal is the same as “Paul’s alternative” / hard-coded proposal but omits adding the ‘text alone” option - we don’t need to solve this right now and we can always add this option later - without adding a third option, we are in the same position we have been in since the birth of the SPDX License List. incremental changes have always been our go-to strategy; let’s take a first step to clarify the current identifiers in a way that the FSF can get behind. If, for a later release, we think we need this third option, then we can discuss that further once we have some time under our belts with this change.
I am really hoping we can all get behind this approach and spend the time on Tuesdays’ call discussing the specifics of implementation, whatever else needs to be done for the next release (for this change and generally), and then get the next release out in time for a nice Christmas present to us all :)
Thanks, Jilayne
SPDX Legal Team co-lead
|
|
Jilayne Lovejoy <opensource@...>:
Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause 9This "resolution" doesn't solve the problem. Since tools are not yet sentient, tools often *cannot* determine if "or later" was intended. Yet "don't know" makes a tool useless, and it *did* see a copy of a license, so the tool *will* report something. Tools will probably report "GPL-2.0-only" when they only see the GPL-2.0. As a result, soon "GPL-2.0-only" will not IN PRACTICE mean "only GPL-2.0". I'm fine with "GPL-2.0-only" and special-casing "GPL-2.0+", but we *STILL* need a way to indicate "GPL-2.0 at least and I don't know if later versions are okay". People depend on automated tools, and automated tools often CAN'T figure out the "or later" question. There are a million ways to indicate "I don't know if a later version is okay", e.g., "AT LEAST" or "?" suffix, MAYBE operation, etc. But if SPDX can't represent this common case, then people will overload *other* expressions with this alternative meaning, meaning that the "only" soon won't have that meaning. --- David A. Wheeler |
|
J Lovejoy:
Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clauseI disagree, sorry. - we don’t need to solve this right now and we can always add this option laterNo, this is the *reason* that there's a problem. The *reason* that "GPL-2.0" isn't working is, in part, because it overloads two notions. "GPL-2.0" is supposed to mean "Only 2.0" (per the spec) . But tools only know "I saw a GPL-2.0 license", so how can they represent that information? The obvious way is "GPL-2.0", so that same identifier can mean "2.0 at least, and I don't know if there are other versions allowed". That's not good. If we wait to "add this option later", "GPL-2.0-only" will probably have morphed in *practice* into "GPL-2.0 at least, and I don't know if it's the only version". So while everyone can congratulate themselves about the clarity of the spec, very soon it will predictably be *unclear* in practice. If we want to be able to express "exactly this version", we also need to be able to represent "at least this version". --- David A. Wheeler |
|
Brad Edmondson
Hi David, I think your points are good ones, but it seems to me they go to the separate issues of "file:detected license" and "package:concluded license." The clarity of the spec argument is aimed at making the "file:detected license" case more explicit, and if it leaves tools with NOASSERTION for "package:concluded license," then I think that's OK, no? Best, Brad On Fri, Nov 17, 2017 at 10:35 AM, Wheeler, David A <dwheeler@...> wrote: J Lovejoy: |
|
John Sullivan <johns@...>
J Lovejoy <opensource@...> writes:
Hi All,Thanks to everyone for working with us on this! -john -- John Sullivan | Executive Director, Free Software Foundation GPG Key: A462 6CBA FF37 6039 D2D7 5544 97BA 9CE7 61A0 963B https://status.fsf.org/johns | https://fsf.org/blogs/RSS Do you use free software? Donate to join the FSF and support freedom at <https://my.fsf.org/join>. |
|
Brad Edmondson [mailto:brad.edmondson@...]
I think your points are good ones, but it seems to me they go to the separate issues of "file:detected license" and "package:concluded license."No, it fails to work for multiple reasons: 1. "NOASSERTION" is basically useless, because it provides no information. In many cases, all I need to know is "there's a version of the GPL here", and I can make a decision. Being able to provide *some* information is often all that's needed , while providing *no* information creates a lot of unnecessary work for tool users. 2. Tools, lacking sentience, often cannot determine whether or not "or later versions" applies. So they're unable to be "more explicit" in package:concluded. The current structure requires that conclude either "only 2.0" or "2.0 or later"... even though tools typically CANNOT make that determination. SPDX should make it possible report the information *actually* available. 3. The majority of SPDX users do not use SPDX files. Instead, they *only* use SPDX license expressions (as available in package managers, file content declarations, etc.). So there's no "file:detected" vs. "package:concluded" entries to compare anyway. --- David A. Wheeler |
|
Gary O'Neall
I understand and agree with David's concerns - also coming from a tooling perspective.
toggle quoted message
Show quoted text
However, I believe this is a different problem than the FSF issue and a problem we have today with the current license expression syntax and the current license list. It seems we are talking about 2 different usage scenarios for SPDX license expressions: 1) Someone is using a license expression to document what they "know" or assert is the license for a file or package. For example, the copyright owner is adding an SPDX license ID in their file headers. 2) Someone or something is documenting findings on license information for files or packages. For example, a license scanning tool. For #1, we don't want to allow someone to be ambiguous about whether a GPL license is "only" or "or later" when describing a license using SPDX license expressions. I believe this is the issue the FSF is concerned about. For #2, we will find situations where it is not clear if a GPL license is to be used "only" with that version or with that version or later (BTW - it's not just tools that have this problem). We would like to be able to express this situation using SPDX since it is very useful information. On the last legal call, it seemed clear to me that our attempts to solve #2 created a great deal of concern for those trying to solve #1. In order to make progress, I still feel we should divide and conquer solving the FSF issue first then addressing the ambiguous license version issue in a future release of the spec. Perhaps we can come up with a more generalized solution for ambiguous license findings for #2 if we had more time to design and discuss the solution. One additional thought: We could use a LicenseRef to document the exact text of the ambiguous license version and add a license comment to indicate it is GPL, just not clear which version. The LicenseRef approach would only work for SPDX documents and would provide more information than a NOASSERTION. Gary -----Original Message----- |
|
J Lovejoy
If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, then it is currently a problem. And perhaps by altering the current identifier (GPL-2.0) to be more explicit (GPL-2.0-only) we will expose just how often GPL-2.0 has been used incorrectly. That may provide better examples to work off of to decide what ‘third option’ we need. Just a reminder to all: when someone places a copy of the GPL, version 2 alongside source code files this does not make the licensing ambiguous; clearly there is a valid license. The question comes down to how you interpret clause 9: - does the language, "If the Program specifies a version number of this License which applies to it and 'any later version,' you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation.” interpreted that placing a copy of the license is “specifying a version” and thu a user can redistribute the code under GPL version 2 (GPL-2.0-only) or, possibly some people read this as meaning GPL version 2 or any later version (GPL-2.0+) - or does placing a copy of a version of the license NOT constitute specifying a version and thus the sentence, "If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.” in which case one can redistribute the code under GPL-1.0+ Some people have made this determination, e.g. Fedora, https://fedoraproject.org/wiki/Licensing:FAQ?rd=Licensing/FAQ#How_do_I_figure_out_what_version_of_the_GPL.2FLGPL_my_package_is_under.3F Any scenario you could interpret, we have a way to express that currently and would still under the proposal. While on this subject, an article that appeared on opensource.com came up on the last call. I just want to point out that that article, which explains the above interpretation issues (which we have been talking about for several months), does not reach a conclusion but simply encourages people to provide clarity of their intentions. We can certainly all agree on encouraging that! https://opensource.com/article/17/11/avoiding-gpl-confusion (Although, I think we should consistently encourage people to use the standard license notices provided by the license and/or SPDX short identifiers) :) Thanks, Jilayne |
|
J Lovejoy [mailto:opensource@...]:
If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, then it is currently a problem.Yes indeed, that's my point :-). And perhaps by altering the current identifier (GPL-2.0) to be more explicit (GPL-2.0-only) we will expose just how often GPL-2.0 has been used incorrectly.The tools are currently *required* to be incorrect, because they cannot report the information they have ("I have GPL-2.0, and I don't know if 'or later' applies"). Neither the proposed "GPL-2.0-only" nor "GPL-2.0+" correctly represents the information they have. Tools will have to output *something*, and whatever they produce will dilute in *practice* the strict meanings of those license identifiers. --- David A. Wheeler |
|
Philippe Ombredanne
On Tue, Nov 21, 2017 at 5:28 PM, Wheeler, David A <dwheeler@...> wrote:
J Lovejoy [mailto:opensource@...]:David,If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, thenYes indeed, that's my point :-). Speaking as the author of a fine license detection engine, I think this is a red herring. A license detection result can be: "I am 95% sure this is GPL-2.0-only but it could be GPL-2.0+: please review me to fill in your conclusion." So detection does not have to be binary as in either 100% right or 100% wrong. If a tool can only report red or blue binary results, that's a possibly fine but weak tool. For instance scancode-toolkit can cope with ambiguity alright and surface this for review when it cannot come with a definitive detection answer. Therefore I have no issue whatsoever to implement Jilyane's comprehensive proposal and I can always output something on my side. So since this can be done by one tool alright this is NOT an issue for the SPDX spec to worry about and tools should adjust: that's for tools implementors to cope with ambiguity, not something to specify here. Please let's keep this spec simple! -- Cordially Philippe Ombredanne |
|
Philippe Ombredanne:
I think there is no contention there at all.Respectfully: There *IS* contention. I'm contending. A summary (e.g. a license expression) cannot ever capture all the nuancesSure, but all summaries, and all models, omit something. Indeed, a SPDX license file *also* cannot capture all the nuances. The correct question is, "is this model adequate for its uses?" In most cases people want to know, "is this package legal to use?". To answer that question, "it's at least GPL-2.0, and might be more" s important information, and I think it's information that the SPDX license expression should include. Speaking as the author of a fine license detection engine, I think this is aThis inability to indicate the "in-between" state within a license expression greatly increases the number of cases where an unnecessary review must occur. Every unnecessary review is a significant increase in time and money. In many cases, it's *NOT* necessary to make a decision, but in some cases it is. If organizations can do the analysis *ONLY* when they need to, they'd save a lot of time and money... and that is greatly aided by having SPDX license expressions able to indicate this information. So detection does not have to be binary as in either 100% right or 100%But that's what I'm saying. Most tools CAN provide more than 2 answers. The problem is that the SPDX license expressions don't allow tools to report more than the 2 answers within a license expression. So the tool doesn't have to give a binary answer, but SPDX forces the tools to do so when they output SDPX license expressions. For instance scancode-toolkit can cope with ambiguity alright and surfaceBut it CANNOT surface this information via SPDX license expressions. For most people, that's the ONLY thing that matters. I suspect at most 0.1% of SPDX users use SPDX files, everyone else ONLY uses SDPX license expressions. The percentage of SPDX users who use SPDX files may not be that high :-). Therefore I have no issue whatsoever to implement Jilyane's comprehensiveYou can always output something nonstandard that cannot be shared, sure, and for many detailed analyses that's a good thing. But that's less helpful for sharing compared to a standard format. So since this can be done by one tool alright this is NOT an issue for theWell, empty specs are the simplest possible :-). Specs need to be as simple as possible... but no simpler. There's also the long-term damage this decision will cause. In practice, I expect failing to add this capability is going to make "GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't know if 'other later' applies" - and as a result "GPL-2.0-only" will NOT mean "GPL-2.0-only" as intended. The case of "I see a license and no other information" is relatively common, and is *important* for determining what is legal to do. --- David A. Wheeler |
|
Philippe Ombredanne
David:
You are bringing good points. Here are my counter points: On Fri, Nov 24, 2017 at 5:15 PM, Wheeler, David A <dwheeler@...> wrote: Philippe Ombredanne:You are making assumption about what the common use case might be. ToI think there is no contention there at all.Respectfully: There *IS* contention. I'm contending.A summary (e.g. a license expression) cannot ever capture all the nuancesSure, but all summaries, and all models, omit something. Indeed, me the common use case is more simply: what's the license? Whether this is "legal" or not is something you or your legal adviser can decide based on this. And practically, "legal" is more often than not a policy choice instead, whether you are a FLOSS project author or a consumer of FLOSS code. To answer that question, "it's at least GPL-2.0, and might be more"Is this really important to know this fact in the general case? In my own experience the cases where I need hyper precision on GPL-2.0 vs GPL-2.0+ are rather limited: 1. I am combining GPL 2 and GPL 3 code 2. OR I want to use a GPL 3 for GPL 2-licensed code These cases are extremely rare for consumers of FLOSS code based on my reasonably wide and many of experience in this space... So rare in fact that they account for a handful across thousand+ products and billions of LOC. So rare that I cannot recall of any OTH. In each cases they require careful legal review before making a decision. Making this careful decision solely on the few characters of a license expression would be insanely foolish IMHO. I am not sure SPDX needs to worry or cater about this. In every other case, the GPL2 vs GPL2+ debate does not matter much as this is still the same GPL terms that apply: same permissions and same obligations. Again, the cases where you need precision vs. good enough accuracy inSpeaking as the author of a fine license detection engine, I think this is aThis inability to indicate the "in-between" state within a license expression the GPL2/GPL2+ debate are rare. 99% of the time, you do not need this precision at all. Now, I could not agree more with you: inaccurate and clear licensing information means that a user will need to review this to ensure this is clear. But this is NOT a problem for SPDX to solve in the license expression spec. This is something that needs to fixed by working with every project author such that there is clarity such as the work Kate and I have and are doing with Linux maintainers to make the kernel licensing hyper clear. Or the tickets I routinely file with projects that lack a clear license. That's solving the problem IMHO: e.g. let's react to the symptoms, but attack the root cause instead. And there SPDX and license expression are a great way to make things clear upstream once reviewed. There are not a substitute to a review. FWIW, having an initiative to systematically help projects authors clarify licensing is something that I have had in mind for quite a while. I may do something about it eventually. I can output more than one expression then, can I?So detection does not have to be binary as in either 100% right or 100%But that's what I'm saying. Most tools CAN provide more than 2 answers. It surely could (NB: it does not yet). that's a minor change.For instance scancode-toolkit can cope with ambiguity alright and surfaceBut it CANNOT surface this information via SPDX license expressions. e.g. something like a list of license expressions with a confidence: - confidence: 100% , expression: GPL-2.0-only - confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT) Each expression is valid, right? I suspect at most 0.1% ofWould you have data or pointers to support these assertions about SPDX usage? That would be mighty useful! I think we had a similar discussion a while back about addingTherefore I have no issue whatsoever to implement Jilyane's comprehensiveYou can always output something nonstandard that cannot be shared, sure, something like a scope or purpose in the license expression syntax. This is the same here: I can convey one or more license expressions with a confidence attached if needed. The confidence or score is not part of the expression but some external attribute that qualifies it. I am not talking to output anything "non-standard" whatever this may be: instead external data about an expression are best handled externally. When in an SPDX doc, there are ways to deal with it; outside of it, you need to track other data attributes that would otherwise be supported by an SPDX doc. To take a (likely bad) analogy: What you are suggesting is somewhat similar to storing the SHA1 of a file inside the file itself. This will change the file content... and then you need to recompute the SHA1 value beause of this. And store it inside the file, and recompute, and so on .... forever. External observations about something (here the confidence you may attach to a certain license expression) are best managed outside the observed thing, otherwise they modify the thing under observation. Therefore, I track a file SHA1 outside of a file itself and not inside. And I see it best to track the confidence or score I can attach to a license expression outside of this expression. And if we want to have this in SDPX, this would mean to add an attribute to qualify a license expression "confidence", not add this to the expression syntax IMHO. Are you suggesting that the SPDX expression spec is empty? (*cough*)So since this can be done by one tool alright this is NOT an issue for theWell, empty specs are the simplest possible :-). Or that the SPDX spec is empty? (*cough, cough*) I tend to think it as a tad too fat and in need of a good diet instead ;) There's also the long-term damage this decision will cause.I do not grok what you mean there. Can you clarify? Which part of "only" is not clear to you? Why would "GPL-2.0-only" suddenly be meaning anything else that its definition in SPDX as carefully crafted by experienced and FLOSS-savvy lawyers (hat tip) and as agreed and reviewed with the GPL authority that the FSF is without any possible argument (other hat tip) ? The case of "I see a licenseDo you have data to support this? My personal experience is that this is a case that is not so common. And again even if it were pervasive and the norm, the number of cases where I need hyper precision to determine "what is legal to do" are rare as I explained at first and that I am repeating here for clarity: 1. I am combining GPL 2 and GPL 3 code 2. OR I want to use a GPL 3 for GPL 2-licensed code Outside of these two rare cases, a user of GPL-2.0-licensed code will not care much about this: "what is legal to do" e.g. which GPL 2.0 permissions and obligations apply is clear and non-ambiguous: this all that needs to be known. The eventual lack of precision here is not a problem to me and the many user of many GPL-licensed code used I helped and helped comply. And yet, Jilayne's proposal makes these rare cases **crystal clear** going forward: so this is all gravy to me! -- Cordially Philippe Ombredanne +1 650 799 0949 | pombredanne@... DejaCode - What's in your code?! - http://www.dejacode.com AboutCode - Open source for open source - https://www.aboutcode.org nexB Inc. - http://www.nexb.com |
|
David A. Wheeler:
Philippe Ombredanne [mailto:pombredanne@...]To answer that question, "it's at least GPL-2.0, and might be more" Is this really important to know this fact in the general case?Yes, there are a number of cases where it's important. The usual reason is because I'm trying to link Apache-2.0 licensed code with other code, a non-problem for GPL-2.0+ but widely considered a problem for GPL-2.0 only. The Apache-2.0 license is extremely common. On the other hand, there are many other cases where it's not important. Which is why it's important to know in cases, and important to *not* track it down when it's unimportant. Making this careful decision solely on the few characters of a licenseNot at all. What matters in many circumstances is just being able to show some sort of due diligence. In many cases, the "usual" situation is to copy & paste code, regardless of license or legality. Any improvement over *that* is a big win. Now, I could not agree more with you: inaccurate and clear licensingI *heartily* endorse that work, thank you! But for every license you add, someone creates another project with unclear licensing. The *real* root causes are going to be difficult to fix: * A large proportion of software developers are self-taught (& so don't know about the laws), and of the rest, schools typically fail to teach CS students about software-related laws. You can teach one, but the next developer will do the same thing. * We have a VC/business culture that often values speed of development over legality. * Many software developers are young & only know other young developers, so they don't have anyone more experienced to learn from (or discount the knowledge of those who *have* suffered the problems before). * Many software developers, especially young/inexperienced developers, incorrectly think that laws don't apply to software; I blame in part the RIAA, who have successfully convinced the latest software developers that copyright is not a real law. * Copyright law as-written is very complex, and is so obviously bought off by special interests, that it's difficult to defend, and that makes it difficult to get many developers to take it seriously. You can fix a few egregious cases with tickets, and please do. But you're *not* to fix these root causes with a few tickets. Education is *great*, but for the foreseeable future we're going to continue to have problems. It surely could (NB: it does not yet). that's a minor change.That's not a standard SPDX license expression. SPDX license expression syntax could add a "confidence" value - but that's more complex, and I don't think you're seriously proposing it. Why not just a simple expression that indicates uncertainty of new versions? I agree that'd be useful - I don't have anything great. Here's one try. A Google search of "filetype:spdx" returns 164 results. Clearly ".spdx" files are not lighting the world on file. Contrasting this to SPDX license expressions, we have to look at their uses, which include package managers, in-file statements, and simple tools that just report SDPX license expressions (e.g., Ruby's LicenseFinder). Many package managers use SPDX license expressions to indicate the package license. E.g., NPM does: https://docs.npmjs.com/files/package.json by using the "license:" field - which is *NOT* a SPDX license file. According to <http://modulecounts.com/>, *just* the NPM ecosystem has 550,951 modules as of Nov 24, with 535 new packages a day on average. I don't know what percentage of modules have a "license:" entry (is someone willing to find out?) - but for discussion, I'll guess it's at *least* 10%.. That would mean that there are 55,095 NPM packages that use SDPX license expressions. This is a quick try, it'd be possible to get a more accurate estimate. But if you add all the other package managers where SPDX license expressions get used, and the per-file entries, and I think It's clearly that SPDX use is *primarily* the use of SPDX license expressions. External observations about something (here the confidence you mayNo. *All* observations are external, there are no exceptions. Even if a file is specifically labelled as a license, it might have been added by someone not authorized to do so. More philosophically, I cannot observe the world "directly"; I can only perceive the world through my senses which in turn are mediated by my brain. It is very valuable to be able to say, "the final result of my analysis" in a single computer-processable expression. Especially since that "final" analysis can in turn be used as an input for a larger analysis. Are you suggesting that the SPDX expression spec is empty? (*cough*) OrNo, I'm suggesting that simplicity as the *only* criteria is not enough; It needs to be balanced with other needs. (*cough, cough*) I tend to think it as a tad too Oh, I *understand* the proposal very well. The problem is thatThere's also the long-term damage this decision will cause.I do not grok what you mean there. Can you clarify? I think it's ignoring some key facts on the ground. I've said it several different ways, but I'll try again. Many tools CANNOT determine "or any later version applies in all cases. They *CAN* determine if a copy of the GPL-2.0 exists. These tools WILL NOT report "UNKNOWN", because that's useless. People are using these tools, and will continue to do so. So, the tools will report "GPL-2.0-only" when they see "GPL-2.0" and don't know if "or later" applies. Why would "GPL-2.0-only" suddenly be meaning anything else that itsThe result: "GPL-2.0-only" WILL NOT mean "2.0 only" no matter how much text is written in the spec. It will mean "GPL-2.0, and we don't know if or later applies". It will mean that, because the spec fails to give tool writers any alternative to report. Thanks!! Regards, --- David A. Wheeler |
|
Philippe Ombredanne
David,
On Fri, Nov 24, 2017 at 10:33 PM, Wheeler, David A <dwheeler@...> wrote: David A. Wheeler:I understand your point, but __how many times__ did you ever encounterPhilippe Ombredanne [mailto:pombredanne@...]To answer that question, "it's at least GPL-2.0, and might be more" this case in the real world? On my side, I have analyzed 1000+ significant software products, 10,000+ packages and billions of line of code over the last 10 years. An issue of Apache-2.0 compatibility with the GPL-2.0 has never showed up: zero cases, not one single time. I am not saying it does not exist in theory, but in practice this is a rare case that is exceptional enough and therefore best left aside. On the other hand, there are many other cases where it's not important.My point is that it is so rare that it is NOT important at all to track in the license expression spec at all. This can be dealt with comments, and anything else but not within a license expression syntax. There are likely tens of other crooked use cases that cannot be expressed precisely with a license expression, yet they are too rare to consider. Are you serious there? Where in the actual real world anyone isMaking this careful decision solely on the few characters of a licenseNot at all. What matters in many circumstances is just being able to show looking after "being able to show some sort of due diligence" and consider this enough? That does not sound reasonable. Who does this? I would have a field day looking as such a codebase. In many cases, the "usual" situation is to copy & paste code, regardless of license or legality.Where do you get that the "usual" situation is to copy & paste code? Based on my long experience, copy/paste of snippets is a rare event and usually account for only a handful of items even in very large product codebases. And this even rarer that license or origin was not tracked then. This is not the norm I have experience with: I ever met only a couple confused software development team doing serious copy of un-tracked snippets. Now, I could not agree more with you: inaccurate and clear licensingI *heartily* endorse that work, thank you! But for every license you add,Really, do you have data to back this? Note also we should not care if "someone creates another project with unclear licensing". We should care if someone creates another project with unclear licensing that someone actually uses in the real world. The hypothetical cases of goofy licensing of unused software are not relevant IMHO. The *real* root causes are going to be difficult to fix:I cannot comment on these or I would come out as rude: I have no idea where these arguments come from and what data could support any of these. I guess they are best opinions, but cannot be used as supporting point for a serious argument. You can fix a few egregious cases with tickets, and please do.What if this is not a few tickets but a million? This can be crowed-sourced and distributed with appropriate leverage. Case in point: the Linux kernel is a large and mature codebase at the bottom of a vast ecosystem of code that runs on top of Linux. With the work Kate and I did to help maintainers adopt SPDX ids, we now have: 1. about ~15K'ish files with a proper SPDX id 2. doc and guidance for incoming patches that has been created by some key maintainers This is something that is being adopted by thousands of contributors and will spill on the whole ecosystem. And this will require only marginal effort going forward and these efforts are distributed on all committers and contributors. That's leverage to me. Since when "GPL-2.0-only" and "((GPL-2.0-only or GPL-2.0+) and MIT)"It surely could (NB: it does not yet). that's a minor change.That's not a standard SPDX license expression. are not valid expressions? SPDX license expression syntax could add a "confidence" value - but that'sI am not indeed. Why not just a simple expression that indicates uncertainty of new versions?This is not common enough to warrant such addition until someone can prove otherwise. Oh, I *understand* the proposal very well. The problem is thatIf there is such tool, then it should either be updated or not used at all. They *CAN* determine if a copy of the GPL-2.0 exists.If I reformulate this: There are tools that do a poor job at providing proper results. Therefore, the spec should provide a way to support their lack of feature? This does not make sense to me. They should instead either adapt or die if they are not fit for the job. I cannot understand your reasoning here.Why would "GPL-2.0-only" suddenly be meaning anything else that itsThe result: "GPL-2.0-only" WILL NOT mean "2.0 only" no matter how much -- Cordially Philippe Ombredanne |
|