Re: update on only/or later etc.


Philippe Ombredanne
 

David:
You are bringing good points. Here are my counter points:

On Fri, Nov 24, 2017 at 5:15 PM, Wheeler, David A <dwheeler@...> wrote:
Philippe Ombredanne:
I think there is no contention there at all.
Respectfully: There *IS* contention. I'm contending.

A summary (e.g. a license expression) cannot ever capture all the nuances
of the details.... This is a lossy "compression" by construction...
Sure, but all summaries, and all models, omit something. Indeed,
a SPDX license file *also* cannot capture all the nuances.

The correct question is, "is this model adequate for its uses?"
In most cases people want to know, "is this package legal to use?".
You are making assumption about what the common use case might be. To
me the common use case is more simply: what's the license?

Whether this is "legal" or not is something you or your legal adviser
can decide based on this.
And practically, "legal" is more often than not a policy choice
instead, whether you are a FLOSS project author or a consumer of FLOSS
code.

To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.
Is this really important to know this fact in the general case? In my
own experience the cases where I need hyper precision on GPL-2.0 vs
GPL-2.0+ are rather limited:
1. I am combining GPL 2 and GPL 3 code
2. OR I want to use a GPL 3 for GPL 2-licensed code

These cases are extremely rare for consumers of FLOSS code based on my
reasonably wide and many of experience in this space... So rare in
fact that they account for a handful across thousand+ products and
billions of LOC. So rare that I cannot recall of any OTH.

In each cases they require careful legal review before making a
decision. Making this careful decision solely on the few characters of
a license expression would be insanely foolish IMHO. I am not sure
SPDX needs to worry or cater about this.

In every other case, the GPL2 vs GPL2+ debate does not matter much as
this is still the same GPL terms that apply: same permissions and same
obligations.

Speaking as the author of a fine license detection engine, I think this is a
red herring.
A license detection result can be: "I am 95% sure this is GPL-2.0-only but it
could be GPL-2.0+: please review me to fill in your conclusion."
This inability to indicate the "in-between" state within a license expression
greatly increases the number of cases where an unnecessary review must occur.
Every unnecessary review is a significant increase in time and money.
In many cases, it's *NOT* necessary to make a decision, but in some cases it is.
If organizations can do the analysis *ONLY* when they need to,
they'd save a lot of time and money... and that is greatly aided by
having SPDX license expressions able to indicate this information.
Again, the cases where you need precision vs. good enough accuracy in
the GPL2/GPL2+ debate are rare. 99% of the time, you do not need this
precision at all.

Now, I could not agree more with you: inaccurate and clear licensing
information means that a user will need to review this to ensure this
is clear. But this is NOT a problem for SPDX to solve in the license
expression spec.

This is something that needs to fixed by working with every project
author such that there is clarity such as the work Kate and I have and
are doing with Linux maintainers to make the kernel licensing hyper
clear. Or the tickets I routinely file with projects that lack a clear
license. That's solving the problem IMHO: e.g. let's react to the
symptoms, but attack the root cause instead. And there SPDX and
license expression are a great way to make things clear upstream once
reviewed. There are not a substitute to a review.
FWIW, having an initiative to systematically help projects authors
clarify licensing is something that I have had in mind for quite a
while. I may do something about it eventually.

So detection does not have to be binary as in either 100% right or 100%
wrong. If a tool can only report red or blue binary results, that's a possibly
fine but weak tool.
But that's what I'm saying. Most tools CAN provide more than 2 answers.
The problem is that the SPDX license expressions don't allow tools to report
more than the 2 answers within a license expression. So the tool doesn't have
to give a binary answer, but SPDX forces the tools to do so when they output
SDPX license expressions.
I can output more than one expression then, can I?

For instance scancode-toolkit can cope with ambiguity alright and surface
this for review when it cannot come with a definitive detection answer.
But it CANNOT surface this information via SPDX license expressions.
For most people, that's the ONLY thing that matters.
It surely could (NB: it does not yet). that's a minor change.
e.g. something like a list of license expressions with a confidence:

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)

Each expression is valid, right?

I suspect at most 0.1% of
SPDX users use SPDX files, everyone else ONLY uses SDPX license expressions.
The percentage of SPDX users who use SPDX files may not be that high :-).
Would you have data or pointers to support these assertions about SPDX
usage? That would be mighty useful!

Therefore I have no issue whatsoever to implement Jilyane's comprehensive
proposal and I can always output something on my side.
You can always output something nonstandard that cannot be shared, sure,
and for many detailed analyses that's a good thing.
But that's less helpful for sharing compared to a standard format.
I think we had a similar discussion a while back about adding
something like a scope or purpose in the license expression syntax.
This is the same here: I can convey one or more license expressions
with a confidence attached if needed. The confidence or score is not
part of the expression but some external attribute that qualifies it.

I am not talking to output anything "non-standard" whatever this may
be: instead external data about an expression are best handled
externally.
When in an SPDX doc, there are ways to deal with it; outside of it,
you need to track other data attributes that would otherwise be
supported by an SPDX doc.

To take a (likely bad) analogy: What you are suggesting is somewhat
similar to storing the SHA1 of a file inside the file itself. This
will change the file content... and then you need to recompute the
SHA1 value beause of this. And store it inside the file, and
recompute, and so on .... forever.

External observations about something (here the confidence you may
attach to a certain license expression) are best managed outside the
observed thing, otherwise they modify the thing under observation.

Therefore, I track a file SHA1 outside of a file itself and not
inside. And I see it best to track the confidence or score I can
attach to a license expression outside of this expression.
And if we want to have this in SDPX, this would mean to add an
attribute to qualify a license expression "confidence", not add this
to the expression syntax IMHO.

So since this can be done by one tool alright this is NOT an issue for the
SPDX spec to worry about and tools should adjust: that's for tools
implementors to cope with ambiguity, not something to specify here.

Please let's keep this spec simple!
Well, empty specs are the simplest possible :-).
Specs need to be as simple as possible... but no simpler.
Are you suggesting that the SPDX expression spec is empty? (*cough*)
Or that the SPDX spec is empty? (*cough, cough*) I tend to think it as
a tad too fat and in need of a good diet instead ;)

There's also the long-term damage this decision will cause.
In practice, I expect failing to add this capability is going to make
"GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't
know if 'other later' applies" - and as a result "GPL-2.0-only" will
NOT mean "GPL-2.0-only" as intended.
I do not grok what you mean there. Can you clarify?

Which part of "only" is not clear to you?

Why would "GPL-2.0-only" suddenly be meaning anything else that its
definition in SPDX as carefully crafted by experienced and FLOSS-savvy
lawyers (hat tip) and as agreed and reviewed with the GPL authority
that the FSF is without any possible argument (other hat tip) ?

The case of "I see a license
and no other information" is relatively common, and is *important*
for determining what is legal to do.
Do you have data to support this? My personal experience is that this
is a case that is not so common.
And again even if it were pervasive and the norm, the number of cases
where I need hyper precision to determine "what is legal to do" are
rare as I explained at first and that I am repeating here for clarity:

1. I am combining GPL 2 and GPL 3 code
2. OR I want to use a GPL 3 for GPL 2-licensed code

Outside of these two rare cases, a user of GPL-2.0-licensed code will
not care much about this: "what is legal to do" e.g. which GPL 2.0
permissions and obligations apply is clear and non-ambiguous: this all
that needs to be known. The eventual lack of precision here is not a
problem to me and the many user of many GPL-licensed code used I
helped and helped comply.

And yet, Jilayne's proposal makes these rare cases **crystal clear**
going forward: so this is all gravy to me!

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com

Join {Spdx-legal@lists.spdx.org to automatically receive all group messages.