Re: update on only/or later etc.

Philippe Ombredanne


On Fri, Nov 24, 2017 at 10:33 PM, Wheeler, David A <dwheeler@...> wrote:
David A. Wheeler:
To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.
Philippe Ombredanne [mailto:pombredanne@...]
Is this really important to know this fact in the general case?
Yes, there are a number of cases where it's important.
The usual reason is because I'm trying to link Apache-2.0 licensed code with
other code, a non-problem for GPL-2.0+ but widely considered a problem for
GPL-2.0 only. The Apache-2.0 license is extremely common.
I understand your point, but __how many times__ did you ever encounter
this case in the real world?
On my side, I have analyzed 1000+ significant software products,
10,000+ packages and billions of line of code over the last 10 years.
An issue of Apache-2.0 compatibility with the GPL-2.0 has never showed
up: zero cases, not one single time.
I am not saying it does not exist in theory, but in practice this is a
rare case that is exceptional enough and therefore best left aside.

On the other hand, there are many other cases where it's not important.

Which is why it's important to know in cases, and important to *not* track it
down when it's unimportant.
My point is that it is so rare that it is NOT important at all to
track in the license expression spec at all.
This can be dealt with comments, and anything else but not within a
license expression syntax. There are likely tens of other crooked use
cases that cannot be expressed precisely with a license expression,
yet they are too rare to consider.

Making this careful decision solely on the few characters of a license
expression would be insanely foolish IMHO.
Not at all. What matters in many circumstances is just being able to show
some sort of due diligence.
Are you serious there? Where in the actual real world anyone is
looking after "being able to show some sort of due diligence" and
consider this enough? That does not sound reasonable. Who does this? I
would have a field day looking as such a codebase.

In many cases, the "usual" situation is to copy & paste code, regardless of license or legality.
Any improvement over *that* is a big win.
Where do you get that the "usual" situation is to copy & paste code?
Based on my long experience, copy/paste of snippets is a rare event
and usually account for only a handful of items even in very large
product codebases.

And this even rarer that license or origin was not tracked then. This
is not the norm I have experience with: I ever met only a couple
confused software development team doing serious copy of un-tracked

Now, I could not agree more with you: inaccurate and clear licensing
information means that a user will need to review this to ensure this is
This is something that needs to fixed by working with every project author...
[e.g.]... tickets I routinely file with projects that lack a clear license.
I *heartily* endorse that work, thank you!
But for every license you add,
someone creates another project with unclear licensing.
Really, do you have data to back this? Note also we should not care if
"someone creates another project with unclear licensing".
We should care if someone creates another project with unclear
licensing that someone actually uses in the real world.
The hypothetical cases of goofy licensing of unused software are not
relevant IMHO.

The *real* root causes are going to be difficult to fix:
* A large proportion of software developers are self-taught (& so don't know about
the laws), and of the rest, schools typically fail to teach CS students about software-related laws.
You can teach one, but the next developer will do the same thing.
* We have a VC/business culture that often values speed of development over legality.
* Many software developers are young & only know other young developers,
so they don't have anyone more experienced to learn from (or discount
the knowledge of those who *have* suffered the problems before).
* Many software developers, especially young/inexperienced developers,
incorrectly think that laws don't apply to software; I blame in part
the RIAA, who have successfully convinced the latest software developers
that copyright is not a real law.
* Copyright law as-written is very complex, and
is so obviously bought off by special interests, that it's difficult to defend,
and that makes it difficult to get many developers to take it seriously.
I cannot comment on these or I would come out as rude: I have no idea
where these arguments come from and what data could support any of
I guess they are best opinions, but cannot be used as supporting point
for a serious argument.

You can fix a few egregious cases with tickets, and please do.
But you're *not* to fix these root causes with a few tickets.
Education is *great*, but for the foreseeable future we're going to continue to have problems.
What if this is not a few tickets but a million? This can be
crowed-sourced and distributed with appropriate leverage.

Case in point: the Linux kernel is a large and mature codebase at the
bottom of a vast ecosystem of code that runs on top of Linux.

With the work Kate and I did to help maintainers adopt SPDX ids, we now have:
1. about ~15K'ish files with a proper SPDX id
2. doc and guidance for incoming patches that has been created by some
key maintainers

This is something that is being adopted by thousands of contributors
and will spill on the whole ecosystem. And this will require only
marginal effort going forward and these efforts are distributed on all
committers and contributors. That's leverage to me.

It surely could (NB: it does not yet). that's a minor change.
e.g. something like a list of license expressions with a confidence:

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)
That's not a standard SPDX license expression.
Since when "GPL-2.0-only" and "((GPL-2.0-only or GPL-2.0+) and MIT)"
are not valid expressions?

SPDX license expression syntax could add a "confidence" value - but that's
more complex, and I don't think you're seriously proposing it.
I am not indeed.

Why not just a simple expression that indicates uncertainty of new versions?
This is not common enough to warrant such addition until someone can
prove otherwise.

Oh, I *understand* the proposal very well. The problem is that
I think it's ignoring some key facts on the ground.

I've said it several different ways, but I'll try again.

Many tools CANNOT determine "or any later version applies in all cases.
If there is such tool, then it should either be updated or not used at all.

They *CAN* determine if a copy of the GPL-2.0 exists.
These tools WILL NOT report "UNKNOWN", because that's useless.
People are using these tools, and will continue to do so.
So, the tools will report "GPL-2.0-only" when they see "GPL-2.0" and
don't know if "or later" applies.
If I reformulate this: There are tools that do a poor job at providing
proper results. Therefore, the spec should provide a way to support
their lack of feature? This does not make sense to me. They should
instead either adapt or die if they are not fit for the job.

Why would "GPL-2.0-only" suddenly be meaning anything else that its
definition in SPDX....
The result: "GPL-2.0-only" WILL NOT mean "2.0 only" no matter how much
text is written in the spec. It will mean "GPL-2.0, and we don't know if
or later applies". It will mean that, because the spec fails to give
tool writers any alternative to report.
I cannot understand your reasoning here.

Philippe Ombredanne

Join { to automatically receive all group messages.