Re: update on only/or later etc.

David A. Wheeler

David A. Wheeler:
To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.
Philippe Ombredanne [mailto:pombredanne@...]
Is this really important to know this fact in the general case?
Yes, there are a number of cases where it's important.
The usual reason is because I'm trying to link Apache-2.0 licensed code with
other code, a non-problem for GPL-2.0+ but widely considered a problem for
GPL-2.0 only. The Apache-2.0 license is extremely common.

On the other hand, there are many other cases where it's not important.

Which is why it's important to know in cases, and important to *not* track it
down when it's unimportant.

Making this careful decision solely on the few characters of a license
expression would be insanely foolish IMHO.
Not at all. What matters in many circumstances is just being able to show
some sort of due diligence.

In many cases, the "usual" situation is to copy & paste code, regardless of license or legality.
Any improvement over *that* is a big win.

Now, I could not agree more with you: inaccurate and clear licensing
information means that a user will need to review this to ensure this is
This is something that needs to fixed by working with every project author...
[e.g.]... tickets I routinely file with projects that lack a clear license.
I *heartily* endorse that work, thank you! But for every license you add,
someone creates another project with unclear licensing.

The *real* root causes are going to be difficult to fix:
* A large proportion of software developers are self-taught (& so don't know about
the laws), and of the rest, schools typically fail to teach CS students about software-related laws.
You can teach one, but the next developer will do the same thing.
* We have a VC/business culture that often values speed of development over legality.
* Many software developers are young & only know other young developers,
so they don't have anyone more experienced to learn from (or discount
the knowledge of those who *have* suffered the problems before).
* Many software developers, especially young/inexperienced developers,
incorrectly think that laws don't apply to software; I blame in part
the RIAA, who have successfully convinced the latest software developers
that copyright is not a real law.
* Copyright law as-written is very complex, and
is so obviously bought off by special interests, that it's difficult to defend,
and that makes it difficult to get many developers to take it seriously.

You can fix a few egregious cases with tickets, and please do.
But you're *not* to fix these root causes with a few tickets.

Education is *great*, but for the foreseeable future we're going to continue to have problems.

It surely could (NB: it does not yet). that's a minor change.
e.g. something like a list of license expressions with a confidence:

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)
That's not a standard SPDX license expression.

SPDX license expression syntax could add a "confidence" value - but that's
more complex, and I don't think you're seriously proposing it.
Why not just a simple expression that indicates uncertainty of new versions?

Each expression is valid, right?

I suspect at most 0.1% of
SPDX users use SPDX files, everyone else ONLY uses SDPX license
The percentage of SPDX users who use SPDX files may not be that high :-).
Would you have data or pointers to support these assertions about SPDX
usage? That would be mighty useful!
I agree that'd be useful - I don't have anything great.
Here's one try.

A Google search of "filetype:spdx" returns 164 results.
Clearly ".spdx" files are not lighting the world on file.

Contrasting this to SPDX license expressions, we have to look at their
uses, which include package managers, in-file statements, and simple
tools that just report SDPX license expressions (e.g., Ruby's LicenseFinder).

Many package managers use SPDX license expressions
to indicate the package license. E.g., NPM does:
by using the "license:" field - which is *NOT* a SPDX license file.
According to <>, *just* the NPM ecosystem
has 550,951 modules as of Nov 24, with 535 new packages a day on average.
I don't know what percentage of modules have a "license:" entry
(is someone willing to find out?) - but for discussion, I'll guess it's at *least* 10%..
That would mean that there are 55,095 NPM packages that use
SDPX license expressions.

This is a quick try, it'd be possible to get a more accurate estimate. But if you
add all the other package managers where
SPDX license expressions get used, and the per-file entries, and I think
It's clearly that SPDX use is *primarily* the use of SPDX license expressions.

External observations about something (here the confidence you may
attach to a certain license expression) are best managed outside the
observed thing, otherwise they modify the thing under observation.
No. *All* observations are external, there are no exceptions.
Even if a file is specifically labelled as a license, it might have been added by
someone not authorized to do so. More philosophically, I cannot observe
the world "directly"; I can only perceive the world through my senses
which in turn are mediated by my brain.

It is very valuable to be able to say, "the final result of my analysis"
in a single computer-processable expression. Especially since that "final" analysis
can in turn be used as an input for a larger analysis.

Are you suggesting that the SPDX expression spec is empty? (*cough*) Or
that the SPDX spec is empty?
No, I'm suggesting that simplicity as the *only* criteria is not enough;
It needs to be balanced with other needs.

(*cough, cough*) I tend to think it as a tad too
fat and in need of a good diet instead ;)
There's also the long-term damage this decision will cause.
In practice, I expect failing to add this capability is going to make
"GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't
know if 'other later' applies" - and as a result "GPL-2.0-only" will
NOT mean "GPL-2.0-only" as intended.
I do not grok what you mean there. Can you clarify?

Which part of "only" is not clear to you?
Oh, I *understand* the proposal very well. The problem is that
I think it's ignoring some key facts on the ground.

I've said it several different ways, but I'll try again.

Many tools CANNOT determine "or any later version applies in all cases.
They *CAN* determine if a copy of the GPL-2.0 exists.
These tools WILL NOT report "UNKNOWN", because that's useless.
People are using these tools, and will continue to do so.
So, the tools will report "GPL-2.0-only" when they see "GPL-2.0" and
don't know if "or later" applies.

Why would "GPL-2.0-only" suddenly be meaning anything else that its
definition in SPDX....
The result: "GPL-2.0-only" WILL NOT mean "2.0 only" no matter how much
text is written in the spec. It will mean "GPL-2.0, and we don't know if
or later applies". It will mean that, because the spec fails to give
tool writers any alternative to report.



--- David A. Wheeler

Join to automatically receive all group messages.