Date   

Re: Keep partial conclusions out of license expressions (was: update on only/or later etc.)

Philippe Ombredanne
 

On Mon, Nov 27, 2017 at 3:39 AM, Wheeler, David A <dwheeler@...> wrote:
gary@... [mailto:gary@...]
David - I'm curious if the "OR-MAYBE" proposal solves the issue you are
raising as well.
Yes, it does.
If most everyone were to agree to add this, I am reluctantly OK.
Technically the implementation is easy-peasy so that's not the issue.

I still think these rare cases and exceptions are not exceptional
enough to bend, break or add new rules to the expression syntax and
turn it in a franken-syntax.

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com


Re: Keep partial conclusions out of license expressions (was: update on only/or later etc.)

David A. Wheeler
 

gary@... [mailto:gary@...]
David - I'm curious if the "OR-MAYBE" proposal solves the issue you are
raising as well.
Yes, it does.

--- David A. Wheeler


Re: Keep partial conclusions out of license expressions (was: update on only/or later etc.)

Gary O'Neall
 

I do have a use case where being able to represent partial conclusions would
be useful.

As a provider of audit services, I am often confronted with unclear
documentation on licensing where there is some information present, but not
enough documentation to confirm a specific license expression. In a
majority of the situations, I can determine the specifics through related
websites, contacting the author, or other methods. In some situations,
however, determining the precise license entails a very high effort where
determining the specifics will not change the actionable results of the
audit. For example, if someone has a policy to not include any GPL in their
shipping products and I find a GPL-Only or possibly a GPL+ license, I don't
really need to determine which applies to notify the reader of the audit
report there is a policy violation.

Today, I can represent the situation in SPDX with a License-Ref containing
the exact text and a license comment describing the ambiguity.

It would, however, be much more convenient to the consumer of the SPDX
document if I could express at a high level what the range of possible
interpretations are. To use the GPL example, being able to describe
'GPL-Only OR-MAYBE GPL+' would make it clear to the reader, there is
definitely something GPL here and, depending on the policy, this may not be
allowed regardless of how the license version is concluded.

Tools which look for license conflicts could also use this information to
automatically flag issues (e.g.
https://github.com/librariesio/license-compatibility).

Is this worth changing the spec to cover this case? Possibly. In my audit
scenario, all SPDX documents are accompanied by a written report which
clearly expresses any issues. If there is any downstream consumers (human
or tools) of the audit results which just use the SPDX document without the
associated report, I would argue we should extend the spec to cover this
case. If everyone who receives the SPDX document also read the associated
report, there is not as much value.

I am also quite sensitive to not making the spec any more complex. In
gathering feedback on SPDX documents, the number one issue I've heard is the
spec is just "too complex".

If we want to pursue a solution to the above use case, I am in favor of the
"OR-MAYBE" proposal (see
https://lists.spdx.org/pipermail/spdx-legal/2017-September/002233.html for
Trevor's original proposal). My only concern is that it could also be used
in a declared license scenario where "OR-MAYBE" really shouldn't be used.
IMHO, if you're the copyright owner declaring a license you should not be
able to use a partial conclusion in your declaration. Perhaps we could make
it illegal to use this operator for declared licenses.

David - I'm curious if the "OR-MAYBE" proposal solves the issue you are
raising as well.

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of W. Trevor King
Sent: Saturday, November 25, 2017 10:56 PM
To: Wheeler, David A <dwheeler@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Keep partial conclusions out of license expressions (was: update
on
only/or later etc.)

On Fri, Nov 24, 2017 at 09:33:23PM +0000, Wheeler, David A wrote:
Many package managers use SPDX license expressions to indicate the
package license. E.g., NPM does:
https://docs.npmjs.com/files/package.json
by using the "license:" field - which is *NOT* a SPDX license file.
According to <http://modulecounts.com/>, *just* the NPM ecosystem has
550,951 modules as of Nov 24, with 535 new packages a day on average.
I don't know what percentage of modules have a "license:"
entry (is someone willing to find out?) - but for discussion, I'll
guess it's at *least* 10%.. That would mean that there are 55,095 NPM
packages that use SDPX license expressions.
But how many of those authors would use a partial-conclusion syntax if it
existed?

I expect most npm package authors are also core developers for the
packaged
software and know the package license. They won't need to be able to
express
a partial conclusion.

Distibution-specific package managers, on the other hand, seem more likely
to
be third parties who are not directly related to the development team.
They are
more likely to need to express partial conclusions. Project developers
who
inherit an ambiguously-licensed package from some previous authors would
be
in the same boat. In those cases, ideally they'd track down the copyright
holders
and get to the bottom of the licensing. In the absence of that, they'd
want
some way to express their partial conclusions.

I'm fine with the SPDX deciding that structured partial conclusions are
out of
scope, and leaving it to packaging systems, etc. to define their own (e.g.
an
array of SPDX license expressions with confidence scores). Folks who
extract
license claims from those packages would have to write
per-packaging-system
tooling to convert the partial conclusion into their own format, but
that's ok.
And if it turns out to be a problem, anyone can try to talk folks into
whatever
partial-conclusion model they prefer.

I'm also fine with the SPDX defining it's own partial-conclusion model and
syntax, and trying to talk folks into using it where appropriate.
But I don't think the SPDX needs to take that up if it doesn't want to.

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


Keep partial conclusions out of license expressions (was: update on only/or later etc.)

W. Trevor King
 

On Fri, Nov 24, 2017 at 09:33:23PM +0000, Wheeler, David A wrote:
Many package managers use SPDX license expressions
to indicate the package license. E.g., NPM does:
https://docs.npmjs.com/files/package.json
by using the "license:" field - which is *NOT* a SPDX license file.
According to <http://modulecounts.com/>, *just* the NPM ecosystem
has 550,951 modules as of Nov 24, with 535 new packages a day on
average. I don't know what percentage of modules have a "license:"
entry (is someone willing to find out?) - but for discussion, I'll
guess it's at *least* 10%.. That would mean that there are 55,095
NPM packages that use SDPX license expressions.
But how many of those authors would use a partial-conclusion syntax if
it existed?

I expect most npm package authors are also core developers for the
packaged software and know the package license. They won't need to be
able to express a partial conclusion.

Distibution-specific package managers, on the other hand, seem more
likely to be third parties who are not directly related to the
development team. They are more likely to need to express partial
conclusions. Project developers who inherit an ambiguously-licensed
package from some previous authors would be in the same boat. In
those cases, ideally they'd track down the copyright holders and get
to the bottom of the licensing. In the absence of that, they'd want
some way to express their partial conclusions.

I'm fine with the SPDX deciding that structured partial conclusions
are out of scope, and leaving it to packaging systems, etc. to define
their own (e.g. an array of SPDX license expressions with confidence
scores). Folks who extract license claims from those packages would
have to write per-packaging-system tooling to convert the partial
conclusion into their own format, but that's ok. And if it turns out
to be a problem, anyone can try to talk folks into whatever
partial-conclusion model they prefer.

I'm also fine with the SPDX defining it's own partial-conclusion model
and syntax, and trying to talk folks into using it where appropriate.
But I don't think the SPDX needs to take that up if it doesn't want
to.

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


Finished testing license XML list (for now)

Gary O'Neall
 

I just completed testing all of the license XML files against the license text from the current license list or text copied from the upstream “other web pages for this license”.

 

I found a few error in the previous release license texts in the process and created pull requests to fix any issues found or fixable in the license-list-XML repo.

 

Below are links to the pull requests.  If there are no objections, I’ll go ahead and merge those into the master branch.

 

We also need the following PR merged: CC-BY-ND-2.0: Return to CC-BY-ND-2.0 (had CC-BY-ND-3.0 text)

 

If you’re interested in the test files, they are in a pull request to the test files repository: Add a directory of files used for testing the license list website generating code

 

Note that there were a lot of changes to the SPDX library and tools used to generate the website pages.  If you would like to use the library to match licenses using the license-list-XML generated data, be sure to pull the latest code.

 

Gary

 

-------------------------------------------------

Gary O'Neall

Principal Consultant

Source Auditor Inc.

Mobile: 408.805.0586

Email: gary@...

 


Re: update on only/or later etc.

David A. Wheeler
 

David A. Wheeler:
To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.
Philippe Ombredanne [mailto:pombredanne@...]
Is this really important to know this fact in the general case?
Yes, there are a number of cases where it's important.
The usual reason is because I'm trying to link Apache-2.0 licensed code with
other code, a non-problem for GPL-2.0+ but widely considered a problem for
GPL-2.0 only. The Apache-2.0 license is extremely common.

On the other hand, there are many other cases where it's not important.

Which is why it's important to know in cases, and important to *not* track it
down when it's unimportant.

Making this careful decision solely on the few characters of a license
expression would be insanely foolish IMHO.
Not at all. What matters in many circumstances is just being able to show
some sort of due diligence.

In many cases, the "usual" situation is to copy & paste code, regardless of license or legality.
Any improvement over *that* is a big win.

Now, I could not agree more with you: inaccurate and clear licensing
information means that a user will need to review this to ensure this is
clear....
This is something that needs to fixed by working with every project author...
[e.g.]... tickets I routinely file with projects that lack a clear license.
I *heartily* endorse that work, thank you! But for every license you add,
someone creates another project with unclear licensing.

The *real* root causes are going to be difficult to fix:
* A large proportion of software developers are self-taught (& so don't know about
the laws), and of the rest, schools typically fail to teach CS students about software-related laws.
You can teach one, but the next developer will do the same thing.
* We have a VC/business culture that often values speed of development over legality.
* Many software developers are young & only know other young developers,
so they don't have anyone more experienced to learn from (or discount
the knowledge of those who *have* suffered the problems before).
* Many software developers, especially young/inexperienced developers,
incorrectly think that laws don't apply to software; I blame in part
the RIAA, who have successfully convinced the latest software developers
that copyright is not a real law.
* Copyright law as-written is very complex, and
is so obviously bought off by special interests, that it's difficult to defend,
and that makes it difficult to get many developers to take it seriously.

You can fix a few egregious cases with tickets, and please do.
But you're *not* to fix these root causes with a few tickets.

Education is *great*, but for the foreseeable future we're going to continue to have problems.


It surely could (NB: it does not yet). that's a minor change.
e.g. something like a list of license expressions with a confidence:

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)
That's not a standard SPDX license expression.

SPDX license expression syntax could add a "confidence" value - but that's
more complex, and I don't think you're seriously proposing it.
Why not just a simple expression that indicates uncertainty of new versions?



Each expression is valid, right?

I suspect at most 0.1% of
SPDX users use SPDX files, everyone else ONLY uses SDPX license
expressions.
The percentage of SPDX users who use SPDX files may not be that high :-).
Would you have data or pointers to support these assertions about SPDX
usage? That would be mighty useful!
I agree that'd be useful - I don't have anything great.
Here's one try.

A Google search of "filetype:spdx" returns 164 results.
Clearly ".spdx" files are not lighting the world on file.

Contrasting this to SPDX license expressions, we have to look at their
uses, which include package managers, in-file statements, and simple
tools that just report SDPX license expressions (e.g., Ruby's LicenseFinder).

Many package managers use SPDX license expressions
to indicate the package license. E.g., NPM does:
https://docs.npmjs.com/files/package.json
by using the "license:" field - which is *NOT* a SPDX license file.
According to <http://modulecounts.com/>, *just* the NPM ecosystem
has 550,951 modules as of Nov 24, with 535 new packages a day on average.
I don't know what percentage of modules have a "license:" entry
(is someone willing to find out?) - but for discussion, I'll guess it's at *least* 10%..
That would mean that there are 55,095 NPM packages that use
SDPX license expressions.

This is a quick try, it'd be possible to get a more accurate estimate. But if you
add all the other package managers where
SPDX license expressions get used, and the per-file entries, and I think
It's clearly that SPDX use is *primarily* the use of SPDX license expressions.

External observations about something (here the confidence you may
attach to a certain license expression) are best managed outside the
observed thing, otherwise they modify the thing under observation.
No. *All* observations are external, there are no exceptions.
Even if a file is specifically labelled as a license, it might have been added by
someone not authorized to do so. More philosophically, I cannot observe
the world "directly"; I can only perceive the world through my senses
which in turn are mediated by my brain.

It is very valuable to be able to say, "the final result of my analysis"
in a single computer-processable expression. Especially since that "final" analysis
can in turn be used as an input for a larger analysis.


Are you suggesting that the SPDX expression spec is empty? (*cough*) Or
that the SPDX spec is empty?
No, I'm suggesting that simplicity as the *only* criteria is not enough;
It needs to be balanced with other needs.

(*cough, cough*) I tend to think it as a tad too
fat and in need of a good diet instead ;)
There's also the long-term damage this decision will cause.
In practice, I expect failing to add this capability is going to make
"GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't
know if 'other later' applies" - and as a result "GPL-2.0-only" will
NOT mean "GPL-2.0-only" as intended.
I do not grok what you mean there. Can you clarify?

Which part of "only" is not clear to you?
Oh, I *understand* the proposal very well. The problem is that
I think it's ignoring some key facts on the ground.

I've said it several different ways, but I'll try again.

Many tools CANNOT determine "or any later version applies in all cases.
They *CAN* determine if a copy of the GPL-2.0 exists.
These tools WILL NOT report "UNKNOWN", because that's useless.
People are using these tools, and will continue to do so.
So, the tools will report "GPL-2.0-only" when they see "GPL-2.0" and
don't know if "or later" applies.

Why would "GPL-2.0-only" suddenly be meaning anything else that its
definition in SPDX....
The result: "GPL-2.0-only" WILL NOT mean "2.0 only" no matter how much
text is written in the spec. It will mean "GPL-2.0, and we don't know if
or later applies". It will mean that, because the spec fails to give
tool writers any alternative to report.

Thanks!!

Regards,


--- David A. Wheeler


Re: update on only/or later etc.

Philippe Ombredanne
 

David:
You are bringing good points. Here are my counter points:

On Fri, Nov 24, 2017 at 5:15 PM, Wheeler, David A <dwheeler@...> wrote:
Philippe Ombredanne:
I think there is no contention there at all.
Respectfully: There *IS* contention. I'm contending.

A summary (e.g. a license expression) cannot ever capture all the nuances
of the details.... This is a lossy "compression" by construction...
Sure, but all summaries, and all models, omit something. Indeed,
a SPDX license file *also* cannot capture all the nuances.

The correct question is, "is this model adequate for its uses?"
In most cases people want to know, "is this package legal to use?".
You are making assumption about what the common use case might be. To
me the common use case is more simply: what's the license?

Whether this is "legal" or not is something you or your legal adviser
can decide based on this.
And practically, "legal" is more often than not a policy choice
instead, whether you are a FLOSS project author or a consumer of FLOSS
code.

To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.
Is this really important to know this fact in the general case? In my
own experience the cases where I need hyper precision on GPL-2.0 vs
GPL-2.0+ are rather limited:
1. I am combining GPL 2 and GPL 3 code
2. OR I want to use a GPL 3 for GPL 2-licensed code

These cases are extremely rare for consumers of FLOSS code based on my
reasonably wide and many of experience in this space... So rare in
fact that they account for a handful across thousand+ products and
billions of LOC. So rare that I cannot recall of any OTH.

In each cases they require careful legal review before making a
decision. Making this careful decision solely on the few characters of
a license expression would be insanely foolish IMHO. I am not sure
SPDX needs to worry or cater about this.

In every other case, the GPL2 vs GPL2+ debate does not matter much as
this is still the same GPL terms that apply: same permissions and same
obligations.

Speaking as the author of a fine license detection engine, I think this is a
red herring.
A license detection result can be: "I am 95% sure this is GPL-2.0-only but it
could be GPL-2.0+: please review me to fill in your conclusion."
This inability to indicate the "in-between" state within a license expression
greatly increases the number of cases where an unnecessary review must occur.
Every unnecessary review is a significant increase in time and money.
In many cases, it's *NOT* necessary to make a decision, but in some cases it is.
If organizations can do the analysis *ONLY* when they need to,
they'd save a lot of time and money... and that is greatly aided by
having SPDX license expressions able to indicate this information.
Again, the cases where you need precision vs. good enough accuracy in
the GPL2/GPL2+ debate are rare. 99% of the time, you do not need this
precision at all.

Now, I could not agree more with you: inaccurate and clear licensing
information means that a user will need to review this to ensure this
is clear. But this is NOT a problem for SPDX to solve in the license
expression spec.

This is something that needs to fixed by working with every project
author such that there is clarity such as the work Kate and I have and
are doing with Linux maintainers to make the kernel licensing hyper
clear. Or the tickets I routinely file with projects that lack a clear
license. That's solving the problem IMHO: e.g. let's react to the
symptoms, but attack the root cause instead. And there SPDX and
license expression are a great way to make things clear upstream once
reviewed. There are not a substitute to a review.
FWIW, having an initiative to systematically help projects authors
clarify licensing is something that I have had in mind for quite a
while. I may do something about it eventually.

So detection does not have to be binary as in either 100% right or 100%
wrong. If a tool can only report red or blue binary results, that's a possibly
fine but weak tool.
But that's what I'm saying. Most tools CAN provide more than 2 answers.
The problem is that the SPDX license expressions don't allow tools to report
more than the 2 answers within a license expression. So the tool doesn't have
to give a binary answer, but SPDX forces the tools to do so when they output
SDPX license expressions.
I can output more than one expression then, can I?

For instance scancode-toolkit can cope with ambiguity alright and surface
this for review when it cannot come with a definitive detection answer.
But it CANNOT surface this information via SPDX license expressions.
For most people, that's the ONLY thing that matters.
It surely could (NB: it does not yet). that's a minor change.
e.g. something like a list of license expressions with a confidence:

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)

Each expression is valid, right?

I suspect at most 0.1% of
SPDX users use SPDX files, everyone else ONLY uses SDPX license expressions.
The percentage of SPDX users who use SPDX files may not be that high :-).
Would you have data or pointers to support these assertions about SPDX
usage? That would be mighty useful!

Therefore I have no issue whatsoever to implement Jilyane's comprehensive
proposal and I can always output something on my side.
You can always output something nonstandard that cannot be shared, sure,
and for many detailed analyses that's a good thing.
But that's less helpful for sharing compared to a standard format.
I think we had a similar discussion a while back about adding
something like a scope or purpose in the license expression syntax.
This is the same here: I can convey one or more license expressions
with a confidence attached if needed. The confidence or score is not
part of the expression but some external attribute that qualifies it.

I am not talking to output anything "non-standard" whatever this may
be: instead external data about an expression are best handled
externally.
When in an SPDX doc, there are ways to deal with it; outside of it,
you need to track other data attributes that would otherwise be
supported by an SPDX doc.

To take a (likely bad) analogy: What you are suggesting is somewhat
similar to storing the SHA1 of a file inside the file itself. This
will change the file content... and then you need to recompute the
SHA1 value beause of this. And store it inside the file, and
recompute, and so on .... forever.

External observations about something (here the confidence you may
attach to a certain license expression) are best managed outside the
observed thing, otherwise they modify the thing under observation.

Therefore, I track a file SHA1 outside of a file itself and not
inside. And I see it best to track the confidence or score I can
attach to a license expression outside of this expression.
And if we want to have this in SDPX, this would mean to add an
attribute to qualify a license expression "confidence", not add this
to the expression syntax IMHO.

So since this can be done by one tool alright this is NOT an issue for the
SPDX spec to worry about and tools should adjust: that's for tools
implementors to cope with ambiguity, not something to specify here.

Please let's keep this spec simple!
Well, empty specs are the simplest possible :-).
Specs need to be as simple as possible... but no simpler.
Are you suggesting that the SPDX expression spec is empty? (*cough*)
Or that the SPDX spec is empty? (*cough, cough*) I tend to think it as
a tad too fat and in need of a good diet instead ;)

There's also the long-term damage this decision will cause.
In practice, I expect failing to add this capability is going to make
"GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't
know if 'other later' applies" - and as a result "GPL-2.0-only" will
NOT mean "GPL-2.0-only" as intended.
I do not grok what you mean there. Can you clarify?

Which part of "only" is not clear to you?

Why would "GPL-2.0-only" suddenly be meaning anything else that its
definition in SPDX as carefully crafted by experienced and FLOSS-savvy
lawyers (hat tip) and as agreed and reviewed with the GPL authority
that the FSF is without any possible argument (other hat tip) ?

The case of "I see a license
and no other information" is relatively common, and is *important*
for determining what is legal to do.
Do you have data to support this? My personal experience is that this
is a case that is not so common.
And again even if it were pervasive and the norm, the number of cases
where I need hyper precision to determine "what is legal to do" are
rare as I explained at first and that I am repeating here for clarity:

1. I am combining GPL 2 and GPL 3 code
2. OR I want to use a GPL 3 for GPL 2-licensed code

Outside of these two rare cases, a user of GPL-2.0-licensed code will
not care much about this: "what is legal to do" e.g. which GPL 2.0
permissions and obligations apply is clear and non-ambiguous: this all
that needs to be known. The eventual lack of precision here is not a
problem to me and the many user of many GPL-licensed code used I
helped and helped comply.

And yet, Jilayne's proposal makes these rare cases **crystal clear**
going forward: so this is all gravy to me!

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com


Re: update on only/or later etc.

David A. Wheeler
 

Philippe Ombredanne:
I think there is no contention there at all.
Respectfully: There *IS* contention. I'm contending.

A summary (e.g. a license expression) cannot ever capture all the nuances
of the details.... This is a lossy "compression" by construction...
Sure, but all summaries, and all models, omit something. Indeed,
a SPDX license file *also* cannot capture all the nuances.

The correct question is, "is this model adequate for its uses?"
In most cases people want to know, "is this package legal to use?".
To answer that question, "it's at least GPL-2.0, and might be more"
s important information, and I think it's information that the SPDX
license expression should include.

Speaking as the author of a fine license detection engine, I think this is a
red herring.
A license detection result can be: "I am 95% sure this is GPL-2.0-only but it
could be GPL-2.0+: please review me to fill in your conclusion."
This inability to indicate the "in-between" state within a license expression
greatly increases the number of cases where an unnecessary review must occur.
Every unnecessary review is a significant increase in time and money.
In many cases, it's *NOT* necessary to make a decision, but in some cases it is.
If organizations can do the analysis *ONLY* when they need to,
they'd save a lot of time and money... and that is greatly aided by
having SPDX license expressions able to indicate this information.

So detection does not have to be binary as in either 100% right or 100%
wrong. If a tool can only report red or blue binary results, that's a possibly
fine but weak tool.
But that's what I'm saying. Most tools CAN provide more than 2 answers.
The problem is that the SPDX license expressions don't allow tools to report
more than the 2 answers within a license expression. So the tool doesn't have
to give a binary answer, but SPDX forces the tools to do so when they output
SDPX license expressions.

For instance scancode-toolkit can cope with ambiguity alright and surface
this for review when it cannot come with a definitive detection answer.
But it CANNOT surface this information via SPDX license expressions.
For most people, that's the ONLY thing that matters. I suspect at most 0.1% of
SPDX users use SPDX files, everyone else ONLY uses SDPX license expressions.
The percentage of SPDX users who use SPDX files may not be that high :-).

Therefore I have no issue whatsoever to implement Jilyane's comprehensive
proposal and I can always output something on my side.
You can always output something nonstandard that cannot be shared, sure,
and for many detailed analyses that's a good thing.
But that's less helpful for sharing compared to a standard format.

So since this can be done by one tool alright this is NOT an issue for the
SPDX spec to worry about and tools should adjust: that's for tools
implementors to cope with ambiguity, not something to specify here.

Please let's keep this spec simple!
Well, empty specs are the simplest possible :-).
Specs need to be as simple as possible... but no simpler.

There's also the long-term damage this decision will cause.
In practice, I expect failing to add this capability is going to make
"GPL-2.0-only" mean the same thing as "I saw a GPL-2.0 and I don't
know if 'other later' applies" - and as a result "GPL-2.0-only" will
NOT mean "GPL-2.0-only" as intended. The case of "I see a license
and no other information" is relatively common, and is *important*
for determining what is legal to do.

--- David A. Wheeler


EDL - Eclipse Distribution License

Simon Bernard <contact@...>
 

Hi,

  I would like to now if this could make sense to add the "EDL - Eclipse Distribution License" to spdx ?
  I ask the question because it seems this is a https://opensource.org/licenses/BSD-3-Clause.
  See : https://eclipse.org/org/documents/edl-v10.php
  But many eclipse projects use it and this could help to identify it quickly with tools like spdx.

Thx.

Simon


Keep partial conclusions out of license expressions (was: update on only/or later etc.)

W. Trevor King
 

On Wed, Nov 22, 2017 at 09:45:10AM +0100, Philippe Ombredanne wrote:
A license detection result can be: "I am 95% sure this is
GPL-2.0-only but it could be GPL-2.0+: please review me to fill in
your conclusion."

So detection does not have to be binary as in either 100% right or
100% wrong. If a tool can only report red or blue binary results,
that's a possibly fine but weak tool.
That makes sense to me, even if it doesn't work with GitHub's current
license-reporting API [1] or UI [2]. But confidence percentages are
part of Licensee's output [3], so the current limitation is one GitHub
has knowingly taken on.

Cheers,
Trevor

[1]: https://developer.github.com/v3/licenses/#get-the-contents-of-a-repositorys-license
[2]: https://help.github.com/articles/adding-a-license-to-a-repository/
[3]: https://github.com/benbalter/licensee/blob/v9.6.0/docs/usage.md#advanced-api-usage

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


SPDX at Leadership Summit in March

Philip Odence
 

As you may know, the Linux Foundation Leadership Summit is in Sonoma, March 6-8. Additionally, there will be group meetings on the Monday before and Friday after for SPDX and Open Chain respectively.

 

The call for papers was just published. Please consider submitting a paper. There’s an appetite for talks on SPDX tooling, automation or usage.

http://events.linuxfoundation.org/events/open-source-leadership-summit/program/callforproposals

 

Please take this 1 minute survey to give a sense of the likelihood or your attending:

https://www.surveymonkey.com/r/NLX7KXN

 

Best regards,

Phil

 

BLACKDUCK
L. Philip Odence
VP/General Manager Black Duck On-Demand
Black Duck Software, Inc.
800 District Avenue, Suite 201
Burlington, MA 01803-5061
E: podence@...
O: +1.781.425.4479
M: +1.781.258.9502
Skype: philip.odence
www.blackducksoftware.com  

 

 


Re: update on only/or later etc.

Philippe Ombredanne
 

On Tue, Nov 21, 2017 at 5:28 PM, Wheeler, David A <dwheeler@...> wrote:
J Lovejoy [mailto:opensource@...]:
If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, then
it is currently a problem.
Yes indeed, that's my point :-).

And perhaps by altering the current identifier (GPL-2.0) to be more explicit
(GPL-2.0-only) we will expose just how often GPL-2.0 has been used
incorrectly.
The tools are currently *required* to be incorrect, because they cannot report
the information they have ("I have GPL-2.0, and I don't know if 'or later'
applies"). Neither the proposed "GPL-2.0-only" nor "GPL-2.0+" correctly
represents the information they have. Tools will have to output *something*,
and whatever they produce will dilute in *practice* the strict meanings of
those license identifiers.
David,

Speaking as the author of a fine license detection engine, I think
this is a red herring.

A license detection result can be: "I am 95% sure this is GPL-2.0-only
but it could be GPL-2.0+: please review me to fill in your
conclusion."

So detection does not have to be binary as in either 100% right or
100% wrong. If a tool can only report red or blue binary results,
that's a possibly fine but weak tool.

For instance scancode-toolkit can cope with ambiguity alright and
surface this for review when it cannot come with a definitive
detection answer. Therefore I have no issue whatsoever to implement
Jilyane's comprehensive proposal and I can always output something on
my side.

So since this can be done by one tool alright this is NOT an issue for
the SPDX spec to worry about and tools should adjust: that's for tools
implementors to cope with ambiguity, not something to specify here.

Please let's keep this spec simple!

--
Cordially
Philippe Ombredanne


Re: "unclear version" and OR-MAYBE operators (was: update on only/or later etc.)

Philippe Ombredanne
 

On Wed, Nov 22, 2017 at 6:51 AM, W. Trevor King <wking@...> wrote:
On Tue, Nov 21, 2017 at 08:10:02AM -0700, J Lovejoy wrote:
Just a reminder to all: when someone places a copy of the GPL,
version 2 alongside source code files this does not make the
licensing ambiguous; clearly there is a valid license…
[...]
So I think
there is likely to be a substantial set of license-expression authors
who are unwilling to claim a complete conclusion. Is this point still
under contention?
I think there is no contention there at all.
A summary (e.g. a license expression) cannot ever capture all the
nuances of the details.... This is a lossy "compression" by construction...

If we accept a substantial set of partial-concluders, the SPDX needs
to decide what to suggest to them. Folks using SPDX documents can
already use comment sections, but those are unstructured [3]. And
folks using bare license expressions obviously don't have access to
the SPDX-document comment field.
... therefore your input is valuable and well thought out but none of
this extra complexity is needed.

An expression can be in some case not fully conclusive: when this
happens this information can be provided in an SPDX doc elsewhere such
as notes or else as you rightly noted.

Folks using only license expression are typically using them in
another context which is to document their own code or package
license: there is no ambiguity there and therefore no need to add
extra complexity to capture something that does not exist.

In some cases such as here, perfect is the enemy of the good.
Please, let's try to keep this spec simple!

--
Cordially
Philippe Ombredanne


Re: "unclear version" and OR-MAYBE operators

W. Trevor King
 

On Tue, Nov 21, 2017 at 09:51:27PM -0800, W. Trevor King wrote:
[2]: https://lists.spdx.org/pipermail/spdx-legal/2017-November/002317.html
Subject: Re: only/or later and the goals of SPDX
Date:
Message-ID: <20171109195414.GA11633@...>

[5]:
Subject: Re: only/or later and the goals of SPDX
Date: Thu, 12 Oct 2017 10:31:47 -0700
Message-ID: <20171012173147.GD11004@...>
Sorry, hit send too soon :/. [2] was sent on Thu, 9 Nov 2017 11:54:14
-0800. [5] is at
https://lists.spdx.org/pipermail/spdx-legal/2017-October/002265.html

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


"unclear version" and OR-MAYBE operators (was: update on only/or later etc.)

W. Trevor King
 

On Tue, Nov 21, 2017 at 08:10:02AM -0700, J Lovejoy wrote:
Just a reminder to all: when someone places a copy of the GPL,
version 2 alongside source code files this does not make the
licensing ambiguous; clearly there is a valid license…

Any scenario you could interpret, we have a way to express that
currently and would still under the proposal.

https://opensource.com/article/17/11/avoiding-gpl-confusion
I think a copy of the GPL alongside source code (e.g. [1]) is
ambiguous. And the article you link mentions “confusion” in the URL,
“foggy” in the title, and “ambiguity” in the subtitle. I agree that
you can, like Fedora, decide that you are comfortable enough with one
interpretation. But I think Gary has volunteered himself for the “I'd
write partially-concluded license expressions, but there's no syntax
for it yet” camp [2]. The FSF itself is unwilling to commit to a
public position on this situtation (as far as I'm aware). So I think
there is likely to be a substantial set of license-expression authors
who are unwilling to claim a complete conclusion. Is this point still
under contention?

If we accept a substantial set of partial-concluders, the SPDX needs
to decide what to suggest to them. Folks using SPDX documents can
already use comment sections, but those are unstructured [3]. And
folks using bare license expressions obviously don't have access to
the SPDX-document comment field. We can tell them:

a. That they cannot pass the partial conclusion along, and can only
bail out with NOASSERTION (I've filed [4] to add that to license
expressions).
b. That they can pass the partial conclusion along, using:

i. an AMBIGUOUS[-VERSION] operator, or
ii. an OR-MAYBE operator,

as discussed in [5].

I see no upside to (a), but I'm not an SPDX maintainer. I strongly
prefer b.ii to b.i, as discussed in [5].

The OR-MAYBE operator (b.ii) is completely independent of how the
or-later business shakes out.

The AMBIGUOUS[-VERSION] operator (b.i) overlaps slightly, because
you'd have to choose which GPL short identifier to use with
AMBIGUOUS-VERSION. If (b.i) has no surviving supportors, we don't
have to worry about that at all.

Cheers,
Trevor

[1]: https://github.com/javierwilson/tonto/tree/75be0678be565872cbe7b99d5af4a1946393ee77
[2]: https://lists.spdx.org/pipermail/spdx-legal/2017-November/002317.html
Subject: Re: only/or later and the goals of SPDX
Date:
Message-ID: <20171109195414.GA11633@...>
[3]: https://lists.spdx.org/pipermail/spdx-legal/2017-October/002259.html
Subject: Re: only/or later and the goals of SPDX
Date: Wed, 11 Oct 2017 23:12:39 -0700
Message-ID: <20171012061239.GA11004@...>
[4]: https://github.com/spdx/spdx-spec/issues/50
[5]:
Subject: Re: only/or later and the goals of SPDX
Date: Thu, 12 Oct 2017 10:31:47 -0700
Message-ID: <20171012173147.GD11004@...>

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


Re: update on only/or later etc.

David A. Wheeler
 

J Lovejoy [mailto:opensource@...]:
If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, then it is currently a problem.
Yes indeed, that's my point :-).

And perhaps by altering the current identifier (GPL-2.0) to be more explicit (GPL-2.0-only) we will expose just how often GPL-2.0 has been used incorrectly.
The tools are currently *required* to be incorrect, because they cannot report the information they have ("I have GPL-2.0, and I don't know if 'or later' applies"). Neither the proposed "GPL-2.0-only" nor "GPL-2.0+" correctly represents the information they have. Tools will have to output *something*, and whatever they produce will dilute in *practice* the strict meanings of those license identifiers.

--- David A. Wheeler


reminder of legal/tech call today

J Lovejoy
 

Hi All,

Just a reminder that due to the regularly scheduled legal call falling on a US holiday this Thursday, we will join the tech team today.

Dial-in info:
Web conference: http://uberconference.com/SPDXTeam
Optional dial in number: 415-881-1586
No PIN needed

Thanks,
Jilayne

SPDX Legal Team co-lead
opensource@...


Re: update on only/or later etc.

J Lovejoy
 




On Nov 17, 2017, at 8:35 AM, Wheeler, David A <dwheeler@...> wrote:

J Lovejoy:

Do NOT add a identifier or operator, etc. for the found-license-text-only scenario where you don’t know if the intent of the copyright holder was “only or “or later” and are thus left to interpret clause

I disagree, sorry.

- we don’t need to solve this right now and we can always add this option later
- without adding a third option, we are in the same position we have been in since the birth of the SPDX License List. incremental changes have always been our go-to strategy; let’s take a first step to clarify the current identifiers in a way that the FSF can get behind. If, for a later release, we think we need this third option, then we can discuss that further once we have some time under our belts with this change. 

No, this is the *reason* that there's a problem.  The *reason* that "GPL-2.0" isn't working is, in part, because it overloads two notions.  "GPL-2.0" is supposed to mean "Only 2.0" (per the spec) .  But tools only know "I saw a GPL-2.0 license", so how can they represent that information?  The obvious way is "GPL-2.0", so that same identifier can mean "2.0 at least, and I don't know if there are other versions allowed".  That's not good.

Hi David,

If this is a potential problem once GPL-2.0 is changed to GPL-2.0-only, then it is currently a problem. And perhaps by altering the current identifier (GPL-2.0) to be more explicit (GPL-2.0-only) we will expose just how often GPL-2.0 has been used incorrectly. That may provide better examples to work off of to decide what ‘third option’ we need.  

Just a reminder to all: when someone places a copy of the GPL, version 2 alongside source code files this does not make the licensing ambiguous; clearly there is a valid license. The question comes down to how you interpret clause 9:
- does the language, "If the Program specifies a version number of this License which applies to it and 'any later version,' you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation.” interpreted that placing a copy of the license is “specifying a version” and thu a user can redistribute the code under GPL version 2 (GPL-2.0-only) or, possibly some people read this as meaning GPL version 2 or any later version (GPL-2.0+)
- or does placing a copy of a version of the license NOT constitute specifying a version and thus the sentence, "If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.” in which case one can redistribute the code under GPL-1.0+


Any scenario you could interpret, we have a way to express that currently and would still under the proposal. 

While on this subject, an article that appeared on opensource.com came up on the last call. I just want to point out that that article, which explains the above interpretation issues (which we have been talking about for several months), does not reach a conclusion but simply encourages people to provide clarity of their intentions.  We can certainly all agree on encouraging that!  https://opensource.com/article/17/11/avoiding-gpl-confusion (Although, I think we should consistently encourage people to use the standard license notices provided by the license and/or SPDX short identifiers) :)

Thanks,
Jilayne



Re: update on only/or later etc.

Gary O'Neall
 

I understand and agree with David's concerns - also coming from a tooling perspective.

However, I believe this is a different problem than the FSF issue and a problem we have today with the current license expression syntax and the current license list.

It seems we are talking about 2 different usage scenarios for SPDX license expressions:
1) Someone is using a license expression to document what they "know" or assert is the license for a file or package. For example, the copyright owner is adding an SPDX license ID in their file headers.
2) Someone or something is documenting findings on license information for files or packages. For example, a license scanning tool.

For #1, we don't want to allow someone to be ambiguous about whether a GPL license is "only" or "or later" when describing a license using SPDX license expressions. I believe this is the issue the FSF is concerned about.

For #2, we will find situations where it is not clear if a GPL license is to be used "only" with that version or with that version or later (BTW - it's not just tools that have this problem). We would like to be able to express this situation using SPDX since it is very useful information.

On the last legal call, it seemed clear to me that our attempts to solve #2 created a great deal of concern for those trying to solve #1.

In order to make progress, I still feel we should divide and conquer solving the FSF issue first then addressing the ambiguous license version issue in a future release of the spec. Perhaps we can come up with a more generalized solution for ambiguous license findings for #2 if we had more time to design and discuss the solution.

One additional thought: We could use a LicenseRef to document the exact text of the ambiguous license version and add a license comment to indicate it is GPL, just not clear which version. The LicenseRef approach would only work for SPDX documents and would provide more information than a NOASSERTION.

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Wheeler, David A
Sent: Friday, November 17, 2017 3:20 PM
To: brad.edmondson@...
Cc: SPDX-legal
Subject: RE: update on only/or later etc.

Brad Edmondson [mailto:brad.edmondson@...]
I think your points are good ones, but it seems to me they go to the
separate issues of "file:detected license" and "package:concluded license."
The clarity of the spec argument is aimed at making the "file:detected
license" case more explicit, and if it leaves tools with NOASSERTION for
"package:concluded license," then I think that's OK, no?

No, it fails to work for multiple reasons:
1. "NOASSERTION" is basically useless, because it provides no information. In
many cases, all I need to know is "there's a version of the GPL here", and I
can make a decision. Being able to provide *some* information is often all
that's needed , while providing *no* information creates a lot of unnecessary
work for tool users.
2. Tools, lacking sentience, often cannot determine whether or not "or later
versions" applies. So they're unable to be "more explicit" in
package:concluded. The current structure requires that conclude either "only
2.0" or "2.0 or later"... even though tools typically CANNOT make that
determination. SPDX should make it possible report the information *actually*
available.
3. The majority of SPDX users do not use SPDX files. Instead, they *only* use
SPDX license expressions (as available in package managers, file content
declarations, etc.). So there's no "file:detected" vs. "package:concluded"
entries to compare anyway.

--- David A. Wheeler

_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal


Re: update on only/or later etc.

David A. Wheeler
 

Brad Edmondson [mailto:brad.edmondson@...]
I think your points are good ones, but it seems to me they go to the separate issues of "file:detected license" and "package:concluded license." 
The clarity of the spec argument is aimed at making the "file:detected license" case more explicit, and if it leaves tools with NOASSERTION for "package:concluded license," then I think that's OK, no?
No, it fails to work for multiple reasons:
1. "NOASSERTION" is basically useless, because it provides no information. In many cases, all I need to know is "there's a version of the GPL here", and I can make a decision. Being able to provide *some* information is often all that's needed , while providing *no* information creates a lot of unnecessary work for tool users.
2. Tools, lacking sentience, often cannot determine whether or not "or later versions" applies. So they're unable to be "more explicit" in package:concluded. The current structure requires that conclude either "only 2.0" or "2.0 or later"... even though tools typically CANNOT make that determination. SPDX should make it possible report the information *actually* available.
3. The majority of SPDX users do not use SPDX files. Instead, they *only* use SPDX license expressions (as available in package managers, file content declarations, etc.). So there's no "file:detected" vs. "package:concluded" entries to compare anyway.

--- David A. Wheeler

1201 - 1220 of 3280