Re: update on only/or later etc.


David A. Wheeler
 

Sorry for the long email, but I was asked for evidence... so I went and got some.

David A. Wheeler:
The usual reason is because I'm trying to link Apache-2.0 licensed
code with other code, a non-problem for GPL-2.0+ but widely considered
a problem for
GPL-2.0 only. The Apache-2.0 license is extremely common.
Philippe Ombredanne [mailto:pombredanne@...]
I understand your point, but __how many times__ did you ever encounter
this case in the real world? ...
An issue of Apache-2.0 compatibility with the GPL-2.0 has never showed
up: zero cases, not one single time.
The issue of having to *check* for Apache-2.0 & GPL-2.0 ONLY problems
has happened a number of times in just the BadgeApp project that I lead,
and I don't think that's unusual. In just that small project I think I've had to check ~10 times.
An actual *problem* is less common, because it's often the case that the
GPL code is really "GPL-2.0+" or that the GPL-2.0 ONLY code is not linked with Apache code.
I've seen both situations in the BadgeApp. (My biggest concern was with a GPL-2.0 ONLY
library that was a deep transitive dependency that provided text coloring of test results,
but it turned out to be linked into a *different* executable from my Apache-2.0 libraries,
so there was no legal problem.)
However, one of the reasons I use SPDX is *specifically* to make sure that there's no problem.

In some cases, the GPL'ed code is in a separate executable, so I don't need to
examine "is 'or later' true?". If they *are* linked together, then I need to investigate.

Not at all. What matters in many circumstances is just being able to
show some sort of due diligence.
Are you serious there? Where in the actual real world anyone is looking
after "being able to show some sort of due diligence" and consider this
enough? That does not sound reasonable. Who does this? I would have a
field day looking as such a codebase.
Yes, I'm very serious. Enjoy your field day :-). Warning: it gets less fun over time :-(.

The people here on SPDX-legal generally try to
meet legal requirements seriously, and are often involved in corporate efforts
to ensure that the entire organization complies with various legal requirements.

I'm grateful to all of you; you're the good guys. Wear that white hat proudly.

But not all organizations are this way. In particular, small organizations often can't
afford in-house lawyers and often don’t understand the law very well anyway.
"Doing your best" (as they perceive it) without expertise is very common in *practice* -
though perhaps since they're smaller organizations you don't see them.
Small organizations make a lot of software.

Magma claims that
"75% of Mergers and Acquisitions Fail Due to Unsuccessful Software System Integration"
http://www.magmadigital.co.uk/2016/failed-mergers-and-acquisitions-software/
- that isn't just licensing, but licensing is one of the factors.

Maya Rotenberg (WhiteSource) says that
"On the Verge of an M&A? Don’t Ignore Open Source Due Diligence" (11 August 2015)
https://www.whitesourcesoftware.com/whitesource-blog/verge-ma-dont-ignore-open-source-due-diligence/
says "only 27% of companies have a formal open source policy
to approve new components that are added to their software." and that
20% of M&As fail (though it's not clear how often licensing specifically is the problem).

I believe WhiteSource & Magma have economic reasons to emphasize the problem.
Even so, I think that's enough evidence to suggest that there is a real problem.

In many cases, the "usual" situation is to copy & paste code, regardless of
license or legality.
Where do you get that the "usual" situation is to copy & paste code?
Based on my long experience, copy/paste of snippets is a rare event and
usually account for only a handful of items even in very large product
codebases.
My own observation, and those of others, is that copy/paste from others' work happens often.
It is often not obvious, because it's hard to detect unless you use a tool specifically
to detect it. Many organizations do not use such tools, so it's unsurprising
that many organizations are unaware of it.

First, a personal experience.
I talked with a NASA contractor years ago, who asserted that he just
copied code from the Internet, wherever it was useful, and pasted it in. He indicated
that this was common at his company. His rationale:
"if it's on the Internet anyone can use it" (!). This was at a conference where the
NASA lawyers & NASA software developers were trying to understand each other!
The NASA lawyers, of course, knew this was not okay & did everything they could to educate
developers to prevent *exactly* this sort of thing. I also tried to explain to this person that
doing so was *not* acceptable. But when people are told to *NOT* do something,
they do not always comply, especially since copying & pasting means they can go home
and spend more time with family. NASA was *specifically* trying to prevent this
sort of thing, but it's hard to stop... especially since developers often don't have
*any* training in the law or ethics in school (in contrast with many engineers).

It's not just me; here are some citations that suggest it's common:

https://softwareengineering.stackexchange.com/questions/36978/do-most-programmers-copy-and-paste-code
- "... I'm constantly hearing about people who cut & paste, and they talk about it like it's common practice. I also see comments by others which indicate it's common practice."
- "From one project to another: Most programmers cut and paste code in this capacity. They might find a previous project or something online and copy/paste it exactly or copy/paste and make changes to it. I think this practice is typically fine. This is especially good when it is proven code. (Examples: Some sort of utility object from a past project that worked well, or possibly from a blog with few changes needed)."
- "When I'm stuck and search for stuff to solve my problem and happen upon some helpful snippet of code that does what I want, I naturally copy it. Sometimes it's just the gist of it. I then change it to suite my needs."
- "I find that the most frequent situation where I "cut & paste" [other people's] code is when I have a particular problem and I run into a blog post that solves it. Most times I retype the solution into my project (after all, it's probably written in the blog author's style, if nothing else). It's not really my code, but I don't feel bad about using it in that scenario."

Notice that none of these quotes suggests that there are any legal issues in copy/paste.

https://www.quora.com/Do-developers-copy-and-paste-code-from-others-Do-they-change-the-specifics-of-what-they-need-for-their-projects-Do-they-always-understand-they-are-integrating-into-their-projects-Can-a-developer-build-something-up-just-mashing-up-other%E2%80%99s-code
- "Yes. It is common for developers to copy and paste code they've already written, code from their company's code base, open source code, and answers on stack overflow."
(Here, at least, there are some notes about licensing... but that doesn't mean everyone follows them.)


But for every license you add,
someone creates another project with unclear licensing.
Really, do you have data to back this? Note also we should not care if
"someone creates another project with unclear licensing".
We should care if someone creates another project with unclear licensing
that someone actually uses in the real world.
The hypothetical cases of goofy licensing of unused software are not
relevant IMHO.
Yes, but first, some context. Much of this is
part of the larger problem that many software developers think that licensing & the law
is not relevant to modern software development. Many software projects
don’t have a license at all, and yes, people *use* that software.

The article "[Update] Are GitHubbers taking open source seriously?"
found that 28% of the *MOST* popular GitHub software projects do *NOT* have any license:
https://makk.es/blog/are-githubbers-taking-open-source-seriously/

Here's the essay "Why all software needs a license" by Simon Phipps (InfoWorld),
trying to convince developers to add licenses:
https://www.infoworld.com/article/2839560/open-source-software/sticking-a-license-on-everything.html
This essay summarizes the counter-arguments of many, and the essay wouldn't be necessary
if developers universally agreed that licenses and the law apply to modern software.

Ben Balter's 2015 post "Open source license usage on GitHub.com":
https://github.com/blog/1964-open-source-license-usage-on-github-com
found that around 20% of GitHub repos had *any* license at all (80% didn't).
I don't know what the current figures are, but 2015 wasn't *that* long ago and
all figures since 2008 showed that many repos aren't licensed at all.
The 2015 figure was an increase, caused in large part by their choosealicense site.

I'm very grateful to the choosealicense site, but when people create licenses at *all*,
they often just copy&paste the license. If they don't make any choices (e.g., they just
copied GPL-2.0 into a LICENSE file), it's not obvious if they mean "2.0 or later" or not.
Heck, the *developer* probably doesn't know, because the developer often doesn't
understand copyright or licensing. (It's usually not covered in school.)

There are two related *movements* in the software development world,
specifically "license-free software" and "Post open source software" (POSS),
which argue that licenses (and laws) are irrelevant for software.
They advocate for a disregard for the current license regimes & copyright law in general:
https://en.wikipedia.org/wiki/License-free_software
https://en.wikipedia.org/wiki/Post_open_source
This is supported by well-known people in the software development sphere, including
Dan Bernstein (author of qmail, djbdns, daemontools, and ucspi-tcp, though those projects
have more recently received a license statement).
This post by Luis Villa may help explain the mindset:
http://lu.is/blog/2013/01/27/taking-post-open-source-seriously-as-a-statement-about-copyright-law/
James Governor, founder of analyst firm RedMonk, said:
"younger devs today are about POSS – Post open-source software.
f*** the license and governance, just commit to github."
https://twitter.com/monkchips/status/247584170967175169

Please understand that I do *NOT* agree that licensing or the law are irrelevant.
I suspect few on SPDX-legal would have that position :-).
If you're a lawyer, it may boggle your mind that many people think the law is irrelevant.
But it *is* a real thing.

The POSS folks have a point about the problems of a "permission based culture" - but
I think they have it backwards. If they don't like the legal requirement,
they have to change the law. Ignoring the law does not change the law; their actions
simply create a legal trap for others.

- confidence: 100% , expression: GPL-2.0-only
- confidence: 60% , expression: ((GPL-2.0-only or GPL-2.0+) and MIT)
That's not a standard SPDX license expression.
Since when "GPL-2.0-only" and "((GPL-2.0-only or GPL-2.0+) and MIT)"
are not valid expressions?
The "confidence:" is not a valid SPDX license expression, that is *outside* it.

I want something that's *part* of the SPDX license expression, because that's
all many people have or want to have.

Many tools CANNOT determine "or any later version applies in all cases.
If there is such tool, then it should either be updated or not used at all.
No tool can guarantee that always determines if "or any later version" applies.
Certainly not licensee, which is the tool used automatically by GitHub.
Indeed, licensee generally only looks at the LICENSE file - it doesn't even *try*
to parse the README file (which it could only do imperfectly anyway).

Oh, and for many developers, the license output from licensee is the *only*
SPDX data they'll see, because GitHub does that analysis automatically for them
when they view a project (they don't have to run a tool). I'd love to see
licensee improved, but most developers have ZERO interest in all the details
of a SPDX file anyway; they just want the license expression, and that's it.
In many places, the *developers* choose the libraries that will be used;
there are no lawyers to double-check anything.

If I reformulate this: There are tools that do a poor job at providing proper
results.
That's not a reformulation, that's its antithesis. That statement
presumes that tools *can* always provide "proper results".

I argue that it is *impossible* for an automated tool to ALWAYS
provide "proper results" when it sees a GPL-2.0 license.
Sure, tools like "licensee" could be improved.
But if the *developer* included the GPL-2.0 license,
but didn't indicate if "or later" applies, what exactly is the tool supposed to conclude?
Both "GPL-2.0 ONLY" and "GPL-2.0+" are NOT supported by the data available to the tool.
I think there needs to be *some* way to indicate that "I saw at least GPL-2.0,
and I can't determine if 'or later' applies" (in a reasonably simple & general way).

Respectfully,

--- David A. Wheeler

Join Spdx-legal@lists.spdx.org to automatically receive all group messages.