How to start using only SDPX-License-Identifier tags


Warner Losh
 

Greetings,

I have a ton of questions about how a project can go about doing this.

I'm looking at doing whatever it would take to create a framework so that the FreeBSD project could use SPDX-License-Idnetifiers as the sole designator of license.

Today, the project has SPDX tags in about 25k of 90k files in our source tree. However, in all cases, the SPDX tag today is informative. The actual license is contained in the file as well so there's no ambiguity as to what the license is (we have an explicit statement that says presently these tags are informative).

However, as more and more software is created using only these tags, we've seen some pressure to accept this software. In addition, some members have expressed the desire to be rid of all this tiresome boilerplate at the start of every file.

So, I'm investigating how we can have something like

/*
 * Copyright 2021 M. Warner Losh
 * SPDX-License-Identifier: BSD-2-Clause
 */

become a valid license to use this file under whatever BSD-2-Clause.txt says, with the proper values filled in. At first blush, in a friendly environment, it is easy. Common sense lets me do it. However, in a more hostile environment where people may try to construe ambiguity to their interest and against mine, I worry that any lack of clarity could cause problems. I have some other concerns as well, but this is the basic one. One reason I worry about this is that I've seen a journal article try to construe the archaic 'All Rights Reserved' into meaning something other than enabling language for this Buenos Aires Convention and imparting on it other, maybe nefarious meanings to cite a concrete example.

I've seen the license matching guidelines. Those make sense for the project's prior SPDX activity where we were adding informative tags to existing licenses. However, I am having trouble how one could unambiguously apply them to take the above copyright notice and license and come up with something that I could show to our legal department and that I'd have confidence any litigation around copying of the file would start at (both as the copyright holder and also as a company using this code).

When we added the informative tags to the FreeBSD tree, there was an objection to that because of this and other issues. Since we said the tags were merely informative and the actual license granted use, we were able to dodge the issue at the time. As I prepare to start down the path of turning something like the above into something the project accepts, I have to return to questions like these and a zillion others to get there.

Is there something I've missed that talks about this specifically? I have other questions as well, but these questions seem like a good place to start....

Thanks for your time and attention to this matter...

Warner


Philippe Ombredanne
 

Hi Warner:
See some comments below.

Hi Thomas:
This is FYI as you may be able to provide some insights and
recommendations based on the Linux kernel journey towards
SPDX that could help Warner and FreeBSD?

On Wed, Mar 17, 2021 at 11:11 PM Warner Losh <imp@...> wrote:
I'm looking at doing whatever it would take to create a framework so
that the FreeBSD project could use SPDX-License-Identifiers as the
sole designator of license.
Today, the project has SPDX tags in about 25k of 90k files in our
source tree. However, in all cases, the SPDX tag today is informative.
The actual license is contained in the file as well so there's no
ambiguity as to what the license is (we have an explicit statement
that says presently these tags are informative).

However, as more and more software is created using only these tags,
we've seen some pressure to accept this software. In addition, some
members have expressed the desire to be rid of all this tiresome
boilerplate at the start of every file.

So, I'm investigating how we can have something like

/*
* Copyright 2021 M. Warner Losh
* SPDX-License-Identifier: BSD-2-Clause
*/

become a valid license to use this file under whatever
BSD-2-Clause.txt says, with the proper values filled in.
[...]
I've seen the license matching guidelines. Those make sense for the
project's prior SPDX activity where we were adding informative tags to
existing licenses. However, I am having trouble how one could
unambiguously apply them to take the above copyright notice and
license and come up with something that I could show to our legal
department and that I'd have confidence any litigation around copying
of the file would start at (both as the copyright holder and also as a
company using this code).

When we added the informative tags to the FreeBSD tree, there was an
objection to that because of this and other issues. Since we said the
tags were merely informative and the actual license granted use, we
were able to dodge the issue at the time. As I prepare to start down
the path of turning something like the above into something the
project accepts, I have to return to questions like these and a
zillion others to get there.

I happened to have helped quite a bit for a similar effort for the Linux
kernel and as a FOSS license buff I can provide a few comments. I assume
that you are considering only the core FreeBSD source tree and not the
ports for now.

1. There were similar concerns brought up during this kernel effort.
Eventually folks came down to agree after heated discussions and
adjustments. The fact there were a few ultimate decision makers or tie
breakers surely helped quite a bit together with supports from lawyers
here. Thomas Gleixner (in CC) led the effort and may be able to provide
some comments too.

2. The FSFE Reuse [1] is a closely related spec that co-evolved with the
kernel efforts. It provides a good set of documentation and practices
that are not project-specific and could be adapted to the unique FreeBSD
context.

3. Beyond Linux and the pioneers of U-Boot, many projects now use and
accept SPDX license expressions and SPDX- License-Identifier. I made a
fairly extensive survey of related package metadata documentation
practices in my pending proposal to adopt SPDX expressions for Python
[2] which shows how prevalent things started to be.

4. Because of the above I would say that there are now established
community norms and practices that using SPDX-License-Identifier and
SPDX license expressions are acceptable ways to document licensing.
I do not know of any vocal objections from lawyers here and elsewhere.
Therefore, these norms are now the de-facto way things are done in
community at large.

5. The FreeBSD situation is different because there is a greater
number of licenses and origins at play because of all the different
packages. Nevertheless, the way things have been documented in the
kernel [3] may be a good start. You have similar documents for
FreeBSD [4], but not as detailed as Linux's and with one 404 [5]
Is your plan to start by proposing a new/updated "process" document
for FreeBSD? That would be my first step.

6. The FreeBSD license audit from ~ 2014 by Pedro Giffuni could be
refreshed to help. We could use scancode-toolkit [6] (that I maintain)
to help there. This could allow to classify the codebase based on the
different licenses and license documentation approaches and drive the
actual planning and efforts. And help with the file replacement tooling
suggested below...

7. On the mechanics of replacing license notices by SPDX identifiers
in source code, I have some (old) code that may help with this [7]
and would only need some minor love and care to be functional again.
This could mean automating large volumes of code changes.

8. BSD historical attribution notices are usually more than just a
copyright + a license notice, but often contain extra notices such as:

- This code is derived from software contributed to Berkeley by John
Doe.
- This code was contributed to The NetBSD Foundation by Jane Doe.
- or [8]: "Created by: Warner Losh <imp@...>"

You should document what you would want to do with these.

9. BSD historical licenses comes with many small variants of BSD and MIT
licenses (even in some case your own making [9] ;))
You should document what to do with these cases and in particular:

- should ALL the original name be kept when the text meets SPDX
matching-style guidelines? MO no

- define a process to resolve cases that are borderline and fall outside
the strict guidelines

- when should original authors be contacted, and what to do if they are
AWOL?

- when to submit a new license to SPDX? I suspect you will find a large
number of licenses that are unknown to SPDX. I would suggest to use
first a LicenseRef namespace like I do in [10] either scancode's or to
create your own first then funnel these as needed to SPDX. In my Linux
Kernel scans, I "discovered" several new and weird license variants
(several being franken-BSD and franken-MIT hybrids and mods). Many
were eventually added to the SPDX license list. It would be great to
have the same outcome for your FreeBSD effort!

10. When doing large commits to fix many files, Thomas Gleixner and Kate
Stewart enrolled several volunteers from this list -- several of them
legally trained -- to help review and sign-off on the changes. This was
helpful on so many levels. IMHO you should do the same for FreeBSD.

11. What would be your strategy? A trickle a few files at a time over
time, possibly grouped by package or authors/licenses? Or a few larger
tree-wide changes? The latter approach was used for Linux and we started
with grouping things based on the licensing documentation clarity. It
was large to swallow but once we were over the hump I think it made
things easier afterwards.

12. What's your plan for files with no explicit license and copyright?

I hope this long list of comments may help ... I did not have the time
to make them shorter!

[1] https://reuse.software/
[2] https://www.python.org/dev/peps/pep-0639/#appendix-3-surveying-how-other-package-formats-document-licenses
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/process/license-rules.rst
[4] https://docs.freebsd.org/en_US.ISO8859-1/articles/committers-guide/pref-license.html
[5] https://docs.freebsd.org/internal/software-license.html
[6] https://github.com/nexB/scancode-toolkit
[7] https://github.com/nexB/scancode-toolkit/blob/833-espedexify/src/scancode/plugin_espedexify.py
[8] https://github.com/freebsd/freebsd-ports-kde/blob/68a0222b674a77c456b45e3784ad24447e1eba52/devel/p5-Acme-Damn/Makefile#L1
[9] https://github.com/freebsd/freebsd-src/blob/ba7ede0b9b3d0c3a64e6e7d8cbfe26b6f882f39f/UPDATING#L2434
[10] https://scancode-licensedb.aboutcode.org/

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com


Warner Losh
 



On Thu, Mar 18, 2021 at 4:46 AM Philippe Ombredanne <pombredanne@...> wrote:
Hi Warner:
See some comments below.

Hi Thomas:
This is FYI as you may be able to provide some insights and
recommendations based on the Linux kernel journey towards
SPDX that could help Warner and FreeBSD?

On Wed, Mar 17, 2021 at 11:11 PM Warner Losh <imp@...> wrote:
> I'm looking at doing whatever it would take to create a framework so
> that the FreeBSD project could use SPDX-License-Identifiers as the
> sole designator of license.

> Today, the project has SPDX tags in about 25k of 90k files in our
> source tree. However, in all cases, the SPDX tag today is informative.
> The actual license is contained in the file as well so there's no
> ambiguity as to what the license is (we have an explicit statement
> that says presently these tags are informative).
>
> However, as more and more software is created using only these tags,
> we've seen some pressure to accept this software. In addition, some
> members have expressed the desire to be rid of all this tiresome
> boilerplate at the start of every file.
>
> So, I'm investigating how we can have something like
>
> /*
>  * Copyright 2021 M. Warner Losh
>  * SPDX-License-Identifier: BSD-2-Clause
>  */
>
> become a valid license to use this file under whatever
> BSD-2-Clause.txt says, with the proper values filled in.
> [...]
> I've seen the license matching guidelines. Those make sense for the
> project's prior SPDX activity where we were adding informative tags to
> existing licenses. However, I am having trouble how one could
> unambiguously apply them to take the above copyright notice and
> license and come up with something that I could show to our legal
> department and that I'd have confidence any litigation around copying
> of the file would start at (both as the copyright holder and also as a
> company using this code).
>
> When we added the informative tags to the FreeBSD tree, there was an
> objection to that because of this and other issues. Since we said the
> tags were merely informative and the actual license granted use, we
> were able to dodge the issue at the time. As I prepare to start down
> the path of turning something like the above into something the
> project accepts, I have to return to questions like these and a
> zillion others to get there.


I happened to have helped quite a bit for a similar effort for the Linux
kernel and as a FOSS license buff I can provide a few comments. I assume
that you are considering only the core FreeBSD source tree and not the
ports for now.

1. There were similar concerns brought up during this kernel effort.
Eventually folks came down to agree after heated discussions and
adjustments. The fact there were a few ultimate decision makers or tie
breakers surely helped quite a bit together with supports from lawyers
here. Thomas Gleixner (in CC) led the effort and  may be able to provide
some comments too.

This is informative, and we are mindful of this history.
 
2. The FSFE Reuse [1] is a closely related spec that co-evolved with the
kernel efforts. It provides a good set of documentation and practices
that are not project-specific and could be adapted to the unique FreeBSD
context.

Yea. I saw that. Things are close here.
 
3. Beyond Linux and the pioneers of U-Boot, many projects now use and
accept SPDX license expressions and SPDX- License-Identifier. I made a
fairly extensive survey of related package metadata documentation
practices in my pending proposal to adopt SPDX expressions for Python
[2] which shows how prevalent things started to be.

U-Boot wasn't much help. They seem to have made the change without having some kind of document that lets me know how to construct the license. It is implicit, I'll grant, but not as satisfying as I was hoping for.
 
4. Because of the above I would say that there are now established
community norms and practices that using SPDX-License-Identifier and
SPDX license expressions are acceptable ways to document licensing.
I do not know of any vocal objections from lawyers here and elsewhere.
Therefore, these norms are now the de-facto way things are done in
community at large.

Community norms are nice, but the pushback is going to be that's not legal (rightly or wrongly). It's my understanding that the following chain has to happen for someone to enforce the license.

Copyright is at the base. There is no copying w/o permission allowed. The open source licenses grant this permission under contract law. Use of the software constitutes acceptance of the contract.

With a license that's in the file, it is clear what that contract is. With the indirection, it's not yet clear to me how that contract forms in the details because part of a contract being made includes knowing what the contract is. How do we frame things so that people know what the contract is well enough to cope with an actor advocating for a contract different from what we intended, but might be a literal reading of something, somewhere?

5. The FreeBSD situation is different because there is a greater
number of licenses and origins at play because of all the different
packages. Nevertheless, the way things have been documented in the
kernel [3] may be a good start. You have similar documents for
FreeBSD [4], but not as detailed as Linux's and with one 404 [5]
Is your plan to start by proposing a new/updated "process" document
for FreeBSD? That would be my first step.

The proposal is a fairly narrow one at the moment: The legacy stuff will remain unchanged, with the possible exception of expanding the SPDX informative tags we've added. New files are allowed to adopt either the legacy style, or the new style. Current copyright holders are allowed to switch from old to new, but everybody with a copyright stake in a specific file must sign off on moving to the new style. This handles the problems of diversity of license and the slow, viral change of it as various corporate lawyers tweaked things contributed back and others copied that instead of the original and the process iterated.

The new style would be Copyright notice plus SPDX tag. It's my job to document things so people know what it means. Not only to the causal project member that's contributing code (that's relatively easy: I add a section to our policies section in the project handbook), but also to the wider, more demanding audience who want to know what the license is, exactly, so they know what contract they are entering into.
 
6. The FreeBSD license audit from ~ 2014 by Pedro Giffuni could be
refreshed to help. We could use scancode-toolkit [6] (that I maintain)
to help there. This could allow to classify the codebase based on the
different licenses and license documentation approaches and drive the
actual planning and efforts. And help with the file replacement tooling
suggested below...

Yes. For the legacy stuff that's not a verbatim copy, we'll apply the SPDX matching rules for the tags we put in there, but we won't eliminate the licenses. But I view the classification of the old and allowing the new as two independent problems. Is that view incorrect somehow?
 
7. On the mechanics of replacing license notices by SPDX identifiers
in source code, I have some (old) code that may help with this [7]
and would only need some minor love and care to be functional again.
This could mean automating large volumes of code changes.

We'll likely not do this on a wholesale basis, though allow individual contributors to do so. There are so many copyright holders, some of which are defunct or deceased, that doing this wholesale would be fraught and likely viewed as too much risk, too little reward.
 
8. BSD historical attribution notices are usually more than just a
copyright + a license notice, but often contain extra notices such as:

- This code is derived from software contributed to Berkeley by John
  Doe.
- This code was contributed to The NetBSD Foundation by Jane Doe.
- or [8]: "Created by: Warner Losh <imp@...>"

You should document what you would want to do with these.

I was thinking something similar, but we'll need extra metadata to do that, and the current BSD templates for it in the SPDX repo would need some tweaking I think.
 
9. BSD historical licenses comes with many small variants of BSD and MIT
licenses (even in some case your own making [9] ;))
You should document what to do with these cases and in particular:

- should ALL the original name be kept when the text meets SPDX
  matching-style guidelines? MO no

- define a process to resolve cases that are borderline and fall outside
  the strict guidelines

- when should original authors be contacted, and what to do if they are
  AWOL?

All the above are covered by "Transition to new style only with copyright holder's permission" and we live with the old-style in the tree forever. Though it would be nice if at least some of these could be moved forward.

One big problem in removing things is that some files have multiple licenses currently. And while the "B2 AND B3" is good for a scanning tool, it starts to break down if you are trying to generate the licenses. 
 
- when to submit a new license to SPDX? I suspect you will find a large
  number of licenses that are unknown to SPDX. I would suggest to use
  first a LicenseRef namespace like I do in [10] either scancode's or to
  create your own first then funnel these as needed to SPDX. In my Linux
  Kernel scans, I "discovered" several new and weird license variants
  (several being franken-BSD and franken-MIT hybrids and mods). Many
  were eventually added to the SPDX license list. It would be great to
  have the same outcome for your FreeBSD effort!

I for one am not eager for a 'voiced in Bill Paul's head' license variant here :).

I'll have to investigate the LicenseRef stuff.
 
10. When doing large commits to fix many files, Thomas Gleixner and Kate
Stewart enrolled several volunteers from this list -- several of them
legally trained -- to help review and sign-off on the changes. This was
helpful on so many levels. IMHO you should do the same for FreeBSD.

I think that would be excellent.
 
11. What would be your strategy? A trickle a few files at a time over
time, possibly grouped by package or authors/licenses? Or a few larger
tree-wide changes? The latter approach was used for Linux and we started
with grouping things based on the licensing documentation clarity. It
was large to swallow but once we were over the hump I think it made
things easier afterwards.

1. write a policy that will work (the phase I'm in now)
2. get it approved
3. Allow new files with only SPDX tags soon
4. Allow contributors to switch to SPDX only (I'll likely do this on my work in FreeBSD)
5. Study the tree to see if there's a way the policy from 1&2 can be used to unambiguously switch some files over for "absentee" folks.

12. What's your plan for files with no explicit license and copyright?

That's harder... The biggest source of these in the tree at the moment are Makefiles. And it's the old-time opinion of the project that Makefiles are simple recipes and are just 'facts' and have no material that copyright law would cover. However, since the 4.4BSD days when almost all the makefiles were a dry recitation of the facts, they have become more and more complex and I personally am not sure about this anymore.

There's a few others, but all the code has been marked as far as I know... Though now that I think about it there may be a few 'data' files that might not be able to be marked..

I hope this long list of comments may help ... I did not have the time
to make them shorter!

I love that line :) I've given my first form of reactions, hopefully that will be productive. I have a lot of reading to do now too :)

Thank you so much for taking the time. This has filled in some dots and given me a good way to express some of the concerns that have been brought forward...

Warner
 
[1] https://reuse.software/
[2] https://www.python.org/dev/peps/pep-0639/#appendix-3-surveying-how-other-package-formats-document-licenses
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/process/license-rules.rst
[4] https://docs.freebsd.org/en_US.ISO8859-1/articles/committers-guide/pref-license.html
[5] https://docs.freebsd.org/internal/software-license.html
[6] https://github.com/nexB/scancode-toolkit
[7] https://github.com/nexB/scancode-toolkit/blob/833-espedexify/src/scancode/plugin_espedexify.py
[8] https://github.com/freebsd/freebsd-ports-kde/blob/68a0222b674a77c456b45e3784ad24447e1eba52/devel/p5-Acme-Damn/Makefile#L1
[9] https://github.com/freebsd/freebsd-src/blob/ba7ede0b9b3d0c3a64e6e7d8cbfe26b6f882f39f/UPDATING#L2434
[10] https://scancode-licensedb.aboutcode.org/

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com