Re: How to start using only SDPX-License-Identifier tags


Philippe Ombredanne
 

Hi Warner:
See some comments below.

Hi Thomas:
This is FYI as you may be able to provide some insights and
recommendations based on the Linux kernel journey towards
SPDX that could help Warner and FreeBSD?

On Wed, Mar 17, 2021 at 11:11 PM Warner Losh <imp@...> wrote:
I'm looking at doing whatever it would take to create a framework so
that the FreeBSD project could use SPDX-License-Identifiers as the
sole designator of license.
Today, the project has SPDX tags in about 25k of 90k files in our
source tree. However, in all cases, the SPDX tag today is informative.
The actual license is contained in the file as well so there's no
ambiguity as to what the license is (we have an explicit statement
that says presently these tags are informative).

However, as more and more software is created using only these tags,
we've seen some pressure to accept this software. In addition, some
members have expressed the desire to be rid of all this tiresome
boilerplate at the start of every file.

So, I'm investigating how we can have something like

/*
* Copyright 2021 M. Warner Losh
* SPDX-License-Identifier: BSD-2-Clause
*/

become a valid license to use this file under whatever
BSD-2-Clause.txt says, with the proper values filled in.
[...]
I've seen the license matching guidelines. Those make sense for the
project's prior SPDX activity where we were adding informative tags to
existing licenses. However, I am having trouble how one could
unambiguously apply them to take the above copyright notice and
license and come up with something that I could show to our legal
department and that I'd have confidence any litigation around copying
of the file would start at (both as the copyright holder and also as a
company using this code).

When we added the informative tags to the FreeBSD tree, there was an
objection to that because of this and other issues. Since we said the
tags were merely informative and the actual license granted use, we
were able to dodge the issue at the time. As I prepare to start down
the path of turning something like the above into something the
project accepts, I have to return to questions like these and a
zillion others to get there.

I happened to have helped quite a bit for a similar effort for the Linux
kernel and as a FOSS license buff I can provide a few comments. I assume
that you are considering only the core FreeBSD source tree and not the
ports for now.

1. There were similar concerns brought up during this kernel effort.
Eventually folks came down to agree after heated discussions and
adjustments. The fact there were a few ultimate decision makers or tie
breakers surely helped quite a bit together with supports from lawyers
here. Thomas Gleixner (in CC) led the effort and may be able to provide
some comments too.

2. The FSFE Reuse [1] is a closely related spec that co-evolved with the
kernel efforts. It provides a good set of documentation and practices
that are not project-specific and could be adapted to the unique FreeBSD
context.

3. Beyond Linux and the pioneers of U-Boot, many projects now use and
accept SPDX license expressions and SPDX- License-Identifier. I made a
fairly extensive survey of related package metadata documentation
practices in my pending proposal to adopt SPDX expressions for Python
[2] which shows how prevalent things started to be.

4. Because of the above I would say that there are now established
community norms and practices that using SPDX-License-Identifier and
SPDX license expressions are acceptable ways to document licensing.
I do not know of any vocal objections from lawyers here and elsewhere.
Therefore, these norms are now the de-facto way things are done in
community at large.

5. The FreeBSD situation is different because there is a greater
number of licenses and origins at play because of all the different
packages. Nevertheless, the way things have been documented in the
kernel [3] may be a good start. You have similar documents for
FreeBSD [4], but not as detailed as Linux's and with one 404 [5]
Is your plan to start by proposing a new/updated "process" document
for FreeBSD? That would be my first step.

6. The FreeBSD license audit from ~ 2014 by Pedro Giffuni could be
refreshed to help. We could use scancode-toolkit [6] (that I maintain)
to help there. This could allow to classify the codebase based on the
different licenses and license documentation approaches and drive the
actual planning and efforts. And help with the file replacement tooling
suggested below...

7. On the mechanics of replacing license notices by SPDX identifiers
in source code, I have some (old) code that may help with this [7]
and would only need some minor love and care to be functional again.
This could mean automating large volumes of code changes.

8. BSD historical attribution notices are usually more than just a
copyright + a license notice, but often contain extra notices such as:

- This code is derived from software contributed to Berkeley by John
Doe.
- This code was contributed to The NetBSD Foundation by Jane Doe.
- or [8]: "Created by: Warner Losh <imp@...>"

You should document what you would want to do with these.

9. BSD historical licenses comes with many small variants of BSD and MIT
licenses (even in some case your own making [9] ;))
You should document what to do with these cases and in particular:

- should ALL the original name be kept when the text meets SPDX
matching-style guidelines? MO no

- define a process to resolve cases that are borderline and fall outside
the strict guidelines

- when should original authors be contacted, and what to do if they are
AWOL?

- when to submit a new license to SPDX? I suspect you will find a large
number of licenses that are unknown to SPDX. I would suggest to use
first a LicenseRef namespace like I do in [10] either scancode's or to
create your own first then funnel these as needed to SPDX. In my Linux
Kernel scans, I "discovered" several new and weird license variants
(several being franken-BSD and franken-MIT hybrids and mods). Many
were eventually added to the SPDX license list. It would be great to
have the same outcome for your FreeBSD effort!

10. When doing large commits to fix many files, Thomas Gleixner and Kate
Stewart enrolled several volunteers from this list -- several of them
legally trained -- to help review and sign-off on the changes. This was
helpful on so many levels. IMHO you should do the same for FreeBSD.

11. What would be your strategy? A trickle a few files at a time over
time, possibly grouped by package or authors/licenses? Or a few larger
tree-wide changes? The latter approach was used for Linux and we started
with grouping things based on the licensing documentation clarity. It
was large to swallow but once we were over the hump I think it made
things easier afterwards.

12. What's your plan for files with no explicit license and copyright?

I hope this long list of comments may help ... I did not have the time
to make them shorter!

[1] https://reuse.software/
[2] https://www.python.org/dev/peps/pep-0639/#appendix-3-surveying-how-other-package-formats-document-licenses
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/process/license-rules.rst
[4] https://docs.freebsd.org/en_US.ISO8859-1/articles/committers-guide/pref-license.html
[5] https://docs.freebsd.org/internal/software-license.html
[6] https://github.com/nexB/scancode-toolkit
[7] https://github.com/nexB/scancode-toolkit/blob/833-espedexify/src/scancode/plugin_espedexify.py
[8] https://github.com/freebsd/freebsd-ports-kde/blob/68a0222b674a77c456b45e3784ad24447e1eba52/devel/p5-Acme-Damn/Makefile#L1
[9] https://github.com/freebsd/freebsd-src/blob/ba7ede0b9b3d0c3a64e6e7d8cbfe26b6f882f39f/UPDATING#L2434
[10] https://scancode-licensedb.aboutcode.org/

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com

Join Spdx-legal@lists.spdx.org to automatically receive all group messages.