FreeBSD Use Case for Short Identifiers


Warner Losh
 

Greetings,

The FreeBSD project is planning on expanding our use of the SPDX-License-Identifier. Currently, all files have the full license text. We have a mix of files with SPDX ID and those without.

What we'd like to do is expand this so we can accept software that only has a copyright notice plus an SPDX-License-Identifier to indicate the license. To that end, we're trying to craft a policy / document that describes how contributors, users and redistributors of the software can know the license for any given file in the tree.

For the files that have only an explicit license text, it's that license.
For files with both, the explicit license text is the license.
For files with only the SPDX-License-Identifier, the license text can be found in our source tree as src/share/license/<SPDX-License-Identifier>.txt with any copyright notices in the file pre-pended.

To accomplish this, I've started to put together an over-arching FreeBSD policy at https://people.freebsd.org/~imp/DRAFT-Licensing-Policy.html which (a) needs to be copyedited and (b) likely needs to have sections 3 & 4 before 1 & 2.

I thought I'd run this by people here for review and/or to evaluate it as a FAQ answer :).

Warner


Steve Winslow
 

Hi Warner,

Thanks for sharing this. It looks like a good, detailed way to document how the SPDX license identifiers will work for FreeBSD. I've seen several projects implement SPDX identifiers without this level of detail, but I understand the desire to do so here and I think this will be helpful to the community.

I took a quick look at the sections relating to SPDX identifiers. I had a couple of clarifications that might be helpful to consider:

* In section 2, there's a link to https://spdx.org/. There's a page on the SPDX website that is dedicated to explaining how SPDX identifiers work, at https://spdx.dev/ids/ -- this might be more useful as a direct link.

* In section 2.5, I note that it states that for the GNU family of licenses, the "-only" and "-or-later" IDs are deprecated and e.g. "GPL-2.0" or "GPL-2.0+" should be used instead. This is incorrect -- it's actually the other way around, the plain identifier or "+" is deprecated and "-only" / "-or-later" should typically be used instead.

To be clear, the "GPL-2.0" / "GPL-2.0+" identifiers remain valid and usable. They are listed as "deprecated" in the section at the bottom of https://spdx.org/licenses/, but they are still valid (and I expect will remain valid). Some projects, such as the Linux kernel, have opted to primarily use this form rather than switching to "-only" / "-or-later".

This is only for the GNU family of licenses (GPL, LGPL, AGPL) -- all others continue to use "+" as the primary way to express "or later version".

I hope this is helpful! Best,
Steve


On Thu, Apr 1, 2021 at 6:53 PM Warner Losh <imp@...> wrote:
Greetings,

The FreeBSD project is planning on expanding our use of the SPDX-License-Identifier. Currently, all files have the full license text. We have a mix of files with SPDX ID and those without.

What we'd like to do is expand this so we can accept software that only has a copyright notice plus an SPDX-License-Identifier to indicate the license. To that end, we're trying to craft a policy / document that describes how contributors, users and redistributors of the software can know the license for any given file in the tree.

For the files that have only an explicit license text, it's that license.
For files with both, the explicit license text is the license.
For files with only the SPDX-License-Identifier, the license text can be found in our source tree as src/share/license/<SPDX-License-Identifier>.txt with any copyright notices in the file pre-pended.

To accomplish this, I've started to put together an over-arching FreeBSD policy at https://people.freebsd.org/~imp/DRAFT-Licensing-Policy.html which (a) needs to be copyedited and (b) likely needs to have sections 3 & 4 before 1 & 2.

I thought I'd run this by people here for review and/or to evaluate it as a FAQ answer :).

Warner



--
Steve Winslow
VP, Compliance and Legal
The Linux Foundation


Sebastian Crane
 

Dear Warner,

I've read the policy draft you've written; clearly you have given this
lots of thought already. I'd like to offer my take on it:

The policy seems really similar to the REUSE standard [1] for licensing
notices, which combines the SPDX license list with a convention for
where and how to put these notices in the source tree. Given that your
draft policy has many of the same objectives as REUSE, you might want to
consider adopting the REUSE standard fully, as it would allow you to use
existing tools to check and add these notices. It also allows you to
generate SPDX documents automatically, if that is something you are
interested in.

In particular, section 2.4 would become unnecessary with REUSE. One of
the principles of the specification is to provide copyright and license
notices for *all* files, even if that license may be unenforceable. As
the BSD license has very few requirements, it shouldn't be hard for
distributors to comply with it anyway. Again, if all files (even the
trivial ones!) are supposed to have SPDX notices, it would be very easy
to spot the files missing them.

Finally, I also have a few minor suggestions. Copyright licenses aren't
necessarily contracts as stated in section 2; I think that part one of
the short essay 'Free Software Matters: Enforcing the GPL' by Eben
Moglen [2] is particularly good at describing the essence of software
licenses.

Personally, I would remove sections 3 and 4 entirely and link to a
separate document describing those policies. I sometimes find it
confusing when projects have multiple documents explaining the same
thing, and it also increases the risk of contradictions when one is
updated but not others.

Finally, it occurs to me that full adoption of SPDX license identifiers,
if you decided to choose that path, would be the perfect topic for a
joint press release - maybe something to discuss in the future :)

Best wishes,

Sebastian

[1]: https://reuse.software/
[2]: http://emoglen.law.columbia.edu/publications/lu-12.pdf


Warner Losh
 



On Fri, Apr 2, 2021 at 9:27 AM Sebastian <seabass-labrax@...> wrote:
Dear Warner,

I've read the policy draft you've written; clearly you have given this
lots of thought already. I'd like to offer my take on it:

Thanks for the feedback. I'll take a look at this.
 
The policy seems really similar to the REUSE standard [1] for licensing
notices, which combines the SPDX license list with a convention for
where and how to put these notices in the source tree. Given that your
draft policy has many of the same objectives as REUSE, you might want to
consider adopting the REUSE standard fully, as it would allow you to use
existing tools to check and add these notices. It also allows you to
generate SPDX documents automatically, if that is something you are
interested in.

I did have one question about REUSE.

At one point it says:

"To implement this method, each plain text file that can contain comments MUST contain comments at the top of the file (comment header) that declare that file’s Copyright and Licensing Information."

and a little later:

"The SPDX-License-Identifier tag MUST be followed by a valid SPDX License Expression describing the licensing of the file (example: SPDX-License-Identifier: GPL-3.0-or-later OR Apache-2.0). If separate sections of the file are licensed differently, a different SPDX-License-Identifier tag MUST be included for each section."

These seem to contradict a little since you need to associate the copyright with the license, I'd think. Not sure how big a deal it is, but it was confusing to me.
 
In particular, section 2.4 would become unnecessary with REUSE. One of
the principles of the specification is to provide copyright and license
notices for *all* files, even if that license may be unenforceable. As
the BSD license has very few requirements, it shouldn't be hard for
distributors to comply with it anyway. Again, if all files (even the
trivial ones!) are supposed to have SPDX notices, it would be very easy
to spot the files missing them.

True. It's a substantial burden to get to that point before adopting the rest of the system. That said, it is a good long-term goal and one that we can enforce in the pre-commit checks.

At the moment, we have ~25k of the ~95k files in our tree with SPDX tags. It will be quite some time before we get everything marked. In the meantime, we'd hoped to use the short form to nip in the bud the number of variants that pop up as people cut and paste and then tweak things. Copyright notices, however, are much better represented. We have ~6k Makefiles w/o marking, and maybe a few thousand more that are mostly (but not entirely) tests or other files that don't go into the build or whose format cannot tolerate comments. Part of this effort, long term, is to clean all that up, but it can't have it gating the other stuff.

Our 'Beer-ware' licensed files may fall into the unenforceable category...
 
Finally, I also have a few minor suggestions. Copyright licenses aren't
necessarily contracts as stated in section 2; I think that part one of
the short essay 'Free Software Matters: Enforcing the GPL' by Eben
Moglen [2] is particularly good at describing the essence of software
licenses.

As a non-legal person, avoiding 'contract' and using 'document' is likely better since this policy is designed for clarity and anything that gets in the way of that clarity isn't so good. I've chatted with several corporate lawyers over the years who have a different analysis than Dr. Moglen. It's likely better to avoid the issue since I'm not a legal professional and should stay out of any such fights.

Personally, I would remove sections 3 and 4 entirely and link to a
separate document describing those policies. I sometimes find it
confusing when projects have multiple documents explaining the same
thing, and it also increases the risk of contradictions when one is
updated but not others.

Sections 3 & 4 are supposed to be 'what to do' while 1&2 are 'how to do it'. They are currently in 3 different docs, all of which are slightly different in ways that are more annoying than practically bad... I worry that having these things relegated to another document would exacerbate the issue with things being out of sync.
 
Finally, it occurs to me that full adoption of SPDX license identifiers,
if you decided to choose that path, would be the perfect topic for a
joint press release - maybe something to discuss in the future :)

Ah, that's above my pay grade :) I'll keep that in mind and chat with our PR folks about it, though.

Warner
 
Best wishes,

Sebastian

[1]: https://reuse.software/
[2]: http://emoglen.law.columbia.edu/publications/lu-12.pdf






Max Mehl
 

Hi Warner,

~ Warner Losh [2021-04-02 19:03 +0200]:
The policy seems really similar to the REUSE standard [1] for licensing
notices, which combines the SPDX license list with a convention for
where and how to put these notices in the source tree. Given that your
draft policy has many of the same objectives as REUSE, you might want to
consider adopting the REUSE standard fully, as it would allow you to use
existing tools to check and add these notices. It also allows you to
generate SPDX documents automatically, if that is something you are
interested in.
Thanks @Sebastian for bringing up REUSE! Indeed, I concur that it's a
worthy goal for FreeBSD. It reminds me a bit of KDE's story. The project
also adopted REUSE in their policies, and made larger parts of the
codebase REUSE compliant already. They also wrote a tool to convert
traditional copyright notices to SPDX license identifiers
(licensedigger). The interview with Andreas may provide a good overview:

https://fsfe.org/news/2020/news-20201215-01.html

https://community.kde.org/Policies/Licensing_Policy#License_Statements

I did have one question about REUSE.

At one point it says:

"To implement this method, each plain text file that can contain comments
MUST contain comments at the top of the file (comment header) that declare
that file’s Copyright and Licensing Information."

and a little later:

"The SPDX-License-Identifier tag MUST be followed by a valid SPDX License
Expression describing the licensing of the file (example:
SPDX-License-Identifier: GPL-3.0-or-later OR Apache-2.0). If separate
sections of the file are licensed differently, a different
SPDX-License-Identifier tag MUST be included for each section."

These seem to contradict a little since you need to associate the copyright
with the license, I'd think. Not sure how big a deal it is, but it was
confusing to me.
Do you refer to the different sections? Indeed, we're are currently
working on improving and standardising the declaration of differently
licensed/copyrighted parts (snippets). For this, I've started a PR for
the SPDX spec to define the tags and syntax that REUSE can pick up:

https://github.com/spdx/spdx-spec/pull/464

At the moment, we have ~25k of the ~95k files in our tree with SPDX tags.
It will be quite some time before we get everything marked. In the
meantime, we'd hoped to use the short form to nip in the bud the number of
variants that pop up as people cut and paste and then tweak things.
Copyright notices, however, are much better represented. We have ~6k
Makefiles w/o marking, and maybe a few thousand more that are mostly (but
not entirely) tests or other files that don't go into the build or whose
format cannot tolerate comments. Part of this effort, long term, is to
clean all that up, but it can't have it gating the other stuff.
Totally understandable. You may be interested in tools that help you
with the conversion, e.g. the aforementioned licensedigger. IIRC the
Linux project also came up with some conversion scripts to distinguish
their different notice headers (GPL version, only/or-later,
exceptions...).

Best,
Max

--
Max Mehl - Programme Manager - Free Software Foundation Europe
Contact and information: https://fsfe.org/about/mehl | @mxmehl
Become a supporter of software freedom: https://fsfe.org/join