Options for metadata license identifiers


Richard Purdie
 

Hi,

I wondered if I might seek advice/opinions on a dilemma Yocto Project
is facing with license identifiers.

First, some background. We build software components from source and 
combine them together to make up an operating system (most often Linux). 
As such we have metadata we refer to as "recipes" which are basically 
lists of instructions on where to get a piece of software, how to 
configure it and so on. Our core layer has around 800 recipes
(http://git.yoctoproject.org/cgit.cgi/poky/tree/).

In addition we also store patches alongside the recipes which are applied
to the source code as part of the build process.

Where we have normal code for the build process, license identifiers
are easy and we've added them to our code/scripts as we're aligned with
SPDX and believe in what it is doing. Where we have concern is the recipes.

The recipes already have our own license identifiers in them for the 
software being built, for example the busybox recipe has:

$ cat meta/recipes-core/busybox/busybox.inc | grep LIC
LICENSE = "GPLv2 & bzip2-1.0.4"
LIC_FILES_CHKSUM = "file://LICENSE;md5=de10de48642ab74318e893a61105afbb \
file://archival/libarchive/bz/LICENSE;md5=28e3301eae987e8cfe19988e98383dae"

What this means is that the busybox source/binaries are under the listed 
licenses and that the two files mentioned contain license information.
We have a checksum there so that when we upgrade to a new version of 
busybox, if the checksums change, we know we need to re-evaluate the 
license field. Its not a perfect check but it does catch basic mistakes
and we can easily check and reject patches where it hasn't been re-evaluated.

Our license handling predates SPDX, we are trying to align to SPDX identifiers.

My question is what to put in the recipe to identify the license?

We can easily put a "# SPDX-License-Identifier:" into the recipe but there
is a lot of concern about how people might interpret this. Our top
level license says unless otherwise stated, recipe metadata is MIT licensed
so the license is relatively clear. The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.

I have some ideas about what we might have to do to make this really clear
but they have downsides. I wondered if there was any advice here on how
best to handle this? Once we know how to do it, marking up the recipes is 
relatively straightforward, we just need to establish what makes sense.

Also, there is a secondary problem of which license any patches we have
are under and what license identifier (if any) we should put in those.
Those would likely need to match the upstream project source they're patching
I'd imagine but I don't know if we want to mark up all the patches or not.

Cheers,

Richard


Philippe Ombredanne
 

Hi Richard:

On Thu, Mar 18, 2021, Richard Purdie <richard.purdie@...> wrote:
[...]
The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.
FWIW, I have been involved with quite a few license audits for Yocto-
based products and this is already a source of confusion as it is: in
many cases knowing if a license applies to the recipe or to the package
being built by the recipe is far from obvious.

My first reaction and suggestion would be to forego using SPDX-License-
Identifier in recipes and instead to use a new variable in a recipe for
this such as this:

RECIPE_LICENSE = "MIT"
LICENSE = "GPLv2 & bzip2-1.0.4"

And if you need to have a separate license variable for patches:
PATCHES_LICENSE = "MIT"

This would be explicit, clear and nicely integratable in your tooling
IMHO. Ideally of course you'd want the content of these to be valid SPDX
license expressions. Until then I will have to have a mapping and
special detector for [1] to properly collect normalized SPDX licenses
from recipes.

And FYI while I have your attention:

We are adding support to handle Yocto recipes in ScanCode-toolkit [1]
and [2] for license and origin detection. This involves parsing and
"resolving" recipes which is not trivial without running bitbake. This
is done thanks to Konrad Weihmann (in CC:) who kindly extracted his
excellent linting-focused recipe parser in a separate library [3].

[1] https://github.com/nexB/scancode-toolkit/issues/1243
[2] https://github.com/nexB/scancode-toolkit/pull/2443
[3] https://github.com/priv-kweihmann/oelint-parser

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombredanne@...
DejaCode - What's in your code?! - http://www.dejacode.com
AboutCode - Open source for open source - https://www.aboutcode.org
nexB Inc. - http://www.nexb.com


Richard Purdie
 

On Thu, 2021-03-18 at 14:05 +0100, Philippe Ombredanne wrote:
On Thu, Mar 18, 2021, Richard Purdie <richard.purdie@...> wrote:
[...]
The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.
FWIW, I have been involved with quite a few license audits for Yocto-
based products and this is already a source of confusion as it is: in
many cases knowing if a license applies to the recipe or to the package
being built by the recipe is far from obvious.
We don't have anything indicating the license of the metadata other than
the top level license files so I'd have hoped the current situation was
at least clear, LICENSE (and LICENSE_<packagename>) apply to the binary
output, not the metadata. You're confirming the worry about potential
confusion though.

My first reaction and suggestion would be to forego using SPDX-License-
Identifier in recipes and instead to use a new variable in a recipe for
this such as this:

RECIPE_LICENSE = "MIT"
LICENSE = "GPLv2 & bzip2-1.0.4"
Since we specify the license at the top level, the bitbake/OE/YP way
to handle that would be to set RECIPE_LICENSE in the bitbake.conf file
and then just inherit it through our normal variable handling model.

The downside to that is there would be no specific markup in the recipe
about it's license. We also lack copyright information which is another
source of worry/confusion but one step at a time! We probably do need
to fix this and having something in the recipes themselves as I understand
it.

I have wondered about using something like:

# SPDX-Metadata-License-Identifier: MIT

which whilst not quite according to the SPDX spec, would at least
hopefully be clear about the meaning. I suspect there would be mixed
feelings on that approach here! :)

And if you need to have a separate license variable for patches:
PATCHES_LICENSE = "MIT"
There can be multiple patches specified in SRC_URI so that is definitely
not going to work. I suspect the answer may be to add 
SPDX-License-Identifier entries to the patch headers which would
only leave the complication of remote patches (thankfully rare). The
remote case could be handled in the SRC_URI itself.

This would be explicit, clear and nicely integratable in your tooling
IMHO. Ideally of course you'd want the content of these to be valid SPDX
license expressions. Until then I will have to have a mapping and
special detector for [1] to properly collect normalized SPDX licenses
from recipes.
FWIW we do have a mapping for this:

http://git.yoctoproject.org/cgit.cgi/poky/tree/meta/conf/licenses.conf

We do have functions to normalise our license expressions to SPDX 
standard where we can so those can be used if you're generating 
manifests or similar.

We recently reworked this to allow for the "-or-later" licences
to be different where we'd previously mapped "-only" and "-or-later"
as the same thing.

And FYI while I have your attention:

We are adding support to handle Yocto recipes in ScanCode-toolkit [1]
and [2] for license and origin detection. This involves parsing and
"resolving" recipes which is not trivial without running bitbake. This
is done thanks to Konrad Weihmann (in CC:) who kindly extracted his
excellent linting-focused recipe parser in a separate library [3].

[1] https://github.com/nexB/scancode-toolkit/issues/1243
[2] https://github.com/nexB/scancode-toolkit/pull/2443
[3] https://github.com/priv-kweihmann/oelint-parser
Interesting. I have to worry a little about having multiple parsers
for the file format. Did you consider using the tinfoil API in bitbake
to be able to use that to parse the metadata directly? If it wasn't 
possible to use that but the need is there, it would be good to understand
the issue and see if it is possible to improve tinfoil or provide a
suitable API.

I realise part of the challenge is that to have a complete datastore for
a recipe you do need all the inherit/includes. If you don't have that,
you are potentially not going to get accurate results from the output. A 
simple example would be packagegroup recipes where the license is
declared in the class:

meta/classes/packagegroup.bbclass:LICENSE ?= "MIT"

or for images/devicetree:

meta/classes/devicetree.bbclass:LICENSE ?= "GPLv2"
meta/classes/image.bbclass:LICENSE ?= "MIT"

I do want to see YP being able to generate SPDX manifests and better
integrate into other tools for audit purposes too, its just proving
hard to get people interested in working with and contributing to the 
core, most prefer to hack enough together to solve their immediate 
problem which is understandable but frustrating!

Cheers,

Richard


Gary O'Neall
 

Hi Richard,

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of
Richard Purdie
Sent: Thursday, March 18, 2021 4:12 AM
To: SPDX-legal <Spdx-legal@...>
Subject: Options for metadata license identifiers
...
My question is what to put in the recipe to identify the license?

We can easily put a "# SPDX-License-Identifier:" into the recipe but there is a
lot of concern about how people might interpret this. Our top level license
says unless otherwise stated, recipe metadata is MIT licensed so the license is
relatively clear. The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.

I have some ideas about what we might have to do to make this really clear
but they have downsides. I wondered if there was any advice here on how
best to handle this? Once we know how to do it, marking up the recipes is
relatively straightforward, we just need to establish what makes sense.

Also, there is a secondary problem of which license any patches we have are
under and what license identifier (if any) we should put in those.
Those would likely need to match the upstream project source they're
patching I'd imagine but I don't know if we want to mark up all the patches or
not.
[G.O.] How about using the SPDX tag/value terms defined for SPDX documents?

You would use "PackageLicenseDeclared: " for the package itself (see https://spdx.github.io/spdx-spec/3-package-information/#315-declared-license).
There are a couple of advantages to this approach - there is a specific definition for the term and the consistency in syntax makes the tooling a bit easier.

As far as patches, if these are specific files and you have a way to associate the field with that specific file, you could use the term "LicenseInfoInFile: "
(see https://spdx.github.io/spdx-spec/4-file-information/#46-license-information-in-file).

Gary


Alexios Zavras
 

As a single data point, we (Intel) use the mentioned:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

under the understanding that the comment "SPDX-License-Identifier" applies only to the file the line is in.

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of Richard Purdie
Sent: Thursday, 18 March, 2021 12:12
To: SPDX-legal <Spdx-legal@...>
Subject: Options for metadata license identifiers

Hi,

I wondered if I might seek advice/opinions on a dilemma Yocto Project is facing with license identifiers.

First, some background. We build software components from source and combine them together to make up an operating system (most often Linux). As such we have metadata we refer to as "recipes" which are basically lists of instructions on where to get a piece of software, how to configure it and so on. Our core layer has around 800 recipes (http://git.yoctoproject.org/cgit.cgi/poky/tree/).

In addition we also store patches alongside the recipes which are applied to the source code as part of the build process.

Where we have normal code for the build process, license identifiers are easy and we've added them to our code/scripts as we're aligned with SPDX and believe in what it is doing. Where we have concern is the recipes.

The recipes already have our own license identifiers in them for the software being built, for example the busybox recipe has:

$ cat meta/recipes-core/busybox/busybox.inc | grep LIC LICENSE = "GPLv2 & bzip2-1.0.4"
LIC_FILES_CHKSUM = "file://LICENSE;md5=de10de48642ab74318e893a61105afbb \
file://archival/libarchive/bz/LICENSE;md5=28e3301eae987e8cfe19988e98383dae"

What this means is that the busybox source/binaries are under the listed licenses and that the two files mentioned contain license information.
We have a checksum there so that when we upgrade to a new version of busybox, if the checksums change, we know we need to re-evaluate the license field. Its not a perfect check but it does catch basic mistakes and we can easily check and reject patches where it hasn't been re-evaluated.

Our license handling predates SPDX, we are trying to align to SPDX identifiers.

My question is what to put in the recipe to identify the license?

We can easily put a "# SPDX-License-Identifier:" into the recipe but there is a lot of concern about how people might interpret this. Our top level license says unless otherwise stated, recipe metadata is MIT licensed so the license is relatively clear. The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.

I have some ideas about what we might have to do to make this really clear but they have downsides. I wondered if there was any advice here on how best to handle this? Once we know how to do it, marking up the recipes is relatively straightforward, we just need to establish what makes sense.

Also, there is a secondary problem of which license any patches we have are under and what license identifier (if any) we should put in those.
Those would likely need to match the upstream project source they're patching I'd imagine but I don't know if we want to mark up all the patches or not.

Cheers,

Richard







Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928