Re: Options for metadata license identifiers


Richard Purdie
 

On Thu, 2021-03-18 at 14:05 +0100, Philippe Ombredanne wrote:
On Thu, Mar 18, 2021, Richard Purdie <richard.purdie@...> wrote:
[...]
The worry is something like:

# SPDX-License-Identifier: MIT
LICENSE = "GPLv2 & bzip2-1.0.4"

makes for very confusing reading and can be badly interpreted.
FWIW, I have been involved with quite a few license audits for Yocto-
based products and this is already a source of confusion as it is: in
many cases knowing if a license applies to the recipe or to the package
being built by the recipe is far from obvious.
We don't have anything indicating the license of the metadata other than
the top level license files so I'd have hoped the current situation was
at least clear, LICENSE (and LICENSE_<packagename>) apply to the binary
output, not the metadata. You're confirming the worry about potential
confusion though.

My first reaction and suggestion would be to forego using SPDX-License-
Identifier in recipes and instead to use a new variable in a recipe for
this such as this:

RECIPE_LICENSE = "MIT"
LICENSE = "GPLv2 & bzip2-1.0.4"
Since we specify the license at the top level, the bitbake/OE/YP way
to handle that would be to set RECIPE_LICENSE in the bitbake.conf file
and then just inherit it through our normal variable handling model.

The downside to that is there would be no specific markup in the recipe
about it's license. We also lack copyright information which is another
source of worry/confusion but one step at a time! We probably do need
to fix this and having something in the recipes themselves as I understand
it.

I have wondered about using something like:

# SPDX-Metadata-License-Identifier: MIT

which whilst not quite according to the SPDX spec, would at least
hopefully be clear about the meaning. I suspect there would be mixed
feelings on that approach here! :)

And if you need to have a separate license variable for patches:
PATCHES_LICENSE = "MIT"
There can be multiple patches specified in SRC_URI so that is definitely
not going to work. I suspect the answer may be to add 
SPDX-License-Identifier entries to the patch headers which would
only leave the complication of remote patches (thankfully rare). The
remote case could be handled in the SRC_URI itself.

This would be explicit, clear and nicely integratable in your tooling
IMHO. Ideally of course you'd want the content of these to be valid SPDX
license expressions. Until then I will have to have a mapping and
special detector for [1] to properly collect normalized SPDX licenses
from recipes.
FWIW we do have a mapping for this:

http://git.yoctoproject.org/cgit.cgi/poky/tree/meta/conf/licenses.conf

We do have functions to normalise our license expressions to SPDX 
standard where we can so those can be used if you're generating 
manifests or similar.

We recently reworked this to allow for the "-or-later" licences
to be different where we'd previously mapped "-only" and "-or-later"
as the same thing.

And FYI while I have your attention:

We are adding support to handle Yocto recipes in ScanCode-toolkit [1]
and [2] for license and origin detection. This involves parsing and
"resolving" recipes which is not trivial without running bitbake. This
is done thanks to Konrad Weihmann (in CC:) who kindly extracted his
excellent linting-focused recipe parser in a separate library [3].

[1] https://github.com/nexB/scancode-toolkit/issues/1243
[2] https://github.com/nexB/scancode-toolkit/pull/2443
[3] https://github.com/priv-kweihmann/oelint-parser
Interesting. I have to worry a little about having multiple parsers
for the file format. Did you consider using the tinfoil API in bitbake
to be able to use that to parse the metadata directly? If it wasn't 
possible to use that but the need is there, it would be good to understand
the issue and see if it is possible to improve tinfoil or provide a
suitable API.

I realise part of the challenge is that to have a complete datastore for
a recipe you do need all the inherit/includes. If you don't have that,
you are potentially not going to get accurate results from the output. A 
simple example would be packagegroup recipes where the license is
declared in the class:

meta/classes/packagegroup.bbclass:LICENSE ?= "MIT"

or for images/devicetree:

meta/classes/devicetree.bbclass:LICENSE ?= "GPLv2"
meta/classes/image.bbclass:LICENSE ?= "MIT"

I do want to see YP being able to generate SPDX manifests and better
integrate into other tools for audit purposes too, its just proving
hard to get people interested in working with and contributing to the 
core, most prefer to hack enough together to solve their immediate 
problem which is understandable but frustrating!

Cheers,

Richard

Join {Spdx-legal@lists.spdx.org to automatically receive all group messages.