Date
1 - 5 of 5
Options for metadata license identifiers
Richard Purdie
Hi,
I wondered if I might seek advice/opinions on a dilemma Yocto Project is facing with license identifiers. First, some background. We build software components from source and combine them together to make up an operating system (most often Linux). As such we have metadata we refer to as "recipes" which are basically lists of instructions on where to get a piece of software, how to configure it and so on. Our core layer has around 800 recipes (http://git.yoctoproject.org/cgit.cgi/poky/tree/). In addition we also store patches alongside the recipes which are applied to the source code as part of the build process. Where we have normal code for the build process, license identifiers are easy and we've added them to our code/scripts as we're aligned with SPDX and believe in what it is doing. Where we have concern is the recipes. The recipes already have our own license identifiers in them for the software being built, for example the busybox recipe has: $ cat meta/recipes-core/busybox/busybox.inc | grep LIC LICENSE = "GPLv2 & bzip2-1.0.4" LIC_FILES_CHKSUM = "file://LICENSE;md5=de10de48642ab74318e893a61105afbb \ file://archival/libarchive/bz/LICENSE;md5=28e3301eae987e8cfe19988e98383dae" What this means is that the busybox source/binaries are under the listed licenses and that the two files mentioned contain license information. We have a checksum there so that when we upgrade to a new version of busybox, if the checksums change, we know we need to re-evaluate the license field. Its not a perfect check but it does catch basic mistakes and we can easily check and reject patches where it hasn't been re-evaluated. Our license handling predates SPDX, we are trying to align to SPDX identifiers. My question is what to put in the recipe to identify the license? We can easily put a "# SPDX-License-Identifier:" into the recipe but there is a lot of concern about how people might interpret this. Our top level license says unless otherwise stated, recipe metadata is MIT licensed so the license is relatively clear. The worry is something like: # SPDX-License-Identifier: MIT LICENSE = "GPLv2 & bzip2-1.0.4" makes for very confusing reading and can be badly interpreted. I have some ideas about what we might have to do to make this really clear but they have downsides. I wondered if there was any advice here on how best to handle this? Once we know how to do it, marking up the recipes is relatively straightforward, we just need to establish what makes sense. Also, there is a secondary problem of which license any patches we have are under and what license identifier (if any) we should put in those. Those would likely need to match the upstream project source they're patching I'd imagine but I don't know if we want to mark up all the patches or not. Cheers, Richard |
|
Philippe Ombredanne
Hi Richard:
toggle quoted message
Show quoted text
On Thu, Mar 18, 2021, Richard Purdie <richard.purdie@...> wrote:
[...] The worry is something like:FWIW, I have been involved with quite a few license audits for Yocto- based products and this is already a source of confusion as it is: in many cases knowing if a license applies to the recipe or to the package being built by the recipe is far from obvious. My first reaction and suggestion would be to forego using SPDX-License- Identifier in recipes and instead to use a new variable in a recipe for this such as this: RECIPE_LICENSE = "MIT" LICENSE = "GPLv2 & bzip2-1.0.4" And if you need to have a separate license variable for patches: PATCHES_LICENSE = "MIT" This would be explicit, clear and nicely integratable in your tooling IMHO. Ideally of course you'd want the content of these to be valid SPDX license expressions. Until then I will have to have a mapping and special detector for [1] to properly collect normalized SPDX licenses from recipes. And FYI while I have your attention: We are adding support to handle Yocto recipes in ScanCode-toolkit [1] and [2] for license and origin detection. This involves parsing and "resolving" recipes which is not trivial without running bitbake. This is done thanks to Konrad Weihmann (in CC:) who kindly extracted his excellent linting-focused recipe parser in a separate library [3]. [1] https://github.com/nexB/scancode-toolkit/issues/1243 [2] https://github.com/nexB/scancode-toolkit/pull/2443 [3] https://github.com/priv-kweihmann/oelint-parser -- Cordially Philippe Ombredanne +1 650 799 0949 | pombredanne@... DejaCode - What's in your code?! - http://www.dejacode.com AboutCode - Open source for open source - https://www.aboutcode.org nexB Inc. - http://www.nexb.com |
|
Richard Purdie
On Thu, 2021-03-18 at 14:05 +0100, Philippe Ombredanne wrote:
On Thu, Mar 18, 2021, Richard Purdie <richard.purdie@...> wrote:We don't have anything indicating the license of the metadata other than the top level license files so I'd have hoped the current situation was at least clear, LICENSE (and LICENSE_<packagename>) apply to the binary output, not the metadata. You're confirming the worry about potential confusion though. My first reaction and suggestion would be to forego using SPDX-License-Since we specify the license at the top level, the bitbake/OE/YP way to handle that would be to set RECIPE_LICENSE in the bitbake.conf file and then just inherit it through our normal variable handling model. The downside to that is there would be no specific markup in the recipe about it's license. We also lack copyright information which is another source of worry/confusion but one step at a time! We probably do need to fix this and having something in the recipes themselves as I understand it. I have wondered about using something like: # SPDX-Metadata-License-Identifier: MIT which whilst not quite according to the SPDX spec, would at least hopefully be clear about the meaning. I suspect there would be mixed feelings on that approach here! :) And if you need to have a separate license variable for patches:There can be multiple patches specified in SRC_URI so that is definitely not going to work. I suspect the answer may be to add SPDX-License-Identifier entries to the patch headers which would only leave the complication of remote patches (thankfully rare). The remote case could be handled in the SRC_URI itself. This would be explicit, clear and nicely integratable in your toolingFWIW we do have a mapping for this: http://git.yoctoproject.org/cgit.cgi/poky/tree/meta/conf/licenses.conf We do have functions to normalise our license expressions to SPDX standard where we can so those can be used if you're generating manifests or similar. We recently reworked this to allow for the "-or-later" licences to be different where we'd previously mapped "-only" and "-or-later" as the same thing. And FYI while I have your attention:Interesting. I have to worry a little about having multiple parsers for the file format. Did you consider using the tinfoil API in bitbake to be able to use that to parse the metadata directly? If it wasn't possible to use that but the need is there, it would be good to understand the issue and see if it is possible to improve tinfoil or provide a suitable API. I realise part of the challenge is that to have a complete datastore for a recipe you do need all the inherit/includes. If you don't have that, you are potentially not going to get accurate results from the output. A simple example would be packagegroup recipes where the license is declared in the class: meta/classes/packagegroup.bbclass:LICENSE ?= "MIT" or for images/devicetree: meta/classes/devicetree.bbclass:LICENSE ?= "GPLv2" meta/classes/image.bbclass:LICENSE ?= "MIT" I do want to see YP being able to generate SPDX manifests and better integrate into other tools for audit purposes too, its just proving hard to get people interested in working with and contributing to the core, most prefer to hack enough together to solve their immediate problem which is understandable but frustrating! Cheers, Richard |
|
Gary O'Neall
Hi Richard,
toggle quoted message
Show quoted text
-----Original Message-----... My question is what to put in the recipe to identify the license?[G.O.] How about using the SPDX tag/value terms defined for SPDX documents? You would use "PackageLicenseDeclared: " for the package itself (see https://spdx.github.io/spdx-spec/3-package-information/#315-declared-license). There are a couple of advantages to this approach - there is a specific definition for the term and the consistency in syntax makes the tooling a bit easier. As far as patches, if these are specific files and you have a way to associate the field with that specific file, you could use the term "LicenseInfoInFile: " (see https://spdx.github.io/spdx-spec/4-file-information/#46-license-information-in-file). Gary |
|
Alexios Zavras
As a single data point, we (Intel) use the mentioned:
toggle quoted message
Show quoted text
# SPDX-License-Identifier: MIT LICENSE = "GPLv2 & bzip2-1.0.4" under the understanding that the comment "SPDX-License-Identifier" applies only to the file the line is in. -- zvr -----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of Richard Purdie Sent: Thursday, 18 March, 2021 12:12 To: SPDX-legal <Spdx-legal@...> Subject: Options for metadata license identifiers Hi, I wondered if I might seek advice/opinions on a dilemma Yocto Project is facing with license identifiers. First, some background. We build software components from source and combine them together to make up an operating system (most often Linux). As such we have metadata we refer to as "recipes" which are basically lists of instructions on where to get a piece of software, how to configure it and so on. Our core layer has around 800 recipes (http://git.yoctoproject.org/cgit.cgi/poky/tree/). In addition we also store patches alongside the recipes which are applied to the source code as part of the build process. Where we have normal code for the build process, license identifiers are easy and we've added them to our code/scripts as we're aligned with SPDX and believe in what it is doing. Where we have concern is the recipes. The recipes already have our own license identifiers in them for the software being built, for example the busybox recipe has: $ cat meta/recipes-core/busybox/busybox.inc | grep LIC LICENSE = "GPLv2 & bzip2-1.0.4" LIC_FILES_CHKSUM = "file://LICENSE;md5=de10de48642ab74318e893a61105afbb \ file://archival/libarchive/bz/LICENSE;md5=28e3301eae987e8cfe19988e98383dae" What this means is that the busybox source/binaries are under the listed licenses and that the two files mentioned contain license information. We have a checksum there so that when we upgrade to a new version of busybox, if the checksums change, we know we need to re-evaluate the license field. Its not a perfect check but it does catch basic mistakes and we can easily check and reject patches where it hasn't been re-evaluated. Our license handling predates SPDX, we are trying to align to SPDX identifiers. My question is what to put in the recipe to identify the license? We can easily put a "# SPDX-License-Identifier:" into the recipe but there is a lot of concern about how people might interpret this. Our top level license says unless otherwise stated, recipe metadata is MIT licensed so the license is relatively clear. The worry is something like: # SPDX-License-Identifier: MIT LICENSE = "GPLv2 & bzip2-1.0.4" makes for very confusing reading and can be badly interpreted. I have some ideas about what we might have to do to make this really clear but they have downsides. I wondered if there was any advice here on how best to handle this? Once we know how to do it, marking up the recipes is relatively straightforward, we just need to establish what makes sense. Also, there is a secondary problem of which license any patches we have are under and what license identifier (if any) we should put in those. Those would likely need to match the upstream project source they're patching I'd imagine but I don't know if we want to mark up all the patches or not. Cheers, Richard Intel Deutschland GmbH Registered Address: Am Campeon 10, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928 |
|