[spdx/spdx-spec] Add new annex on license namespaces (PR


J Lovejoy
 

(responding via email so I can add spdx-legal mailing list)

On May 26, 2022, at 5:44 AM, Karsten Klein <notifications@...> wrote:


Hi all,

Tuesday’s session left me a little bewildered and puzzled. The interesting observation is that all participants on the discussion have good arguments. However, (in my view) only from a certain and not a holistic perspective. The related fears (such as identifier collisions (e.g., two parties using the same LicenseRef-scancode- referring to different licenses); a party introducing a LicenseRef for a License already on the SPDX License List) appear quite artificial to me and mix syntactic, semantic, and integrity validation aspects, which rather should be disentangled.

As I also can only contribute a perspective and would leave the holistic assessment to the group, I would like to make some observations:

SPDX Format:

  • SPDX is primarily about the exchange of software package information. That is information on software conveying structural, relational facts associated with metadata on different aspects; primarily – but not limited to – licensing information
  • As such the format requires to reference licenses by id and provide (rather optional in my view) license texts associated with the id; concerns below
As a reminder of the original intent for the SPDX License List was to create a shorthand, reliable way to refer to licenses such that an SPDX Document would not get bloated with repeating the same licenses over and over (imagine, in contrast, if every license used the Other Licensing Info part of the spec). At the same time, recognizing the SPDX License List would never represent every license found in s/w, there is the Other Licensing Info. I think it’s important to keep in mind the context of the SPDX Document because that is the starting point. And also acknowledge other contexts.

License Ids:

  • When exchanging software licensing information, we need to make sure that we refer unambiguously to licenses. Using consolidated ids for licenses is key.
  • The SPDX License List – due to scope (limited to open/public licenses) and the policies set forth in the matching guidelines – will not (and never) cover all licenses that can be used for software.
  • When producing SPDX documents we cannot limit the scope of licenses to open (or publicly available) licenses only
  • I’d argue that SPDX-Legal Team is an authority managing the SPDX License List (the work being highly appreciated!!); however, it would be only expectable that there may be other Authorities that manage Licenses Lists (i.e., OSI included in this considerations).
While this is true, different “Authorities” have different goals, processes, etc. So we need to be careful to not conflate. The SPDX License identifiers were certainly aimed to be cross-functional. E.g., OSI uses them and does not need to have its own set of “ids”. The hope/idea is that would be true widely.
  • Scancode Toolkit – for me – is a community managed license list authority. Scancode has not invented these licenses; Scancode organizes them according to the Scancode policies. The rules may not be the same as the SPDX matching guidelines, but scan
Agreed and has always been acknowledged from the beginning. (side note: Just keep in mind that the SPDX License List inclusion guidelines were broadened, so there are more licenses that are potentially eligible to be included than was originally the case.
  • code is of value to the people using it and provides a most pragmatic entry into the domain; why should these not – unambiguously – reference a Scancode license from an SPDX document leveraging SPDX as a format.
  • My company does SCA and license identification in customer projects. In this respect we permanently run into licenses that are not on the SPDX License List and sometimes even not in Scancode. We therefore developed an extended identification concept and published is as {metæffekt} Universe. We would like to – unambiguously – reference licenses using this “extended namespace” within SPDX documents; again, leveraging SPDX as standardized exchange format. Customers with access to the licenses database, can use the ids to resolve the license texts.
Great - this sounds consistent with the original goal as I understand it. 
Based on a quick look I see one license that does not have an SPDX license id, but is now on the SPDX License List. I also see “BSD-1-Clause” (on SPDX License List), but then see you have “BSD-1-Clause-copyright” - but according to SPDX matching guidelines, this is the same as BSD-1-Clause. The only difference is “author” instead of “copyright holders” in the disclaimer paragraph. The SPDX legal team has already determined this is not legally substantively different. By treating this as a different license, you are not leveraging a major benefit of the SPDX License List overall and effectively creating your own criteria. How is this helpful?

The point of the Other Licensing Info (LicenseRef) was to augment the SPDX License List, where the SPDX License List as the starting point. This example seems to reverse or ignore that.
  • Please also be aware that license ids may refer to a unique license text OR they may refer to a license templates. The instances of https://spdx.org/licenses/BSD-3-Clause.html may have different text in case the copyright holder modified the variable parts of the template. Putting the default text of the license template in an SPDX document is not appropriate and not in the sense of the license.
This is where we diverge and we have discussed this together recently. "Putting the default text of the license template in an SPDX document is not appropriate and not in the sense of the license.” is YOUR interpretation. The intent of the SPDX License List and ids within an SPDX Document is to not have to repeat the same license text with definitions of what is the same. You are effectively disagreeing with those matching guidelines (in this case based merely on a different name versus a more template-like “authors” or “copyright holders” language) and then making a legal interpretation as to whether the license intends for such a strict interpretation, and a risk profile assessment as to whether someone is going to bring a non-compliance action due to replicating the BSD-3-Clause text using just “copyright holder” rather than “My Name” specifically. We need to be clear about 1) when we are making such interpretations; and 2) not incorporating such interpretations within SPDX. :)

License Texts:

  • I regard putting license texts in the SPDX document as problematic; especially in the way it currently done.
you are referring to Other Licensing Info, yes? How would it change, then? 
  • A software package may reference a license (id is sufficient to capture this fact) or may include a license text (which may not be 100% the same as stored in the SPDX License Data). The license texts may be mixed with other information or refer to list of third-party licenses. SPDX trying to segregate this into different parts adds an artificial layer of information that often does not align with the facts. Example: https://github.com/spring-projects/spring-framework/blob/main/src/docs/dist/license.txt
I’m a little lost as to what your point is here. Looking at your example, which looks to me to be a pretty nice summary of licensing for a specific package that incorporates other open source licensed components. I’d imagine the license information (roughly) in an SPDX Document might look something like (using SPDX fields, but not listing everything here, of course)
- PackageLicenseDeclared: Apache-2.0 AND BSD-3-Clause
- PackageLicenseConcluded: Apache-2.0
- PackageLicenseInfoFromFiles: Apache-2.0, BSD-3-Clause
- PackageLicenseComments: This package is Apache-2.0 with some included components that are also under Apache-2.0 and BSD-3-Clause as noted here LINK

Then at the file level, you’d have the breakdown of what files are under which license.
Note - someone else might put Apache-2.0 AND BSD-3-Clause as Concluded License field. 
(Note2 - For simplicity of this example, I’m intentionally ignoring the weird NOTICE under Apache-2.0 4(d) in actual example that seems to use the Apache NOTICE file in a way not anticipated and imply there may be other licenses in there.)

  • For us It is important that
    • an SPDX document consumer is able to resolve a license by Id (as approved general representation of the license);
it already can
    • this can be done by an internal or external link to a license database (e.g end on the SPDX license list, OSI, Scancode, {metæffekt} Universe or any other party web site that provides consolidated license ids and information).
    • the package specific license files (copyright, LICENSE, NOTICE, …) can be accessed by the consumer and are preserved in format and content
I’m not sure this is as important or where you mean that this would be accessed - it is always in the source code in any case. 
  • Please note that different authorities for licenses may model licenses differently to match their specific policies and guidelines. This means that the same license or license aggregate is represented differently by the different authorities. I don’t regard this as bad; I regard this as opportunity. Such cases may trigger exchange and discussion.
  • Licenses / contracts may be confidential. While the id is not critical the license/contract content is. Therefore:
    • Licenses can be public / shared
    • Licenses can be private and shared only within parties under contract or NDA
    • Licenses can contain conditions that do not allow to distribute the license text
    • We must anticipate that we are never only talking open source (!)

License Namespaces:

  • I would argue from an SPDX Format perspective that License Authorities should be treated equivalently.
I don’t think so… they are not equivalent so they cannot be treated equivalently. 
  • Currently the SPDX License List is treated unique and special (work highly appreciated) making it ultimately hard for others to contribute their work and with all the caveats listed above. This means that SPDX License List is a namespace definition itself.
  • Registering a namespace means just to register the namespace definition. I’d still argue (as this is content) that the SPDX Legal Team could approve a namespace registration to make sure the namespace is unique and follows given guidelines with respect to naming. A namespace definition includes at least:
    • Short Id
    • Namespace domain (could also be used for owner verification)
    • URL where to find details on the namespace
    • Contact Address (legal entity or person owning the namespace)
    • Latest version
note: there is still time and effort involved in this check, and probably more than it seems at face value!
  • I do not see that SPDX needs to care about the licenses managed in the namespace. The management and rules of the licenses in the namespace is up to namespace owner. If (s)he plays not to the rules, the namespace reputation will suffer. People will not use it.
  • Validation of LicenseRef in the SPDX document is limited; but here I see opportunities for tool providers to add further levels of integrity validation.

As indicated earlier I will not propose any solution. Some aspects may be rather revolutionary, and I can currently not foresee whether these thoughts resonate with the group. Just some highlight showing the idea:

[…]
LicenseConcluded: spdx:MIT

[…]
LicenseConcluded: scancode:bittorrent-eula

[…]
LicenseConcluded: ae:BSD-3-Clause-copyright-holder-variant

[…]
LicenseConcluded: spdx:MIT AND ae:BSD-3-Clause-copyright-holder-variant

I’d also argue that you could define a default license namespace for an SPDX document. In this case you could omit the default namespace short-name prefix. The default would be spdx.


Isn’t it already the default. I don’t think we should have to specify that in a new way. This feels like a dilution of the SPDX License List.

This even doesn’t break LicenseRef compatibility. LicenseRef is just a special case (no namespace available). This can compensate the compatibility concern raised by Philippe and enable a transition once namespaces are available.

Regards,
Karsten


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: <spdx/spdx-spec/pull/681/c1138444864@github.com>



J Lovejoy
 

(responding via email so I can add spdx-legal mailing list; not sure what mess this will make in Github, so apologies in advance)

On 5/26/22 12:00 AM, Alexios Zavras (zvr) wrote:

Quick couple of comments to @jlovejoy reply above:

Starting from the end, yes, the idea for these private lists is that they cover licenses not in SPDX License List. But I assume people might also want to use them for licenses not currently in the SPDX License List.

that is my understanding too

On the publishing point (3), you are correct in understanding the problem: given an identifier LicenseRef-.mynamespace.com.-LicenseABC, there has to be an SPDX Document that uses the "other license info detected" section to say "hey, for this LicenseRef-.mynamespace.com.-LicenseABC the corresponding text is this".

The two alternatives we have are:
a) people submit this document and we store it in a repo; or
b) people submit the location of this document and we store (the location) in a repo.

There are obvious pros and cons to both approaches. But I think we just need to be realistic that (a) may end up requiring some "curation" of some kind if it's in an SPDX repo.

On that note, it appears that the current submission tool is really just (b)?

In both cases, the SPDX project will not be checking content like "someone using a license text that matches a license already on the SPDX License List" or anything like this. Yes, it would be "bad", but this can also happen today: someone defining their own LicenseRef-MIT.

while you are correct that this could happen today, I still think that by not checking the content, we almost give greater license.
Look at Philippe's submission, for example, it links to the entire database of licenses ScanCode has found, which presumably includes all that are already on the SPDX License List. So already we are not differentiating well b/w SPDX License List and the point of LicenseRef (not on SPDX License List).

The SPDX project registers namespaces, not what goes within them.

but what goes within them - which I take to mean the specific, full LicenseRef with namespace and license, plus text of license has to go somewhere (see above)

Related to checks during registration process (point 4), I believe everyone until now only talks about automated checks, no human decision involved. Things like:

  • checking whether the namespace is not already registered;
  • checking whether the format of the namespace is correct;
  • checking whether the URL is valid;
  • checking whether the URL resolves to a document;
  • checking whether the document is a syntactically correct SPDX document;
  • etc.
do we currently have all the tooling to do all these things? If not, who is going to develop for the gaps?


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: <spdx/spdx-spec/pull/681/c1138184754@github.com>