Re: [spdx/spdx-spec] Add new annex on license namespaces (PR


J Lovejoy
 

(responding via email so I can add spdx-legal mailing list)

On May 26, 2022, at 5:44 AM, Karsten Klein <notifications@...> wrote:


Hi all,

Tuesday’s session left me a little bewildered and puzzled. The interesting observation is that all participants on the discussion have good arguments. However, (in my view) only from a certain and not a holistic perspective. The related fears (such as identifier collisions (e.g., two parties using the same LicenseRef-scancode- referring to different licenses); a party introducing a LicenseRef for a License already on the SPDX License List) appear quite artificial to me and mix syntactic, semantic, and integrity validation aspects, which rather should be disentangled.

As I also can only contribute a perspective and would leave the holistic assessment to the group, I would like to make some observations:

SPDX Format:

  • SPDX is primarily about the exchange of software package information. That is information on software conveying structural, relational facts associated with metadata on different aspects; primarily – but not limited to – licensing information
  • As such the format requires to reference licenses by id and provide (rather optional in my view) license texts associated with the id; concerns below
As a reminder of the original intent for the SPDX License List was to create a shorthand, reliable way to refer to licenses such that an SPDX Document would not get bloated with repeating the same licenses over and over (imagine, in contrast, if every license used the Other Licensing Info part of the spec). At the same time, recognizing the SPDX License List would never represent every license found in s/w, there is the Other Licensing Info. I think it’s important to keep in mind the context of the SPDX Document because that is the starting point. And also acknowledge other contexts.

License Ids:

  • When exchanging software licensing information, we need to make sure that we refer unambiguously to licenses. Using consolidated ids for licenses is key.
  • The SPDX License List – due to scope (limited to open/public licenses) and the policies set forth in the matching guidelines – will not (and never) cover all licenses that can be used for software.
  • When producing SPDX documents we cannot limit the scope of licenses to open (or publicly available) licenses only
  • I’d argue that SPDX-Legal Team is an authority managing the SPDX License List (the work being highly appreciated!!); however, it would be only expectable that there may be other Authorities that manage Licenses Lists (i.e., OSI included in this considerations).
While this is true, different “Authorities” have different goals, processes, etc. So we need to be careful to not conflate. The SPDX License identifiers were certainly aimed to be cross-functional. E.g., OSI uses them and does not need to have its own set of “ids”. The hope/idea is that would be true widely.
  • Scancode Toolkit – for me – is a community managed license list authority. Scancode has not invented these licenses; Scancode organizes them according to the Scancode policies. The rules may not be the same as the SPDX matching guidelines, but scan
Agreed and has always been acknowledged from the beginning. (side note: Just keep in mind that the SPDX License List inclusion guidelines were broadened, so there are more licenses that are potentially eligible to be included than was originally the case.
  • code is of value to the people using it and provides a most pragmatic entry into the domain; why should these not – unambiguously – reference a Scancode license from an SPDX document leveraging SPDX as a format.
  • My company does SCA and license identification in customer projects. In this respect we permanently run into licenses that are not on the SPDX License List and sometimes even not in Scancode. We therefore developed an extended identification concept and published is as {metæffekt} Universe. We would like to – unambiguously – reference licenses using this “extended namespace” within SPDX documents; again, leveraging SPDX as standardized exchange format. Customers with access to the licenses database, can use the ids to resolve the license texts.
Great - this sounds consistent with the original goal as I understand it. 
Based on a quick look I see one license that does not have an SPDX license id, but is now on the SPDX License List. I also see “BSD-1-Clause” (on SPDX License List), but then see you have “BSD-1-Clause-copyright” - but according to SPDX matching guidelines, this is the same as BSD-1-Clause. The only difference is “author” instead of “copyright holders” in the disclaimer paragraph. The SPDX legal team has already determined this is not legally substantively different. By treating this as a different license, you are not leveraging a major benefit of the SPDX License List overall and effectively creating your own criteria. How is this helpful?

The point of the Other Licensing Info (LicenseRef) was to augment the SPDX License List, where the SPDX License List as the starting point. This example seems to reverse or ignore that.
  • Please also be aware that license ids may refer to a unique license text OR they may refer to a license templates. The instances of https://spdx.org/licenses/BSD-3-Clause.html may have different text in case the copyright holder modified the variable parts of the template. Putting the default text of the license template in an SPDX document is not appropriate and not in the sense of the license.
This is where we diverge and we have discussed this together recently. "Putting the default text of the license template in an SPDX document is not appropriate and not in the sense of the license.” is YOUR interpretation. The intent of the SPDX License List and ids within an SPDX Document is to not have to repeat the same license text with definitions of what is the same. You are effectively disagreeing with those matching guidelines (in this case based merely on a different name versus a more template-like “authors” or “copyright holders” language) and then making a legal interpretation as to whether the license intends for such a strict interpretation, and a risk profile assessment as to whether someone is going to bring a non-compliance action due to replicating the BSD-3-Clause text using just “copyright holder” rather than “My Name” specifically. We need to be clear about 1) when we are making such interpretations; and 2) not incorporating such interpretations within SPDX. :)

License Texts:

  • I regard putting license texts in the SPDX document as problematic; especially in the way it currently done.
you are referring to Other Licensing Info, yes? How would it change, then? 
  • A software package may reference a license (id is sufficient to capture this fact) or may include a license text (which may not be 100% the same as stored in the SPDX License Data). The license texts may be mixed with other information or refer to list of third-party licenses. SPDX trying to segregate this into different parts adds an artificial layer of information that often does not align with the facts. Example: https://github.com/spring-projects/spring-framework/blob/main/src/docs/dist/license.txt
I’m a little lost as to what your point is here. Looking at your example, which looks to me to be a pretty nice summary of licensing for a specific package that incorporates other open source licensed components. I’d imagine the license information (roughly) in an SPDX Document might look something like (using SPDX fields, but not listing everything here, of course)
- PackageLicenseDeclared: Apache-2.0 AND BSD-3-Clause
- PackageLicenseConcluded: Apache-2.0
- PackageLicenseInfoFromFiles: Apache-2.0, BSD-3-Clause
- PackageLicenseComments: This package is Apache-2.0 with some included components that are also under Apache-2.0 and BSD-3-Clause as noted here LINK

Then at the file level, you’d have the breakdown of what files are under which license.
Note - someone else might put Apache-2.0 AND BSD-3-Clause as Concluded License field. 
(Note2 - For simplicity of this example, I’m intentionally ignoring the weird NOTICE under Apache-2.0 4(d) in actual example that seems to use the Apache NOTICE file in a way not anticipated and imply there may be other licenses in there.)

  • For us It is important that
    • an SPDX document consumer is able to resolve a license by Id (as approved general representation of the license);
it already can
    • this can be done by an internal or external link to a license database (e.g end on the SPDX license list, OSI, Scancode, {metæffekt} Universe or any other party web site that provides consolidated license ids and information).
    • the package specific license files (copyright, LICENSE, NOTICE, …) can be accessed by the consumer and are preserved in format and content
I’m not sure this is as important or where you mean that this would be accessed - it is always in the source code in any case. 
  • Please note that different authorities for licenses may model licenses differently to match their specific policies and guidelines. This means that the same license or license aggregate is represented differently by the different authorities. I don’t regard this as bad; I regard this as opportunity. Such cases may trigger exchange and discussion.
  • Licenses / contracts may be confidential. While the id is not critical the license/contract content is. Therefore:
    • Licenses can be public / shared
    • Licenses can be private and shared only within parties under contract or NDA
    • Licenses can contain conditions that do not allow to distribute the license text
    • We must anticipate that we are never only talking open source (!)

License Namespaces:

  • I would argue from an SPDX Format perspective that License Authorities should be treated equivalently.
I don’t think so… they are not equivalent so they cannot be treated equivalently. 
  • Currently the SPDX License List is treated unique and special (work highly appreciated) making it ultimately hard for others to contribute their work and with all the caveats listed above. This means that SPDX License List is a namespace definition itself.
  • Registering a namespace means just to register the namespace definition. I’d still argue (as this is content) that the SPDX Legal Team could approve a namespace registration to make sure the namespace is unique and follows given guidelines with respect to naming. A namespace definition includes at least:
    • Short Id
    • Namespace domain (could also be used for owner verification)
    • URL where to find details on the namespace
    • Contact Address (legal entity or person owning the namespace)
    • Latest version
note: there is still time and effort involved in this check, and probably more than it seems at face value!
  • I do not see that SPDX needs to care about the licenses managed in the namespace. The management and rules of the licenses in the namespace is up to namespace owner. If (s)he plays not to the rules, the namespace reputation will suffer. People will not use it.
  • Validation of LicenseRef in the SPDX document is limited; but here I see opportunities for tool providers to add further levels of integrity validation.

As indicated earlier I will not propose any solution. Some aspects may be rather revolutionary, and I can currently not foresee whether these thoughts resonate with the group. Just some highlight showing the idea:

[…]
LicenseConcluded: spdx:MIT

[…]
LicenseConcluded: scancode:bittorrent-eula

[…]
LicenseConcluded: ae:BSD-3-Clause-copyright-holder-variant

[…]
LicenseConcluded: spdx:MIT AND ae:BSD-3-Clause-copyright-holder-variant

I’d also argue that you could define a default license namespace for an SPDX document. In this case you could omit the default namespace short-name prefix. The default would be spdx.


Isn’t it already the default. I don’t think we should have to specify that in a new way. This feels like a dilution of the SPDX License List.

This even doesn’t break LicenseRef compatibility. LicenseRef is just a special case (no namespace available). This can compensate the compatibility concern raised by Philippe and enable a transition once namespaces are available.

Regards,
Karsten


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: <spdx/spdx-spec/pull/681/c1138444864@github.com>


Join Spdx-legal@lists.spdx.org to automatically receive all group messages.