Re: License Identification


David Kemp
 

(On a related note, we also support registration of a numeric identifier for each license identifier, as ISO 3166 (https://www.iso.org/obp/ui/#search) assigns both a number and a text ID to each country..  This is for use in efficient non-human-readable data formats such as Protobuf and CBOR.
[JL]  The SPDX License List already provides a machine-readable (text) unique id to each license. Why is that not enough?

For three reasons:
1) efficiency - a 16 bit integer is sufficient to identify 65,000 licenses.  CBOR uses variable-length encoding of integers (major type 0), and even in JSON a number (e.g. 942 - three bytes) is more efficient than a string ("AGPL-3.0-or-later" - nineteen bytes).  When SBOM files climb into the megabyte range, efficiency matters.
2) registration reliability - double-entry bookkeeping was invented to detect errors by enabling independent checks.  Assigning both numeric and string IDs to a license text promotes robustness in the registration process and facilitates anomaly detection.
3) precedent - for the same reason IANA manages ports ("http" = port 80) and status codes (404 = "Not Found"), and the same reason databases use things like numeric primary keys instead of strings.  Numbers are meaningless - nobody is going to claim copyright infringement on the number 279, even though there might be claims by native Americans against strings like "Apache-2.0".  In the off chance that courts force a license ID to be invalidated, the referenceNumber remains a constant identifier that can be mapped to a new string ID.

Regards,
David Kemp

Join Spdx-legal@lists.spdx.org to automatically receive all group messages.