Re: [spdx-tech] An example of a super simple SPDX licenses registry, for discussion
I’m admittedly a bit late to this party despite having a few thoughts on the topic. This thread has quite a few aspects to it, starting with Jeff’s initial proposal, so I’ll try to hit all of them, even though the whole thread is not below.
First of all, I am noticing some energy around being able to add more licenses to the SPDX License List and to do so more easily. Jeff encapsulated this concept quite succinctly with the initial conclusion to his email which was:
"Finally, if the SPDX license list was a) less opinionated as to what can be identified and b) very fast at adding new entries, the bulk of the issue we see would go away."
He is not the only person to express a similar sentiment.
By way of review for those not intimately involved, the general highlights of the process to add a license include:
1) we only add open source licenses, use in the “real world”, etc.2) we review them for similarity to existing licenses on list, which may result in not adding the new license if it is substantially similar to an existing license if we can accommodate any minor differences with additional matching markup according to our matching guidelines3) add them to list, which includes creating the XML file and a .txt file - at which point they become “official” as of the next release.
We aim to do releases on a quarterly basis. I’d say that after the big push to add Fedora licenses a few years back, we probably average less than 10 new licenses/exceptions per release.
We have some tooling, thanks to various GSoC projects (and Gary!) that has helped in making the process cleaner. But at the end of the day, the reality is that EVERYONE who works on this project is a volunteer and it is a very small number of people actually doing this work.
At the same time, I have noticed a trend that the people asking for more licenses, faster process, etc. are generally not engaged in the project in any significant way or helping to that end. Not to pick on Jeff in particular - but I have not seen a new license submission by anyone at Microsoft (that I know of) and I had to approve Jeff’s email to go to the legal team (not on list :). As SPDX gets used more and since we moved to using the Github repo, we also now have started seeing “drive-by” license submissions in the repo. I’m sure other open source projects experience this kind of thing, but I don’t really have experience as to how this is best dealt with: in other words, how do we get more people engaged and more hands-on-deck on a consistent basis? Because if we go back to the original question of being able to add more licenses, more quickly - we must have more resources in order to do so.
Another issue, I think Philippe touched upon related to Jeff’s proposal for a fingerprint algorithm and the challenges in matching licenses where there are small differences - this is a key part of what the SPDX legal team does in #2 above: if we have two licenses that are similar, but not exactly the same, we have a team of lawyers looking at the differences and making a determination as to whether those differences are legally substantive or not: if not, then we accommodate with matching markup (if possible); if they make a legally substantive difference, then we add as a separate license. For obvious reasons, we are very conservative in this determination. But in any case, that is not a step that is going to automat-able.
Finally - my biggest concern about some kind of registry for licenses that are not on the SPDX license list for the purpose of getting an SPDX id, is that people will just do that instead of submitting licenses to be on the SPDX License List and that trying to track this other list and pull licenses into the SPDX License List where appropriate (as per Philippe’s revised proposal) will just create more work for an already too-small and over-stretched team. In other words, it feels like not a solution, but a diversion from a bigger question (make it easier to add more licenses?) and bigger issue (need for more resources).
I hope to discuss the bigger question on the upcoming legal call.
On Mar 13, 2019, at 12:24 PM, Philippe Ombredanne <pombredanne@...> wrote:Richard:
On Mon, Mar 11, 2019 at 10:32 PM Richard Fontana <rfontana@...> wrote:This sounds appealing to me (if I'm understanding it correctly). From
Red Hat's perspective one of the great impracticalities of SPDX has
been that, after many years of SPDX's existence, its adopted
identifiers still represent only a small portion of the licenses
encountered in much of the the software we encounter in our product
and project development.
Let me recap my understanding:
I think everyone agrees that we want want more licenses in SPDX.
Anyone against this, please voice your concerns now.
The review of new licenses for list is an all-volunteers process with
a certain level of ceremony explained here
https://spdx.org/spdx-license-list/license-list-overview and therefore
it takes time. But it takes too much time.
Why do I want an id for stable/well-defined licenses? This would make
it easier to talk about and exchange licenses and it does not require
the reproduction of the license text at all times.
Why not using a LicenseRef for these? This would still require
reproducing always the license text in every SPDX document, which does
not help when there is no document (e.g. in a package manifest such as
an npm or an RPM). NOASSERTION as used for now in ClearlyDefined is
also fraught with problems as highlighted by Richard earlier.
There are two main use cases for more licenses: private or public
licenses. The main concern is to ensure that these license ids are
unique enough in both cases, and that there is minimum or no
duplication of license texts across ids.
- For private licenses, the only concern is to ensure that names are
unique enough. Mark suggested using a reverse domain name prefix for
this. I suggested a lightweight registration of a prefix that would
not require one to own/buy a DNS domain name. The two can likely work
together (e.g. you could use a domain name or anything and still do a
one time registration). In anycase, I become the master of the license
texts and ids in my namespace.
- For public licenses one could use a prefix/namespace plus the
optional registration of actual license id/name/texts. A
content-defined fingerprint id may not help in practice as I explained
in a previous email as too brittle.
In light of all this, here is my suggestion:
1. Establish a lightweight, easy and fast registration process for
SPDX license id prefix (aka namespace). As simple as a quick PR in a
Git repo. This prefix can be made of any character that would be a
valid license identifier. This is used to prefix ids (and not in
LicenseRef). This way you can use both DNS and non-DNS names alright.
This can be used for both public and private namespaces.
2. Establish a lightweight, easy and fast registration process for
prefixed SPDX license ids in an existing namespace. As simple as a
quick PR in a Git repo.
Submissions are very lightly moderated (we want to register licenses
but not cooking recipes).
There is no need for any markup or other annotations at this stage:
only basic id, name, text and possibly URLs.
When submitted, there is an automated deduplication triggered (e.g. we
run a license scan on this license text) and if a submission is the
same or mostly the same as any existing SPDX licenses, the check
fails. (This is a CI script). The submitter can then reuse instead a
pre-existing license id.
4. We add a status for licenses records such that they are either
reviewed/approved by the SPDX legal team or not. The submissions of
namespaced ids would NOT be in the approved status.
5. From that point on, the SPDX legal team can use not only direct
requests for license additions but also can funnel selected public
registrations as candidates for inclusion in the main SPDX
non-namespaced License List.