Re: SPDX should take a stronger stance against vanity/promotional licenses

Steve Winslow

Thanks all for your comments in this thread. I'm not going to try to reply here to every comment, but wanted to note a few pieces that might be informative to folks who are less deep in the SPDX license ID weeds.

Custom license IDs:

Anyone who wants to use an SPDX-format-compatible license ID for a license that isn't on the license list is able to do so, via the LicenseRef- syntax. [1] The characters for the ID are the same as those permitted for IDs on the SPDX License List: letters, digits, hyphen ("-") and period ("."). [2]

Making reusable custom license IDs:

If someone wanted to create a standalone, reusable LicenseRef- ID that implemented a UUID, or a hash of a license text, I believe they could do so just by prepending "LicenseRef-" to the UUID or hash. I suspect there are some automated tools out there that work in this manner. (Of course, it's not going to be a particularly meaningful ID on its own, but just noting it since UUIDs were mentioned in the thread.)

The challenge with using LicenseRef-, of course, is in letting people know which license text corresponds to your custom license ID. There are various ways to do this without ever talking to anyone at SPDX, including by creating your own SPDX document that defines it in an "Other License Information" section, or by following practices such as REUSE. [3]

For an approach that could enable anyone to create more meaningful custom IDs and share the corresponding license text, we've had discussions several times over the past 4+ years about creating a formalized "license namespace" format, built on top of the existing LicenseRef- syntax. This has repeatedly failed to reach consensus, in my view primarily due to disagreements about the nuances of what the syntax should look like, and I don't think there's any appetite to reopen that discussion yet again.

As a result, community members are welcome to establish informal practices for how they format LicenseRef- IDs within the permitted syntax and how they share the corresponding license text, such as via REUSE.

Standards for what goes on the SPDX License List:

I agree with Richard that the documentation should be clearer about "vanity" licenses generally being inappropriate for inclusion on the SPDX License List.

I think there is value in the license list not being just a hash of license IDs to arbitrary text. The work that the SPDX Legal community does to review and curate licenses, insert markup to group them together where appropriate, and omit licenses that are not likely to be encountered in FOSS(-ish) development, seems to be of value to downstream users of the list. If it isn't, and if downstream users do in fact want a list that is just a hash of unique IDs to arbitrary text, then anyone is of course free to implement such a list and to persuade the broader ecosystem to adopt it.

For newly-drafted licenses that are used in only one or a couple of projects (or sometimes zero projects), I agree with Richard that we often burn lots of cycles going back and forth with the license author without real benefit. I'd be in favor of bumping the "substantial use" factor higher on the License Inclusion Principles list [4]. And perhaps being more explicit in related documentation about the likelihood that vanity licenses with little usage, particularly non-FOSS licenses that fall in that category are highly unlikely to be added to the list. For a change to the inclusion principles, as Jilayne mentioned earlier I do think that a Change Proposal [5] is probably the right place to discuss the specifics of what that would look like.

Submitters of newly-drafted licenses with little-to-no usage do sometimes mention that they need their license to be added to the SPDX License List so that their software with their new license can be included in a package manager. For package managers that use license list IDs as a requirement, I'd encourage them to consider implementing and permitting LicenseRef- IDs as well. (Or, if they don't want to permit LicenseRef- IDs, then that suggests to me that they are in fact finding some value in the curation that we perform for the License List.)

And of course, to James's point: if a brand new license does see significant usage in the wild such that it is likely to be encountered in a broad set of community-developed software projects, then at that point it may be appropriate to add to the list. But I don't see value in having the SPDX License List be the first stop for a newly-drafted, non-FOSS license that is used in someone's personal project, or in having us burn cycles repeatedly explaining that.


On Wed, Jan 25, 2023 at 1:14 PM Kyle Mitchell <kyle@...> wrote:
If the idea is really to hunt down every license lurking in
every potentially popular public package, I can see how
distro adoption's a real big deal. Congrats! I worry about
more work for distro people, but suppose those chasing
completeness goals like this likely have financial support.

On the process front, three ideas:

First, separate processes for "I've got a license and
champion its identification" from "I spotted a license and
think SPDX may not have it already". Create a separate
intake track for the latter, I imagine often distro people.
This would unburden those submitting just to replace
exceptions with IDs someday. They may otherwise have nothing
to say about terms, beyond what the words are and where they
found them. Put their "sightings" in a separate queue and
let people who care take them up for full submission. Those
can be people more invested in process and criteria.

Second, seriously consider requiring only text for
submissions up front, with XML coding if and when the
license moves forward. Grokking the schema and overcoming
validation errors takes time, even for the XML-astute. I see
the benefits for the tech team in the end. I also see
temptation to use the burden as a general brake on
submissions, or as a backhand "do you really care?" test.
But I don't see XML mattering to the identification
question. It becomes worthwhile only once a license gets
voted in. At that point, well versed SPDX people may be more
inclined to do in five what can take new people an hour.

Third, create a new "provisional" license status and
identify licenses awaiting significance there. Essentially
let folks call dibs on IDs. Supplement with a guideline on
to prefer prefixes like `Apache` to collision-prone
initialisms like `APSL`. Publish the list JSON with a
provisional flag, so implementers can then decide whether to
validate provisionals or not, like they choose for
deprecated. Give provisionals a holding period, say a couple
years, then either promote or deprecate. Think Lanham Act
supplemental register for lawyers, merge-behind-feature-flag
for coders.

On a personal note, I hope I can be honest about my
motivation without coming over blunt. I'm not in
license-list-XML helping clear backlog, even though I
maintain several projects using IDs, because I'm not
interested in a process that I _do_ see as passing
judgments, "approving" more than merely identifying. The
very thrust of this e-mail chain is more effectively shooing
away drafters deemed vain and projects deemed insubstantial.
Those are value judgments.

Value judgments make assessments eat more time. They open
them to controversy. They ask more of reviewers, which
contributes to backlog. I wouldn't expect reordering factors
in the factor test to change that.

If SPDX doesn't want to identify new licenses it doesn't
like, or wants to use its adoption as leverage to discourage
new forms, it should come out and say that. Those of us
building with broader needs can fork or superset.

Kyle Mitchell, attorney // Oakland // (510) 712 - 0933

Join { to automatically receive all group messages.