Re: SPDX files as templates

Steve Winslow

Previously I've been generally against the idea of encouraging folks to use the test/simpleTestForGenerator/*.txt files for anything other than the automated tests for the XML files. Mostly for the reason you noted at the start of this thread: that in many cases (especially where a copyright notice is baked into the license text, such as MIT) people may grab it without realizing they should probably adjust the text.

I've been pretty well convinced that I was wrong there; if people are finding value in using the "test" text as license templates, then great.

A couple of random thoughts, getting into the weeds:

1) People should not assume that the text in test/simpleTestForGenerator/*.txt is necessarily the _official, canonical, byte-for-byte text_ from the license steward, if there is one. Here are a couple of examples: is different from (at least w/r/t whitespace, and different parts of "optional" text after the end of the license, etc).

There's a related question of whether the test text in the license-list-XML repo _should_ be the same as the canonical license steward text, where there is one. I'm just noting that at present, it isn't always the same. Also, since license stewards sometimes make changes to their own official license texts (GPL-2.0 is an example), the SPDX text is not necessarily going to be in sync if upstream makes a change.

2) I'd tend to agree that it's generally going to be preferable to point folks at the text/ directory in the license-list-data repo. That helps to keep the concerns separated as "go to license-list-data if you're a user of the License List; go to license-list-XML in order to contribute."

From a very quick skim, it looks like the text/ directory in license-list-data is _mostly_ the same as the test text files in license-list-XML. I see a handful with differences in whitespaces; and it looks like the naming for deprecated licenses might be handled differently. But those are both presumably something that could be addressed.


On Tue, Nov 16, 2021 at 3:46 PM J Lovejoy <opensource@...> wrote:
Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the folder of the license-list-XML repo, but then there are the .text files here too: (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.


On 11/16/21 1:20 PM, Alexios Zavras wrote:
Hi Jilayne,

The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside is generated, where people can collect all the information in a variety of formats.

If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates

Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.



Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, <>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Join { to automatically receive all group messages.