SPDX files as templates


J Lovejoy
 

Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!) There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

Thanks,
Jilayne


Warner Losh
 



On Tue, Nov 16, 2021 at 11:18 AM J Lovejoy <opensource@...> wrote:
Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

The plan for FreeBSD is to say that when there's an SPDX-License-Identifier: and no other grant of license, it should be construed (right word?) to include that identifier (eg MIT) from /usr/share/licenses/MIT.txt, as appropriate for the included expression. We plan on having this explicitly spelled out in our policies since this is, imho (not legal), it is no different than the GPL license saying "this is licensed under the GPL 2.0 or later" or whatever of the 100-odd variations on those words are. The policy makes the intent clear, both to contributors that are licensing their work, and to users who wish to be in compliance. We also plan to spell out that example copyright notices contained there in are by way of example only and the copyright notice is contained in the file doing the licensing, as well as similar language for spelling out that "author", "contributor" or "copyright hold" etc should be construed to be formed from the copyright notices in the file. But that's really awkwardly stated, I'm sure, and I've not yet run this past the legal folks that specialize in this area to know if that forms a "contract" or "agreement" or whatever the right term is between two parties that allows one party to copy the other party's work.
 
Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

When people use the text, like we do, with an indirection, there's no chance to update the 'template' that's in the /usr/share/licenses directory, except via the templating operation I'm trying to document.  That's my use case.

Other projects seem to do this vary widely. Linux kernel documents this, to an extent. u-boot seems to document it by implication only. There seems to be no real standard here, with each project making it up as they go along down this path, but generally huing either to the Linux Kernel end of the spectrum (eg being explicit) or the u-boot end (people can figure it out) with little in between. Of course, my survey has been far from scientific or systematic.

Is that the feedback you are looking for? Or did I miss the boat?

Warner
 





Alexios Zavras
 

Hi Jilayne,

The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.

If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates

Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!) There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

Thanks,
Jilayne




Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


VM (Vicky) Brasseur
 

I'd rather go inline with my reply, but as Outlook has Opinions™ about the format of replies…then top-posting it is.

I've frequently seen people copy the .txt of the licenses then drop that into a LICENSE file in their repos. I've seen them do similar things with the license texts on the OSI site. It's happening and we can't (and shouldn't) stop that, IMO. We could, perhaps, even find ways to make it easier.

Part of that may include removing names from copyright notices, but I feel we probably should do that anyway. _WE_ know that the copyright line needs to be changed, but most people don't. They know "license text > LICENSE > put in repo > done" if they know anything at all about this stuff. Not to fault them; it's just not something most people teach so folks make assumptions.

The Contributor Covenant Code of Conduct is an example of this sort of thing happening in real life. Project maintainers copy that text, drop it into a Code_of_Conduct.md file in a repo, then call it a day. They never really read far enough to reach the part where they need to enter an email address where people can report CoC violations. So now we have a lot of CoCs in project repositories…but no way to tell anyone about bad behaviour. That's one way to ensure low CoC report stats, I guess…

--V

--

VM (Vicky) Brasseur
Director, Senior Strategy Advisor
Open Source Program Office
Wipro Limited
Time Zone: Pacific/West Coast US

-----Original Message-----
From: <Spdx-legal@...> on behalf of "J Lovejoy via lists.spdx.org" <opensource=jilayne.com@...>
Reply-To: "opensource@..." <opensource@...>
Date: Tuesday, November 16, 2021 at 10:19
To: SPDX-legal <Spdx-legal@...>
Subject: SPDX files as templates

CAUTION:This email is received from an external domain. Open the hyperlink(s) & attachment(s) with caution.
.


Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!) There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Flicense-list-XML%2Fblob%2Fmaster%2Ftest%2FsimpleTestForGenerator%2FMIT.txt&;data=04%7C01%7Cvm.brasseur%40wipro.com%7C7dd5b77b13ad4aa1575c08d9a92d7717%7C258ac4e4146a411e9dc879a9e12fd6da%7C1%7C0%7C637726835917760296%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=%2FSkXawBuqCLJwdEKtR9lS1OTYUBEFlKe%2BJWNDA8tPB8%3D&amp;reserved=0 ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

Thanks,
Jilayne





'The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com'


J Lovejoy
 

Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/ folder of the license-list-XML repo, but then there are the .text files here too: https://github.com/spdx/license-list-data/tree/master/text (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.

Thanks,
Jilayne

On 11/16/21 1:20 PM, Alexios Zavras wrote:

Hi Jilayne,

The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.

If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates

Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

Thanks,
Jilayne




Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928







Steve Winslow
 

Previously I've been generally against the idea of encouraging folks to use the test/simpleTestForGenerator/*.txt files for anything other than the automated tests for the XML files. Mostly for the reason you noted at the start of this thread: that in many cases (especially where a copyright notice is baked into the license text, such as MIT) people may grab it without realizing they should probably adjust the text.

I've been pretty well convinced that I was wrong there; if people are finding value in using the "test" text as license templates, then great.

A couple of random thoughts, getting into the weeds:

1) People should not assume that the text in test/simpleTestForGenerator/*.txt is necessarily the _official, canonical, byte-for-byte text_ from the license steward, if there is one. Here are a couple of examples:


https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/GPL-2.0-or-later.txt is different from https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt (at least w/r/t whitespace, and different parts of "optional" text after the end of the license, etc).

There's a related question of whether the test text in the license-list-XML repo _should_ be the same as the canonical license steward text, where there is one. I'm just noting that at present, it isn't always the same. Also, since license stewards sometimes make changes to their own official license texts (GPL-2.0 is an example), the SPDX text is not necessarily going to be in sync if upstream makes a change.

2) I'd tend to agree that it's generally going to be preferable to point folks at the text/ directory in the license-list-data repo. That helps to keep the concerns separated as "go to license-list-data if you're a user of the License List; go to license-list-XML in order to contribute."

From a very quick skim, it looks like the text/ directory in license-list-data is _mostly_ the same as the test text files in license-list-XML. I see a handful with differences in whitespaces; and it looks like the naming for deprecated licenses might be handled differently. But those are both presumably something that could be addressed.

Steve

On Tue, Nov 16, 2021 at 3:46 PM J Lovejoy <opensource@...> wrote:
Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/ folder of the license-list-XML repo, but then there are the .text files here too: https://github.com/spdx/license-list-data/tree/master/text (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.

Thanks,
Jilayne

On 11/16/21 1:20 PM, Alexios Zavras wrote:
Hi Jilayne,

The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.

If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates

Hi all,

This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.

SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.

The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?

I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)

Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.

Thoughts?

Thanks,
Jilayne




Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928







Alan Tse
 

As a programmatic user of the list, I think we should expect the use per Vicky’s points. One extra data point, I’m not accessing any of the GitHub repos listed so far but relying on whatever the licenses.json leads me to. I do that because at one point that was pointed out as the endpoint for machine reading. If we wanted to encourage a specific type of use, we’d have to build some tooling to encourage it. That way there’s a benefit to doing it the “official way”. So for example, our template matching could be used to indicate which fields should be replaced (copyright holder). If there was a library to pull the right file and swap in the missing variable, that would encourage more official use.

 

On canonical licenses, I’d be supportive of swapping out to the canonical if it exists. The old one could be kept as another example. Seems like a simple version increment. If we wanted to normalize licenses to replace specific names with “COPYRIGHT HOLDERS”, I think that would be helpful and could be treated the same as a canonical switch.

 

Alan

 

From: <Spdx-legal@...> on behalf of Steve Winslow <swinslow@...>
Date: Tuesday, November 16, 2021 at 12:56 PM
To: J Lovejoy <opensource@...>
Cc: Alexios Zavras <alexios.zavras@...>, SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.

 

Previously I've been generally against the idea of encouraging folks to use the test/simpleTestForGenerator/*.txt files for anything other than the automated tests for the XML files. Mostly for the reason you noted at the start of this thread: that in many cases (especially where a copyright notice is baked into the license text, such as MIT) people may grab it without realizing they should probably adjust the text.

 

I've been pretty well convinced that I was wrong there; if people are finding value in using the "test" text as license templates, then great.

 

A couple of random thoughts, getting into the weeds:

 

1) People should not assume that the text in test/simpleTestForGenerator/*.txt is necessarily the _official, canonical, byte-for-byte text_ from the license steward, if there is one. Here are a couple of examples:

 

 

https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/GPL-2.0-or-later.txt is different from https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt (at least w/r/t whitespace, and different parts of "optional" text after the end of the license, etc).

 

There's a related question of whether the test text in the license-list-XML repo _should_ be the same as the canonical license steward text, where there is one. I'm just noting that at present, it isn't always the same. Also, since license stewards sometimes make changes to their own official license texts (GPL-2.0 is an example), the SPDX text is not necessarily going to be in sync if upstream makes a change.

 

2) I'd tend to agree that it's generally going to be preferable to point folks at the text/ directory in the license-list-data repo. That helps to keep the concerns separated as "go to license-list-data if you're a user of the License List; go to license-list-XML in order to contribute."

 

From a very quick skim, it looks like the text/ directory in license-list-data is _mostly_ the same as the test text files in license-list-XML. I see a handful with differences in whitespaces; and it looks like the naming for deprecated licenses might be handled differently. But those are both presumably something that could be addressed.

 

Steve

 

On Tue, Nov 16, 2021 at 3:46 PM J Lovejoy <opensource@...> wrote:

Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/ folder of the license-list-XML repo, but then there are the .text files here too: https://github.com/spdx/license-list-data/tree/master/text (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.

Thanks,
Jilayne

On 11/16/21 1:20 PM, Alexios Zavras wrote:

Hi Jilayne,
 
The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.
 
If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.
 
-- zvr
 
-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates
 
Hi all,
 
This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.
 
SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.
 
The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?
 
I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)
 
Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.
 
Thoughts?
 
Thanks,
Jilayne
 
 
 
 
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
 
 
 
 
 

 


VM (Vicky) Brasseur
 

Data point: REUSE specifically directs people to copy the text from https://github.com/spdx/license-list-data/tree/master/text.

 

Citation: https://reuse.software/tutorial/

 

--V

 

-- 

VM (Vicky) Brasseur

Director, Senior Strategy Advisor

Open Source Program Office

Wipro Limited

Time Zone: Pacific/West Coast US

 

 

From: <Spdx-legal@...> on behalf of "Alan Tse via lists.spdx.org" <alan.tse=wdc.com@...>
Reply-To: "alan.tse@..." <alan.tse@...>
Date: Tuesday, November 16, 2021 at 13:23
To: Steve Winslow <swinslow@...>, J Lovejoy <opensource@...>
Cc: Alexios Zavras <alexios.zavras@...>, SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

CAUTION:This email is received from an external domain. Open the hyperlink(s) & attachment(s) with caution.
.
 

As a programmatic user of the list, I think we should expect the use per Vicky’s points. One extra data point, I’m not accessing any of the GitHub repos listed so far but relying on whatever the licenses.json leads me to. I do that because at one point that was pointed out as the endpoint for machine reading. If we wanted to encourage a specific type of use, we’d have to build some tooling to encourage it. That way there’s a benefit to doing it the “official way”. So for example, our template matching could be used to indicate which fields should be replaced (copyright holder). If there was a library to pull the right file and swap in the missing variable, that would encourage more official use.

 

On canonical licenses, I’d be supportive of swapping out to the canonical if it exists. The old one could be kept as another example. Seems like a simple version increment. If we wanted to normalize licenses to replace specific names with “COPYRIGHT HOLDERS”, I think that would be helpful and could be treated the same as a canonical switch.

 

Alan

 

From: <Spdx-legal@...> on behalf of Steve Winslow <swinslow@...>
Date: Tuesday, November 16, 2021 at 12:56 PM
To: J Lovejoy <opensource@...>
Cc: Alexios Zavras <alexios.zavras@...>, SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.

 

Previously I've been generally against the idea of encouraging folks to use the test/simpleTestForGenerator/*.txt files for anything other than the automated tests for the XML files. Mostly for the reason you noted at the start of this thread: that in many cases (especially where a copyright notice is baked into the license text, such as MIT) people may grab it without realizing they should probably adjust the text.

 

I've been pretty well convinced that I was wrong there; if people are finding value in using the "test" text as license templates, then great.

 

A couple of random thoughts, getting into the weeds:

 

1) People should not assume that the text in test/simpleTestForGenerator/*.txt is necessarily the _official, canonical, byte-for-byte text_ from the license steward, if there is one. Here are a couple of examples:

 

 

https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/GPL-2.0-or-later.txt is different from https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt (at least w/r/t whitespace, and different parts of "optional" text after the end of the license, etc).

 

There's a related question of whether the test text in the license-list-XML repo _should_ be the same as the canonical license steward text, where there is one. I'm just noting that at present, it isn't always the same. Also, since license stewards sometimes make changes to their own official license texts (GPL-2.0 is an example), the SPDX text is not necessarily going to be in sync if upstream makes a change.

 

2) I'd tend to agree that it's generally going to be preferable to point folks at the text/ directory in the license-list-data repo. That helps to keep the concerns separated as "go to license-list-data if you're a user of the License List; go to license-list-XML in order to contribute."

 

From a very quick skim, it looks like the text/ directory in license-list-data is _mostly_ the same as the test text files in license-list-XML. I see a handful with differences in whitespaces; and it looks like the naming for deprecated licenses might be handled differently. But those are both presumably something that could be addressed.

 

Steve

 

On Tue, Nov 16, 2021 at 3:46 PM J Lovejoy <opensource@...> wrote:

Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/ folder of the license-list-XML repo, but then there are the .text files here too: https://github.com/spdx/license-list-data/tree/master/text (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.

Thanks,
Jilayne

On 11/16/21 1:20 PM, Alexios Zavras wrote:

Hi Jilayne,
 
The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.
 
If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.
 
-- zvr
 
-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates
 
Hi all,
 
This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.
 
SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.
 
The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?
 
I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)
 
Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.
 
Thoughts?
 
Thanks,
Jilayne
 
 
 
 
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
 
 
 
 
 

 

'The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com'


Gary O'Neall
 

Adding a couple of facts to the discussion:

  • Over the last couple of releases, the license text field in the JSON and RDF representations as well as the license-list-data/text is a copy of the test license text as long as it passed a comparison test with the license XML.  Let me know of any differences you find and I’ll see if there is any issue with the tools.
  • The new license workflow includes a statement that the test data should be the canonical text – bullet point #2 under the add test .txt subheading:

“Locate the canonical text for the license. There should be a link to this in the issue, but if there isn't please ask for it from the license steward. Don't proceed until you have confirmed that you have the canonical text.

  • As Alexios pointed out, we have in the past discouraged people from fetching license text directly from the License-List-XML repo

 

A couple of opinions:

  • I continue to believe we should discourage anyone from pulling data directly from the License-List-XML repo.  We should only use that repo internally within the legal team to give us flexibility to change things without worrying about breaking tooling that use the data.
  • The XML format for the licenses are now pretty stable, so I think it may be time to expose this format as something usable outside the SPDX legal workgroup.  Per the point above, I would do this by adding a subdirectory in the license-list-data repo and copying the XML files there rather than have people pull directly from our “working repo”.
  • Per Alan’s point, I still prefer the website accessible endpoints for the license list data (https://spdx.org/licenses/licenses.json).  However, if anyone finds it more convenient to pull it from the license-list-data repo, that should also be fine.  Both versions are generated by the same utility and should be identical.

 

Best regards,

Gary

 

From: Spdx-legal@... <Spdx-legal@...> On Behalf Of VM (Vicky) Brasseur via lists.spdx.org
Sent: Tuesday, November 16, 2021 1:38 PM
To: SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

Data point: REUSE specifically directs people to copy the text from https://github.com/spdx/license-list-data/tree/master/text.

 

Citation: https://reuse.software/tutorial/

 

--V

 

-- 

VM (Vicky) Brasseur

Director, Senior Strategy Advisor

Open Source Program Office

Wipro Limited

Time Zone: Pacific/West Coast US

 

 

From: <Spdx-legal@...> on behalf of "Alan Tse via lists.spdx.org" <alan.tse=wdc.com@...>
Reply-To: "alan.tse@..." <alan.tse@...>
Date: Tuesday, November 16, 2021 at 13:23
To: Steve Winslow <swinslow@...>, J Lovejoy <opensource@...>
Cc: Alexios Zavras <alexios.zavras@...>, SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

CAUTION:This email is received from an external domain. Open the hyperlink(s) & attachment(s) with caution.
.
 

As a programmatic user of the list, I think we should expect the use per Vicky’s points. One extra data point, I’m not accessing any of the GitHub repos listed so far but relying on whatever the licenses.json leads me to. I do that because at one point that was pointed out as the endpoint for machine reading. If we wanted to encourage a specific type of use, we’d have to build some tooling to encourage it. That way there’s a benefit to doing it the “official way”. So for example, our template matching could be used to indicate which fields should be replaced (copyright holder). If there was a library to pull the right file and swap in the missing variable, that would encourage more official use.

 

On canonical licenses, I’d be supportive of swapping out to the canonical if it exists. The old one could be kept as another example. Seems like a simple version increment. If we wanted to normalize licenses to replace specific names with “COPYRIGHT HOLDERS”, I think that would be helpful and could be treated the same as a canonical switch.

 

Alan

 

From: <Spdx-legal@...> on behalf of Steve Winslow <swinslow@...>
Date: Tuesday, November 16, 2021 at 12:56 PM
To: J Lovejoy <opensource@...>
Cc: Alexios Zavras <alexios.zavras@...>, SPDX-legal <Spdx-legal@...>
Subject: Re: SPDX files as templates

 

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.

 

Previously I've been generally against the idea of encouraging folks to use the test/simpleTestForGenerator/*.txt files for anything other than the automated tests for the XML files. Mostly for the reason you noted at the start of this thread: that in many cases (especially where a copyright notice is baked into the license text, such as MIT) people may grab it without realizing they should probably adjust the text.

 

I've been pretty well convinced that I was wrong there; if people are finding value in using the "test" text as license templates, then great.

 

A couple of random thoughts, getting into the weeds:

 

1) People should not assume that the text in test/simpleTestForGenerator/*.txt is necessarily the _official, canonical, byte-for-byte text_ from the license steward, if there is one. Here are a couple of examples:

 

 

https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/GPL-2.0-or-later.txt is different from https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt (at least w/r/t whitespace, and different parts of "optional" text after the end of the license, etc).

 

There's a related question of whether the test text in the license-list-XML repo _should_ be the same as the canonical license steward text, where there is one. I'm just noting that at present, it isn't always the same. Also, since license stewards sometimes make changes to their own official license texts (GPL-2.0 is an example), the SPDX text is not necessarily going to be in sync if upstream makes a change.

 

2) I'd tend to agree that it's generally going to be preferable to point folks at the text/ directory in the license-list-data repo. That helps to keep the concerns separated as "go to license-list-data if you're a user of the License List; go to license-list-XML in order to contribute."

 

From a very quick skim, it looks like the text/ directory in license-list-data is _mostly_ the same as the test text files in license-list-XML. I see a handful with differences in whitespaces; and it looks like the naming for deprecated licenses might be handled differently. But those are both presumably something that could be addressed.

 

Steve

 

On Tue, Nov 16, 2021 at 3:46 PM J Lovejoy <opensource@...> wrote:

Hi Alexios,

You are correct re: the license-list-XML repo and originally - when we were still getting the whole XML files sorted - I believe, we had some kind of explicit warning not to use the data there and pointing to the license-list-data repo explicitly (if memory serves). The README has since been updated and has more of a factual statement.

But this is part of what I was getting at - there are the .txt files in the https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/ folder of the license-list-XML repo, but then there are the .text files here too: https://github.com/spdx/license-list-data/tree/master/text (and a template file as well).

I'm curious as to 1) what are people actually using and how (that we know of)?
and 2) given Vicky's comment that people are going to do what they are going to do, should we point them in any particular direction?

I don't see the need for an additional format... but open to thoughts on that too.

Thanks,
Jilayne

On 11/16/21 1:20 PM, Alexios Zavras wrote:

Hi Jilayne,
 
The way we have operated for years is that the license-list-XML repo is for internal work of the SPDX Legal Team.
These files are automagically processed and everything inside https://github.com/spdx/license-list-data is generated, where people can collect all the information in a variety of formats.
 
If we want to provide another format (e.g., with deleted author names), we can definitely add it in this generated data repo.
 
-- zvr
 
-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Tuesday, 16 November, 2021 19:18
To: SPDX-legal <spdx-legal@...>
Subject: SPDX files as templates
 
Hi all,
 
This is a topic that came up some time ago (I think by way of the Reuse folks) and I’ve been meaning to raise it in a separate thread.
 
SPDX has a lot of license data by way of the SPDX License List and associated tooling and files. Some people are using that data to grab license text for as a kind of license template (as I understand it, but correct my terminology as need be!)  There were some opinions expressed that this is a bad idea - I’m not sure why.
 
The licenses for the SPDX License List are stored in two formats in the main repo: the XML format which applies some of the matching guidelines and other formatting and a plain .txt file. I believe it is the latter that some people may be using for the above scenario. For example, if someone wants to use the MIT license, for example, why wouldn’t they simply pull it from https://github.com/spdx/license-list-XML/blob/master/test/simpleTestForGenerator/MIT.txt ?
 
I am wondering if I have the scenario correct or are there other scenarios like this?
And if I’m pointing to where people are pulling the text from (or are other places being used? hopefully not the XML files or scraping the website!)
 
Relatedly, we have had requests to remove specific names in copyright notices so as to avoid anyone using the wrong notice. From an SPDX matching guidelines perspective, what name exists in the copyright notice does not matter, as that is not “matchable” text for the purposes of matching a license. I would also point out that anyone using the .txt files as a copy of the license for their own code, would *always* need to update the copyright notice - whether it has some other name or simply “author”. In any case, I don’t see this as a reason not to use the .txt files of the license text for other purpose outside of SPDX. And it’s fine for us to change those copyright notices to generic “author” or “name” if that helps.
 
Thoughts?
 
Thanks,
Jilayne
 
 
 
 
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
 
 
 
 
 

 

'The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com'