Till Jaeger

Hello Steve,

Am 22.02.23 um 20:54 schrieb Steve Winslow:
Hi Till,

Regarding the Unicode Terms of Use: since there is a new and
substantively different set of terms at
<https://www.unicode.org/copyright.html>, I agree it would be
appropriate to add them to the SPDX License List as a new identifier.
I would probably propose `Unicode-TOU-2023` as the identifier. I've
added an issue at https://github.com/spdx/license-list-XML/issues/1851
<https://github.com/spdx/license-list-XML/issues/1851> for the
community to weigh in on adding it.
Sounds good to me! Thanks for taking care.

For the existing `Unicode-TOU` identifier, we don't change or remove
identifiers from the list, so I think that identifier would still
remain present no matter what. But there are two changes that could be
considered (I've added an issue at
<https://github.com/spdx/license-list-XML/issues/1852> for community

1. whether to change the "Full name" (which shows up in e.g. the
left-hand column at https://spdx.org/licenses
<https://spdx.org/licenses>) to "Unicode Terms of Use - 2014"; and/or
2. whether to deprecate the `Unicode-TOU` identifier altogether, and
add a corresponding `Unicode-TOU-2014` with identical content.
Ok. Makes sense to me.

I'm in favor of step 1 either way. For step 2, I am on the fence and
interested in what others have to say. This would keep `Unicode-TOU`
as a valid identifier with the same text, but would mark it as
"deprecated" and move it to the bottom part of the list; and would
provide `Unicode-TOU-2014` as a versioned alternative for folks to use.

To be clear, in either case this would _not_ result in removing
`Unicode-TOU` as a valid identifier. Even if it's deprecated, it would
still be present on the License List, as we do not remove existing
identifiers. But deprecating it and adding a new `Unicode-TOU-2014`
alternative might help alleviate the issues you're seeing.

Also, I would _not_ be in favor of removing
<https://www.unicode.org/copyright.html> from the "other URLs" list
for either identifier. Even if it does not point to the corresponding
text today, if someone is using content from 2014 under the old
license text, then relying solely on that URL as embodied in the
content would also lead to a wrong conclusion if only the current
version had the URL associated with it.
What about using

Any sort of dependence on listed URLs to be accurate for license
detection is going to fail in particular cases, and as a policy matter
I believe we've taken the position that "other URLs" will not be
modified based on evolving content. (*Jilayne* or others who may have
historical knowledge here, please feel free to jump in if I'm off-base!)
If has seen some links from https://web.archive.org. But maybe they were
used from the beginning.

The URL https://www.unicode.org/copyright.html
<https://www.unicode.org/copyright.html> would of course still be
appropriate to add also to the new Unicode-TOU-2023 entry.
I assume there will be future changes on that URL. Perhaps using
is a good idea?

Please note that the year in the copyright notice changes each year.

Finally, for the Unicode Data Files and Software license, from the
comparison you shared it looks like the only differences between
<https://www.unicode.org/license.txt> and Unicode-DFS-2016 might be in
omitable (blue) or replaceable (red) text. With the exception being
the inclusion of the URL after "See Terms of Use" at the very
beginning. I would be more inclined to add that URL as additional
"omitable" text for Unicode-DFS-2016, rather than create a separate
license identifier for it. But I've added an issue at
<https://github.com/spdx/license-list-XML/issues/1853> for folks to
weigh in on this one as well.
Interesting question about whether or not the change to the omitted text
justifies a new identifier. I see your point.

Thanks for your support!

Have anice weekend.



On Tue, Feb 21, 2023 at 7:27 PM Till Jaeger <jaeger@...
<mailto:jaeger@...>> wrote:

Hi Steve,

Thanks for looking into this issue. I have a few additional remarks

Am 21.02.23 um 20:55 schrieb Steve Winslow:
Whoops -- accidentally just sent this to Till, re-sending to the
full list:

= = = = =

Hi Till, please see my thoughts inline below:

On Tue, Feb 21, 2023 at 2:19 PM Till Jaeger via lists.spdx.org
<http://lists.spdx.org> <jaeger=jbb.de@...
<mailto:jbb.de@...>> wrote:

Dear all,

Sorry to bring this up again.

I suggest to correct the information on

The link provided under "Other web pages for this license"
points to a
different text (http://www.unicode.org/copyright.html
<http://www.unicode.org/copyright.html>) than the one at

[*SDW*] From a quick search on the Internet Archive, that URL
appears to have been the correct URL for that version of the
website text at one point in time (at least as of July 2014:

The purpose of the "other URLs" section of each license is _not_
to be a now-current source for that license text, but rather to
include URLs which may have been a source for it in the past (as
they may be useful for scanning tools, human review, etc. when
finding URLs embedded in source code). We don't remove inactive or
no-longer-valid URLs because they may remain useful for
identification purposes -- see
C) for one place where this is mentioned.
Well, there are several cases in which there is an indication that
an URL does not work anymore (e.g.

But I think that a link to a webpage with a _different_ license text
is even worse than a dead link.

It should be stated that the link points to a newer version of
the TOU.

[*SDW*] This could perhaps be added to the "Notes" for the
Unicode-TOU license, but I'm a little hesitant to do so. For the
reasons mentioned above, any of the "other URLs" for any license
on the SPDX license list may be incorrect, and I don't think we go
through to regularly re-confirm that any of them match the present
I have a feeling that I did not do a good enough job of explaining
the problem.

The situation that we face when doing FOSS license compliance is the


License scanners detect files as the following:



The license information is "For terms of use, see


Most license scanners conclude "Unicode-TOU".


Many companies have license checklists or internal assessments based
on SPDX identifiers and such internal analysis is based on the text
of https://spdx.org/licenses/Unicode-TOU.html
<https://spdx.org/licenses/Unicode-TOU.html> instead of the current
text of https://www.unicode.org/terms_of_use.html
<https://www.unicode.org/terms_of_use.html>. This increases the risk
to work with the wrong license text. Furthermore, I know many
companies creating license documentation by using template license
texts from SPDX instead of the original license text of the source.


Files such as
<https://www.unicode.org/Public/emoji/15.0/emoji-sequences.txt> may
have been licensed under the Unicode TOU at some point. But newer
versions of the files at https://www.unicode.org/Public/
<https://www.unicode.org/Public/> will no longer be licensed under
the (deprecated) text at https://spdx.org/licenses/Unicode-TOU.html
<https://spdx.org/licenses/Unicode-TOU.html> in the future, and
incorrect license text may be used.


There is no SPDX identifier for the current version of the Unicode
TOU at http://www.unicode.org/copyright.html
<http://www.unicode.org/copyright.html>, even though the vast
majority of Unicode files reference it. This makes the job of
compliance officers working with SPDX much more difficult. This is
the reason why I think that a new identifier would be helpful
(and/or we should clarify that the text on
<https://spdx.org/licenses/Unicode-TOU.html> does not match with the
current TOU on https://www.unicode.org/terms_of_use.html

Follow-up issue: Unicode files refer to
<http://www.unicode.org/copyright.html,i.e>. as the most
recent version of
the text provided on that site (a kind of dynamic reference).
So people
may be confused if they take the text from the Unicode TOU
instead of
the most recent text. Any suggestions on how to deal with this

[*SDW*] I think this is a recurring issue when license stewards
reuse old URLs to change the text of a license.
<https://www.gnu.org/licenses/gpl.html>used to point to GPL-2.0
until it later pointed to GPL-3.0 (see
That URL can show up in source code with the author's intent of it
having referred to either version. No matter how we handle URLs on
the SPDX License List, URLs at most _may_ be helpful for identifying
a license, but frequently aren't going to be solely reliable in
plenty of cases.
I agree. Despite this, or perhaps because of it, obsolete URLs
should not be used. As you say yourself:
<https://spdx.org/licenses/GPL-2.0-only.html> no longer references
<https://www.gnu.org/licenses/gpl.html>, even though GPL-2.0 was
once available there.

I suggest to correct the information on

The link provided under "Other web pages for this license"
points to the

[*SDW*] The "other URLs" link currently listed there --
<http://www.unicode.org/copyright.html>-- appear to have
previously been a source for finding the Unicode-DFS-2016 license
<http://www.unicode.org/copyright.html>as of August 2016
appears to have had Unicode-DFS-2016 as the license text in Exhibit 1
on that page.
This is correct. However, the current text at
<http://www.unicode.org/copyright.html> refers to
<https://www.unicode.org/license.txt>, which is not
Unicode-DFS-2016. The license text
<https://www.unicode.org/license.txt> has no SPDX identifier, even
though most Unicode files are licensed under this license.

It should be stated that a newer version of this agreement is
at https://www.unicode.org/license.txt

[*SDW*] From a quick look, that does appear to be a valid URL
containing the text for Unicode-DFS-2016 (though I haven't checked
carefully to confirm it's a match). Assuming it is, I agree that
<https://www.unicode.org/license.txt>could be added as an
additional "other URL" for it.
It does not fully match. The first paragraph is different:

Current version:

See Terms of Use<https://www.unicode.org/copyright.html>
for definitions of Unicode Inc.’s Data Files and Software.

See Terms of Use for definitions of Unicode Inc.'s Data Files and
Unicode Data Files include all data files under the directories
http://www.unicode.org/Public/ <http://www.unicode.org/Public/>,
http://www.unicode.org/reports/ <http://www.unicode.org/reports/>,
http://www.unicode.org/cldr/data/ <http://www.unicode.org/cldr/data/>,
http://www.unicode.org/ivd/data/ <http://www.unicode.org/ivd/data/>, and
Unicode Data Files do not include PDF online code charts under the
directoryhttp://www.unicode.org/Public/ <http://www.unicode.org/Public/>.
Software includes any source code published in the Unicode
Standard or under the directories
http://www.unicode.org/Public/ <http://www.unicode.org/Public/>,
http://www.unicode.org/reports/ <http://www.unicode.org/reports/>,
http://www.unicode.org/cldr/data/ <http://www.unicode.org/cldr/data/>,
<http://source.icu-project.org/repos/icu/>, and

Please find attached a comparison.

Do you have a solution in mind?


I see the problem with dynamic references on websites but SPDX
incorrect links. Of course, it would be nice to have SPDX
for the most recent versions of the TOU and Unicode-DFS.



Am 31.10.22 um 12:20 schrieb Till Jaeger via lists.spdx.org
Dear all,

I'm wondering why https://spdx.org/licenses/Unicode-TOU.html
<https://spdx.org/licenses/Unicode-TOU.html> is (still)
part of the license list. Could it be deprecated?

First of all, the current text of the "Unicode® Copyright
and Terms of
Use" is quite different from the text which is referenced at
<https://spdx.org/licenses/Unicode-TOU.html> (SPDX License
Diff is very
helpful to show the differences - thanks again to Alan Tse).

Sec. C.3 of the current version refers to the "Unicode Data
Files and
Software License":

"Further specifications of rights and restrictions
pertaining to the use
of the Unicode DATA FILES and SOFTWARE can be found in the
Unicode Data
Files and Software License."

The "Unicode Data Files and Software License"
<https://www.unicode.org/license.txt>) is similar but not
identical to

To me it seems that the "Unicode® Copyright and Terms of
Use" are more
or less ToU for a website and all redistributables are under

Unicode modifies the "year" within the copyright notice from
year to
year. The "Unicode Data Files and Software License" provides
as follows:

"this copyright and permission notice appear with all copies
of the Data Files or Software"

Would this require to identify in which year the data and/or
was copied from the Unicode website to use the license text
with the
correct year? Would it be sufficient to use the most recent
version of
the license text? Should this be reflected in the SPDX

Is there anybody with more background information who can
give some

Best regards,