Date   

Re: ANTLR-PD

Steve Winslow
 

Thanks for flagging this, Till. I've added an issue in the license-list-XML repo to track this at https://github.com/spdx/license-list-XML/issues/1056.

I don't know the history of this one myself, but it looks like that language had been omitted prior to when the license list was first brought into source control (see https://github.com/spdx/license-list-XML/commits/master/src/ANTLR-PD.xml). I expect it should be added into the ANTLR-PD markup for the reasons you mentioned.

Best,
Steve


On Tue, Jun 23, 2020 at 5:33 AM Till Jaeger via lists.spdx.org <jaeger=jbb.de@...> wrote:
Hello list,

I just found out that there is a deviation from
https://spdx.org/licenses/ANTLR-PD.html#licenseText to the linked text from
http://www.antlr2.org/license.html which contains the following language:

"In countries where the Public Domain status of the work may not be valid,
the author grants a copyright licence to the general public to deal in the
work without restriction and permission to sublicence derivates under the
terms of any (OSI approved) Open Source licence."

From the perspective from EU law this is an extremely important part since
it makes clear that a unrestricted license is intended if PD does not work.
This avoids (always disputable) interpretation of the PD text.

Is there any reason for the omission? Could the text be added?

Best regards,

Till

--
Dr. Till Jaeger
Certified Copyright and Media Law Attorney


JBB Rechtsanwälte
Jaschinski Biere Brexl Partnerschaft mbB
Christinenstraße 18/19 | 10119 Berlin
Tel. +49.30.443 765 0  |  Fax +49.30.443 765 22
Sitz der Gesellschaft: Berlin | Registergericht AG Charlottenburg | PR 609 B
www.jbb.de





--
Steve Winslow
Director of Strategic Programs
The Linux Foundation


Re: Validate license cross references: New fields to be added

Smith Tanjong Agbor
 

Hi all,

1- I don't think http://localhost/ or https://127.0.0.1 should be valid urls. So I shall consider those exceptions in my code.
2- In addition to https, http and ftp; Are there any other protocols you would like us to consider for license urls?

3- I think this:

Rather than true/false perhaps allow the name of the matched algorithm:
verbatim
noassertion – if no test result is available (for invalid links perhaps)
todo – no match attempted

“” – no match asserted

verbatim2 – matches with \r == \r\n == \n
verbatim3 – matches “ignoring whitespace differences” reflowed text

verbatim4 – matches ignoring decoration (comments, flower-boxes)
template – matches template verbatim (see ppalaga’s comment)
et cetera as they become available

shall provide more information than what I suggested previously. It will also enable us to add values without changing the structure of the data.

4- Concerning the date of the most recent HTTP-200 response, we can have two values; the date of the most recent HTTP-200/or not and true/false. I think this will allow us to have dates in any case; and whether the link is dead or not.

Concerning Brad's reply;

1- I would suggest storing the dates of events for all fields, except the url.
For instance:
isValid: {val: true/false, date: date_utc},
isDead: {val: true/false, date: date_utc}, etc

2- I would really like to have more input on this. I really do not know if the inclusion of the DNS, CDN, private network, etc to evaluate the validity of an url is ok. I am more inclined towards using a regex, and not requiring that a link is valid before establishing whether it is dead or not. I think that could help.

Any more comments/suggestions are welcome.

Thanks, 
Smith

Le mer. 17 juin 2020 à 21:14, Kaelbling, Michael <michael.kaelbling@...> a écrit :

In the spirit of “any suggestions and/or modifications will be very much appreciated”, I have inserted comments below.

 

From: Spdx-legal@... <Spdx-legal@...> On Behalf Of Smith Tanjong Agbor
Sent: Wednesday, June 17, 2020 12:32
To: Spdx-tech@...; Spdx-legal@...
Cc: Gary O'Neall <gary@...>; swinslow@...
Subject: Validate license cross references: New fields to be added

 

Hi all,

 

I am working on a Google Summer of Code project that emanates from this discussion/issue; concerning the validation of license cross references. Here is a link to my GSOC proposal.

 

The focus is on improving the LicenseListPublisher repository to have generated license data updated with fields on the validity of the crossref, among others. 

 

Inorder to do this, the structure of the crossref shall change(in some cases, eg JSON), and in others, there shall be additional tags. In general the following are fields which shall be added to the crossrefs:

 

"isValid": true/false,

Indicates whether or not the crossref url is a valid url (ex: not some local file link)

Must a valid URL be based on one of only two/three schemes: http, https, and ftp? Is http://localhost/ or https://127.0.0.1 valid?


"isWayBackLink": true/false,

Indicates whether or not the url is a link from a previous version(wayback machine) of the site(where the license is located)


"extraText": true/false,

Indicates whether or not the license from the url has extra text in its description when compared to the license description in the current file.


"isMatch": true/false,

Indicates whether or not the license from the url link matches(perfectly) the license description in the current file.

Rather than true/false perhaps allow the name of the matched algorithm:
verbatim
noassertion – if no test result is available (for invalid links perhaps)
todo – no match attempted

“” – no match asserted

verbatim2 – matches with \r == \r\n == \n
verbatim3 – matches “ignoring whitespace differences” reflowed text

verbatim4 – matches ignoring decoration (comments, flower-boxes)
template – matches template verbatim (see ppalaga’s comment)
et cetera as they become available

This is the url of the license text/description


"isDead": true/false

Indicates whether or not the url is a dead link(a link that returns a page different from HTTP_200, could be bad request HTTP_400, not found HTTP_404, forbidden HTTP_403, etc)

Rather than true/false (since dead sites can be reanimated), how about a date for the most-recent HTTP-200 response? “dateMRHTTP200”: “UTC date”

 

Please consider this as a proposal and any suggestions and/or modifications will be very much appreciated.

 

Thanks,

Smith

 

 


ANTLR-PD

Till Jaeger
 

Hello list,

I just found out that there is a deviation from
https://spdx.org/licenses/ANTLR-PD.html#licenseText to the linked text from
http://www.antlr2.org/license.html which contains the following language:

"In countries where the Public Domain status of the work may not be valid,
the author grants a copyright licence to the general public to deal in the
work without restriction and permission to sublicence derivates under the
terms of any (OSI approved) Open Source licence."

From the perspective from EU law this is an extremely important part since
it makes clear that a unrestricted license is intended if PD does not work.
This avoids (always disputable) interpretation of the PD text.

Is there any reason for the omission? Could the text be added?

Best regards,

Till

--
Dr. Till Jaeger
Certified Copyright and Media Law Attorney


JBB Rechtsanwälte
Jaschinski Biere Brexl Partnerschaft mbB
Christinenstraße 18/19 | 10119 Berlin
Tel. +49.30.443 765 0 | Fax +49.30.443 765 22
Sitz der Gesellschaft: Berlin | Registergericht AG Charlottenburg | PR 609 B
www.jbb.de


Re: License of an open source license text

Matija Šuklje
 

Die 19. 06. 20 et hora 03:00 J Lovejoy scripsit:

Thanks Till for weighing in here!
FWIW, another lawyerly +1 on Till‘s analysis from me.


(a) A technical question: When generating SPDX data at the file level, how
does one identify the LICENSE.txt file? Various ideas have been raised
here.
[…]
This is what really matters. If I find a LICENSE.txt file and it’s an exact
match to MIT - why wouldn’t I simply identify it as MIT? I guess I don’t
understand why having a new license identifier is needed or how that helps.
I’d be really curious to hear what other lawyers think on this bit - as we
are the ones who are going to consume/review the license fields part of the
SPDX data.
This is why REUSE <https://reuse.software> requires the license texts to be
stored with SPDX ID names inside a LICENSES/ folder – e.g.:

LICENSES/GPL-2.0-or-later.txt
LICENSES/MIT.txt

and requires a modified text to use the LicenseRef-* prefix, e.g.:

LICENSES/Licenses-MIT-Matija.txt

That way you can see at a quick glance see:
• all the licenses in the repo/package at a glance – just check LICENSES/*
• match each file through the SPDX ID in it with the license text in LICENSES/
* due to the shared name between the tag and the file
• if the license text is the same as the SPDX reference text – look out for
LicenseRef-*

I think this is a very elegant solution that piggy-backs on top of SPDX.

But in any case, let’s please start with the understanding that the license
of the LICENSE.txt file doesn’t matter.
Aye!

(unless you want to modify/fork the license text, in which case the license of
the license text is probably your least concern)

The (very few) open source licenses that do have a copyright notice or some
other such communication as to the license text itself, I would interpret
more as an artifact of trying to prevent license proliferation or at least
encourage people to name the license something else, so avoid confusion
(now we have scanners that can and SPDX identifiers to help too).
This is an issue that is felt also in REUSE, where because of this the current
suggestion for MIT &sim. is to use a LicenseRef-MIT-{$vendor_or_project} SPDX
ID for the license name for every different copyright holder. Which apart from
making sure the copyright holders are preserved in the license texts, only
takes extra work and space while giving no practical benefits.

See this issue for more details, some proposed solutions, and feel free to
chip in:
https://github.com/fsfe/reuse-docs/issues/16

The sooner we crack this problem, the faster it will get (even) easier to mark
code with licensing info :)


cheers,
Matija
--
gsm: tel:+386.41.849.552
www: https://matija.suklje.name
xmpp: matija.suklje@...
sip: matija_suklje@...


Re: License of an open source license text

J Lovejoy
 

Hi all,

Thanks Till for weighing in here!

I think there are two general issues that come up here:

(a) A technical question: When generating SPDX data at the file level, how does one identify the LICENSE.txt file?
Various ideas have been raised here. Some of you might be interested to know (if you weren’t here or don’t remember) that when we were discussing the change to GPL-x.y-or-later and GPL-x.y-only identifiers, one proposal that circulated was to keep the “plain” GPL-2.0 identifier for the purpose of identifying when one finds the text of the license alone which does not indicate whether it’s “only” or “or later” because this is indicated in the license header or with the SPDX identifier. (Unfortunately, this proposal did not win out for unrelated reasons). 

(b) A legal question: what is the license for the LICENSE.txt file itself?

I think there is a different question that is being missed:
(c) Does it matter? (or Do I need to know the license of the LICENSE.txt file itself? or Is there a license for the LICENSE.txt file itself?)

Because if the answer to (c) is - it doesn’t matter, I don’t need to know, and no, then that answers (b) and “solving” a) or how you answer a) doesn’t really matter much either.

Question (b) has come up several times over the years, always answered with various levels of detailed copyright analysis, as well as pragmatism (see above) by the excellent lawyers on this list. The people asking are not lawyers (but in some cases, have not been satisfied with answers by several lawyers…) 

I’d like to emphasize a few things Till said below:
I have no interest to know how the license text is licensed itself. 
YES!!!

and

I have an interest to know whether or not the license text is identical to
the original one 
YES, YES, YES!!!!

This is what really matters. If I find a LICENSE.txt file and it’s an exact match to MIT - why wouldn’t I simply identify it as MIT?  I guess I don’t understand why having a new license identifier is needed or how that helps.  I’d be really curious to hear what other lawyers think on this bit - as we are the ones who are going to consume/review the license fields part of the SPDX data.

But in any case, let’s please start with the understanding that the license of the LICENSE.txt file doesn’t matter. Mostly because it’s generally understood that text in legal agreements is not copyrightable or (for the pragmatic approach) shouldn’t be and/or no one cares for it to be. Legal agreements, in their best form, convey a “meeting of the minds” between the parties in a way that’s clear and remains clear over time. As lawyers, we always copy well-written legal agreements. It would be silly (and wildly inefficient) not to. 

The (very few) open source licenses that do have a copyright notice or some other such communication as to the license text itself, I would interpret more as an artifact of trying to prevent license proliferation or at least encourage people to name the license something else, so avoid confusion (now we have scanners that can and SPDX identifiers to help too).

Thanks,
Jilayne

On Jun 18, 2020, at 3:52 PM, Till Jaeger via lists.spdx.org <jaeger=jbb.de@...> wrote:

Hi all,

I have some remarks from a lawyer's perspective who is scanning source code
and/or has to deal with the results from scanning.

1.
It is helpful if the license text file is differently identified from
licensed source files. There are some reasons for that:
- This license text is not licensed under itself.
- The information can be misleading. The LGPL-2.1 would be LGPL-2.1-only
although all source files might be LGPL-2.1-or-later
- It is good to know whether or not the license text is included in a source
package (and not just referenced). Accordingly, you know if adding the
license text is needed.

2.
Identifiers like "LicenseRef-GPL-3.0-license-text" would be great since you
can see on first view what is in the license file.

3.
I have no interest to know how the license text is licensed itself. All
known FOSS licenses allow copying and distribution. More is not needed.

4.
I have an interest to know whether or not the license text is identical to
the original one (or modified/shortened).

Not sure if this is helpful for you but I hope so.

Best regards,

Till



Am 18.06.20 um 16:32 schrieb Philippe Ombredanne:
Hi Richard:

On Thu, Jun 18, 2020 at 2:57 PM Richard Purdie wrote:

Just to be really clear, the license ID of a given specific
package *is* correct and definitive. What is unclear is the license of
the license information.

The challenge is that one software project can be split into multiple
binary packages and those binary packages can have finer grained
licenses.

For example, gcc which contains libgcc. gcc is GPL-3.0 and libgcc is
the under the runtime license exception. We specifically mark the
binary packages with the correct license.

This isn't enough for some legal departments and some licenses, we have
to have the full license text somewhere. We have options:

a) Include the full license text in every binary package
b) Have a licence package per test and require each binary package to
depend on that license package
c) As per b) but have the package management or tools figure out the
dependencies if requested
d) Have a license package per piece of software containing all the
licensing texts for that piece of software.

There are pros and cons for all of these, some of the issues are very
significant, particularly in a constrained embedded system. Rightly or
wrongly, we have d) implemented today and this is consistent with what
other distros like Debian do (although they merge docs and license
info, we split them).

Also, this assumes the licenses can be split into specific individual
chunks. I suspect in some cases this is not possible.

The question is what license is that package in d) under.

Then in this case you can take the same approach as Debian's
packaging: your package in d) can be under its own license unrelated
to the license of the things it contains.

You could state that the license of the packaging of these license
data is under a CC0-1.0. You are not making any assertion about the
license of the licenses which are under whatever license they may be;
and whatever these may be are self-contained in their own license
texts.

This is the approach I take in scancode.
I bundle thousand license texts and I am not reporting any specific
license for these license texts..
Instead I am only declaring that the license data set is under CC0-1.0

As an aside, this might make scancode's [1] processing a little more
complicated ... but this could be fixed if we know we are looking at
the license of Yocto packages somehow.






Re: License of an open source license text

Till Jaeger
 

Hi all,

I have some remarks from a lawyer's perspective who is scanning source code
and/or has to deal with the results from scanning.

1.
It is helpful if the license text file is differently identified from
licensed source files. There are some reasons for that:
- This license text is not licensed under itself.
- The information can be misleading. The LGPL-2.1 would be LGPL-2.1-only
although all source files might be LGPL-2.1-or-later
- It is good to know whether or not the license text is included in a source
package (and not just referenced). Accordingly, you know if adding the
license text is needed.

2.
Identifiers like "LicenseRef-GPL-3.0-license-text" would be great since you
can see on first view what is in the license file.

3.
I have no interest to know how the license text is licensed itself. All
known FOSS licenses allow copying and distribution. More is not needed.

4.
I have an interest to know whether or not the license text is identical to
the original one (or modified/shortened).

Not sure if this is helpful for you but I hope so.

Best regards,

Till



Am 18.06.20 um 16:32 schrieb Philippe Ombredanne:

Hi Richard:

On Thu, Jun 18, 2020 at 2:57 PM Richard Purdie wrote:

Just to be really clear, the license ID of a given specific
package *is* correct and definitive. What is unclear is the license of
the license information.

The challenge is that one software project can be split into multiple
binary packages and those binary packages can have finer grained
licenses.

For example, gcc which contains libgcc. gcc is GPL-3.0 and libgcc is
the under the runtime license exception. We specifically mark the
binary packages with the correct license.

This isn't enough for some legal departments and some licenses, we have
to have the full license text somewhere. We have options:

a) Include the full license text in every binary package
b) Have a licence package per test and require each binary package to
depend on that license package
c) As per b) but have the package management or tools figure out the
dependencies if requested
d) Have a license package per piece of software containing all the
licensing texts for that piece of software.

There are pros and cons for all of these, some of the issues are very
significant, particularly in a constrained embedded system. Rightly or
wrongly, we have d) implemented today and this is consistent with what
other distros like Debian do (although they merge docs and license
info, we split them).

Also, this assumes the licenses can be split into specific individual
chunks. I suspect in some cases this is not possible.

The question is what license is that package in d) under.
Then in this case you can take the same approach as Debian's
packaging: your package in d) can be under its own license unrelated
to the license of the things it contains.

You could state that the license of the packaging of these license
data is under a CC0-1.0. You are not making any assertion about the
license of the licenses which are under whatever license they may be;
and whatever these may be are self-contained in their own license
texts.

This is the approach I take in scancode.
I bundle thousand license texts and I am not reporting any specific
license for these license texts..
Instead I am only declaring that the license data set is under CC0-1.0

As an aside, this might make scancode's [1] processing a little more
complicated ... but this could be fixed if we know we are looking at
the license of Yocto packages somehow.


Re: License of an open source license text

Philippe Ombredanne
 

Hi Richard:

On Thu, Jun 18, 2020 at 2:57 PM Richard Purdie wrote:

Just to be really clear, the license ID of a given specific
package *is* correct and definitive. What is unclear is the license of
the license information.

The challenge is that one software project can be split into multiple
binary packages and those binary packages can have finer grained
licenses.

For example, gcc which contains libgcc. gcc is GPL-3.0 and libgcc is
the under the runtime license exception. We specifically mark the
binary packages with the correct license.

This isn't enough for some legal departments and some licenses, we have
to have the full license text somewhere. We have options:

a) Include the full license text in every binary package
b) Have a licence package per test and require each binary package to
depend on that license package
c) As per b) but have the package management or tools figure out the
dependencies if requested
d) Have a license package per piece of software containing all the
licensing texts for that piece of software.

There are pros and cons for all of these, some of the issues are very
significant, particularly in a constrained embedded system. Rightly or
wrongly, we have d) implemented today and this is consistent with what
other distros like Debian do (although they merge docs and license
info, we split them).

Also, this assumes the licenses can be split into specific individual
chunks. I suspect in some cases this is not possible.

The question is what license is that package in d) under.
Then in this case you can take the same approach as Debian's
packaging: your package in d) can be under its own license unrelated
to the license of the things it contains.

You could state that the license of the packaging of these license
data is under a CC0-1.0. You are not making any assertion about the
license of the licenses which are under whatever license they may be;
and whatever these may be are self-contained in their own license
texts.

This is the approach I take in scancode.
I bundle thousand license texts and I am not reporting any specific
license for these license texts..
Instead I am only declaring that the license data set is under CC0-1.0

As an aside, this might make scancode's [1] processing a little more
complicated ... but this could be fixed if we know we are looking at
the license of Yocto packages somehow.
--
Cordially
Philippe Ombredanne

[1] https://github.com/openembedded/meta-openembedded/blob/612128b46d183934bda7d0c7e224a313fc54d227/meta-oe/classes/scancode.bbclass


Re: License of an open source license text

Richard Purdie
 

On Thu, 2020-06-18 at 11:35 +0000, Zavras, Alexios wrote:
You might want to consider using something more general, like
LicenseRef-FSF-license-text or even LicenseRef-license-text, to use
the same for all license files...
I think "LicenseRef-license-text" is inappropriate as the different
texts have differing licenses so we need something finer grained.

LicenseRef-FSF-license-text would work for the FSF licenses and if
there were a standard we'd probably work to that. I don't think we're
in a position to try and build that standard though so we may have to
go the generic route until any standard emerges where someone collates
that information.

I had kind of hoped SPDX may be able to do that but I can understand
why it may be out of scope.

Cheers,

Richard


Re: License of an open source license text

Richard Purdie
 

On Thu, 2020-06-18 at 14:31 +0200, Philippe Ombredanne wrote:
On Thu, Jun 18, 2020 at 12:37 AM Richard Purdie
<richard.purdie@...> wrote:

If we set the license of the licence text package to include GPL-
3.0,
the legal department blocks the release since they said "no GPL-
3.0".
If you tell them its only the license text, they tell you the
license
is not GPL-3.0 and the license is incorrect. What should the
license
be though?
I think there may be a different perspective to consider: Why
include the GPL text if it does not apply (or for that matter for any
license)?

A license id that is trying to convey more or less that "we included
this license text in this package but it really does not apply to
anything, so please ignore it" may not be the best approach.

Instead, what about correcting the Yocto packaging and include only
the licenses that apply to this package?
Just to be really clear, the license ID of a given specific
package *is* correct and definitive. What is unclear is the license of
the license information.

The challenge is that one software project can be split into multiple
binary packages and those binary packages can have finer grained
licenses.

For example, gcc which contains libgcc. gcc is GPL-3.0 and libgcc is
the under the runtime license exception. We specifically mark the
binary packages with the correct license.

This isn't enough for some legal departments and some licenses, we have
to have the full license text somewhere. We have options:

a) Include the full license text in every binary package
b) Have a licence package per test and require each binary package to
depend on that license package
c) As per b) but have the package management or tools figure out the
dependencies if requested
d) Have a license package per piece of software containing all the
licensing texts for that piece of software.

There are pros and cons for all of these, some of the issues are very
significant, particularly in a constrained embedded system. Rightly or
wrongly, we have d) implemented today and this is consistent with what
other distros like Debian do (although they merge docs and license
info, we split them).

Also, this assumes the licenses can be split into specific individual
chunks. I suspect in some cases this is not possible.

The question is what license is that package in d) under.

If we went for one of the other approaches, we'd be able to remove
license texts that were not "active" but I suspect the implementations
are extremely complex, fragile and overkill. Its also not what we've
been asked to fix.

You also wrote:

We also put the license texts into its own package. Right now
that
package is licensed as "LGPL-2.1 and GPL-3 and GPL-2", the same
as
the overall license.
IMHO that's the root of the problem, you are including and mixing
licenses that may not apply and trying to convey with some id that
these included licenses may not apply.
No, we're not. We're trying to convey that the package contains license
texts under whatever license the license text is under, not binaries
under a specific license. We need an identifier that says "this is all
the license information about piece of software X". We could have a
single identifier however I think its clear that there are going to be
different licenses for different license texts (where they are even
known!).

Cheers,

Richard


Re: License of an open source license text

Philippe Ombredanne
 

Hi Richard:

On Thu, Jun 18, 2020 at 12:37 AM Richard Purdie
<richard.purdie@...> wrote:

If we set the license of the licence text package to include GPL-3.0,
the legal department blocks the release since they said "no GPL-3.0".
If you tell them its only the license text, they tell you the license
is not GPL-3.0 and the license is incorrect. What should the license
be though?
I think there may be a different perspective to consider: Why include
the GPL text if it does not apply (or for that matter for any
license)?

A license id that is trying to convey more or less that "we included
this license text in this package but it really does not apply to
anything, so please ignore it" may not be the best approach.

Instead, what about correcting the Yocto packaging and include only
the licenses that apply to this package?

You also wrote:

We also put the license texts into its own package. Right now that
package is licensed as "LGPL-2.1 and GPL-3 and GPL-2", the same as
the overall license.
IMHO that's the root of the problem, you are including and mixing
licenses that may not apply and trying to convey with some id that
these included licenses may not apply.

--
Cordially
Philippe Ombredanne


Re: License of an open source license text

Alexios Zavras
 

You might want to consider using something more general, like LicenseRef-FSF-license-text or even LicenseRef-license-text, to use the same for all license files...

-- zvr

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of Richard Purdie
Sent: Thursday, 18 June, 2020 13:20
To: Steve Winslow <swinslow@...>
Cc: SPDX-legal <Spdx-legal@...>
Subject: Re: License of an open source license text

On Wed, 2020-06-17 at 20:24 -0400, Steve Winslow wrote:
Hi Richard, thanks for the detailed explanation -- I think I
understand your use case better now.

What I'd suggest would probably be that if you do want to represent
this, one way might be to use a "LicenseRef-" identifier. This is
compatible with (and defined in) the SPDX spec, and REUSE also
includes it as the recommended way to represent licenses that aren't
on the License List. Here are a few links with more details:
https://spdx.github.io/spdx-spec/appendix-IV-SPDX-license-expressions/
(search for
"LicenseRef")
https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifie
rs-in-source-files/ (scroll to
bottom)
https://reuse.software/spec/#license-files
So then you could represent it by something like "LicenseRef-GPL-3.0-
license-text", or whatever else you wanted that starts with
"LicenseRef-" and uses alphanumeric plus hyphens and periods. And then
just indicate somewhere in the documentation what that ID represents.
That way it would still be SPDX-compatible.
Thanks, so to summarise, the answer is that:

a) there are no SPDX identifiers for the license of a license text
b) there are no plans to add any
c) we can create our own namespace as mentioned above and remain
compatible

So we can use "LicenseRef-GPL-3.0-license-text" and similar and move forward from there.

If others ask, it would be good if we can at least try and use a common convention for it too. This is now in the mail archives which should help. We'll recommend this as a standard within the Yocto Project.

Thanks for the help/pointers!

Cheers,

Richard








Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Re: License of an open source license text

Richard Purdie
 

On Wed, 2020-06-17 at 20:24 -0400, Steve Winslow wrote:
Hi Richard, thanks for the detailed explanation -- I think I
understand your use case better now.

What I'd suggest would probably be that if you do want to represent
this, one way might be to use a "LicenseRef-" identifier. This is
compatible with (and defined in) the SPDX spec, and REUSE also
includes it as the recommended way to represent licenses that aren't
on the License List. Here are a few links with more details:
https://spdx.github.io/spdx-spec/appendix-IV-SPDX-license-expressions/ (search for
"LicenseRef")
https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/ (scroll to
bottom)
https://reuse.software/spec/#license-files
So then you could represent it by something like "LicenseRef-GPL-3.0-
license-text", or whatever else you wanted that starts with
"LicenseRef-" and uses alphanumeric plus hyphens and periods. And
then just indicate somewhere in the documentation what that ID
represents. That way it would still be SPDX-compatible.
Thanks, so to summarise, the answer is that:

a) there are no SPDX identifiers for the license of a license text
b) there are no plans to add any
c) we can create our own namespace as mentioned above and remain
compatible

So we can use "LicenseRef-GPL-3.0-license-text" and similar and move
forward from there.

If others ask, it would be good if we can at least try and use a common
convention for it too. This is now in the mail archives which should
help. We'll recommend this as a standard within the Yocto Project.

Thanks for the help/pointers!

Cheers,

Richard


Re: License of an open source license text

Steve Winslow
 

Hi Richard, thanks for the detailed explanation -- I think I understand your use case better now.

What I'd suggest would probably be that if you do want to represent this, one way might be to use a "LicenseRef-" identifier. This is compatible with (and defined in) the SPDX spec, and REUSE also includes it as the recommended way to represent licenses that aren't on the License List. Here are a few links with more details:
So then you could represent it by something like "LicenseRef-GPL-3.0-license-text", or whatever else you wanted that starts with "LicenseRef-" and uses alphanumeric plus hyphens and periods. And then just indicate somewhere in the documentation what that ID represents. That way it would still be SPDX-compatible.

Best,
Steve

On Wed, Jun 17, 2020 at 6:37 PM Richard Purdie <richard.purdie@...> wrote:
Hi Steve,

Thanks for the reply, it matches my first take on understanding this
situation and is what we do today, however, we're seeing some push back
from our users and I do think they have a point to some extent. I had
hoped there was an existing solution/convention we could follow but it
appears not. I'm going to play devils advocate and put a case for why
this matters to users and may be something SPDX needs to think about.

For better or worse there are legal departments who tell engineers that
products must contain "no GPL-3.0". I'm not getting into whether that
is a good thing to do or not, its mandated in companies and engineers
are being asked to solve the problem. I'm picking on GPL-3.0 but it
could be any license.

In YP, a single piece of software can generate multiple binary
packages. We can put non-GPL-3.0 components in one package, the GPL-3.0
in another and the end user can select to install only the non-GPL-3.0
packages into their image. Binary packages have an associated license.
We also generate a "license text" package which contains the license
texts this piece of software is under. There is only one license text
package per piece of software, its impractical to split those apart.

There are licenses which require the license texts to be included with
the binaries. For this reason, some legal departments require the
license text package be included in the image in all cases. They audit
their compliance by looking at the licenses of the packages installed
into a given image.

If we set the license of the licence text package to include GPL-3.0,
the legal department blocks the release since they said "no GPL-3.0".
If you tell them its only the license text, they tell you the license
is not GPL-3.0 and the license is incorrect. What should the license
be though?

To handle this, I think YP is going to have to identify these cases as
something like "GPL-3.0-license-text". It will then be up to the end
users to decide whether they can include such a thing in their images
or not. YP has been tentatively moving toward using SDPX license
identifiers where we can. It would be nice if we could standardise
this, if not, we would just accept that YP use SDPX as a guide but has
to handle cases like this which SPDX doesn't.

The thing I like about this approach is we have no need to make any
claim about what license "GPL-3.0-license-text" is actually under, just
that it is the license text alone.

Hopefully that gives a bit more insight into the challenges our users
are running into. I guess the question is therefore whether such a
convention is something SPDX could/should support? I'm equally open to
other ideas but I think we (YP) will have to do something one way or
another.

Cheers,

Richard



On Wed, 2020-06-17 at 17:11 -0400, Steve Winslow wrote:
> Hi Richard,
>
> Thanks for your email. A couple of thoughts, speaking just for
> myself:
>
> When it comes to the question of "what license applies to a license
> text," I think this is something that has typically been seen as
> outside the scope of the SPDX License List. The licenses on the list
> cover those used for software as well as other types of open
> collaboration (e.g. open hardware, data, etc). But I don't think the
> license list has gotten into (or has plans to get into) including
> identifiers for which licenses apply to licenses themselves.
>
> I'm not sure if I followed the specifics of the Yocto use case you
> described. I think that in most cases where I've seen folks
> associating SPDX license identifiers with files, they would generally
> just use the license that is reflected by the license text itself. So
> for instance, when seeing a file containing the text of MPL-2.0, in
> an SPDX document they would note the license for that file as MPL-2.0
> -- rather than whatever the license of the MPL-2.0 license text might
> hypothetically be. I don't know that I'm describing it well, but
> that's how I'd think of it, since that conveys the information that
> is really relevant to users of that code.
>
> Looking at REUSE (
> https://reuse.software/spec/#copyright-and-licensing-information), it
> looks to me like they take a different but similar approach, where
> license files themselves do not have meta-licensing information
> associated with them. I know there are some REUSE folks on this list
> so I hope they'll speak up if I'm mischaracterizing this.
>
> Not sure if I've answered your question... but basically I would just
> recommend associating the license's own identifier with the license
> text file, since that will be the most comprehensible to folks who
> are looking to understand the software package's license.
>
> Hope this helps,
> Steve
>
> On Tue, May 26, 2020 at 3:25 PM Richard Purdie <
> richard.purdie@...> wrote:
> > Hi,
> >
> > I work on the Yocto Project and we use SDPX identifiers when
> > working
> > with open source licenses. An issue has come up and it was
> > suggested I
> > ask about it here.
> >
> > The question is quite simple:
> >
> > Which licence are we using when we share just the license text?
> >
> > The background is more complex:
> >
> > YP has some software which is under "LGPL-2.1 and GPL-3 and GPL-2"
> > where one source file is v3, the rest are under other licenses.
> >
> > When we build that software, multiple binaries result, we group
> > them
> > into different packages and can be specific about which licences
> > each
> > binary is under. If no GPLv3 code is in there, it can be under the
> > other licenses.
> >
> > We also put the license texts into its own package. Right now that
> > package is licensed as "LGPL-2.1 and GPL-3 and GPL-2", the same as
> > the
> > overall license.
> >
> > The problem is if someone excludes GPL-3 from their images, they
> > can
> > exclude specific packages but they also exclude the license package
> > which isn't what they want.
> >
> > If the license text is under GPL-3 then this is unfortunate but we
> > could just have to tell people to live with that. If it isn't but
> > is
> > under a different license (or a subset of it), what license do we
> > put
> > down for that package? I don't believe there is no SPDX identifier
> > we
> > can use?
> >
> > To be clear, we don't want to modify the license itself but want to
> > list something in the license field of our binary package which
> > says
> > what its license is.
> >
> > Another way of putting is what is the license identifier for:
> >
> > "Everyone is permitted to copy and distribute verbatim copies of
> > this license document, but changing it is not allowed."
> >
> > (quoted from the GPL)
> >
> > Cheers,
> >
> > Richard
> >
> >
> >
> >
>
>



--
Steve Winslow
Director of Strategic Programs
The Linux Foundation


Re: License of an open source license text

Richard Purdie
 

Hi Steve,

Thanks for the reply, it matches my first take on understanding this
situation and is what we do today, however, we're seeing some push back
from our users and I do think they have a point to some extent. I had
hoped there was an existing solution/convention we could follow but it
appears not. I'm going to play devils advocate and put a case for why
this matters to users and may be something SPDX needs to think about.

For better or worse there are legal departments who tell engineers that
products must contain "no GPL-3.0". I'm not getting into whether that
is a good thing to do or not, its mandated in companies and engineers
are being asked to solve the problem. I'm picking on GPL-3.0 but it
could be any license.

In YP, a single piece of software can generate multiple binary
packages. We can put non-GPL-3.0 components in one package, the GPL-3.0
in another and the end user can select to install only the non-GPL-3.0
packages into their image. Binary packages have an associated license.
We also generate a "license text" package which contains the license
texts this piece of software is under. There is only one license text
package per piece of software, its impractical to split those apart.

There are licenses which require the license texts to be included with
the binaries. For this reason, some legal departments require the
license text package be included in the image in all cases. They audit
their compliance by looking at the licenses of the packages installed
into a given image.

If we set the license of the licence text package to include GPL-3.0,
the legal department blocks the release since they said "no GPL-3.0".
If you tell them its only the license text, they tell you the license
is not GPL-3.0 and the license is incorrect. What should the license
be though?

To handle this, I think YP is going to have to identify these cases as
something like "GPL-3.0-license-text". It will then be up to the end
users to decide whether they can include such a thing in their images
or not. YP has been tentatively moving toward using SDPX license
identifiers where we can. It would be nice if we could standardise
this, if not, we would just accept that YP use SDPX as a guide but has
to handle cases like this which SPDX doesn't.

The thing I like about this approach is we have no need to make any
claim about what license "GPL-3.0-license-text" is actually under, just
that it is the license text alone.

Hopefully that gives a bit more insight into the challenges our users
are running into. I guess the question is therefore whether such a
convention is something SPDX could/should support? I'm equally open to
other ideas but I think we (YP) will have to do something one way or
another.

Cheers,

Richard

On Wed, 2020-06-17 at 17:11 -0400, Steve Winslow wrote:
Hi Richard,

Thanks for your email. A couple of thoughts, speaking just for
myself:

When it comes to the question of "what license applies to a license
text," I think this is something that has typically been seen as
outside the scope of the SPDX License List. The licenses on the list
cover those used for software as well as other types of open
collaboration (e.g. open hardware, data, etc). But I don't think the
license list has gotten into (or has plans to get into) including
identifiers for which licenses apply to licenses themselves.

I'm not sure if I followed the specifics of the Yocto use case you
described. I think that in most cases where I've seen folks
associating SPDX license identifiers with files, they would generally
just use the license that is reflected by the license text itself. So
for instance, when seeing a file containing the text of MPL-2.0, in
an SPDX document they would note the license for that file as MPL-2.0
-- rather than whatever the license of the MPL-2.0 license text might
hypothetically be. I don't know that I'm describing it well, but
that's how I'd think of it, since that conveys the information that
is really relevant to users of that code.

Looking at REUSE (
https://reuse.software/spec/#copyright-and-licensing-information), it
looks to me like they take a different but similar approach, where
license files themselves do not have meta-licensing information
associated with them. I know there are some REUSE folks on this list
so I hope they'll speak up if I'm mischaracterizing this.

Not sure if I've answered your question... but basically I would just
recommend associating the license's own identifier with the license
text file, since that will be the most comprehensible to folks who
are looking to understand the software package's license.

Hope this helps,
Steve

On Tue, May 26, 2020 at 3:25 PM Richard Purdie <
richard.purdie@...> wrote:
Hi,

I work on the Yocto Project and we use SDPX identifiers when
working
with open source licenses. An issue has come up and it was
suggested I
ask about it here.

The question is quite simple:

Which licence are we using when we share just the license text?

The background is more complex:

YP has some software which is under "LGPL-2.1 and GPL-3 and GPL-2"
where one source file is v3, the rest are under other licenses.

When we build that software, multiple binaries result, we group
them
into different packages and can be specific about which licences
each
binary is under. If no GPLv3 code is in there, it can be under the
other licenses.

We also put the license texts into its own package. Right now that
package is licensed as "LGPL-2.1 and GPL-3 and GPL-2", the same as
the
overall license.

The problem is if someone excludes GPL-3 from their images, they
can
exclude specific packages but they also exclude the license package
which isn't what they want.

If the license text is under GPL-3 then this is unfortunate but we
could just have to tell people to live with that. If it isn't but
is
under a different license (or a subset of it), what license do we
put
down for that package? I don't believe there is no SPDX identifier
we
can use?

To be clear, we don't want to modify the license itself but want to
list something in the license field of our binary package which
says
what its license is.

Another way of putting is what is the license identifier for:

"Everyone is permitted to copy and distribute verbatim copies of
this license document, but changing it is not allowed."

(quoted from the GPL)

Cheers,

Richard




Re: License of an open source license text

Russ Allbery
 

"Steve Winslow" <swinslow@...> writes:

But I don't think the license list has gotten into (or has plans to get
into) including identifiers for which licenses apply to licenses
themselves.
It might be worth noting that one reason for this is that some license
texts are not themselves released under an open source license, and thus
the license of the license text itself would not be eligible for an SPDX
identifier. This is the case for the GPLv3, for example, whose license,
as Richard noted, is:

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

This is not an open source license (fairly obviously, since it doesn't
allow any modifications), and thus is outside the scope of the SPDX
project.

If you're thinking that it feels a little odd to include a document that
is clearly not open source inside an open source project, you're not
alone, but it's been this way since before the term "open source" became
popular (the GPLv1 is under the same license), and the open source and
free software communities have essentially decided to ignore this problem
for lack of a good alternative. License texts themselves are therefore
generally considered outside the scope of any promises about the license
agreements of open source or free software.

--
Russ Allbery (eagle@...) <https://www.eyrie.org/~eagle/>


Re: License of an open source license text

Steve Winslow
 

Hi Richard,

Thanks for your email. A couple of thoughts, speaking just for myself:

When it comes to the question of "what license applies to a license text," I think this is something that has typically been seen as outside the scope of the SPDX License List. The licenses on the list cover those used for software as well as other types of open collaboration (e.g. open hardware, data, etc). But I don't think the license list has gotten into (or has plans to get into) including identifiers for which licenses apply to licenses themselves.

I'm not sure if I followed the specifics of the Yocto use case you described. I think that in most cases where I've seen folks associating SPDX license identifiers with files, they would generally just use the license that is reflected by the license text itself. So for instance, when seeing a file containing the text of MPL-2.0, in an SPDX document they would note the license for that file as MPL-2.0 -- rather than whatever the license of the MPL-2.0 license text might hypothetically be. I don't know that I'm describing it well, but that's how I'd think of it, since that conveys the information that is really relevant to users of that code.

Looking at REUSE (https://reuse.software/spec/#copyright-and-licensing-information), it looks to me like they take a different but similar approach, where license files themselves do not have meta-licensing information associated with them. I know there are some REUSE folks on this list so I hope they'll speak up if I'm mischaracterizing this.

Not sure if I've answered your question... but basically I would just recommend associating the license's own identifier with the license text file, since that will be the most comprehensible to folks who are looking to understand the software package's license.

Hope this helps,
Steve


On Tue, May 26, 2020 at 3:25 PM Richard Purdie <richard.purdie@...> wrote:
Hi,

I work on the Yocto Project and we use SDPX identifiers when working
with open source licenses. An issue has come up and it was suggested I
ask about it here.

The question is quite simple:

Which licence are we using when we share just the license text?

The background is more complex:

YP has some software which is under "LGPL-2.1 and GPL-3 and GPL-2"
where one source file is v3, the rest are under other licenses.

When we build that software, multiple binaries result, we group them
into different packages and can be specific about which licences each
binary is under. If no GPLv3 code is in there, it can be under the
other licenses.

We also put the license texts into its own package. Right now that
package is licensed as "LGPL-2.1 and GPL-3 and GPL-2", the same as the
overall license.

The problem is if someone excludes GPL-3 from their images, they can
exclude specific packages but they also exclude the license package
which isn't what they want.

If the license text is under GPL-3 then this is unfortunate but we
could just have to tell people to live with that. If it isn't but is
under a different license (or a subset of it), what license do we put
down for that package? I don't believe there is no SPDX identifier we
can use?

To be clear, we don't want to modify the license itself but want to
list something in the license field of our binary package which says
what its license is.

Another way of putting is what is the license identifier for:

"Everyone is permitted to copy and distribute verbatim copies of
this license document, but changing it is not allowed."

(quoted from the GPL)

Cheers,

Richard






--
Steve Winslow
Director of Strategic Programs
The Linux Foundation


Re: Validate license cross references: New fields to be added

Kaelbling, Michael <michael.kaelbling@...>
 

In the spirit of “any suggestions and/or modifications will be very much appreciated”, I have inserted comments below.

 

From: Spdx-legal@... <Spdx-legal@...> On Behalf Of Smith Tanjong Agbor
Sent: Wednesday, June 17, 2020 12:32
To: Spdx-tech@...; Spdx-legal@...
Cc: Gary O'Neall <gary@...>; swinslow@...
Subject: Validate license cross references: New fields to be added

 

Hi all,

 

I am working on a Google Summer of Code project that emanates from this discussion/issue; concerning the validation of license cross references. Here is a link to my GSOC proposal.

 

The focus is on improving the LicenseListPublisher repository to have generated license data updated with fields on the validity of the crossref, among others. 

 

Inorder to do this, the structure of the crossref shall change(in some cases, eg JSON), and in others, there shall be additional tags. In general the following are fields which shall be added to the crossrefs:

 

"isValid": true/false,

Indicates whether or not the crossref url is a valid url (ex: not some local file link)

Must a valid URL be based on one of only two/three schemes: http, https, and ftp? Is http://localhost/ or https://127.0.0.1 valid?


"isWayBackLink": true/false,

Indicates whether or not the url is a link from a previous version(wayback machine) of the site(where the license is located)


"extraText": true/false,

Indicates whether or not the license from the url has extra text in its description when compared to the license description in the current file.


"isMatch": true/false,

Indicates whether or not the license from the url link matches(perfectly) the license description in the current file.

Rather than true/false perhaps allow the name of the matched algorithm:
verbatim
noassertion – if no test result is available (for invalid links perhaps)
todo – no match attempted

“” – no match asserted

verbatim2 – matches with \r == \r\n == \n
verbatim3 – matches “ignoring whitespace differences” reflowed text

verbatim4 – matches ignoring decoration (comments, flower-boxes)
template – matches template verbatim (see ppalaga’s comment)
et cetera as they become available

This is the url of the license text/description


"isDead": true/false

Indicates whether or not the url is a dead link(a link that returns a page different from HTTP_200, could be bad request HTTP_400, not found HTTP_404, forbidden HTTP_403, etc)

Rather than true/false (since dead sites can be reanimated), how about a date for the most-recent HTTP-200 response? “dateMRHTTP200”: “UTC date”

 

Please consider this as a proposal and any suggestions and/or modifications will be very much appreciated.

 

Thanks,

Smith

 

 


Meeting tomorrow, June 18

Steve Winslow
 

Hello all,

The next regularly-scheduled SPDX legal team meeting will be tomorrow, Thursday, June 18, at 9AM PDT / noon EDT.

I'm hoping that we can take a few minutes at the beginning of the meeting to discuss:
1) follow-up from the joint tech/legal team call last week; and
2) the GSoC work and proposal from Smith Tanjong Agbor that was shared with the mailing list earlier today, see https://lists.spdx.org/g/Spdx-legal/message/2821

After that, we will review status updates on the various open issues for 3.10.

Best,
Steve

= = = = =

Join Zoom Meeting
https://zoom.us/j/611416785

Meeting ID: 611 416 785

One tap mobile
+16465588656,,611416785# US (New York)
+16699006833,,611416785# US (San Jose)

Dial by your location
        +1 646 558 8656 US (New York)
        +1 669 900 6833 US (San Jose)
        877 369 0926 US Toll-free
        855 880 1246 US Toll-free
        +1 647 558 0588 Canada
        855 703 8985 Canada Toll-free
Meeting ID: 611 416 785
Find your local number: https://zoom.us/u/aceZFvRyln

--
Steve Winslow
Director of Strategic Programs
The Linux Foundation


Re: Validate license cross references: New fields to be added

Brad Edmondson
 

Hi Smith,

Thanks for your well-laid-out email and your GSoC proposal. Trying to think about this from the perspective of the LicenseListPublisher repository over time, I would imagine the validity and other status of links could change over time. Links can linkrot, http-302 forwards can differ one day to the next, and the license text presented in HTML at a specific URL could be, and sometimes is, altered -- either with or without explicitly versioning the license. I think this necessitates some way of recording or representing validity information as a point-in-time, at minimum with a lastChecked value (e.g., UTC). There may be use cases for representing validity over periods of time, for example:
  • Time-series: (in daily checks tagged with UTC): valid-valid-valid-invalid-invalid-valid
  • Last-known-modified: perhaps lastChecked and lastChanged so that one could say "this was checked every week since X date and hasn't changed)
  • Other: other time-related information that tooling providers might want

Then, I wasn't sure if isValid represented a valid regex-matchable URL (which presumably could be local, or more likely, corporate intranet), or both validly-formed according to regex and accessible from [some place on] the global internet. In theory that might depend on DNS, firewall configurations, or both, which are subject to change or manipulation to e.g. mitigate DDoS, find the physically closest webserver for a CDN, or block specific IPs sending malicious traffic. When it comes down to the "bits on the wire," the server has the option whether and how to respond to a request, and the server can (and occasionally does) make its decision based on these types of connection metadata describing the "from" side of the connection. So in theory it may make sense to include things like the source IP address of the system performing the validation attempt. That raises privacy issues, although if it came from a Linux Foundation system (or something similar), then hiding the validating system's IP address wouldn't necessarily be a requirement. So it may make sense to evaluation these kinds of contextual data points, along with clarifying in the isValid name or definition which validity-check you mean for it to represent. At minimum, it's worth thinking through these things and how we would deal with the edge cases introduced by relying on DNS and http to perform what is ultimately a connection-based point-in-time check.

Best,
Brad Edmondson

PS: Personally I am not in favor of SPDX tracking the validity of license-text links, but then again I am coming at this as a contributor on the SPDX-legal side of things, and not on the SPDX tech team nor a frequent user of tooling. If the tech team is happy with this idea generally, and with fully owning the process and collected data on the LicenseListPublisher side, then I would have no objection from the legal side. (Also, of course, I only represent my own view and not the official or finalized position of the legal team.)

--
Brad Edmondson, Esq.
brad.edmondson@...


On Wed, Jun 17, 2020 at 6:31 AM Smith Tanjong Agbor <stanjongagbor@...> wrote:
Hi all,

I am working on a Google Summer of Code project that emanates from this discussion/issue; concerning the validation of license cross references. Here is a link to my GSOC proposal.

The focus is on improving the LicenseListPublisher repository to have generated license data updated with fields on the validity of the crossref, among others. 

Inorder to do this, the structure of the crossref shall change(in some cases, eg JSON), and in others, there shall be additional tags. In general the following are fields which shall be added to the crossrefs:

"isValid": true/false,
Indicates whether or not the crossref url is a valid url (ex: not some local file link)

"isWayBackLink": true/false,
Indicates whether or not the url is a link from a previous version(wayback machine) of the site(where the license is located)

"extraText": true/false,
Indicates whether or not the license from the url has extra text in its description when compared to the license description in the current file.

"isMatch": true/false,
Indicates whether or not the license from the url link matches(perfectly) the license description in the current file.
This is the url of the license text/description

"isDead": true/false
Indicates whether or not the url is a dead link(a link that returns a page different from HTTP_200, could be bad request HTTP_400, not found HTTP_404, forbidden HTTP_403, etc)

Please consider this as a proposal and any suggestions and/or modifications will be very much appreciated.

Thanks,
Smith



Validate license cross references: New fields to be added

Smith Tanjong Agbor
 

Hi all,

I am working on a Google Summer of Code project that emanates from this discussion/issue; concerning the validation of license cross references. Here is a link to my GSOC proposal.

The focus is on improving the LicenseListPublisher repository to have generated license data updated with fields on the validity of the crossref, among others. 

Inorder to do this, the structure of the crossref shall change(in some cases, eg JSON), and in others, there shall be additional tags. In general the following are fields which shall be added to the crossrefs:

"isValid": true/false,
Indicates whether or not the crossref url is a valid url (ex: not some local file link)

"isWayBackLink": true/false,
Indicates whether or not the url is a link from a previous version(wayback machine) of the site(where the license is located)

"extraText": true/false,
Indicates whether or not the license from the url has extra text in its description when compared to the license description in the current file.

"isMatch": true/false,
Indicates whether or not the license from the url link matches(perfectly) the license description in the current file.
This is the url of the license text/description

"isDead": true/false
Indicates whether or not the url is a dead link(a link that returns a page different from HTTP_200, could be bad request HTTP_400, not found HTTP_404, forbidden HTTP_403, etc)

Please consider this as a proposal and any suggestions and/or modifications will be very much appreciated.

Thanks,
Smith


441 - 460 of 3280