explanation for ensuring no duplicate identifiers


J Lovejoy
 

Hi all,

As discussed on the call today (related to Issue https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit statement regarding not duplicating identifiers in the explanation of fields on the Overview page for the license list: https://spdx.org/spdx-license-list/license-list-overview

I had the task to suggest some additional language, including discussion about character type.  The relevant text is as follows, with proposed changes in red:

B) License or Exception Identifier (aka "SPDX Short Identifier")

• Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere
• Short identifiers have no spaces in them and only use ASCII characters
• Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception
• Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format
• Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.)
• Short identifiers should be as short in length as possible while staying consistent with all other naming criteria
• Short identifiers must not be duplicative: newly added short identifiers will be checked to ensure they are different from all pre-existing short identifiers, regardless of upper/lower case


Let me know your thoughts,

Jilayne

SPDX Legal Team co-lead
opensource@...



Alexios Zavras
 

My only comment would be to change “ASCII characters” to “ASCII printable characters”.

 

Looking at the Overview page, it needs a little care:

  • It documents “Is OSI approved?” but not “Is FSF Free/Libre?”
  • It references the “spreadsheet” in a couple of places

 

 

-- zvr –

 

From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Thursday, 14 June, 2018 21:28
To: SPDX-legal <spdx-legal@...>
Subject: explanation for ensuring no duplicate identifiers

 

Hi all,

 

As discussed on the call today (related to Issue https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit statement regarding not duplicating identifiers in the explanation of fields on the Overview page for the license list: https://spdx.org/spdx-license-list/license-list-overview

 

I had the task to suggest some additional language, including discussion about character type.  The relevant text is as follows, with proposed changes in red:

 

B) License or Exception Identifier (aka "SPDX Short Identifier")

               • Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere

               • Short identifiers have no spaces in them and only use ASCII characters

               • Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception

               • Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format

               • Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.)

               • Short identifiers should be as short in length as possible while staying consistent with all other naming criteria

               • Short identifiers must not be duplicative: newly added short identifiers will be checked to ensure they are different from all pre-existing short identifiers, regardless of upper/lower case

 

 

Let me know your thoughts,

 

Jilayne

 

SPDX Legal Team co-lead
opensource@...

 

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Philippe Ombredanne
 

Alexios:
good catch, though even printable may be too generous. A colon is
printable and not a supported in a Windows file name for instance.

Jilayne:
We could/should more simply list the allowed characters and be very specific.
Here is my suggestion:

Allowed characters are ASCII:
- Lower and upper case letters from A to Z.
- Numbers from 0 to 9
- Dash '-', underscore '_', period '.' and plus '+'

- An ID first character must be a letter or number.
- Ignoring case, an ID is guaranteed to be unique forever within any
published SPDX license lists.
--
Philippe

On Fri, Jun 15, 2018 at 9:12 AM, Alexios Zavras
<alexios.zavras@...> wrote:
My only comment would be to change “ASCII characters” to “ASCII printable
characters”.



Looking at the Overview page, it needs a little care:

It documents “Is OSI approved?” but not “Is FSF Free/Libre?”
It references the “spreadsheet” in a couple of places





-- zvr –



From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J
Lovejoy
Sent: Thursday, 14 June, 2018 21:28
To: SPDX-legal <spdx-legal@...>
Subject: explanation for ensuring no duplicate identifiers



Hi all,



As discussed on the call today (related to Issue
https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit
statement regarding not duplicating identifiers in the explanation of fields
on the Overview page for the license list:
https://spdx.org/spdx-license-list/license-list-overview



I had the task to suggest some additional language, including discussion
about character type. The relevant text is as follows, with proposed
changes in red:



B) License or Exception Identifier (aka "SPDX Short Identifier")

• Short identifier to be used to identify a license or
exception match to licenses or exceptions contained on the SPDX License List
in the context of an SPDX file, in source file, or elsewhere

• Short identifiers have no spaces in them and only use ASCII
characters

• Short identifiers consist of an abbreviation based on a
common short name or acronym for the license or exception

• Where applicable, the abbreviation will be followed by a
dash and then the version number, in X.Y format

• Where applicable, and if possible, the short identifier
should be harmonized with other well-known open source naming sources (i.e.,
OSI, Fedora, etc.)

• Short identifiers should be as short in length as possible
while staying consistent with all other naming criteria

• Short identifiers must not be duplicative: newly added
short identifiers will be checked to ensure they are different from all
pre-existing short identifiers, regardless of upper/lower case





Let me know your thoughts,



Jilayne



SPDX Legal Team co-lead
opensource@...



Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928



--
Cordially
Philippe Ombredanne


Kate Stewart
 



On Fri, Jun 15, 2018 at 12:25 PM, Philippe Ombredanne <pombredanne@...> wrote:
Alexios:
good catch, though even printable may be too generous. A colon is
printable and not a supported in a Windows file name for instance.

Jilayne:
We could/should more simply list the allowed characters and be very specific.
Here is my suggestion:

Allowed characters are ASCII:
- Lower and upper case letters from A to Z.
- Numbers from 0 to 9
- Dash '-', underscore '_',  period '.' and plus '+'

need to be a little careful here Philippe...

"+" is reserved for license expressions. 

Best to stick with what's in Appendix IV of the spec today

idstring              = 1*(ALPHA / DIGIT / "-" / "." )

where ALPHA and DIGIT are per definition in

 ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
 DIGIT          =  %x30-39             ; 0-9


If you want to see "_" added, then probably should open an issue
against the spec for 2.2 and get it consistent tthroughout.

Thanks, Kate
 


Alexios Zavras
 

OK, so the wording for the Overview page could be something like:

 

               • Short identifiers have no spaces in them consist of ASCII letters (A-Za-z), digits (0-9), full stops (.) and hyphen or minus signs (-)

 

 

 

-- zvr –

 

From: Kate Stewart <kstewart@...>
Sent: Friday, 15 June, 2018 19:51
To: Philippe Ombredanne <pombredanne@...>
Cc: Zavras, Alexios <alexios.zavras@...>; J Lovejoy <opensource@...>; SPDX-legal <spdx-legal@...>
Subject: Re: explanation for ensuring no duplicate identifiers

 

 

 

On Fri, Jun 15, 2018 at 12:25 PM, Philippe Ombredanne <pombredanne@...> wrote:

Alexios:
good catch, though even printable may be too generous. A colon is
printable and not a supported in a Windows file name for instance.

Jilayne:
We could/should more simply list the allowed characters and be very specific.
Here is my suggestion:

Allowed characters are ASCII:
- Lower and upper case letters from A to Z.
- Numbers from 0 to 9
- Dash '-', underscore '_',  period '.' and plus '+'

 

need to be a little careful here Philippe...

 

"+" is reserved for license expressions. 

 

Best to stick with what's in Appendix IV of the spec today

 

idstring              = 1*(ALPHA / DIGIT / "-" / "." )

where ALPHA and DIGIT are per definition in

 

 ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
 DIGIT          =  %x30-39             ; 0-9

 

 

If you want to see "_" added, then probably should open an issue

against the spec for 2.2 and get it consistent tthroughout.

 

Thanks, Kate
 

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Philippe Ombredanne
 

On Fri, Jun 15, 2018 at 7:51 PM, Kate Stewart
<kstewart@...> wrote:


On Fri, Jun 15, 2018 at 12:25 PM, Philippe Ombredanne <pombredanne@...>
wrote:

Alexios:
good catch, though even printable may be too generous. A colon is
printable and not a supported in a Windows file name for instance.

Jilayne:
We could/should more simply list the allowed characters and be very
specific.
Here is my suggestion:

Allowed characters are ASCII:
- Lower and upper case letters from A to Z.
- Numbers from 0 to 9
- Dash '-', underscore '_', period '.' and plus '+'

need to be a little careful here Philippe...

"+" is reserved for license expressions.
I listed this because SPDX has issued ids that contained a + in the past.
But that's minor alright!

Best to stick with what's in Appendix IV of the spec today

idstring = 1*(ALPHA / DIGIT / "-" / "." )

where ALPHA and DIGIT are per definition in
https://tools.ietf.org/html/rfc5234

ALPHA = %x41-5A / %x61-7A ; A-Z / a-z

DIGIT = %x30-39 ; 0-9



If you want to see "_" added, then probably should open an issue
against the spec for 2.2 and get it consistent tthroughout.
I do not care much for the underscore. Good catch!

--
Cordially
Philippe Ombredanne


W. Trevor King
 

On Thu, Jun 14, 2018 at 01:28:11PM -0600, J Lovejoy wrote:
• Short identifiers must not be duplicative: newly added short
identifiers will be checked to ensure they are different from all
pre-existing short identifiers, regardless of upper/lower case
Wherever we put this commitment, I think we also want something like
[1]:

List consumers are enouraged to use the canonical identifier casing,
but this uniqueness commitment ensures that case-insensitive
comparison with listed identifiers will be unambiguous.

In a spec, I'd make that a SHOULD recommendation [2]. Encouraging the
use of canonical casing reduces the need for a case-canonicalizer [3],
by giving tools that choose not to implement a canonicalizer something
to point at if/when users complain about unrecognized, non-canonical
identifiers. It also makes it less likely that tools decide to change
the case without thinking about downstream compatibility (e.g. [4]).

I also prefer focusing on the list state (across versions) instead of
using "newly added" to focus on changes to the list state. For
example, my earlier wording [5]:

This project commits to never, in any past or future version,
contain identifiers which differ only in case but have different
semantics.

makes it clear that the current list is already free of case-insentive
ambiguity. With the "newly added" wording, we could already have
case-insentive ambiguous IDs and just be committing to not adding
more.

Cheers,
Trevor

[1]: https://github.com/spdx/license-list-XML/pull/651/files#diff-04c6e90faac2675aa89e2176d2eec7d8R16
[2]: https://tools.ietf.org/html/rfc2119#section-3
[3]: https://github.com/spdx/spdx-spec/issues/63#issuecomment-366370691
[4]: https://github.com/benbalter/licensee/issues/282
[5]: https://github.com/spdx/license-list-XML/pull/651/files#diff-04c6e90faac2675aa89e2176d2eec7d8R14

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


J Lovejoy
 

I’ve now updated the page further to remove references to the spreadsheet and a few other minor outdated items.

I also updated the field names to be more accurate and consistent with what one sees on the website.

I’ve added a placeholder for Is FSF Free/Libre - but we need to come up with a description for that (working on that otherwise)


Back to the Short Identifier additions are: characters, given the various feedback on this thread, here is an updated suggestion. Tried to separate out the non-duplicative aspect and case insensitive nature (but preference for use of case sensitive) - not sure this is the best wording however, but want to keep it concise!

B) Short Identifier

• Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere
• Short identifiers have no spaces in them consist of ASCII letters (A-Za-z), digits (0-9), full stops (.) and hyphen or minus signs (-)
• Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception
• Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format
• Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.)
• Short identifiers should be as short in length as possible while staying consistent with all other naming criteria
• Short identifiers must not be duplicative and must be different from all pre-existing short identifiers.
• While short identifiers can be treated as case insensitive, it is encouraged to use the canonical short identifier casing.



Jilayne

On Jun 15, 2018, at 1:12 AM, Alexios Zavras <alexios.zavras@...> wrote:

My only comment would be to change “ASCII characters” to “ASCII printable characters”.
 
Looking at the Overview page, it needs a little care:
  • It documents “Is OSI approved?” but not “Is FSF Free/Libre?”
  • It references the “spreadsheet” in a couple of places
 
 
-- zvr –
 
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of J Lovejoy
Sent: Thursday, 14 June, 2018 21:28
To: SPDX-legal <spdx-legal@...>
Subject: explanation for ensuring no duplicate identifiers
 
Hi all,
 
As discussed on the call today (related to Issue https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit statement regarding not duplicating identifiers in the explanation of fields on the Overview page for the license list: https://spdx.org/spdx-license-list/license-list-overview
 
I had the task to suggest some additional language, including discussion about character type.  The relevant text is as follows, with proposed changes in red:
 

B) License or Exception Identifier (aka "SPDX Short Identifier")

               • Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere
               • Short identifiers have no spaces in them and only use ASCII characters
               • Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception
               • Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format
               • Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.)
               • Short identifiers should be as short in length as possible while staying consistent with all other naming criteria
               • Short identifiers must not be duplicative: newly added short identifiers will be checked to ensure they are different from all pre-existing short identifiers, regardless of upper/lower case
 
 
Let me know your thoughts,
 
Jilayne
 

SPDX Legal Team co-lead
opensource@...

 

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928



W. Trevor King
 

On Thu, Jul 05, 2018 at 06:16:56PM -0600, J Lovejoy wrote:
Back to the Short Identifier additions are: characters, given the
various feedback on this thread, here is an updated suggestion.
I'm fine with this proposal going out as you have it, but I've put a
few suggestions inline in case you want to pick them up.

• Short identifier to be used to identify a license or exception
match to licenses or exceptions contained on the SPDX License List
in the context of an SPDX file, in source file, or elsewhere
This matches what's currently live [1], but it could probably be
tightened up to something like:

Short identifiers identify licenses and exceptions from the SPDX
License List in the context of an SPDX file, a source file, or
elsewhere

• Short identifiers have no spaces in them consist of ASCII letters
(A-Za-z), digits (0-9), full stops (.) and hyphen or minus signs
(-)
I think it is sufficient to list the allowed characters:

Short identifiers consist of ASCII letters (A-Za-z), digits (0-9),
full stops (.), and hyphen/minus signs (-).

And then, if you want to draw attention to spaces in particular, add a
second sentence to that list item:

They do not contain spaces or other characters except those
mentioned in the previous sentence.

• Where applicable, the abbreviation will be followed by a dash and
then the version number, in X.Y format
This line is currently live [1], but do we need to keep it? Not all
of our versions are X.Y. For example, W3C-19980720 [2] is in YYYYMMDD
format. Perhaps that falls under "where applicable", but why call out
one specific versioning approach?

• Short identifiers must not be duplicative and must be different
from all pre-existing short identifiers.
• While short identifiers can be treated as case insensitive, it is
encouraged to use the canonical short identifier casing.
These cover the two points I think need to get covered. So while I
prefer my previously-suggested wording [3], I'm fine with this
wording.

Cheers,
Trevor

[1]: https://spdx.org/spdx-license-list/license-list-overview
[2]: https://github.com/spdx/license-list-XML/blob/v3.1/src/W3C-19980720.xml
[3]: Subject: Re: explanation for ensuring no duplicate identifiers
Date: Mon, 18 Jun 2018 11:18:34 -0700
Message-ID: <20180618181834.GD25466@valgrind>

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


J Lovejoy
 

HI all,

I’ve updated this, including one of Trevor’s additional edit below for the first bullet (the other suggestion you had was the same as I had, but my strikethrough seemed to have gotten lost in your email!)

I also added the field for the FSF Free/Libre with a description of the intent there. Considering that that field is not entirely complete, I almost put a note as such (“under construction” or the like) but then figured I’d have to remember to remove it later, so I did not add that.


Thanks,
Jilayne

SPDX Legal Team co-lead
opensource@...


On Jul 5, 2018, at 11:41 PM, W. Trevor King <wking@...> wrote:

On Thu, Jul 05, 2018 at 06:16:56PM -0600, J Lovejoy wrote:
Back to the Short Identifier additions are: characters, given the
various feedback on this thread, here is an updated suggestion.

I'm fine with this proposal going out as you have it, but I've put a
few suggestions inline in case you want to pick them up.

• Short identifier to be used to identify a license or exception
 match to licenses or exceptions contained on the SPDX License List
 in the context of an SPDX file, in source file, or elsewhere

This matches what's currently live [1], but it could probably be
tightened up to something like:

 Short identifiers identify licenses and exceptions from the SPDX
 License List in the context of an SPDX file, a source file, or
 elsewhere

• Short identifiers have no spaces in them consist of ASCII letters
 (A-Za-z), digits (0-9), full stops (.) and hyphen or minus signs
 (-)

I think it is sufficient to list the allowed characters:

 Short identifiers consist of ASCII letters (A-Za-z), digits (0-9),
 full stops (.), and hyphen/minus signs (-).

And then, if you want to draw attention to spaces in particular, add a
second sentence to that list item:

 They do not contain spaces or other characters except those
 mentioned in the previous sentence.

• Where applicable, the abbreviation will be followed by a dash and
 then the version number, in X.Y format

This line is currently live [1], but do we need to keep it?  Not all
of our versions are X.Y.  For example, W3C-19980720 [2] is in YYYYMMDD
format.  Perhaps that falls under "where applicable", but why call out
one specific versioning approach?

• Short identifiers must not be duplicative and must be different
 from all pre-existing short identifiers.
• While short identifiers can be treated as case insensitive, it is
 encouraged to use the canonical short identifier casing.

These cover the two points I think need to get covered.  So while I
prefer my previously-suggested wording [3], I'm fine with this
wording.

Cheers,
Trevor

[1]: https://spdx.org/spdx-license-list/license-list-overview
[2]: https://github.com/spdx/license-list-XML/blob/v3.1/src/W3C-19980720.xml
[3]: Subject: Re: explanation for ensuring no duplicate identifiers
    Date: Mon, 18 Jun 2018 11:18:34 -0700
    Message-ID: <20180618181834.GD25466@valgrind>

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy