explanation for ensuring no duplicate identifiers
J Lovejoy
Hi all,
As discussed on the call today (related to Issue https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit statement regarding not duplicating identifiers in the explanation of fields on the Overview page for the license list: https://spdx.org/spdx-license-list/license-list-overview I had the task to suggest some additional language, including discussion about character type. The relevant text is as follows, with proposed changes in red: B) License or Exception Identifier (aka "SPDX Short Identifier") • Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere • Short identifiers have no spaces in them and only use ASCII characters • Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception • Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format • Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.) • Short identifiers should be as short in length as possible while staying consistent with all other naming criteria • Short identifiers must not be duplicative: newly added short identifiers will be checked to ensure they are different from all pre-existing short identifiers, regardless of upper/lower case Let me know your thoughts, Jilayne |
|
Alexios Zavras
My only comment would be to change “ASCII characters” to “ASCII printable characters”.
Looking at the Overview page, it needs a little care:
-- zvr –
From: Spdx-legal@... <Spdx-legal@...>
On Behalf Of J Lovejoy
Sent: Thursday, 14 June, 2018 21:28 To: SPDX-legal <spdx-legal@...> Subject: explanation for ensuring no duplicate identifiers
Hi all,
As discussed on the call today (related to Issue https://github.com/spdx/license-list-XML/pull/651 ), we will add an explicit statement regarding not duplicating identifiers in the explanation of fields on the Overview page for the license list: https://spdx.org/spdx-license-list/license-list-overview
I had the task to suggest some additional language, including discussion about character type. The relevant text is as follows, with proposed changes in red:
B) License or Exception Identifier (aka "SPDX Short Identifier") • Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere • Short identifiers have no spaces in them and only use ASCII characters • Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception • Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format • Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.) • Short identifiers should be as short in length as possible while staying consistent with all other naming criteria • Short identifiers must not be duplicative: newly added short identifiers will be checked to ensure they are different from all pre-existing short identifiers, regardless of upper/lower case
Let me know your thoughts,
Jilayne Intel Deutschland GmbH |
|
Philippe Ombredanne
Alexios:
good catch, though even printable may be too generous. A colon is printable and not a supported in a Windows file name for instance. Jilayne: We could/should more simply list the allowed characters and be very specific. Here is my suggestion: Allowed characters are ASCII: - Lower and upper case letters from A to Z. - Numbers from 0 to 9 - Dash '-', underscore '_', period '.' and plus '+' - An ID first character must be a letter or number. - Ignoring case, an ID is guaranteed to be unique forever within any published SPDX license lists. -- Philippe On Fri, Jun 15, 2018 at 9:12 AM, Alexios Zavras <alexios.zavras@...> wrote: My only comment would be to change “ASCII characters” to “ASCII printable -- Cordially Philippe Ombredanne |
|
Kate Stewart
On Fri, Jun 15, 2018 at 12:25 PM, Philippe Ombredanne <pombredanne@...> wrote: Alexios: need to be a little careful here Philippe... "+" is reserved for license expressions. Best to stick with what's in Appendix IV of the spec today idstring = 1*(ALPHA / DIGIT / "-" / "." ) where ALPHA and DIGIT are per definition inALPHA = %x41-5A / %x61-7A ; A-Z / a-z DIGIT = %x30-39 ; 0-9 If you want to see "_" added, then probably should open an issue against the spec for 2.2 and get it consistent tthroughout. Thanks, Kate |
|
Alexios Zavras
OK, so the wording for the Overview page could be something like:
• Short identifiers
-- zvr –
From: Kate Stewart <kstewart@...>
Sent: Friday, 15 June, 2018 19:51 To: Philippe Ombredanne <pombredanne@...> Cc: Zavras, Alexios <alexios.zavras@...>; J Lovejoy <opensource@...>; SPDX-legal <spdx-legal@...> Subject: Re: explanation for ensuring no duplicate identifiers
On Fri, Jun 15, 2018 at 12:25 PM, Philippe Ombredanne <pombredanne@...> wrote:
need to be a little careful here Philippe...
"+" is reserved for license expressions.
Best to stick with what's in Appendix IV of the spec today
idstring = 1*(ALPHA / DIGIT / "-" / "." ) where ALPHA and DIGIT are per definition in
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
DIGIT = %x30-39 ; 0-9
If you want to see "_" added, then probably should open an issue against the spec for 2.2 and get it consistent tthroughout.
Thanks, Kate Intel Deutschland GmbH |
|
Philippe Ombredanne
On Fri, Jun 15, 2018 at 7:51 PM, Kate Stewart
<kstewart@...> wrote: I listed this because SPDX has issued ids that contained a + in the past. But that's minor alright! Best to stick with what's in Appendix IV of the spec todayI do not care much for the underscore. Good catch! -- Cordially Philippe Ombredanne |
|
W. Trevor King
On Thu, Jun 14, 2018 at 01:28:11PM -0600, J Lovejoy wrote:
• Short identifiers must not be duplicative: newly added shortWherever we put this commitment, I think we also want something like [1]: List consumers are enouraged to use the canonical identifier casing, but this uniqueness commitment ensures that case-insensitive comparison with listed identifiers will be unambiguous. In a spec, I'd make that a SHOULD recommendation [2]. Encouraging the use of canonical casing reduces the need for a case-canonicalizer [3], by giving tools that choose not to implement a canonicalizer something to point at if/when users complain about unrecognized, non-canonical identifiers. It also makes it less likely that tools decide to change the case without thinking about downstream compatibility (e.g. [4]). I also prefer focusing on the list state (across versions) instead of using "newly added" to focus on changes to the list state. For example, my earlier wording [5]: This project commits to never, in any past or future version, contain identifiers which differ only in case but have different semantics. makes it clear that the current list is already free of case-insentive ambiguity. With the "newly added" wording, we could already have case-insentive ambiguous IDs and just be committing to not adding more. Cheers, Trevor [1]: https://github.com/spdx/license-list-XML/pull/651/files#diff-04c6e90faac2675aa89e2176d2eec7d8R16 [2]: https://tools.ietf.org/html/rfc2119#section-3 [3]: https://github.com/spdx/spdx-spec/issues/63#issuecomment-366370691 [4]: https://github.com/benbalter/licensee/issues/282 [5]: https://github.com/spdx/license-list-XML/pull/651/files#diff-04c6e90faac2675aa89e2176d2eec7d8R14 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy |
|
J Lovejoy
I’ve now updated the page further to remove references to the spreadsheet and a few other minor outdated items.
toggle quoted message
Show quoted text
I also updated the field names to be more accurate and consistent with what one sees on the website. I’ve added a placeholder for Is FSF Free/Libre - but we need to come up with a description for that (working on that otherwise) Back to the Short Identifier additions are: characters, given the various feedback on this thread, here is an updated suggestion. Tried to separate out the non-duplicative aspect and case insensitive nature (but preference for use of case sensitive) - not sure this is the best wording however, but want to keep it concise! B) Short Identifier
• Short identifier to be used to identify a license or exception match to licenses or exceptions contained on the SPDX License List in the context of an SPDX file, in source file, or elsewhere • Short identifiers • Short identifiers consist of an abbreviation based on a common short name or acronym for the license or exception • Where applicable, the abbreviation will be followed by a dash and then the version number, in X.Y format • Where applicable, and if possible, the short identifier should be harmonized with other well-known open source naming sources (i.e., OSI, Fedora, etc.) • Short identifiers should be as short in length as possible while staying consistent with all other naming criteria • Short identifiers must not be duplicative and must be different from all pre-existing short identifiers. • While short identifiers can be treated as case insensitive, it is encouraged to use the canonical short identifier casing. Jilayne
|
|
W. Trevor King
On Thu, Jul 05, 2018 at 06:16:56PM -0600, J Lovejoy wrote:
Back to the Short Identifier additions are: characters, given theI'm fine with this proposal going out as you have it, but I've put a few suggestions inline in case you want to pick them up. • Short identifier to be used to identify a license or exceptionThis matches what's currently live [1], but it could probably be tightened up to something like: Short identifiers identify licenses and exceptions from the SPDX License List in the context of an SPDX file, a source file, or elsewhere • Short identifiers have no spaces in them consist of ASCII lettersI think it is sufficient to list the allowed characters: Short identifiers consist of ASCII letters (A-Za-z), digits (0-9), full stops (.), and hyphen/minus signs (-). And then, if you want to draw attention to spaces in particular, add a second sentence to that list item: They do not contain spaces or other characters except those mentioned in the previous sentence. • Where applicable, the abbreviation will be followed by a dash andThis line is currently live [1], but do we need to keep it? Not all of our versions are X.Y. For example, W3C-19980720 [2] is in YYYYMMDD format. Perhaps that falls under "where applicable", but why call out one specific versioning approach? • Short identifiers must not be duplicative and must be differentThese cover the two points I think need to get covered. So while I prefer my previously-suggested wording [3], I'm fine with this wording. Cheers, Trevor [1]: https://spdx.org/spdx-license-list/license-list-overview [2]: https://github.com/spdx/license-list-XML/blob/v3.1/src/W3C-19980720.xml [3]: Subject: Re: explanation for ensuring no duplicate identifiers Date: Mon, 18 Jun 2018 11:18:34 -0700 Message-ID: <20180618181834.GD25466@valgrind> -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy |
|
J Lovejoy
HI all,
toggle quoted message
Show quoted text
I’ve updated this, including one of Trevor’s additional edit below for the first bullet (the other suggestion you had was the same as I had, but my strikethrough seemed to have gotten lost in your email!) I also added the field for the FSF Free/Libre with a description of the intent there. Considering that that field is not entirely complete, I almost put a note as such (“under construction” or the like) but then figured I’d have to remember to remove it later, so I did not add that. see page for updates: https://spdx.org/spdx-license-list/license-list-overview Thanks, Jilayne
|
|