SPDX: license equivalence rules


Peterson, Scott K (HP Legal) <scott.k.peterson@...>
 

Ths comment is NOT about what the normalization should be or what equivalences should be permitted. Rather, I suggest a different approach to how we represent the result of the agreed upon normalization/equivalence rules.

 

I suggest reconceiving "templatization" as defining "match rules".

 

Templatization seems to me to be a process of partially applying the match rules to take a step toward comparison. It is not apparent to me what the value is of giving that particular intermediate document special status.

 

I see a danger in two versions. Which contains the authoritative information? Whenever there are two, there is danger of them becoming misaligned.

 

There should be a single, canonical text. Applying the match rules against that canonical text and a candidate text would yield the authoritative answer to the question of whether the candidate text corresponds to the license represented by the canonical text.

 

Given the rules, anyone would be free to pre-process the canonical texts into whatever sorts of intermediate versions they thought would facilitate performance of their comparison tool. Choosing a particular intermediate version seems to add unnecessary complexity.

 

-- Scott

 


Kate Stewart <kate.stewart@...>
 

Hi Scott,

On Wed, 2011-03-23 at 16:33 +0000, Peterson, Scott K (HP Legal) wrote:
Ths comment is NOT about what the normalization should be or what
equivalences should be permitted. Rather, I suggest a different
approach to how we represent the result of the agreed upon
normalization/equivalence rules.



I suggest reconceiving "templatization" as defining "match rules".
The problem is that the today the various tools have nothing to check
against to make sure that they are applying the rules correctly.

The templatized version is intended as a tool checker, that the right
substitutions can be recognized rather than as an alternate human
readable reference. The golden reference should always be whats on the
official public web site of a license - which is human readable. The
official authoritative version will be copied onto the SPDX web site
verbatim. The proposal is to have the processed version there as well,
and marked as such, so that when there are disagreements between various
tools doing license recognition and asserting the short form, they have
a common comparison point.

Its intended more like an answer sheet for a teacher administering a
test to students to know what answers are ok, and which aren't.

For instance, I believe that Daniel German (Ninka tool) and Bob
Gobeille (FOSSology tool) get together from time to time (or intended to
last I talked to them about it, last year) to talk about why their tools
don't recognize same licenses. Having a templatized license text would
aid future tool creators (open source as well as commercial vendors) to
check that they are able to recognize a license accurately before
asserting the short form.

It is meant to illustrates what should happen when the match rules are
applied.


Templatization seems to me to be a process of partially applying the
match rules to take a step toward comparison. It is not apparent to me
what the value is of giving that particular intermediate document
special status.
As a check for the tools, and to build confidence that the match rules
the spdx-legal team is comfortable, are applied consistently.


I see a danger in two versions. Which contains the authoritative
information? Whenever there are two, there is danger of them becoming
misaligned.
The authoritative version is the version on the project's public web
site. In some cases the OSI site has a copy and is used as the
authoritative version though.

We copy that version onto the SPDX web page for convenience, as well as,
the link to the authoritative public site we get this from.

On the SPDX web page, we'll also be adding the "templatized" version as
a convenience, after the rules have been applied to the original
authorized version, so folks can see what the results of applying the
match rules yields ("the answer sheet" for the test to continue with my
earlier analogy).


There should be a single, canonical text. Applying the match rules
against that canonical text and a candidate text would yield the
authoritative answer to the question of whether the candidate text
corresponds to the license represented by the canonical text.
The single canonical text will be copied verbatim (spaces,
capitalization, etc. ) intact from the authorative web site for that
license.

The templatized version is just the result of applying the match rules
to the authorative version.

We should definitely take care to make this VERY clear on the web site.


Given the rules, anyone would be free to pre-process the canonical
texts into whatever sorts of intermediate versions they thought would
facilitate performance of their comparison tool. Choosing a particular
intermediate version seems to add unnecessary complexity.
see comments above.

Kate


Peter Williams <peter.williams@...>
 

I think having some examples of text with the normalization rules applied is a good idea.  However those examples should be in the spec. Having to go to the registry to see examples will make it harder to implement the normalization algorithm.

If the only use of the normalized text for standard licenses is for example purposes, I don't think we really need to do all the licenses. Not having the normalized text in the registry would make its design easier. (The versioning issues are particularly non-trivial.)

Peter
Openlogic.com

On Mar 23, 2011 10:21 AM, "Kate Stewart" <kate.stewart@...> wrote:
> Hi Scott,
>
> On Wed, 2011-03-23 at 16:33 +0000, Peterson, Scott K (HP Legal) wrote:
>> Ths comment is NOT about what the normalization should be or what
>> equivalences should be permitted. Rather, I suggest a different
>> approach to how we represent the result of the agreed upon
>> normalization/equivalence rules.
>>
>>
>>
>> I suggest reconceiving "templatization" as defining "match rules".
>>
>
> The problem is that the today the various tools have nothing to check
> against to make sure that they are applying the rules correctly.
>
> The templatized version is intended as a tool checker, that the right
> substitutions can be recognized rather than as an alternate human
> readable reference. The golden reference should always be whats on the
> official public web site of a license - which is human readable. The
> official authoritative version will be copied onto the SPDX web site
> verbatim. The proposal is to have the processed version there as well,
> and marked as such, so that when there are disagreements between various
> tools doing license recognition and asserting the short form, they have
> a common comparison point.
>
> Its intended more like an answer sheet for a teacher administering a
> test to students to know what answers are ok, and which aren't.
>
> For instance, I believe that Daniel German (Ninka tool) and Bob
> Gobeille (FOSSology tool) get together from time to time (or intended to
> last I talked to them about it, last year) to talk about why their tools
> don't recognize same licenses. Having a templatized license text would
> aid future tool creators (open source as well as commercial vendors) to
> check that they are able to recognize a license accurately before
> asserting the short form.
>
> It is meant to illustrates what should happen when the match rules are
> applied.
>
>>
>> Templatization seems to me to be a process of partially applying the
>> match rules to take a step toward comparison. It is not apparent to me
>> what the value is of giving that particular intermediate document
>> special status.
>>
>
> As a check for the tools, and to build confidence that the match rules
> the spdx-legal team is comfortable, are applied consistently.
>
>>
>> I see a danger in two versions. Which contains the authoritative
>> information? Whenever there are two, there is danger of them becoming
>> misaligned.
>>
> The authoritative version is the version on the project's public web
> site. In some cases the OSI site has a copy and is used as the
> authoritative version though.
>
> We copy that version onto the SPDX web page for convenience, as well as,
> the link to the authoritative public site we get this from.
>
> On the SPDX web page, we'll also be adding the "templatized" version as
> a convenience, after the rules have been applied to the original
> authorized version, so folks can see what the results of applying the
> match rules yields ("the answer sheet" for the test to continue with my
> earlier analogy).
>
>>
>> There should be a single, canonical text. Applying the match rules
>> against that canonical text and a candidate text would yield the
>> authoritative answer to the question of whether the candidate text
>> corresponds to the license represented by the canonical text.
>>
>
> The single canonical text will be copied verbatim (spaces,
> capitalization, etc. ) intact from the authorative web site for that
> license.
>
> The templatized version is just the result of applying the match rules
> to the authorative version.
>
> We should definitely take care to make this VERY clear on the web site.
>
>>
>> Given the rules, anyone would be free to pre-process the canonical
>> texts into whatever sorts of intermediate versions they thought would
>> facilitate performance of their comparison tool. Choosing a particular
>> intermediate version seems to add unnecessary complexity.
>>
>
> see comments above.
>
> Kate
>
>
>
>
> _______________________________________________
> Spdx-legal mailing list
> Spdx-legal@...
> https://fossbazaar.org/mailman/listinfo/spdx-legal