Purpose of license templatization


Peter Williams <peter.williams@...>
 

During the discussion this morning regarding license templatization a
question came up regarding the exact purpose of templatization. This
question was not answered satisfactory so hopefully the full legal
group can answer it.

The use cases we have so far can be categorized as either ignoring
inconsequential variations (eg, white space differences, alternate
spellings, minor grammatical differences) or ignoring very common, and
well understood, material variations (eg, changes in the name of the
copyright holder).

Support for specifying acceptable material changes seems necessary.
Without it several of the standardized licenses will be effectively
useless because they have organization names, etc embedded in them.
The bsd license is a prime example.

Standardizing approaches for ignoring inconsequential variations has
much lower value. It will be extremely difficult to do well and tools
can handle this problem without a standard. In fact, most tools
already have sophisticated techniques for recognizing licenses while
ignoring trivial variations. Those techniques are probably superior
to the rather basic normalization mechanisms we are going to be able
to specify. Tools are unlikely to adopt any approach suggested in the
spec because that would reduce the quality of their results.

Designing, testing and documenting even a relatively simple minded
english language normalization algorithm is non-trivial. (If we need
to support any other languages that will, of course, add to the level
of effort.) Much of the effort required to design and implement such
a normalization scheme will fall on people who are already critical
resources for the beta release of the spec.

We should seriously consider if a license normalization algorithm is
worth the cost. (Particularly with an eye to the opportunity costs.)
I don't think specifying how tools/people should deal with
inconsequential variations in license text is worth the effort. Tools
will quickly evolve, or more likely have already have evolved,
techniques equivalent or superior to anything we will specify.

If it does turn out that a standardized normalization mechanism is
required, it would be just as easy to implement post beta or in
version 2 as it is to implement it now.

Peter
openlogic.com


Tom Incorvia
 

Hi Peter,

 

Bumping this up a bit conceptually.  I do agree that the timing and sophistication of tooling to support SPDX deserves additional discussion.  The comments below speak a bit more to the intent, so hopefully we will get additional discussion on this topic.   

 

One of the intents of normalization is to move the open source community to more standard licenses with less non-material reorganization of words, single-word changes, etc. 

 

In prior discussions regarding template licenses like BSD, there has always been an intent to parse out items such as organization names.  Other items, like the BSD tendency to swap copyright “Holder” vs. “Owner” and other small but potentially material items would likely be dealt with by an agreement with our legal representatives and the broader SPDX community regarding what the “standard” license is (or should be), and having other variants be bumped to “non-standard” in a package and being required to list the full text.

 

The expectation is that by standardizing the licenses that will move people in the direction of using the standard rather than the small variants.  Over time, the requirements for tool sophistication with regards to license identification will be reduced – there would essentially be a requirement for an exact match other than agreed upon template items.  Companies that participate in SPDX or want to take advantage of SPDX will benefit by moving towards the standard.

 

Our “bail out” has consistently been that if the match is insufficient, the entire license text is listed.  We can get a lot of mileage out of a straightforward approach that starts with exact match minus white space, capitalization etc., and then progressing to removal agreed upon template items, and then optionally progressing to small, agreed upon word changes such as single v plural, British use of Z rather than S, etc.  But we only do this with the full agreement of the legal team and possibly some extension in to the SPDX group and open source community – the approach is conservative, with anomalies kicked out into the full text being required.

 

The net is that the reasonable alternative to tool sophistication is the progressive standardization of the licenses which should occur over time if we get the adoption that we are aiming for.

 

That aside, the declared license field (the field where the preparer states their interpretation of the license given the relevant facts) is an opportunity to state the license based on broader interpretations that include a more sophisticated tool, access to a particular resource individual such as the author or open source projects attorneys, etc.

 

My thought is that we start with a basic approach (exact match minus white space and caps, and grow from there) .  If we do this for the beta, that will give us an opportunity to pilot the approach with “friends and family” and have ongoing discussions in our group regarding the viability of the approach, and make necessary adjustments.   

 

Tom

 

Tom Incorvia

tom.incorvia@...

Direct:  (512) 340-1336

Mobile: (408) 499 6850

 

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-bounces@...] On Behalf Of Peter Williams
Sent: Wednesday, March 02, 2011 10:20 PM
To: spdx-legal@...
Subject: Purpose of license templatization

 

During the discussion this morning regarding license templatization a

question came up regarding the exact purpose of templatization.  This

question was not answered satisfactory so hopefully the full legal

group can answer it.

 

The use cases we have so far can be categorized as either ignoring

inconsequential variations (eg, white space differences, alternate

spellings, minor grammatical differences) or ignoring very common, and

well understood, material variations (eg, changes in the name of the

copyright holder).

 

Support for specifying acceptable material changes seems necessary.

Without it several of the standardized licenses will be effectively

useless because they have organization names, etc embedded in them.

The bsd license is a prime example.

 

Standardizing approaches for ignoring inconsequential variations has

much lower value.  It will be extremely difficult to do well and tools

can handle this problem without a standard.  In fact, most tools

already have sophisticated techniques for recognizing licenses while

ignoring trivial variations.  Those techniques are probably superior

to the rather basic normalization mechanisms we are going to be able

to specify.  Tools are unlikely to adopt any approach suggested in the

spec because that would reduce the quality of their results.

 

Designing, testing and documenting even a relatively simple minded

english language normalization algorithm is non-trivial.  (If we need

to support any other languages that will, of course, add to the level

of effort.)  Much of the effort required to design and implement such

a normalization scheme will fall on people who are already critical

resources for the beta release of the spec.

 

We should seriously consider if a license normalization algorithm is

worth the cost.  (Particularly with an eye to the opportunity costs.)

I don't think specifying how tools/people should deal with

inconsequential variations in license text is worth the effort.  Tools

will quickly evolve, or more likely have already have evolved,

techniques equivalent or superior to anything we will specify.

 

If it does turn out that a standardized normalization mechanism is

required, it would be just as easy to implement post beta or in

version 2 as it is to implement it now.

 

Peter

openlogic.com

_______________________________________________

Spdx-legal mailing list

Spdx-legal@...

https://fossbazaar.org/mailman/listinfo/spdx-legal

 

 

This message has been scanned for viruses by MailController - www.MailController.altohiway.com


Peter Williams <peter.williams@...>
 

On Thu, Mar 3, 2011 at 5:31 AM, Tom Incorvia
<tom.incorvia@...> wrote:

One of the intents of normalization is to move the open source community to
more standard licenses with less non-material reorganization of words,
single-word changes, etc.
I like the goal but i don't see how standardizing a license
normalization algorithm designed to remove non-material variations
advances this goal.

In prior discussions regarding template licenses like BSD, there has always
been an intent to parse out items such as organization names.
I think everyone agrees we need to handle this case.

The expectation is that by standardizing the licenses that will move people
in the direction of using the standard rather than the small variants.  Over
time, the requirements for tool sophistication with regards to license
identification will be reduced – there would essentially be a requirement
for an exact match other than agreed upon template items.  Companies that
participate in SPDX or want to take advantage of SPDX will benefit by moving
towards the standard.
Having a registry of licenses with standardized text (particularly
good, well formatted, cut-and-pastable text) may increase the use of
those licenses. However, i don't think we can assume that license
variations will ever be reduced to point that users will be able to do
without sophisticated matching. Even it that does happen someday, it
is not the world we live in today.

The net is that the reasonable alternative to tool sophistication is the
progressive standardization of the licenses which should occur over time if
we get the adoption that we are aiming for.
Maybe. Personally, i doubt we will ever see the day when there are
few license variation than there are now.

My thought is that we start with a basic approach (exact match minus white
space and caps, and grow from there) .  If we do this for the beta, that
will give us an opportunity to pilot the approach with “friends and family”
and have ongoing discussions in our group regarding the viability of the
approach, and make necessary adjustments.
If we did this, what would a non-match due to a non-material variation
mean? (Say some "a"s where switched to "an"s.)

Could i, as a human, look at it and decide that it is clearly one of
the standard licenses and so produce an spdx file in when the licenses
in file and concluded licenses both reference the standard license?
Or would i be forced to use a non-standard license (even though is is
clearly one of the standard licenses)?


Peter
openlogic.com