Re: Purpose of license templatization

Tom Incorvia

Hi Peter,


Bumping this up a bit conceptually.  I do agree that the timing and sophistication of tooling to support SPDX deserves additional discussion.  The comments below speak a bit more to the intent, so hopefully we will get additional discussion on this topic.   


One of the intents of normalization is to move the open source community to more standard licenses with less non-material reorganization of words, single-word changes, etc. 


In prior discussions regarding template licenses like BSD, there has always been an intent to parse out items such as organization names.  Other items, like the BSD tendency to swap copyright “Holder” vs. “Owner” and other small but potentially material items would likely be dealt with by an agreement with our legal representatives and the broader SPDX community regarding what the “standard” license is (or should be), and having other variants be bumped to “non-standard” in a package and being required to list the full text.


The expectation is that by standardizing the licenses that will move people in the direction of using the standard rather than the small variants.  Over time, the requirements for tool sophistication with regards to license identification will be reduced – there would essentially be a requirement for an exact match other than agreed upon template items.  Companies that participate in SPDX or want to take advantage of SPDX will benefit by moving towards the standard.


Our “bail out” has consistently been that if the match is insufficient, the entire license text is listed.  We can get a lot of mileage out of a straightforward approach that starts with exact match minus white space, capitalization etc., and then progressing to removal agreed upon template items, and then optionally progressing to small, agreed upon word changes such as single v plural, British use of Z rather than S, etc.  But we only do this with the full agreement of the legal team and possibly some extension in to the SPDX group and open source community – the approach is conservative, with anomalies kicked out into the full text being required.


The net is that the reasonable alternative to tool sophistication is the progressive standardization of the licenses which should occur over time if we get the adoption that we are aiming for.


That aside, the declared license field (the field where the preparer states their interpretation of the license given the relevant facts) is an opportunity to state the license based on broader interpretations that include a more sophisticated tool, access to a particular resource individual such as the author or open source projects attorneys, etc.


My thought is that we start with a basic approach (exact match minus white space and caps, and grow from there) .  If we do this for the beta, that will give us an opportunity to pilot the approach with “friends and family” and have ongoing discussions in our group regarding the viability of the approach, and make necessary adjustments.   




Tom Incorvia


Direct:  (512) 340-1336

Mobile: (408) 499 6850


-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-bounces@...] On Behalf Of Peter Williams
Sent: Wednesday, March 02, 2011 10:20 PM
To: spdx-legal@...
Subject: Purpose of license templatization


During the discussion this morning regarding license templatization a

question came up regarding the exact purpose of templatization.  This

question was not answered satisfactory so hopefully the full legal

group can answer it.


The use cases we have so far can be categorized as either ignoring

inconsequential variations (eg, white space differences, alternate

spellings, minor grammatical differences) or ignoring very common, and

well understood, material variations (eg, changes in the name of the

copyright holder).


Support for specifying acceptable material changes seems necessary.

Without it several of the standardized licenses will be effectively

useless because they have organization names, etc embedded in them.

The bsd license is a prime example.


Standardizing approaches for ignoring inconsequential variations has

much lower value.  It will be extremely difficult to do well and tools

can handle this problem without a standard.  In fact, most tools

already have sophisticated techniques for recognizing licenses while

ignoring trivial variations.  Those techniques are probably superior

to the rather basic normalization mechanisms we are going to be able

to specify.  Tools are unlikely to adopt any approach suggested in the

spec because that would reduce the quality of their results.


Designing, testing and documenting even a relatively simple minded

english language normalization algorithm is non-trivial.  (If we need

to support any other languages that will, of course, add to the level

of effort.)  Much of the effort required to design and implement such

a normalization scheme will fall on people who are already critical

resources for the beta release of the spec.


We should seriously consider if a license normalization algorithm is

worth the cost.  (Particularly with an eye to the opportunity costs.)

I don't think specifying how tools/people should deal with

inconsequential variations in license text is worth the effort.  Tools

will quickly evolve, or more likely have already have evolved,

techniques equivalent or superior to anything we will specify.


If it does turn out that a standardized normalization mechanism is

required, it would be just as easy to implement post beta or in

version 2 as it is to implement it now.




Spdx-legal mailing list




This message has been scanned for viruses by MailController -

Join { to automatically receive all group messages.