Re: Purpose of license templatization
Tom Incorvia
Hi Peter,
Bumping this up a bit conceptually. I do agree that the timing and sophistication of tooling to support SPDX deserves additional discussion. The comments below speak a bit more to the intent, so hopefully we will get additional discussion on this topic.
One of the intents of normalization is to move the open source community to more standard licenses with less non-material reorganization of words, single-word changes, etc.
In prior discussions regarding template licenses like BSD, there has always been an intent to parse out items such as organization names. Other items, like the BSD tendency to swap copyright “Holder” vs. “Owner” and other small but potentially material items would likely be dealt with by an agreement with our legal representatives and the broader SPDX community regarding what the “standard” license is (or should be), and having other variants be bumped to “non-standard” in a package and being required to list the full text.
The expectation is that by standardizing the licenses that will move people in the direction of using the standard rather than the small variants. Over time, the requirements for tool sophistication with regards to license identification will be reduced – there would essentially be a requirement for an exact match other than agreed upon template items. Companies that participate in SPDX or want to take advantage of SPDX will benefit by moving towards the standard.
Our “bail out” has consistently been that if the match is insufficient, the entire license text is listed. We can get a lot of mileage out of a straightforward approach that starts with exact match minus white space, capitalization etc., and then progressing to removal agreed upon template items, and then optionally progressing to small, agreed upon word changes such as single v plural, British use of Z rather than S, etc. But we only do this with the full agreement of the legal team and possibly some extension in to the SPDX group and open source community – the approach is conservative, with anomalies kicked out into the full text being required.
The net is that the reasonable alternative to tool sophistication is the progressive standardization of the licenses which should occur over time if we get the adoption that we are aiming for.
That aside, the declared license field (the field where the preparer states their interpretation of the license given the relevant facts) is an opportunity to state the license based on broader interpretations that include a more sophisticated tool, access to a particular resource individual such as the author or open source projects attorneys, etc.
My thought is that we start with a basic approach (exact match minus white space and caps, and grow from there) . If we do this for the beta, that will give us an opportunity to pilot the approach with “friends and family” and have ongoing discussions in our group regarding the viability of the approach, and make necessary adjustments.
Tom
Tom Incorvia Direct: (512) 340-1336 Mobile: (408) 499 6850
-----Original Message-----
During the discussion this morning regarding license templatization a question came up regarding the exact purpose of templatization. This question was not answered satisfactory so hopefully the full legal group can answer it.
The use cases we have so far can be categorized as either ignoring inconsequential variations (eg, white space differences, alternate spellings, minor grammatical differences) or ignoring very common, and well understood, material variations (eg, changes in the name of the copyright holder).
Support for specifying acceptable material changes seems necessary. Without it several of the standardized licenses will be effectively useless because they have organization names, etc embedded in them. The bsd license is a prime example.
Standardizing approaches for ignoring inconsequential variations has much lower value. It will be extremely difficult to do well and tools can handle this problem without a standard. In fact, most tools already have sophisticated techniques for recognizing licenses while ignoring trivial variations. Those techniques are probably superior to the rather basic normalization mechanisms we are going to be able to specify. Tools are unlikely to adopt any approach suggested in the spec because that would reduce the quality of their results.
Designing, testing and documenting even a relatively simple minded english language normalization algorithm is non-trivial. (If we need to support any other languages that will, of course, add to the level of effort.) Much of the effort required to design and implement such a normalization scheme will fall on people who are already critical resources for the beta release of the spec.
We should seriously consider if a license normalization algorithm is worth the cost. (Particularly with an eye to the opportunity costs.) I don't think specifying how tools/people should deal with inconsequential variations in license text is worth the effort. Tools will quickly evolve, or more likely have already have evolved, techniques equivalent or superior to anything we will specify.
If it does turn out that a standardized normalization mechanism is required, it would be just as easy to implement post beta or in version 2 as it is to implement it now.
Peter openlogic.com _______________________________________________ Spdx-legal mailing list Spdx-legal@... https://fossbazaar.org/mailman/listinfo/spdx-legal
This message has been scanned for viruses by MailController - www.MailController.altohiway.com |
|