Re: Mismatches between OSI and SPDX
J Lovejoy
Hi again Max, and thanks Richard for filling in
on some of this.
toggle quoted message
Show quoted text
Max - While you are not the first person who has asked about some of these, you might be the first person to have done such a thorough review! When SPDX decided that every license ever approved by the OSI should be included on the SPDX License List and the OSI decided to adopt use of the SPDX license ids in their URLs and on the license pages, it kicked off a bunch of work by both sides. I led that on the SPDX side and collaborated with a handful of OSI board members over a few years. We didn't get everything perfectly tidied up, so much of what you are noticing are the things that we sorted out, but maybe didn't get to an ideal end result. This has made me think that this history/background would probably be a good to document some of these known issues because 1) digging through email archives is not exactly time efficient or intuitive; 2) if a few people have asked a similar thing, it'd probably be good to document; and 3) relying on one person's memory is not a sustainable model! To that end, I've taken your list and started to create a page in the SPDX Legal List DOCS area to document/explain this all. This is a PR in-progress at this point. It will take a bit of time to get it in proper shape and merged, but at least it's a start! https://github.com/spdx/license-list-XML/pull/1738 I may have a few questions for you as to what exactly you observed in your research and will follow-up accordingly. Stay tuned. Thanks, Jilayne On 12/9/22 8:26 AM, Richard Fontana
wrote:
On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...> wrote:In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list. Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2]. Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website. I documented my findings for all issues in this gist: https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552 Now, I am sure I’m not the first who notices this. Is this a known problem? Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources? [1]: https://opensource.org/licenses/alphabetical [2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.jsonNot speaking for SPDX or OSI: To some degree this is a known problem, and possibly viewable as not a problem in some cases. Some issues I see embodied in your list: 1. In some cases licenses published on the OSI website are incorrect in the sense that they do not match widely used versions of the license text that the OSI probably intended to be the approved license text. I think these cases have all been noticed through activity related to creation or adoption of SPDX identifiers -- for example, Fedora recently adopted use of SPDX identifiers for package license metadata and early on it was noticed that the license SPDX calls Python-2.0, which I assume is a faithful copy of the corresponding license text from the OSI website, does not actually match the license (or "license stack") text used in known releases of CPython, so SPDX added Python-2.0.1 to capture the latter text. There is a similar situation involving the Artistic License 1.0. I think it is not reasonable to expect the OSI to have historically applied the degree of rigor SPDX applies in associating an identifier with a matchable license text (where "matching" is a concept that SPDX has itself defined). This simply didn't exist in FOSS before SPDX; it was foreign to the culture. I'm not excusing outright mistakes in published licenses though (see e.g. https://github.com/spdx/license-list-XML/issues/1653). For submitted licenses, I can tell you from my time on the board that OSI assumes the license submitter has the correct text. In some cases the "incorrect" text gets adopted by projects, at which point it is questionable whether it is really incorrect. 2. You list some cases of 'WITH' expressions. The OSI has been reluctant to approve license exceptions, except in a few special cases where the exception (or exception coupled with a standard license) is itself thought of as a single license (e.g. LGPL version 3; ec0s-2.0 is also like this). From my recollection of my time on the OSI board, the main concern was the potential numerosity of license submissions if the OSI encouraged submission of exceptions. There's been a tendency to assume that typical types of GPL exceptions are legit (for a GPL-world notion of legit) because they conform to the model of a grant of additional permission -- I need to comment on this issue on another recent thread. 3. The OSI website IIRC does not list (though still publishes?) certain licenses or license versions considered by the license steward to be deprecated. Not sure if that accounts for anything on your list. 4. Use of SPDX identifiers: Probably the main issue here today is that an SPDX identifier gets adopted after the approval of the license by the OSI (for most OSI-approved licenses in recent memory). One basic issue here, which is not really acknowledged by anyone, is that the kinds of things the OSI has been historically approving are not the same kinds of things that SPDX assigns identifiers to. For example, the OSI has approved a license text it calls the MIT license. It has not approved the (infinite?) set of license texts that meet the SPDX definition of licenses that match "MIT" (i.e., that match the ever-evolving XML file that defines what "MIT" means). Indeed it cannot, because the OSI cannot possibly know how SPDX might alter that XML file in the future. Or maybe it has signalled approval of some range of licenses that are only trivially different from the canonical MIT license, but that undefined set may not be equivalent to the present/future set of license texts that match to SPDX "MIT". Some confusion has resulted from a sort of collectively forced merger of these different concepts of "license". This looseness around what a license is is also evident (and maybe more obvious) when considering the "FSF-free" category. Richard |
|