Re: Mismatches between OSI and SPDX

J Lovejoy

Hi again Max, and thanks Richard for filling in on some of this.

Max - While you are not the first person who has asked about some of these, you might be the first person to have done such a thorough review!

When SPDX decided that every license ever approved by the OSI should be included on the SPDX License List and the OSI decided to adopt use of the SPDX license ids in their URLs and on the license pages, it kicked off a bunch of work by both sides. I led that on the SPDX side and collaborated with a handful of OSI board members over a few years. We didn't get everything perfectly tidied up, so much of what you are noticing are the things that we sorted out, but maybe didn't get to an ideal end result.

This has made me think that this history/background would probably be a good to document some of these known issues because 1) digging through email archives is not exactly time efficient or intuitive;  2) if a few people have asked a similar thing, it'd probably be good to document; and 3) relying on one person's memory is not a sustainable model!

To that end, I've taken your list and started to create a page in the SPDX Legal List DOCS area to document/explain this all. This is a PR in-progress at this point. It will take a bit of time to get it in proper shape and merged, but at least it's a start!

I may have a few questions for you as to what exactly you observed in your research and will follow-up accordingly. Stay tuned.


On 12/9/22 8:26 AM, Richard Fontana wrote:

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...> wrote:

In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.

Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.

I documented my findings for all issues in this gist:

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?


Not speaking for SPDX or OSI: To some degree this is a known problem,
and possibly viewable as not a problem in some cases. Some issues I
see embodied in your list:

1. In some cases licenses published on the OSI website are incorrect
in the sense that they do not match widely used versions of the
license text that the OSI probably intended to be the approved license
text. I think these cases have all been noticed through activity
related to creation or adoption of SPDX identifiers -- for example,
Fedora recently adopted use of SPDX identifiers for package license
metadata and early on it was noticed that the license SPDX calls
Python-2.0, which I assume is a faithful copy of the corresponding
license text from the OSI website, does not actually match the license
(or "license stack") text used in known releases of CPython, so SPDX
added Python-2.0.1 to capture the latter text. There is a similar
situation involving the Artistic License 1.0.

I think it is not reasonable to expect the OSI to have historically
applied the degree of rigor SPDX applies in associating an identifier
with a matchable license text (where "matching" is a concept that SPDX
has itself defined). This simply didn't exist in FOSS before SPDX; it
was foreign to the culture. I'm not excusing outright mistakes in
published licenses though (see e.g. For submitted
licenses, I can tell you from my time on the board that OSI assumes
the license submitter has the correct text. In some cases the
"incorrect" text gets adopted by projects, at which point it is
questionable whether it is really incorrect.

2. You list some cases of 'WITH' expressions. The OSI has been
reluctant to approve license exceptions, except in a few special cases
where the exception (or exception coupled with a standard license) is
itself thought of as a single license (e.g. LGPL version 3; ec0s-2.0
is also like this). From my recollection of my time on the OSI board,
the main concern was the potential numerosity of license submissions
if the OSI encouraged submission of exceptions. There's been a
tendency to assume that typical types of GPL exceptions are legit (for
a GPL-world notion of legit) because they conform to the model of a
grant of additional permission -- I need to comment on this issue on
another recent thread.

3. The OSI website IIRC does not list (though still publishes?)
certain licenses or license versions considered by the license steward
to be deprecated. Not sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that
an SPDX identifier gets adopted after the approval of the license by
the OSI (for most OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is
that the kinds of things the OSI has been historically approving are
not the same kinds of things that SPDX assigns identifiers to. For
example, the OSI has approved a license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the
SPDX definition of licenses that match "MIT" (i.e., that match the
ever-evolving XML file that defines what "MIT" means). Indeed it
cannot, because the OSI cannot possibly know how SPDX might alter that
XML file in the future. Or maybe it has signalled approval of some
range of licenses that are only trivially different from the canonical
MIT license, but that undefined set may not be equivalent to the
present/future set of license texts that match to SPDX "MIT". Some
confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a
license is is also evident (and maybe more obvious) when considering
the "FSF-free" category.


