Re: Mismatches between OSI and SPDX

J Lovejoy

Awesome, Gary! Thanks for the update. Looks like some of the items listed in the issue you link to are on Max's list, so that's good.


On 12/9/22 12:12 PM, Gary O'Neall wrote:

One more input on this discussion - I've been collaborating with OSI on a more automated way of keeping the SPDX and OSI license lists in sync.

This issue tracks the remaining items that need correction to the OSI data:

Note that some of these also impact what is on the OSI website.

Once all of the issues have been addressed and assuming no new issue have been introduced, I plan to enhance the tools that generate our license list data to keep the OSI related data in sync when we publish a new version of the license list.


-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of
Richard Fontana
Sent: Friday, December 9, 2022 7:27 AM
To: Max Mehl <max.mehl@...>
Cc: spdx-legal@...
Subject: Re: Mismatches between OSI and SPDX

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...>

In my organisation, we define all licenses approved by OSI as valid Open
Source licenses. However, we also increasingly rely on SPDX and therefore
also its license list.
Recently, we found several mismatches between OSI’s list of approved
licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].
Certainly, some of these issues are on OSI’s side (e.g., misleading links or
wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list
that cannot be found on the OSI website.
I documented my findings for all issues in this gist:

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to
better align both sources?

Not speaking for SPDX or OSI: To some degree this is a known problem, and
possibly viewable as not a problem in some cases. Some issues I see
embodied in your list:

1. In some cases licenses published on the OSI website are incorrect in the
sense that they do not match widely used versions of the license text that the
OSI probably intended to be the approved license text. I think these cases have
all been noticed through activity related to creation or adoption of SPDX
identifiers -- for example, Fedora recently adopted use of SPDX identifiers for
package license metadata and early on it was noticed that the license SPDX
calls Python-2.0, which I assume is a faithful copy of the corresponding license
text from the OSI website, does not actually match the license (or "license
stack") text used in known releases of CPython, so SPDX added Python-2.0.1
to capture the latter text. There is a similar situation involving the Artistic
License 1.0.

I think it is not reasonable to expect the OSI to have historically applied the
degree of rigor SPDX applies in associating an identifier with a matchable
license text (where "matching" is a concept that SPDX has itself defined). This
simply didn't exist in FOSS before SPDX; it was foreign to the culture. I'm not
excusing outright mistakes in published licenses though (see e.g. For submitted
licenses, I can tell you from my time on the board that OSI assumes the
license submitter has the correct text. In some cases the "incorrect" text gets
adopted by projects, at which point it is questionable whether it is really

2. You list some cases of 'WITH' expressions. The OSI has been reluctant to
approve license exceptions, except in a few special cases where the exception
(or exception coupled with a standard license) is itself thought of as a single
license (e.g. LGPL version 3; ec0s-2.0 is also like this). From my recollection of
my time on the OSI board, the main concern was the potential numerosity of
license submissions if the OSI encouraged submission of exceptions. There's
been a tendency to assume that typical types of GPL exceptions are legit (for a
GPL-world notion of legit) because they conform to the model of a grant of
additional permission -- I need to comment on this issue on another recent

3. The OSI website IIRC does not list (though still publishes?) certain licenses
or license versions considered by the license steward to be deprecated. Not
sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that an SPDX
identifier gets adopted after the approval of the license by the OSI (for most
OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is that the
kinds of things the OSI has been historically approving are not the same kinds
of things that SPDX assigns identifiers to. For example, the OSI has approved a
license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the SPDX
definition of licenses that match "MIT" (i.e., that match the ever-evolving XML
file that defines what "MIT" means). Indeed it cannot, because the OSI cannot
possibly know how SPDX might alter that XML file in the future. Or maybe it
has signalled approval of some range of licenses that are only trivially
different from the canonical MIT license, but that undefined set may not be
equivalent to the present/future set of license texts that match to SPDX "MIT".
Some confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a license is
is also evident (and maybe more obvious) when considering the "FSF-free"


Join { to automatically receive all group messages.