Mismatches between OSI and SPDX


Max Mehl
 

Dear all,

 

In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.

 

Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.

 

I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

 

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?

 

Thanks for any insights.

 

Best,

Max

 

 

[1]: https://opensource.org/licenses/alphabetical

[2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.json

 

--

Max Mehl

Open Source Strategy & Governance

Enterprise-Team Chief Technology Office (CTO), T.IP E-T-378

 

DB Systel GmbH

Jürgen-Ponto-Platz 1, 60329 Frankfurt/M

 




Pflichtangaben anzeigen

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: https://www.deutschebahn.com/de/konzern/datenschutz


J Lovejoy
 

Max,

All of what you have done here was already done years ago (~2011, mostly by me, working with various OSI members at that time) in terms of "matching" up the OSI list and is documented on the SPDX-legal mailing lists archives. I wish you had asked first before expending this effort!

I will respond in length in detail to your list and larger questions later or Monday :)

Thanks,
Jilayne
SPDX-legal co-lead

On 12/9/22 2:19 AM, Max Mehl wrote:

Dear all,

 

In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.

 

Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.

 

I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

 

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?

 

Thanks for any insights.

 

Best,

Max

 

 

[1]: https://opensource.org/licenses/alphabetical

[2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.json

 

--

Max Mehl

Open Source Strategy & Governance

Enterprise-Team Chief Technology Office (CTO), T.IP E-T-378

 

DB Systel GmbH

Jürgen-Ponto-Platz 1, 60329 Frankfurt/M

 




Pflichtangaben anzeigen

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: https://www.deutschebahn.com/de/konzern/datenschutz


Richard Fontana
 

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...> wrote:

In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.

Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.

I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?

[1]: https://opensource.org/licenses/alphabetical

[2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.json
Not speaking for SPDX or OSI: To some degree this is a known problem,
and possibly viewable as not a problem in some cases. Some issues I
see embodied in your list:

1. In some cases licenses published on the OSI website are incorrect
in the sense that they do not match widely used versions of the
license text that the OSI probably intended to be the approved license
text. I think these cases have all been noticed through activity
related to creation or adoption of SPDX identifiers -- for example,
Fedora recently adopted use of SPDX identifiers for package license
metadata and early on it was noticed that the license SPDX calls
Python-2.0, which I assume is a faithful copy of the corresponding
license text from the OSI website, does not actually match the license
(or "license stack") text used in known releases of CPython, so SPDX
added Python-2.0.1 to capture the latter text. There is a similar
situation involving the Artistic License 1.0.

I think it is not reasonable to expect the OSI to have historically
applied the degree of rigor SPDX applies in associating an identifier
with a matchable license text (where "matching" is a concept that SPDX
has itself defined). This simply didn't exist in FOSS before SPDX; it
was foreign to the culture. I'm not excusing outright mistakes in
published licenses though (see e.g.
https://github.com/spdx/license-list-XML/issues/1653). For submitted
licenses, I can tell you from my time on the board that OSI assumes
the license submitter has the correct text. In some cases the
"incorrect" text gets adopted by projects, at which point it is
questionable whether it is really incorrect.

2. You list some cases of 'WITH' expressions. The OSI has been
reluctant to approve license exceptions, except in a few special cases
where the exception (or exception coupled with a standard license) is
itself thought of as a single license (e.g. LGPL version 3; ec0s-2.0
is also like this). From my recollection of my time on the OSI board,
the main concern was the potential numerosity of license submissions
if the OSI encouraged submission of exceptions. There's been a
tendency to assume that typical types of GPL exceptions are legit (for
a GPL-world notion of legit) because they conform to the model of a
grant of additional permission -- I need to comment on this issue on
another recent thread.

3. The OSI website IIRC does not list (though still publishes?)
certain licenses or license versions considered by the license steward
to be deprecated. Not sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that
an SPDX identifier gets adopted after the approval of the license by
the OSI (for most OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is
that the kinds of things the OSI has been historically approving are
not the same kinds of things that SPDX assigns identifiers to. For
example, the OSI has approved a license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the
SPDX definition of licenses that match "MIT" (i.e., that match the
ever-evolving XML file that defines what "MIT" means). Indeed it
cannot, because the OSI cannot possibly know how SPDX might alter that
XML file in the future. Or maybe it has signalled approval of some
range of licenses that are only trivially different from the canonical
MIT license, but that undefined set may not be equivalent to the
present/future set of license texts that match to SPDX "MIT". Some
confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a
license is is also evident (and maybe more obvious) when considering
the "FSF-free" category.

Richard


Gary O'Neall
 

One more input on this discussion - I've been collaborating with OSI on a more automated way of keeping the SPDX and OSI license lists in sync.

This issue tracks the remaining items that need correction to the OSI data: https://github.com/OpenSourceOrg/licenses/issues/62

Note that some of these also impact what is on the OSI website.

Once all of the issues have been addressed and assuming no new issue have been introduced, I plan to enhance the tools that generate our license list data to keep the OSI related data in sync when we publish a new version of the license list.

Gary

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of
Richard Fontana
Sent: Friday, December 9, 2022 7:27 AM
To: Max Mehl <max.mehl@...>
Cc: spdx-legal@...
Subject: Re: Mismatches between OSI and SPDX

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...>
wrote:

In my organisation, we define all licenses approved by OSI as valid Open
Source licenses. However, we also increasingly rely on SPDX and therefore
also its license list.

Recently, we found several mismatches between OSI’s list of approved
licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or
wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list
that cannot be found on the OSI website.

I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to
better align both sources?

[1]: https://opensource.org/licenses/alphabetical

[2]:
https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.jso
n
Not speaking for SPDX or OSI: To some degree this is a known problem, and
possibly viewable as not a problem in some cases. Some issues I see
embodied in your list:

1. In some cases licenses published on the OSI website are incorrect in the
sense that they do not match widely used versions of the license text that the
OSI probably intended to be the approved license text. I think these cases have
all been noticed through activity related to creation or adoption of SPDX
identifiers -- for example, Fedora recently adopted use of SPDX identifiers for
package license metadata and early on it was noticed that the license SPDX
calls Python-2.0, which I assume is a faithful copy of the corresponding license
text from the OSI website, does not actually match the license (or "license
stack") text used in known releases of CPython, so SPDX added Python-2.0.1
to capture the latter text. There is a similar situation involving the Artistic
License 1.0.

I think it is not reasonable to expect the OSI to have historically applied the
degree of rigor SPDX applies in associating an identifier with a matchable
license text (where "matching" is a concept that SPDX has itself defined). This
simply didn't exist in FOSS before SPDX; it was foreign to the culture. I'm not
excusing outright mistakes in published licenses though (see e.g.
https://github.com/spdx/license-list-XML/issues/1653). For submitted
licenses, I can tell you from my time on the board that OSI assumes the
license submitter has the correct text. In some cases the "incorrect" text gets
adopted by projects, at which point it is questionable whether it is really
incorrect.

2. You list some cases of 'WITH' expressions. The OSI has been reluctant to
approve license exceptions, except in a few special cases where the exception
(or exception coupled with a standard license) is itself thought of as a single
license (e.g. LGPL version 3; ec0s-2.0 is also like this). From my recollection of
my time on the OSI board, the main concern was the potential numerosity of
license submissions if the OSI encouraged submission of exceptions. There's
been a tendency to assume that typical types of GPL exceptions are legit (for a
GPL-world notion of legit) because they conform to the model of a grant of
additional permission -- I need to comment on this issue on another recent
thread.

3. The OSI website IIRC does not list (though still publishes?) certain licenses
or license versions considered by the license steward to be deprecated. Not
sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that an SPDX
identifier gets adopted after the approval of the license by the OSI (for most
OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is that the
kinds of things the OSI has been historically approving are not the same kinds
of things that SPDX assigns identifiers to. For example, the OSI has approved a
license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the SPDX
definition of licenses that match "MIT" (i.e., that match the ever-evolving XML
file that defines what "MIT" means). Indeed it cannot, because the OSI cannot
possibly know how SPDX might alter that XML file in the future. Or maybe it
has signalled approval of some range of licenses that are only trivially
different from the canonical MIT license, but that undefined set may not be
equivalent to the present/future set of license texts that match to SPDX "MIT".
Some confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a license is
is also evident (and maybe more obvious) when considering the "FSF-free"
category.

Richard




J Lovejoy
 

Hi again Max, and thanks Richard for filling in on some of this.

Max - While you are not the first person who has asked about some of these, you might be the first person to have done such a thorough review!

When SPDX decided that every license ever approved by the OSI should be included on the SPDX License List and the OSI decided to adopt use of the SPDX license ids in their URLs and on the license pages, it kicked off a bunch of work by both sides. I led that on the SPDX side and collaborated with a handful of OSI board members over a few years. We didn't get everything perfectly tidied up, so much of what you are noticing are the things that we sorted out, but maybe didn't get to an ideal end result.

This has made me think that this history/background would probably be a good to document some of these known issues because 1) digging through email archives is not exactly time efficient or intuitive;  2) if a few people have asked a similar thing, it'd probably be good to document; and 3) relying on one person's memory is not a sustainable model!

To that end, I've taken your list and started to create a page in the SPDX Legal List DOCS area to document/explain this all. This is a PR in-progress at this point. It will take a bit of time to get it in proper shape and merged, but at least it's a start! https://github.com/spdx/license-list-XML/pull/1738

I may have a few questions for you as to what exactly you observed in your research and will follow-up accordingly. Stay tuned.

Thanks,
Jilayne

On 12/9/22 8:26 AM, Richard Fontana wrote:

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...> wrote:

In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.

Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].

Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.

I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?

[1]: https://opensource.org/licenses/alphabetical

[2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.json
Not speaking for SPDX or OSI: To some degree this is a known problem,
and possibly viewable as not a problem in some cases. Some issues I
see embodied in your list:

1. In some cases licenses published on the OSI website are incorrect
in the sense that they do not match widely used versions of the
license text that the OSI probably intended to be the approved license
text. I think these cases have all been noticed through activity
related to creation or adoption of SPDX identifiers -- for example,
Fedora recently adopted use of SPDX identifiers for package license
metadata and early on it was noticed that the license SPDX calls
Python-2.0, which I assume is a faithful copy of the corresponding
license text from the OSI website, does not actually match the license
(or "license stack") text used in known releases of CPython, so SPDX
added Python-2.0.1 to capture the latter text. There is a similar
situation involving the Artistic License 1.0.

I think it is not reasonable to expect the OSI to have historically
applied the degree of rigor SPDX applies in associating an identifier
with a matchable license text (where "matching" is a concept that SPDX
has itself defined). This simply didn't exist in FOSS before SPDX; it
was foreign to the culture. I'm not excusing outright mistakes in
published licenses though (see e.g.
https://github.com/spdx/license-list-XML/issues/1653). For submitted
licenses, I can tell you from my time on the board that OSI assumes
the license submitter has the correct text. In some cases the
"incorrect" text gets adopted by projects, at which point it is
questionable whether it is really incorrect.

2. You list some cases of 'WITH' expressions. The OSI has been
reluctant to approve license exceptions, except in a few special cases
where the exception (or exception coupled with a standard license) is
itself thought of as a single license (e.g. LGPL version 3; ec0s-2.0
is also like this). From my recollection of my time on the OSI board,
the main concern was the potential numerosity of license submissions
if the OSI encouraged submission of exceptions. There's been a
tendency to assume that typical types of GPL exceptions are legit (for
a GPL-world notion of legit) because they conform to the model of a
grant of additional permission -- I need to comment on this issue on
another recent thread.

3. The OSI website IIRC does not list (though still publishes?)
certain licenses or license versions considered by the license steward
to be deprecated. Not sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that
an SPDX identifier gets adopted after the approval of the license by
the OSI (for most OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is
that the kinds of things the OSI has been historically approving are
not the same kinds of things that SPDX assigns identifiers to. For
example, the OSI has approved a license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the
SPDX definition of licenses that match "MIT" (i.e., that match the
ever-evolving XML file that defines what "MIT" means). Indeed it
cannot, because the OSI cannot possibly know how SPDX might alter that
XML file in the future. Or maybe it has signalled approval of some
range of licenses that are only trivially different from the canonical
MIT license, but that undefined set may not be equivalent to the
present/future set of license texts that match to SPDX "MIT". Some
confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a
license is is also evident (and maybe more obvious) when considering
the "FSF-free" category.

Richard








J Lovejoy
 

Awesome, Gary! Thanks for the update. Looks like some of the items listed in the issue you link to are on Max's list, so that's good.

Jilayne

On 12/9/22 12:12 PM, Gary O'Neall wrote:

One more input on this discussion - I've been collaborating with OSI on a more automated way of keeping the SPDX and OSI license lists in sync.

This issue tracks the remaining items that need correction to the OSI data: https://github.com/OpenSourceOrg/licenses/issues/62

Note that some of these also impact what is on the OSI website.

Once all of the issues have been addressed and assuming no new issue have been introduced, I plan to enhance the tools that generate our license list data to keep the OSI related data in sync when we publish a new version of the license list.

Gary

-----Original Message-----
From: Spdx-legal@... <Spdx-legal@...> On Behalf Of
Richard Fontana
Sent: Friday, December 9, 2022 7:27 AM
To: Max Mehl <max.mehl@...>
Cc: spdx-legal@...
Subject: Re: Mismatches between OSI and SPDX

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...>
wrote:

In my organisation, we define all licenses approved by OSI as valid Open
Source licenses. However, we also increasingly rely on SPDX and therefore
also its license list.
Recently, we found several mismatches between OSI’s list of approved
licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].
Certainly, some of these issues are on OSI’s side (e.g., misleading links or
wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list
that cannot be found on the OSI website.
I documented my findings for all issues in this gist:

https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552

Now, I am sure I’m not the first who notices this. Is this a known problem?

Is the OSI website incomplete and/or SPDX list incorrect? What can we do to
better align both sources?
[1]: https://opensource.org/licenses/alphabetical

[2]:
https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.jso
n
Not speaking for SPDX or OSI: To some degree this is a known problem, and
possibly viewable as not a problem in some cases. Some issues I see
embodied in your list:

1. In some cases licenses published on the OSI website are incorrect in the
sense that they do not match widely used versions of the license text that the
OSI probably intended to be the approved license text. I think these cases have
all been noticed through activity related to creation or adoption of SPDX
identifiers -- for example, Fedora recently adopted use of SPDX identifiers for
package license metadata and early on it was noticed that the license SPDX
calls Python-2.0, which I assume is a faithful copy of the corresponding license
text from the OSI website, does not actually match the license (or "license
stack") text used in known releases of CPython, so SPDX added Python-2.0.1
to capture the latter text. There is a similar situation involving the Artistic
License 1.0.

I think it is not reasonable to expect the OSI to have historically applied the
degree of rigor SPDX applies in associating an identifier with a matchable
license text (where "matching" is a concept that SPDX has itself defined). This
simply didn't exist in FOSS before SPDX; it was foreign to the culture. I'm not
excusing outright mistakes in published licenses though (see e.g.
https://github.com/spdx/license-list-XML/issues/1653). For submitted
licenses, I can tell you from my time on the board that OSI assumes the
license submitter has the correct text. In some cases the "incorrect" text gets
adopted by projects, at which point it is questionable whether it is really
incorrect.

2. You list some cases of 'WITH' expressions. The OSI has been reluctant to
approve license exceptions, except in a few special cases where the exception
(or exception coupled with a standard license) is itself thought of as a single
license (e.g. LGPL version 3; ec0s-2.0 is also like this). From my recollection of
my time on the OSI board, the main concern was the potential numerosity of
license submissions if the OSI encouraged submission of exceptions. There's
been a tendency to assume that typical types of GPL exceptions are legit (for a
GPL-world notion of legit) because they conform to the model of a grant of
additional permission -- I need to comment on this issue on another recent
thread.

3. The OSI website IIRC does not list (though still publishes?) certain licenses
or license versions considered by the license steward to be deprecated. Not
sure if that accounts for anything on your list.

4. Use of SPDX identifiers: Probably the main issue here today is that an SPDX
identifier gets adopted after the approval of the license by the OSI (for most
OSI-approved licenses in recent memory).

One basic issue here, which is not really acknowledged by anyone, is that the
kinds of things the OSI has been historically approving are not the same kinds
of things that SPDX assigns identifiers to. For example, the OSI has approved a
license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the SPDX
definition of licenses that match "MIT" (i.e., that match the ever-evolving XML
file that defines what "MIT" means). Indeed it cannot, because the OSI cannot
possibly know how SPDX might alter that XML file in the future. Or maybe it
has signalled approval of some range of licenses that are only trivially
different from the canonical MIT license, but that undefined set may not be
equivalent to the present/future set of license texts that match to SPDX "MIT".
Some confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a license is
is also evident (and maybe more obvious) when considering the "FSF-free"
category.

Richard












Max Mehl
 

Hi all,

 

Thanks for the swift replies and sharing some background information.

While I knew that some mismatches have been brought up in the past, I wanted to understand it myself and therefore dedicated some time and lines of code.

 

I appreciate the approach of documenting this in the Github repository and already commented on the PR where I saw questions directed at me.

It feels like a good checklist of issues both SPDX and OSI can work on to sync the last pieces.

 

Looking forward to contributing to solving the remaining mismatches.

Please let me know if you need a helping hand during the process.

 

Best,

Max

 

--

Max Mehl

Open Source Strategy & Governance

Enterprise-Team Chief Technology Office (CTO), T.IP E-T-378

 

DB Systel GmbH

Jürgen-Ponto-Platz 1, 60329 Frankfurt/M

 

Von: Spdx-legal@... <Spdx-legal@...> Im Auftrag von J Lovejoy via lists.spdx.org
Gesendet: Montag, 12. Dezember 2022 07:03
An: Richard Fontana <rfontana@...>; Max Mehl <Max.Mehl@...>
Cc: spdx-legal@...
Betreff: Re: Mismatches between OSI and SPDX

 

Sie erhalten nicht oft eine E-Mail von opensource=jilayne.com@.... Erfahren Sie, warum dies wichtig ist

Hi again Max, and thanks Richard for filling in on some of this.

Max - While you are not the first person who has asked about some of these, you might be the first person to have done such a thorough review!

When SPDX decided that every license ever approved by the OSI should be included on the SPDX License List and the OSI decided to adopt use of the SPDX license ids in their URLs and on the license pages, it kicked off a bunch of work by both sides. I led that on the SPDX side and collaborated with a handful of OSI board members over a few years. We didn't get everything perfectly tidied up, so much of what you are noticing are the things that we sorted out, but maybe didn't get to an ideal end result.

This has made me think that this history/background would probably be a good to document some of these known issues because 1) digging through email archives is not exactly time efficient or intuitive;  2) if a few people have asked a similar thing, it'd probably be good to document; and 3) relying on one person's memory is not a sustainable model!

To that end, I've taken your list and started to create a page in the SPDX Legal List DOCS area to document/explain this all. This is a PR in-progress at this point. It will take a bit of time to get it in proper shape and merged, but at least it's a start! https://github.com/spdx/license-list-XML/pull/1738

I may have a few questions for you as to what exactly you observed in your research and will follow-up accordingly. Stay tuned.

Thanks,
Jilayne

On 12/9/22 8:26 AM, Richard Fontana wrote:

On Fri, Dec 9, 2022 at 4:22 AM Max Mehl <max.mehl@...> wrote:
 
In my organisation, we define all licenses approved by OSI as valid Open Source licenses. However, we also increasingly rely on SPDX and therefore also its license list.
 
Recently, we found several mismatches between OSI’s list of approved licenses [1] and the licenses marked as OSI-approved in SPDX’s list [2].
 
Certainly, some of these issues are on OSI’s side (e.g., misleading links or wrong SPDX identifiers). But most mismatches are from licenses on SPDX’s list that cannot be found on the OSI website.
 
I documented my findings for all issues in this gist:
 
https://gist.github.com/mxmehl/1e7a3aed4ff14a8ddfd4aff8ab4de552
 
Now, I am sure I’m not the first who notices this. Is this a known problem?
 
Is the OSI website incomplete and/or SPDX list incorrect? What can we do to better align both sources?
 
[1]: https://opensource.org/licenses/alphabetical
 
[2]: https://github.com/spdx/license-list-data/blob/v3.19/json/licenses.json
 
Not speaking for SPDX or OSI: To some degree this is a known problem,
and possibly viewable as not a problem in some cases. Some issues I
see embodied in your list:
 
1. In some cases licenses published on the OSI website are incorrect
in the sense that they do not match widely used versions of the
license text that the OSI probably intended to be the approved license
text. I think these cases have all been noticed through activity
related to creation or adoption of SPDX identifiers -- for example,
Fedora recently adopted use of SPDX identifiers for package license
metadata and early on it was noticed that the license SPDX calls
Python-2.0, which I assume is a faithful copy of the corresponding
license text from the OSI website, does not actually match the license
(or "license stack") text used in known releases of CPython, so SPDX
added Python-2.0.1 to capture the latter text. There is a similar
situation involving the Artistic License 1.0.
 
I think it is not reasonable to expect the OSI to have historically
applied the degree of rigor SPDX applies in associating an identifier
with a matchable license text (where "matching" is a concept that SPDX
has itself defined). This simply didn't exist in FOSS before SPDX; it
was foreign to the culture. I'm not excusing outright mistakes in
published licenses though (see e.g.
https://github.com/spdx/license-list-XML/issues/1653). For submitted
licenses, I can tell you from my time on the board that OSI assumes
the license submitter has the correct text. In some cases the
"incorrect" text gets adopted by projects, at which point it is
questionable whether it is really incorrect.
 
2. You list some cases of 'WITH' expressions. The OSI has been
reluctant to approve license exceptions, except in a few special cases
where the exception (or exception coupled with a standard license) is
itself thought of as a single license (e.g. LGPL version 3; ec0s-2.0
is also like this). From my recollection of my time on the OSI board,
the main concern was the potential numerosity of license submissions
if the OSI encouraged submission of exceptions. There's been a
tendency to assume that typical types of GPL exceptions are legit (for
a GPL-world notion of legit) because they conform to the model of a
grant of additional permission -- I need to comment on this issue on
another recent thread.
 
3. The OSI website IIRC does not list (though still publishes?)
certain licenses or license versions considered by the license steward
to be deprecated. Not sure if that accounts for anything on your list.
 
4. Use of SPDX identifiers: Probably the main issue here today is that
an SPDX identifier gets adopted after the approval of the license by
the OSI (for most OSI-approved licenses in recent memory).
 
One basic issue here, which is not really acknowledged by anyone, is
that the kinds of things the OSI has been historically approving are
not the same kinds of things that SPDX assigns identifiers to. For
example, the OSI has approved a license text it calls the MIT license.
It has not approved the (infinite?) set of license texts that meet the
SPDX definition of licenses that match "MIT" (i.e., that match the
ever-evolving XML file that defines what "MIT" means). Indeed it
cannot, because the OSI cannot possibly know how SPDX might alter that
XML file in the future. Or maybe it has signalled approval of some
range of licenses that are only trivially different from the canonical
MIT license, but that undefined set may not be equivalent to the
present/future set of license texts that match to SPDX "MIT". Some
confusion has resulted from a sort of collectively forced merger of
these different concepts of "license". This looseness around what a
license is is also evident (and maybe more obvious) when considering
the "FSF-free" category.
 
Richard
 
 
 
 
 
 

 




Pflichtangaben anzeigen

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: https://www.deutschebahn.com/de/konzern/datenschutz