Topics

A proposal for Jilayne's foreign language challenge


Karsten Reincke
 

Dear SPDX community;

I am currently again sitting in a exciting session of the FSFE Legal and Licensing Workshop in Barcelona.

Yesterday we had the pleasure to listen to an exciting SPDX lecture - titled 'FOSS licenses and different languages' and held by Jilayne Lovejoy. She asked for feedback and ideas for some challenges. During a coffee break I could offer to her an idea of a solution. And she asked me, to publish the rough sketch in this mailing list. So, here it is:

The problem was that we have to deal with translations of original FOSS licenses. An example for such a set of related licenses is the EUPL. In this specific case the translations have an 'official' state. Other licenses are sometimes translated by 'not so official translators'. About the EUPL it is said that the official translations preserve the legal power with respect to the different European countries. Unfortunately - as one member of that LLW session mentioned - it turned out that this statement is not true.

So, the problem is how to reliably group licenses which are linked to other licenses in any sense. During our LLW session there exist the clear wish to create a specific SPDX file for each translation of each FOSS license. That means NOT to group the licenses - due to the fact that they are not 'identical'.

In consequence we will have a large set SPDX files. And that might reduce the usability of SPDX.

So, my proposal is, to classify each element of such a license cluster like EUPL by a number which indicates the distances to the original. The idea is to encode the reliability of a license in number. Using that technique would allow us to specify a license cluster by one SPDX file (and a distance number [which could be incorporated into the SPDX file]). The advantage of this proposal is that we finally approximately do not have more than licenses than the OSI license list.

For being able to use SPDX License Cluster Distance Value, we would have to define some dimensions whose values determine the distance to an original. Then we would have to prioritize these dimensions and values so that we get an ordered row of distance factors - ordered by priority. To create a distance number on that base is simple. The main idea would be:

The less that number the less the distance to the original.

How could look that concretely?

Let us link an English original to a zero. Here are some dimensions (which have been mentioned in the LLW session):

(0) Is the license an English written original? (YES=0 | NO=1)
(1) Is the license a translation / derivation (YES=2 | NO=0)
(3) Is the license an official translation (YES=0 | NO=4)
(4) Does the translated license preserve the legal power (YES=0 | UNKNOWN=8 | NO=16)

Finally build the sum.

With respect to the EUPL, this algorithm delivers the following distance values

a) English version = 0 + 0 + 0 + 0 = 0

b) Greek version =
1 (because it's not the English original) +
2 (because it is a translation) +
0 (because it is an official translation)
0 (because it preserves the legal power)
= 03

b) Spain version =
1 (because it's not the English original) +
2 (because it is a translation) +
0 (because it is an official translation)
8 (because it is unknown whether it preserves the legal power)
= 11

c) Freman [of the hypothetical prospective country French+German] version =
1 (because it's not the English original) +
2 (because it is a translation) +
4 (because it is an unofficial translation)
16 (because it does not preserve the legal power)
= 23

What does such a technique mean for one the problems Jilayne mentioned?

A.1) In the case, that we do not have an English spoken original, the SPDX License Cluster Distance Value would be 1 instead of zero, but nevertheless this number indicates a very small distance from the ideal. And it indicates, that the English spoken community might have (minor) problems to use such a licensed software.

A.1) On the other hand, if we have an English translation of a non English original, that license get the value (0 + 2 + x + y) whch clearly indicates that the distance to the original / ideal is greater than the distance between a foreign original and the ideal.

A predictable question:

This idea might evoke the idea also to cluster variants like BSD-4-Clause, BSD-3-Clause, BSD-2-Clause' and the newest version 'BSD-3-Clause with patent'. This would mean to encode also such contentual differences into the SPDX License Cluster Distance Value.

I don't like that idea. I think, that textual literal different license in the same language should ever have a different SPDX file - because they intentionally are different licenses.

A last remark:

In the LLW session someone voted for having only English originals. He argued that in case of foreign-language licenses, SPDX does not reliably know whether it really is a FOSS license. I can't follow that position:

Even as a English native speaker you do not know in case of an English written License, whether it is really a Free or Open Source License. This can only be evaluated by an established official process - as for example the OSI offered. Hence:

1) If SPDX strictly stuck to the OSI list of open source licenses that problem would not exist. All OSI licenses are English.

3) If SPDX wants to cover other licenses which are not blessed by any processes the problem of the reliable FOSS status is the same, in English and in foreign-language license. Foreign-language license have the advantage the they more clearly indicate the existence of the problem.

So, please feel free to use this idea, to throw it away, to find other dimensions, to refine the algorithm. The work you do is very valuable for the FOSS community - as we not only could see at the lecture Jilayne gave.

With best reards
Karsten


---
Deutsche Telekom Technik GmbH  / Infrastructure Cloud
Karsten Reincke, Senior Expert Key Projects - Telekom Open Source Committee
[display complete signatur: http://opensource.telekom.net/kreincke/kr-dtag-sign-en.txt ]


Alan Tse
 

Karsten,
Thanks for the thoughtful suggestion. I like it and think it could work. One issue I see is the issue we run into about trying to avoid making a legal judgment when classifying the licenses. That would imply we wouldn't use dimension 4 about "preserving legal power."

Also for dimension 3 regarding "official" licenses, perhaps we need some more gradation for something where it's not "official" but it's at least acknowledged or referenced. For example, the GPL translations aren't official: https://www.gnu.org/licenses/translations.en.html I think if we're factually relying on statements made by the license steward, it's less a concern about making a legal judgment.

Alan D. Tse

-----Original Message-----
From: spdx-bounces@... [mailto:spdx-bounces@...] On Behalf Of @kreincke
Sent: Friday, April 28, 2017 2:29 AM
To: spdx@...
Subject: A proposal for Jilayne's foreign language challenge

Dear SPDX community;

I am currently again sitting in a exciting session of the FSFE Legal and Licensing Workshop in Barcelona.

Yesterday we had the pleasure to listen to an exciting SPDX lecture - titled 'FOSS licenses and different languages' and held by Jilayne Lovejoy. She asked for feedback and ideas for some challenges. During a coffee break I could offer to her an idea of a solution. And she asked me, to publish the rough sketch in this mailing list. So, here it is:

The problem was that we have to deal with translations of original FOSS licenses. An example for such a set of related licenses is the EUPL. In this specific case the translations have an 'official' state. Other licenses are sometimes translated by 'not so official translators'. About the EUPL it is said that the official translations preserve the legal power with respect to the different European countries. Unfortunately - as one member of that LLW session mentioned - it turned out that this statement is not true.

So, the problem is how to reliably group licenses which are linked to other licenses in any sense. During our LLW session there exist the clear wish to create a specific SPDX file for each translation of each FOSS license. That means NOT to group the licenses - due to the fact that they are not 'identical'.

In consequence we will have a large set SPDX files. And that might reduce the usability of SPDX.

So, my proposal is, to classify each element of such a license cluster like EUPL by a number which indicates the distances to the original. The idea is to encode the reliability of a license in number. Using that technique would allow us to specify a license cluster by one SPDX file (and a distance number [which could be incorporated into the SPDX file]). The advantage of this proposal is that we finally approximately do not have more than licenses than the OSI license list.

For being able to use SPDX License Cluster Distance Value, we would have to define some dimensions whose values determine the distance to an original. Then we would have to prioritize these dimensions and values so that we get an ordered row of distance factors - ordered by priority. To create a distance number on that base is simple. The main idea would be:

The less that number the less the distance to the original.

How could look that concretely?

Let us link an English original to a zero. Here are some dimensions (which have been mentioned in the LLW session):

(0) Is the license an English written original? (YES=0 | NO=1)
(1) Is the license a translation / derivation (YES=2 | NO=0)
(3) Is the license an official translation (YES=0 | NO=4)
(4) Does the translated license preserve the legal power (YES=0 | UNKNOWN=8 | NO=16)

Finally build the sum.

With respect to the EUPL, this algorithm delivers the following distance values

a) English version = 0 + 0 + 0 + 0 = 0

b) Greek version =
1 (because it's not the English original) +
2 (because it is a translation) +
0 (because it is an official translation)
0 (because it preserves the legal power) = 03

b) Spain version =
1 (because it's not the English original) +
2 (because it is a translation) +
0 (because it is an official translation)
8 (because it is unknown whether it preserves the legal power) = 11

c) Freman [of the hypothetical prospective country French+German] version =
1 (because it's not the English original) +
2 (because it is a translation) +
4 (because it is an unofficial translation)
16 (because it does not preserve the legal power) = 23

What does such a technique mean for one the problems Jilayne mentioned?

A.1) In the case, that we do not have an English spoken original, the SPDX License Cluster Distance Value would be 1 instead of zero, but nevertheless this number indicates a very small distance from the ideal. And it indicates, that the English spoken community might have (minor) problems to use such a licensed software.

A.1) On the other hand, if we have an English translation of a non English original, that license get the value (0 + 2 + x + y) whch clearly indicates that the distance to the original / ideal is greater than the distance between a foreign original and the ideal.

A predictable question:

This idea might evoke the idea also to cluster variants like BSD-4-Clause, BSD-3-Clause, BSD-2-Clause' and the newest version 'BSD-3-Clause with patent'. This would mean to encode also such contentual differences into the SPDX License Cluster Distance Value.

I don't like that idea. I think, that textual literal different license in the same language should ever have a different SPDX file - because they intentionally are different licenses.

A last remark:

In the LLW session someone voted for having only English originals. He argued that in case of foreign-language licenses, SPDX does not reliably know whether it really is a FOSS license. I can't follow that position:

Even as a English native speaker you do not know in case of an English written License, whether it is really a Free or Open Source License. This can only be evaluated by an established official process - as for example the OSI offered. Hence:

1) If SPDX strictly stuck to the OSI list of open source licenses that problem would not exist. All OSI licenses are English.

3) If SPDX wants to cover other licenses which are not blessed by any processes the problem of the reliable FOSS status is the same, in English and in foreign-language license. Foreign-language license have the advantage the they more clearly indicate the existence of the problem.

So, please feel free to use this idea, to throw it away, to find other dimensions, to refine the algorithm. The work you do is very valuable for the FOSS community - as we not only could see at the lecture Jilayne gave.

With best reards
Karsten


---
Deutsche Telekom Technik GmbH  / Infrastructure Cloud Karsten Reincke, Senior Expert Key Projects - Telekom Open Source Committee [display complete signatur: http://opensource.telekom.net/kreincke/kr-dtag-sign-en.txt ]

_______________________________________________
Spdx mailing list
Spdx@...
https://lists.spdx.org/mailman/listinfo/spdx
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


Karsten Reincke
 

Dear Alan

Karsten,
Thanks for the thoughtful suggestion. I like it and think it could
work.
I am happy for having been able to help. I need a running SPDX system for my further work. So, it is not totally unselfish ;-)

One issue I see is the issue we run into about trying to avoid
making a legal judgment when classifying the licenses. That would
imply we wouldn't use dimension 4 about "preserving legal power."
It is important that you define the list of necessary dimensions: you are the SPDX experts. I personally agree with your attitude: Inserting such a value could make SPDX a bit pejorative (and will surely evoke unnecessary discussions). Howsoever, I inserted that dimension only because it has been mentioned/requested on the LLW.

Also for dimension 3 regarding "official" licenses, perhaps we need
some more gradation for something where it's not "official" but it's at
least acknowledged or referenced. For example, the GPL translations
aren't official: https://www.gnu.org/licenses/translations.en.html I
think if we're factually relying on statements made by the license
steward, it's less a concern about making a legal judgment.
Such a differentiation would be helpful. Together with the simplification not to use the dimension 'legal power' you can use a better and simpler representation:

licenses
- original
- English 00
- foreign 01
- translation
- approved 10
- audited 20
- ...
- unclear f0

Feel free to expand and redesign this little domain

With best regards
Karsten

---
Deutsche Telekom Technik GmbH / Infrastructure Cloud
Karsten Reincke, Senior Expert Key Projects - Telekom Open Source Committee
[display complete signatur: http://opensource.telekom.net/kreincke/kr-dtag-sign-en.txt ]


Brad Edmondson
 

Thanks Karsten for sharing your idea. 

It's a very interesting one, and compresses a lot of information into a small representation, sort of like a bitfield. I wonder, though, if that's really necessary given the verbosity we're otherwise already accepting with XML/RDF/JSON/etc. representations of the licenses on the license list. Could we represent the same information in a format both human- and machine-readable?


First, let me say that I emphatically believe SPDX should cover non-English licenses. The world of FOSS software contribution is multilingual, and I think SPDX should be as well. This may require some extra work when adding a new license (finding a native-speaking attorney, English review of an auto-translation, or something in between), but I think it will prove worthwhile in the end as we expand coverage to all widely-used FOSS contribution languages. In addition, we have the license list source-controlled so that we can make changes and fix issues over time, so I wouldn't be too worried about our ability to make corrections if we felt an addition was ultimately a mistake.

Second, my current opinion is that each license/language text should be tracked, treated, and marked up individually by SPDX, i.e. one license for GPL-en, another for GPL-de, another for GPL-fr, etc. (presumably 24 for the EUPL?). To my mind, these are collections of related license texts, not multiple ways to get to the "same" license, since even if the "same" license is in fact what the author intended (e.g. in the case of an "official" translation), it would still be up to a court to decide whether the legal terms as represented in one language are identical to a similar, purportedly "identical" representation in another language (even in the same jurisdiction). So I would say, let's track them all, and get at the problem of relating one to another with more metadata.

Third, assuming all license translations are individually tracked, I think the best way to go about relating them to each other is to use something as close to native XML as possible. We already have the unique identifiers, XML tags, and attributes for each license, so why not add an XML tag that can reference another license by unique identifier? For EU Public License in German, that might look something like this:
...
<relatedLicenses>
   <relatedLicense relationshipType="official-translation" targetLicenseIdentifier="EUPL-1.1">EUPL-1.1</relatedLicense>
</relatedLicenses>
...
Other relationshipTypes might be "unofficial-translation," "official-translation-ported," "official-translation-unported," and "derived-from" (there may be others, or maybe we don't need all of those). That just represents the facts as we've perceived them, without getting into too much judgment as to how close the relationship might be. This would allow us to say, essentially, "this is what we think the relationships are; have your open-source counsel review what that means for you."


Another way of throwing data at the problem might be to individually track all of the licenses, without built-in cross-references to other license IDs, but at the same time also publish a separate document specifying which of those licenses are related to each other and what kind of bundles those are. This is the same data as proposed in the previous paragraph, but laid out explicitly (again with reference to the unique license ID) rather than emergent from the XML. I think I prefer the emergent solution, but that's just me, and what I think today. I'm no XML expert, just a young attorney with a bit of programming experience doing my best to help.


What do others think of this? Should we have Kate add handling multi-language licenses to the tech team's spec discussion?

Best,
Brad

PS - Preemptive apologies to Jilayne -- I'm guessing your preferred solution would not be "just make the license list longer!" -- but I do actually think that's the best way to handle these clusters of related licenses (plus a little more metadata about relationships).     :-)

--
Brad Edmondson, Esq.
512-673-8782 | brad.edmondson@...

On Wed, May 3, 2017 at 10:33 AM, <Karsten.Reincke@...> wrote:
Dear Alan

> Karsten,
> Thanks for the thoughtful suggestion. I like it and think it could
> work.

I am happy for having been able to help. I need a running SPDX system for my further work. So, it is not totally unselfish ;-)

> One issue I see is the issue we run into about trying to avoid
> making a legal judgment when classifying the licenses.  That would
> imply we wouldn't use dimension 4 about "preserving legal power."

It is important that you define the list of necessary dimensions: you are the SPDX experts. I personally agree with your attitude: Inserting such a value could make SPDX a bit pejorative (and will surely evoke unnecessary discussions). Howsoever, I inserted that dimension only because it has been mentioned/requested on the LLW.

> Also for dimension 3 regarding "official" licenses, perhaps we need
> some more gradation for something where it's not "official" but it's at
> least acknowledged or referenced.  For example, the GPL translations
> aren't official: https://www.gnu.org/licenses/translations.en.html  I
> think if we're factually relying on statements made by the license
> steward, it's less a concern about making a legal judgment.

Such a differentiation would be helpful. Together with the simplification not to use the dimension 'legal power' you can use a better and simpler representation:

licenses
- original
  - English 00
  - foreign 01
- translation
  - approved 10
  - audited 20
  - ...
  - unclear f0

Feel free to expand and redesign this little domain

With best regards
Karsten

---
Deutsche Telekom Technik GmbH  / Infrastructure Cloud
Karsten Reincke, Senior Expert Key Projects - Telekom Open Source Committee
[display complete signatur: http://opensource.telekom.net/kreincke/kr-dtag-sign-en.txt ]

_______________________________________________
Spdx mailing list
Spdx@...
https://lists.spdx.org/mailman/listinfo/spdx


J Lovejoy
 

Thanks so much Karsten for posting this, as my memory would have never sparked such discussion and now we have the benefit of your input, as well as the others in the SPDX team!

I am copying this over to the SPDX-legal mailing list as that is where the will continue, although as Brad already noted, I think this is a cross-team issue.  As such, this is helpful to have started the discussion on the general list for exposure and I will also mention it today at the general call.  If anyone is on the general list, but not on the legal mailing list, please do join so you can be part of the continued discussions: https://lists.spdx.org/mailman/listinfo/spdx-legal

And to Brad’s PS - if we need to make the license list longer to accommodate the best, international solution to this, then that is what we need to do!  I’m trusting that with the switch to XML format and using Github, maintaining the license list will fall on more shoulders going forward, as we have already seen in the transition!!

Thanks,
Jilayne

SPDX Legal Team co-lead
opensource@...


On May 4, 2017, at 12:18 AM, Brad Edmondson <brad.edmondson@...> wrote:

Thanks Karsten for sharing your idea. 

It's a very interesting one, and compresses a lot of information into a small representation, sort of like a bitfield. I wonder, though, if that's really necessary given the verbosity we're otherwise already accepting with XML/RDF/JSON/etc. representations of the licenses on the license list. Could we represent the same information in a format both human- and machine-readable?


First, let me say that I emphatically believe SPDX should cover non-English licenses. The world of FOSS software contribution is multilingual, and I think SPDX should be as well. This may require some extra work when adding a new license (finding a native-speaking attorney, English review of an auto-translation, or something in between), but I think it will prove worthwhile in the end as we expand coverage to all widely-used FOSS contribution languages. In addition, we have the license list source-controlled so that we can make changes and fix issues over time, so I wouldn't be too worried about our ability to make corrections if we felt an addition was ultimately a mistake.

Second, my current opinion is that each license/language text should be tracked, treated, and marked up individually by SPDX, i.e. one license for GPL-en, another for GPL-de, another for GPL-fr, etc. (presumably 24 for the EUPL?). To my mind, these are collections of related license texts, not multiple ways to get to the "same" license, since even if the "same" license is in fact what the author intended (e.g. in the case of an "official" translation), it would still be up to a court to decide whether the legal terms as represented in one language are identical to a similar, purportedly "identical" representation in another language (even in the same jurisdiction). So I would say, let's track them all, and get at the problem of relating one to another with more metadata.

Third, assuming all license translations are individually tracked, I think the best way to go about relating them to each other is to use something as close to native XML as possible. We already have the unique identifiers, XML tags, and attributes for each license, so why not add an XML tag that can reference another license by unique identifier? For EU Public License in German, that might look something like this:
...
<relatedLicenses>
   <relatedLicense relationshipType="official-translation" targetLicenseIdentifier="EUPL-1.1">EUPL-1.1</relatedLicense>
</relatedLicenses>
...
Other relationshipTypes might be "unofficial-translation," "official-translation-ported," "official-translation-unported," and "derived-from" (there may be others, or maybe we don't need all of those). That just represents the facts as we've perceived them, without getting into too much judgment as to how close the relationship might be. This would allow us to say, essentially, "this is what we think the relationships are; have your open-source counsel review what that means for you."


Another way of throwing data at the problem might be to individually track all of the licenses, without built-in cross-references to other license IDs, but at the same time also publish a separate document specifying which of those licenses are related to each other and what kind of bundles those are. This is the same data as proposed in the previous paragraph, but laid out explicitly (again with reference to the unique license ID) rather than emergent from the XML. I think I prefer the emergent solution, but that's just me, and what I think today. I'm no XML expert, just a young attorney with a bit of programming experience doing my best to help.


What do others think of this? Should we have Kate add handling multi-language licenses to the tech team's spec discussion?

Best,
Brad

PS - Preemptive apologies to Jilayne -- I'm guessing your preferred solution would not be "just make the license list longer!" -- but I do actually think that's the best way to handle these clusters of related licenses (plus a little more metadata about relationships).     :-)

--
Brad Edmondson, Esq.
512-673-8782 | brad.edmondson@...

On Wed, May 3, 2017 at 10:33 AM, <Karsten.Reincke@...> wrote:
Dear Alan

> Karsten,
> Thanks for the thoughtful suggestion. I like it and think it could
> work.

I am happy for having been able to help. I need a running SPDX system for my further work. So, it is not totally unselfish ;-)

> One issue I see is the issue we run into about trying to avoid
> making a legal judgment when classifying the licenses.  That would
> imply we wouldn't use dimension 4 about "preserving legal power."

It is important that you define the list of necessary dimensions: you are the SPDX experts. I personally agree with your attitude: Inserting such a value could make SPDX a bit pejorative (and will surely evoke unnecessary discussions). Howsoever, I inserted that dimension only because it has been mentioned/requested on the LLW.

> Also for dimension 3 regarding "official" licenses, perhaps we need
> some more gradation for something where it's not "official" but it's at
> least acknowledged or referenced.  For example, the GPL translations
> aren't official: https://www.gnu.org/licenses/translations.en.html  I
> think if we're factually relying on statements made by the license
> steward, it's less a concern about making a legal judgment.

Such a differentiation would be helpful. Together with the simplification not to use the dimension 'legal power' you can use a better and simpler representation:

licenses
- original
  - English 00
  - foreign 01
- translation
  - approved 10
  - audited 20
  - ...
  - unclear f0

Feel free to expand and redesign this little domain

With best regards
Karsten

---
Deutsche Telekom Technik GmbH  / Infrastructure Cloud
Karsten Reincke, Senior Expert Key Projects - Telekom Open Source Committee
[display complete signatur: http://opensource.telekom.net/kreincke/kr-dtag-sign-en.txt ]

_______________________________________________
Spdx mailing list
Spdx@...
https://lists.spdx.org/mailman/listinfo/spdx

_______________________________________________
Spdx mailing list
Spdx@...
https://lists.spdx.org/mailman/listinfo/spdx