Thanks Karsten for sharing your idea.
It's a very interesting one, and compresses a lot of information into a small representation, sort of like a bitfield. I wonder, though, if that's really necessary given the verbosity we're otherwise already accepting with XML/RDF/JSON/etc. representations of the licenses on the license list. Could we represent the same information in a format both human- and machine-readable?
First, let me say that I emphatically believe SPDX should cover non-English licenses. The world of FOSS software contribution is multilingual, and I think SPDX should be as well. This may require some extra work when adding a new license (finding a native-speaking attorney, English review of an auto-translation, or something in between), but I think it will prove worthwhile in the end as we expand coverage to all widely-used FOSS contribution languages. In addition, we have the license list source-controlled so that we can make changes and fix issues over time, so I wouldn't be too worried about our ability to make corrections if we felt an addition was ultimately a mistake.
Second, my current opinion is that each license/language text should be tracked, treated, and marked up individually by SPDX, i.e. one license for GPL-en, another for GPL-de, another for GPL-fr, etc. (presumably 24 for the EUPL?). To my mind, these are collections of related license texts, not multiple ways to get to the "same" license, since even if the "same" license is in fact what the author intended (e.g. in the case of an "official" translation), it would still be up to a court to decide whether the legal terms as represented in one language are identical to a similar, purportedly "identical" representation in another language (even in the same jurisdiction). So I would say, let's track them all, and get at the problem of relating one to another with more metadata.
Third, assuming all license translations are individually tracked, I think the best way to go about relating them to each other is to use something as close to native XML as possible. We already have the unique identifiers, XML tags, and attributes for each license, so why not add an XML tag that can reference another license by unique identifier? For EU Public License in German, that might look something like this:
<relatedLicense relationshipType="official-translation" targetLicenseIdentifier="EUPL-1.1">EUPL-1.1</relatedLicense>
Other relationshipTypes might be "unofficial-translation," "official-translation-ported," "official-translation-unported," and "derived-from" (there may be others, or maybe we don't need all of those). That just represents the facts as we've perceived them, without getting into too much judgment as to how close the relationship might be. This would allow us to say, essentially, "this is what we think the relationships are; have your open-source counsel review what that means for you."
Another way of throwing data at the problem might be to individually track all of the licenses, without built-in cross-references to other license IDs, but at the same time also publish a separate document specifying which of those licenses are related to each other and what kind of bundles those are. This is the same data as proposed in the previous paragraph, but laid out explicitly (again with reference to the unique license ID) rather than emergent from the XML. I think I prefer the emergent solution, but that's just me, and what I think today. I'm no XML expert, just a young attorney with a bit of programming experience doing my best to help.
What do others think of this? Should we have Kate add handling multi-language licenses to the tech team's spec discussion?
PS - Preemptive apologies to Jilayne -- I'm guessing your preferred solution would not be "just make the license list longer!" -- but I do actually think that's the best way to handle these clusters of related licenses (plus a little more metadata about relationships). :-)