Question about license matching with SPDX templates


pmonks+spdx@...
 

G'day SPDX gurus,

I'm working on a little license matching project [1] leveraging the (great!) SPDX text templates [2], but have noticed that some canonical license texts (e.g. CC-BY-4.0 [3]) don't match the template since they contain a number of horizontal rules (a sequence of '=' characters, in this case).  I don't see these in the CC-BY-4.0 text template [4], which means my code doesn't find a match.

Is there a matching guideline [5] or other text processing I'm missing for this case?

Thanks in advance,
Peter

[1] https://github.com/pmonks/lice-comb
[2] https://github.com/spdx/license-list-data/tree/master/template
[3] https://creativecommons.org/licenses/by/4.0/legalcode.txt
[4] https://github.com/spdx/license-list-data/blob/master/template/CC-BY-4.0.template.txt
[5] https://spdx.dev/license-list/matching-guidelines/


Steve Winslow
 

Hi Peter,

Thanks for raising this (and glad the license templates have been useful for you!)

Your email is timely, as we actually discussed a related point (though not specific to CC-BY-4.0) during the Legal Team call on this past Thursday.

Currently, I don't think there is a matching guideline that covers the scenario you're describing. But the general sense from the call was that there should be one. Something like, "if a line in a license consists of solely 1 or more dash, hyphen, underscore or equal-sign characters, ignore it for matching purposes." That isn't precise wording, but that's the concept that seemed to make sense. Would definitely need to get input from the SPDX tech team and tool developers before this became official, of course.

You can see some more discussion at [0] and [1] for where this arose and is being addressed. Feel free to weigh in there as well!

Finally, for CC-BY-4.0 specifically (and some of the other CC licenses for which CC has published plain-text versions), it's a good point that they don't have the equal-sign lines. They likely should be included as optional text in the XML template [2] and appear in the "test text" file [3] for that and other comparable CC licenses.

Best,
Steve


On Sat, Oct 29, 2022 at 7:11 PM <pmonks+spdx@...> wrote:
G'day SPDX gurus,

I'm working on a little license matching project [1] leveraging the (great!) SPDX text templates [2], but have noticed that some canonical license texts (e.g. CC-BY-4.0 [3]) don't match the template since they contain a number of horizontal rules (a sequence of '=' characters, in this case).  I don't see these in the CC-BY-4.0 text template [4], which means my code doesn't find a match.

Is there a matching guideline [5] or other text processing I'm missing for this case?

Thanks in advance,
Peter

[1] https://github.com/pmonks/lice-comb
[2] https://github.com/spdx/license-list-data/tree/master/template
[3] https://creativecommons.org/licenses/by/4.0/legalcode.txt
[4] https://github.com/spdx/license-list-data/blob/master/template/CC-BY-4.0.template.txt
[5] https://spdx.dev/license-list/matching-guidelines/


pmonks+spdx@...
 

Thank you very much Steve. I’ll take a read through those resources and see how my code needs updating.

Thanks again!
Peter