Re: SPDX Legal call this Thursday
Philippe Ombredanne
On Wed, Sep 16, 2015 at 2:33 AM, J Lovejoy <opensource@...> wrote:
3) License matching templates/markup: a) have you used the existing markup for matching purposes?Yes and No: ScanCode uses an SPDX-inspired/derived markup, but instead of reusing the markup directly from the main license texts, markup is transformed in a simpler {{mustache-like}} syntax added to copies of these texts used only for detection purpose. i) if no, why not?Because: - adding more markup to a reference license text makes this eventually no longer usable as a reference text and harder to read by humans - the many variations found in the wild make it hard to put all in a single template. - the markup syntax implies eventually an implementation using regular expressions. ScanCode does not use regex, but inverted indexes and string alignments. ii) if yes, has it been helpful/effective? Could it be improved, and if so,I think a simple markup is a very effective way to detect licenses with minor text variations and still call this an exact match. It is also a very effective way to indicate variations for humans. I find it hard personally to mix the human readability and technical detection concerns in the same file without compromises. As food for thought, here are some examples of markup as used in ScanCode: https://github.com/nexB/scancode-toolkit/blob/b37be4de78152fbd3ed54761627c960010ce26a3/src/licensedcode/data/rules/apache-1.1_38.RULE#L17 https://github.com/nexB/scancode-toolkit/blob/b37be4de78152fbd3ed54761627c960010ce26a3/src/licensedcode/data/rules/bzip2-libbzip-1.0.5_1.RULE#L1 The syntax is using double curly braces to enclose variable parts. There is no regex involved. Optionally a number can be used after the opening braces to indicate the number of variable words, defaulting to 5 words. For instance {{ Copyright (c) 2015 Myco }} would match up to 5 words and {{ 10 Copyright (c) 2015 Myco inc.}} would match up to 10 words. I hope this helps even though this is a slightly different take. -- Cordially Philippe Ombredanne |
|