Date
1 - 3 of 3
[spdx-tech] Proposed topic for this week's tech call: Extend license expressions to include OR-MAYBE
W. Trevor King
On Mon, Nov 27, 2017 at 08:49:08PM +0000, Wheeler, David A wrote:
gary@...:Philippe's recent points about weighted confidence (e.g. [1]) suggests- Do we agree the "OR-MAYBE" should be added?I agree… that, even if we decide to support incomplete conclusions, an unweighted list of alternatives may not be sufficient. In that case, we may want something like: binary-confidence-expression-operator = "AND" confidence-expression = license-expression space "CONFIDENCE" space "0." 1*DIGIT confidence-list = confidence-expression *(space confidence-expression) [space license-expression] / confidence-list space binary-confidence-expression-operator space confidence-list / license-expression where license-expression, space, and DIGIT are discussed in [2]. The sum of confidence weights would have to sum to something ≤ 1. ‘AND’ would have the same conjunctive semantics as the current license-expression operator, but we don't want to support disjunctive OR for confidence lists. The ‘[space license-expression]’ (optional trailing license-expression) has an implicit ‘CONFIDENCE {1 - sum_of_previous_confidences}’, for folks who don't trust their math or want to save a few characters. The ‘/ license-expression’ case has an implicit ‘CONFIDENCE 1’ for backwards compatibility with existing license-expression consumers who choose to upgrade to confidence-list. Then folks consuming confidence-list could use: GPL-2.0-only CONFIDENCE 0.95 GPL-2.0-or-later For “I am 95% sure this is GPL-2.0-only but it could be GPL-2.0-or-later” with the implicit 5% confidence for GPL-2.0-or-later. Keeping a separate ABNF rule for license-expression allows consumers- Should we disallow "OR-MAYBE" in declared license fields (itNo. Projects sometimes get inherited from others where the license to choose between license-expression and confidence-list as they see fit. But yeah, the “inherited project” case is a good reason for allowing confidence-list (or whatever we use for partial conclusions) in declared-license fields. The CONFIDENCE approach allows you to handle that case with:- What is the exact definition of the "OR-MAYBE" we would includeFor "OR MAYBE", in the definition of compound-expression, change: GPL-2.0-only CONFIDENCE 0.90 for “I'm 90% sure this is GPL-2.0-only, and am not expressing an opinion on the 10% alternatives”. Using an OR-MAYBE like: binary-alternatives-operator = "AND" alternatives = license-expression *(OR-MAYBE license-expression) / alternatives space binary-alternatives-operator space alternatives would not support weighting. But with [3], you could represent that case with: GPL-2.0-only OR-MAYBE NOASSERTION So I don't see an upside to a separate MAYBE. It might work with clear precedence rules, but without them: APACHE-2.0 OR GPL-2.0-only OR MAYBE GPL-2.0-or-later could mean ‘APACHE-2.0 OR GPL-2.0-only OR (MAYBE GPL-2.0-or-later)’: A disjunctive choice between ‘APACHE-2.0’, ‘GPL-2.0-only’, and something that I haven't been able to figure out yet but which might be ‘GPL-2.0-or-later’”. or it could mean ‘(APACHE-2.0 OR GPL-2.0-only) OR MAYBE GPL-2.0-or-later’: This might be ‘APACHE-2.0 OR GPL-2.0-only’, but I'm not sure. It might also be ‘GPL-2.0-or-later’. I haven't been able to figure out which yet. depending on whether MAYBE had a higher precedence than OR or not. With the former interpretation, you're safe if you want to use the code under APACHE-2.0 or if you want to use it under GPL-2.0-only. With the latter interpretation, you're only safe if you want to use the code under GPL-2.0-only (since that's also a subset of GPL-2.0-or-later). Even with OR-MAYBE, precedence for AND is going to be complicated (and will decide whether a given AND is acting as a license expession operator or an alternative operator). But using a hyphenated OR-MAYBE at least avoids that confusion for OR. Comparing OR-MAYBE with CONFIDENCE, the only actionable use I can think of for weighting is a vendor with a report of confidence lists for various components of their software. They might decide to prioritize digging into the component with the least-confident assertion. But they might also want to prioritize based on lines of code under the unclear license, or on the importance of the particular lines. For example, say you have a product with: 10k lines of core code under ‘GPL-3.0-only’ 1k lines of core code under ‘GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only’ 100 lines of build script under ‘MIT CONFIDENCE 0.5 NONE’ 10 lines of build script under ‘MIT CONFIDENCE 0.1 NONE’ where NONE is [4]. What would the project be? GPL-3.0-only AND (GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only) AND (MIT CONFIDENCE 0.5 NONE) AND (MIT CONFIDENCE 0.1 NONE) would it be: GPL-3.0-only AND (GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only) AND (MIT CONFIDENCE 0.4636 NONE) using line-count weights (or similar) to combine the two ‘MIT OR-MAYBE NONE’ cases? Either way, that's probably going to focus people on build script (“reasonable chance that this is not open code at all!”), but they may instead want to focus on the core code (“we think copy/pasting 110 lines could be fair use, but we don't want to waste time on those 1k lines of possibly GPL-2.0-only code if we can't link them with the 10k GPL-3.0-only code”). And we don't weight AND, so it's not clear to me how actionable CONFIDENCE values would be for product-level composites. Still, scancode-toolkit [1,5] and licensee [6] both decided to set it, so I don't want to drop it without understanding how it's used. My impression based on [7,8] is that both of these are tunables for the tool-user, and that the tool-authors don't expect them to be passed up the chain to folks reading compound confidence lists, but it's worth getting more feedback from the tool authors on that. And I'm also fine with leaving a partial-conclusion syntax out of the spec, and punting it to higher levels and third parties. [1]: https://lists.spdx.org/pipermail/spdx-legal/2017-November/002351.html Subject: Re: update on only/or later etc. Date: 2017-11-22 Message-ID: <CAOFm3uFFfitvk-wK_TO3ZqWpGR6VD+R-26HrucnQ8MNbzx2Bag@...> [2]: https://github.com/wking/spdx-spec/blob/922031a89e7f7dca19f20d17005d0f3feeb95af5/chapters/appendix-IV-SPDX-license-expressions.md#IV.2 https://github.com/spdx/spdx-spec/pull/37 [3]: https://github.com/spdx/spdx-spec/issues/50 Subject: Add “NOASSERTION” to the license expression syntax [4]: https://github.com/spdx/spdx-spec/issues/49 Subject: Add “NONE” to the license expression syntax [5]: https://github.com/nexB/scancode-toolkit/blame/v2.2.1/src/licensedcode/README.rst#L140-L141 [6]: https://github.com/benbalter/licensee/blob/v9.6.0/docs/usage.md#command-line-usage [7]: https://github.com/nexB/scancode-toolkit/issues/342 Subject: Bare CPOL license detection rule detection issue [8]: https://github.com/benbalter/licensee/pull/212 Subject: Fix for FCPL false positive -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
|
|
Gary O'Neall
toggle quoted messageShow quoted text
-----Original Message-----[G.O.] My preference is for the "OR-MAYBE" approach just due to the simplicity. In the audit use case, it is difficult to assign a confidence that has any precision. The weighting would work for a tool where there is some algorithm that results in a weighting or confidence measure.
|
|
W. Trevor King
On Mon, Nov 27, 2017 at 10:17:22PM -0800, Gary O'Neall wrote:
I agree that getting consistent confidence numbers is going to bebinary-confidence-expression-operator = "AND"[G.O.] My preference is for the "OR-MAYBE" approach just due to the hard, and that without that (and maybe even with that), confidence weights may not be very useful. But with two license tools returning confidence-weighted alternatives, I want to make sure we understand their intended use cases before we commit to backwards-compat for a binary OR-MAYBE. Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
|
|