Re: [spdx-tech] An example of a super simple SPDX licenses registry, for discussion
Philippe Ombredanne
Richard, Jeff:
On Mon, Mar 11, 2019 at 10:32 PM Richard Fontana <rfontana@...> wrote: Use of "LicenseRef" (not to mention something likeAgreed. What I am trying to achieve here is to make these become "standard" and known at SPDX. I think this is possible. On Sun, Mar 10, 2019 at 12:44 PM Jeff McAffer <Jeff.McAffer@...> wrote: This ideal works in theory but for several reasons I outline below would beIMO the "ideal" here is that there is some automated way of too brittle in practice as you would have different fingerprints too often for this to be working. Instead running a full license detection is a better way to dedupe things. And this requires some form of centralization but could be fully automated alright. The other thing is that IMO giving a name/id does matter a lot: the license named 43bdf298 is not really human friendly. Now even if license-text-fingerprint-as-id were to work out, the difficult part is not so much the algorithm for computing these, but the content you feed for fingerprinting. And that part is not easily to automate: - For instance, is a copyright part of the license or not (I think not, but YMMV)? - Or what about statements around a license? For instance these two SPDX licenses may not really deserve a different id yet they have one: https://spdx.org/licenses/bzip2-1.0.6.html and https://spdx.org/licenses/bzip2-1.0.5.html The LICENSE file in the original code archives does not have a patent disclaimer statement footer seen in bzip2-1.0.5's SPDX license text. That footer is present on the archive.org website only. I would not treat this as part of the license, but this was treated as part of it here. This is a judgment call. - Or for instance, there are 6+ version of the text of the GPL-2.0 which are really the same but would fingerprint differently. Therefore a fingerprint algorithm would be hard to generalize as there would be many exceptions or a simple one would be too brittle in too many cases. Deduping is best achieved by license detection with a full diff (which is what scancode does FWIW). Let me follow up with my suggestion. -- Cordially Philippe Ombredanne |
|