Re: Standalone license tools for scanning debian/ubuntu apps?

Dan Kegel

On Tue, Feb 5, 2019 at 1:30 PM Jeremiah C. Foster <@jeremiah> wrote:
If I'm not mistaken, copyright has to be a string because it has to be legible by humans. This means you can likely grep through source code as scancode does with a fair degree of confidence and use 'strings' on binaries.

Using DEP-5 and Debian Copyright files where you can should also be sufficient for due diligence in most jurisdictions, but I can't point to any legal precedent as evidence.

SPDX helps by creating a framework for human and machine readable documentation of your work, but you'll still need to scan code for copyright.

Binaries likely require a bit of reverse engineering.
Yes, absolutely.

SPDX's set of standard licenses and ids (and scancode's somewhat
expanded similar set) are great for stating license info succinctly.

scancode is great at collecting the info that should go into the
debian copyright file.

My goal for this iteration at our licensing process was to automate
collection of license info for the shared libraries our binary uses.

Here's the pipeline I set up to do that:

1) reads
a DEP-5 (aka Debian copyright) file and filters out any clauses that
(most likely) do not propagate to shared library artifacts
2) reads a
Debian copyright file, filters it through ob-filter-licenses, and
outputs spdx ids. (For non-DEP-5 copyright files, it uses scancode to
guess licenses.)
3) uses ldd
to look up shared libraries used by a binary, uses dpkg-query to look
up the containing packages, and runs ob-parse-licenses on them.

For instance, running "ob-list-licences /bin/login" outputs:





This of course only solves a small part of the license / copyright
problem, and only approximately, but it found interesting things for
- Dan

Join to automatically receive all group messages.