Jeremiah C. Foster
If I'm not mistaken, copyright has to be a string because it has to be legible by humans. This means you can likely grep through source code as scancode does with a fair degree of confidence and use 'strings' on binaries.
Using DEP-5 and Debian Copyright files where you can should also be sufficient for due diligence in most jurisdictions, but I can't point to any legal precedent as evidence.
SPDX helps by creating a framework for human and machine readable documentation of your work, but you'll still need to scan code for copyright.
Binaries likely require a bit of reverse engineering.
From: Dan Kegel <dank@...>
Sent: Monday, February 4, 2019 23:49
Subject: Re: [spdx] Standalone license tools for scanning debian/ubuntu apps?
I did look a bit at those, but they seemed more about unpacking
binaries than about wrangling copyrights.
This e-mail and any attachment(s) are intended only for the recipient(s) named above and others who have been specifically authorized to receive them. They may contain confidential information. If you are not the intended recipient, please do not read this email or its attachment(s). Furthermore, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and then delete this e-mail and any attachment(s) or copies thereof from your system. Thank you.