Re: Import and export function of SPDX


Kevin P. Fleming <kpfleming@...>
 

On 06/13/2012 06:50 AM, RUFFIN, MICHEL (MICHEL) wrote:
We are currently in the process of revisiting all our entries in the
database to increase consistency and remove confidential information in
order to be able to provide the content of our database to some external
partner companies such as Blackduck (and there are others). The goal is
to increase the quality and completeness of their database (we have 200
FOSS experts feeding our DB for 5000 entries, Blackduck has a DB of
around 500 000 entries, Antelink has a DB of 1 000 000 entries, I do not
know for Palamida, Protecode and NextB but it should be similar in range…).
I'm curious how you see this working. As I posted on the other SPDX list yesterday, I find the package-level metadata to be mandatory in order to have a high degree of trust in the rest of the information present in the SPDX file.

As an example of why, let's assume that a company has received a binary distribution of some software from a vendor, and the software is nominally licensed under the GPLv2. The vendor supplies the company with a source archive that purportedly is the original source used to produce that binary code, and also provides an SPDX file that claims to be a valid and accurate representation of the license information present in the files in the source archive. The company wants to use this information to ensure that they are in compliance with the stated license obligations when they further distribute this binary.

However, the SPDX file does not contain the source archive name/version/etc. nor its checksum. All it contains is file names and their checksums. How can the receiving company be sure that the SPDX file correctly represents the source archive they have received? If the source archive contains additional source files not represented in the SPDX file, there is no way for the receiving company to know this other than to audit the source archive contents again, thus partially defeating the purpose of having received an SPDX file in the first place. In my own case, if I did this audit and found that the source archive contained source files that were *not* represented in the SPDX file, then I'd probably just throw away the SPDX file and act as if I had not received it in the first place.

I understand that there are probably many situations where the source archive is not available, or will not be distributed, and in such situations making these fields mandatory in SPDX seems burdensome. However, in any case where the source archive is available and will be distributed, I think these fields are as important as any other top-level metadata in the SPDX file.

Could we construct the SPDX 2.0 XML in such a way that there was an attribute indicating that the source archive is 'unavailable or unknown', and if this attribute it set, then (and only then) these other fields become optional? Doing this would mean that if the producer/distributor of an SPDX file makes the claim that the source archive is unavailable/unknown (and thus does not provide these additional pieces of information), but in fact the source archive is available, the receiver of the SPDX file could then choose whether to audit the source archive or trust the SPDX file.

--
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming@... | SIP: kpfleming@... | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at www.digium.com & www.asterisk.org

Join {spdx@lists.spdx.org to automatically receive all group messages.