Re: Import and export function of SPDX


Well, today we solve more or less this issue by requesting the URL where the FOSs can be downloaded, so URL + name + version number determine the FOSS used. It is not perfect but I never manage a good solution to identify uniquely an open source.

Even the URL is not enough, when our foss evaluators received a URL to study a FOSS, they have first to check that it is the good one. For instance people are providing the URL on Sourceforge or on a mirror site, while this is not the home page for the software. So our internal recommendation is to use the home page of the copyright owner whenever possible.

Not that the URL and version number are the only mandatory fields in our database


Michel.Ruffin@..., PhD
Software Coordination Manager, Bell Labs, Corporate CTO Dpt
Distinguished Member of Technical Staff
Tel +33 (0) 6 75 25 21 94
Alcatel-Lucent International, Centre de Villarceaux
Route De Villejust, 91620 Nozay, France

-----Message d'origine-----
De : spdx-bounces@... [mailto:spdx-bounces@...] De la part de Kevin P. Fleming
Envoyé : mercredi 13 juin 2012 16:57
À : spdx@...
Objet : Re: Import and export function of SPDX

On 06/13/2012 06:50 AM, RUFFIN, MICHEL (MICHEL) wrote:
We are currently in the process of revisiting all our entries in the
database to increase consistency and remove confidential information in
order to be able to provide the content of our database to some external
partner companies such as Blackduck (and there are others). The goal is
to increase the quality and completeness of their database (we have 200
FOSS experts feeding our DB for 5000 entries, Blackduck has a DB of
around 500 000 entries, Antelink has a DB of 1 000 000 entries, I do not
know for Palamida, Protecode and NextB but it should be similar in range.).
I'm curious how you see this working. As I posted on the other SPDX list
yesterday, I find the package-level metadata to be mandatory in order to
have a high degree of trust in the rest of the information present in
the SPDX file.

As an example of why, let's assume that a company has received a binary
distribution of some software from a vendor, and the software is
nominally licensed under the GPLv2. The vendor supplies the company with
a source archive that purportedly is the original source used to produce
that binary code, and also provides an SPDX file that claims to be a
valid and accurate representation of the license information present in
the files in the source archive. The company wants to use this
information to ensure that they are in compliance with the stated
license obligations when they further distribute this binary.

However, the SPDX file does not contain the source archive
name/version/etc. nor its checksum. All it contains is file names and
their checksums. How can the receiving company be sure that the SPDX
file correctly represents the source archive they have received? If the
source archive contains additional source files not represented in the
SPDX file, there is no way for the receiving company to know this other
than to audit the source archive contents again, thus partially
defeating the purpose of having received an SPDX file in the first
place. In my own case, if I did this audit and found that the source
archive contained source files that were *not* represented in the SPDX
file, then I'd probably just throw away the SPDX file and act as if I
had not received it in the first place.

I understand that there are probably many situations where the source
archive is not available, or will not be distributed, and in such
situations making these fields mandatory in SPDX seems burdensome.
However, in any case where the source archive is available and will be
distributed, I think these fields are as important as any other
top-level metadata in the SPDX file.

Could we construct the SPDX 2.0 XML in such a way that there was an
attribute indicating that the source archive is 'unavailable or
unknown', and if this attribute it set, then (and only then) these other
fields become optional? Doing this would mean that if the
producer/distributor of an SPDX file makes the claim that the source
archive is unavailable/unknown (and thus does not provide these
additional pieces of information), but in fact the source archive is
available, the receiver of the SPDX file could then choose whether to
audit the source archive or trust the SPDX file.

Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming@... | SIP: kpfleming@... | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at &
Spdx mailing list

Join to automatically receive all group messages.