Payload externalReference considered harmful


David Kemp
 

Last week we discussed ExternalReference, drew some combinations of elements on the model, and made the "elements" field of ExternalReference plural so that all of the elements in a document are included in a single ExternalReference.

Later, Sebastian and I discussed canonicalization, particularly whether the element store is complete without document retrieval, and the role of detached signatures.  Those discussions led to the conclusion that whatever other types of data an Element's externalReference property may refer to, it should NOT refer to a payload.

Consider the element drawing:

There are 5 elements of any type that have some kind of connection (not "relationship"):
  • Element 1 could be an Annotation whose subject is 2 and was created by identities 3, 4, and 5.
  • Element 1 could be a Relationship created by 2, from 3, to 4 and 5
  • Element 1 could be an SBOM with Files 2, 3, and 4, created by 5
Those elements can be put in Payloads (serialized SPDX documents) in any combination, for example:
  • A single payload with elements 1,2,3,4,5
  • Two or more payloads where one payload may reference elements in other payloads
  • Five payloads, each containing a single element and zero or more references to other payloads
Remember that the reason for serializing more than one element into a payload is to allow information within a payload to be shared (reducing its size) and to allow a single payload integrity to provide integrity for each of its elements.  The picture shows "H" on each reference to a payload, indicating a hash or signature over the element(s) in the payload.

But the value (and hash/signature) of an element cannot depend on which of many payloads that element may be serialized in.  Therefore the externalReference property of an element cannot refer to a payload (the externalReferenceType (currently TBD) cannot be "PAYLOAD" or "SPDX_DOCUMENT" or whatever type would have been used for payloads.

The drawing shows three ways of serializing five elements.  The first has no payload references, the second and third have a single reference.

The SpdxDocument Element is the single element type that describes a payload.  An SpdxDocument MAY be created to describe any payload, but it only MUST be created in order to support references from one payload to another.  In the diagram, only two SpdxDocument elements MUST exist (for payloads [2,4,5] and [1,5]).

A decision to not create an SpdxDocument element for a payload does not "make a commitment to future use cases".  If the creator of payload [2,4,5] did not create SpdxDocument [2,4,5], then the creator of payload [1,3,(6)] can create SpdxDocument(6) that describes the referenced payload.  The consequence of not having the creator's SpdxDocument included in payload [2,4,5] is that there is no creator's signature to allow original source verification.  The proxy signature by the creator of payload [1,3,(6)] can still be verified by anyone who references or uses that payload.

In summary, the payload external reference information (element list, payload download/query information, and integrity information) belongs exclusively in the SpdxDocument element and must not be included in any other element type.

Regards,
David


Gary O'Neall
 

If we exclusively use canonicalization for verification of external elements, I would agree with the conclusion.

 

I would like to retain the ability to verify by checksuming the payload which is supported in the current SPDX spec. Would we be able to support this approach if we remove the reference to the payload in the externalReference?

 

This issue of ambiguous originating payloads for external element is a real concern which I haven’t considered until now.

 

I can think of a couple possible solutions for verification:
A. Allow the ambiguity and let the consumer of the payload containing the external reference to determine which of the many possible external payloads they would like to verify

B. Remove the ambiguity by having the serialized format of the payload containing the external reference specify which external payload is associated with each element (e.g. in the external map)

 

Regards,

Gary

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp
Sent: Saturday, August 27, 2022 6:56 AM
To: SPDX-list <Spdx-tech@...>
Subject: [spdx-tech] Payload externalReference considered harmful

 

Last week we discussed ExternalReference, drew some combinations of elements on the model, and made the "elements" field of ExternalReference plural so that all of the elements in a document are included in a single ExternalReference.

Later, Sebastian and I discussed canonicalization, particularly whether the element store is complete without document retrieval, and the role of detached signatures.  Those discussions led to the conclusion that whatever other types of data an Element's externalReference property may refer to, it should NOT refer to a payload.

Consider the element drawing:

 

There are 5 elements of any type that have some kind of connection (not "relationship"):

  • Element 1 could be an Annotation whose subject is 2 and was created by identities 3, 4, and 5.
  • Element 1 could be a Relationship created by 2, from 3, to 4 and 5
  • Element 1 could be an SBOM with Files 2, 3, and 4, created by 5

Those elements can be put in Payloads (serialized SPDX documents) in any combination, for example:

  • A single payload with elements 1,2,3,4,5
  • Two or more payloads where one payload may reference elements in other payloads
  • Five payloads, each containing a single element and zero or more references to other payloads

Remember that the reason for serializing more than one element into a payload is to allow information within a payload to be shared (reducing its size) and to allow a single payload integrity to provide integrity for each of its elements.  The picture shows "H" on each reference to a payload, indicating a hash or signature over the element(s) in the payload.

But the value (and hash/signature) of an element cannot depend on which of many payloads that element may be serialized in.  Therefore the externalReference property of an element cannot refer to a payload (the externalReferenceType (currently TBD) cannot be "PAYLOAD" or "SPDX_DOCUMENT" or whatever type would have been used for payloads.

The drawing shows three ways of serializing five elements.  The first has no payload references, the second and third have a single reference.

 

The SpdxDocument Element is the single element type that describes a payload.  An SpdxDocument MAY be created to describe any payload, but it only MUST be created in order to support references from one payload to another.  In the diagram, only two SpdxDocument elements MUST exist (for payloads [2,4,5] and [1,5]).

A decision to not create an SpdxDocument element for a payload does not "make a commitment to future use cases".  If the creator of payload [2,4,5] did not create SpdxDocument [2,4,5], then the creator of payload [1,3,(6)] can create SpdxDocument(6) that describes the referenced payload.  The consequence of not having the creator's SpdxDocument included in payload [2,4,5] is that there is no creator's signature to allow original source verification.  The proxy signature by the creator of payload [1,3,(6)] can still be verified by anyone who references or uses that payload.

In summary, the payload external reference information (element list, payload download/query information, and integrity information) belongs exclusively in the SpdxDocument element and must not be included in any other element type.

Regards,
David


David Kemp
 

There are two components of canonicalization:
1) being able to serialize a document as individual elements
2) defining a canonical serialized value for an element regardless of data format

To enable canonicalization a checksum must be over individual elements so that it doesn't depend on payload-specific information like prefixes and document creation information.  All SPDXIDs must be full and all element creation fields must have values.  And the procedure to hash a group of individual elements must be specified for a particular data format:
a) concatenate the serialized bytes of each individual element and hash the concatenated bytes
b) create a structure (array) of individual elements and hash the serialized structure
c) hash the serialized bytes of each individual element, then hash the concatenated hashes or array of hashes (Merkel)

If canonicalization is a goal, producers and consumers can't compute a hash directly over the compressed payload. The hashing process must disentangle the individual elements before hashing. But you can do #1 without going to #2. The individual elements are in the payload's data format and no other data formats need to be supported.

I can think of a couple possible solutions for verification:
A. Allow the ambiguity and let the consumer of the payload containing the external reference to determine which of the many possible external payloads they would like to verify
B. Remove the ambiguity by having the serialized format of the payload containing the external reference specify which external payload is associated with each element (e.g. in the external map)

I was suggesting B.  And observing that in the payload the external map must be separate from the elements (as with prefixes), not included in the value of any element as the logical model currently shows.  And I wasn't clear about what is mandatory - Payload must have the external document reference to enable B,  It's not strictly necessary to include references in SpdxDocument Elements, but if there is a use case for making payloads visible in the element store by defining an SpdxDocument type, then references between payloads should be included, which in turn enables SpdxDocument to replace a special External Reference data type in the payload.

You can do B without computing hashes over independent elements, but that precludes the path to canonicalization.

I'm not sure about A, but I think it means "try all the payloads that match the locator information and see if any of them match the expected hash".  Only one of them can match (the one the producer used when computing the hash), so that seems like a lot of work if the locators include more than one payload containing an element, not just copies of the same payload.

Regards,
David


On Sat, Aug 27, 2022 at 1:53 PM Gary O'Neall <gary@...> wrote:

If we exclusively use canonicalization for verification of external elements, I would agree with the conclusion.

 

I would like to retain the ability to verify by checksuming the payload which is supported in the current SPDX spec. Would we be able to support this approach if we remove the reference to the payload in the externalReference?

 

This issue of ambiguous originating payloads for external element is a real concern which I haven’t considered until now.

 

I can think of a couple possible solutions for verification:
A. Allow the ambiguity and let the consumer of the payload containing the external reference to determine which of the many possible external payloads they would like to verify

B. Remove the ambiguity by having the serialized format of the payload containing the external reference specify which external payload is associated with each element (e.g. in the external map)

 

Regards,

Gary

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp
Sent: Saturday, August 27, 2022 6:56 AM
To: SPDX-list <Spdx-tech@...>
Subject: [spdx-tech] Payload externalReference considered harmful

 

Last week we discussed ExternalReference, drew some combinations of elements on the model, and made the "elements" field of ExternalReference plural so that all of the elements in a document are included in a single ExternalReference.

Later, Sebastian and I discussed canonicalization, particularly whether the element store is complete without document retrieval, and the role of detached signatures.  Those discussions led to the conclusion that whatever other types of data an Element's externalReference property may refer to, it should NOT refer to a payload.

Consider the element drawing:

 

There are 5 elements of any type that have some kind of connection (not "relationship"):

  • Element 1 could be an Annotation whose subject is 2 and was created by identities 3, 4, and 5.
  • Element 1 could be a Relationship created by 2, from 3, to 4 and 5
  • Element 1 could be an SBOM with Files 2, 3, and 4, created by 5

Those elements can be put in Payloads (serialized SPDX documents) in any combination, for example:

  • A single payload with elements 1,2,3,4,5
  • Two or more payloads where one payload may reference elements in other payloads
  • Five payloads, each containing a single element and zero or more references to other payloads

Remember that the reason for serializing more than one element into a payload is to allow information within a payload to be shared (reducing its size) and to allow a single payload integrity to provide integrity for each of its elements.  The picture shows "H" on each reference to a payload, indicating a hash or signature over the element(s) in the payload.

But the value (and hash/signature) of an element cannot depend on which of many payloads that element may be serialized in.  Therefore the externalReference property of an element cannot refer to a payload (the externalReferenceType (currently TBD) cannot be "PAYLOAD" or "SPDX_DOCUMENT" or whatever type would have been used for payloads.

The drawing shows three ways of serializing five elements.  The first has no payload references, the second and third have a single reference.

 

The SpdxDocument Element is the single element type that describes a payload.  An SpdxDocument MAY be created to describe any payload, but it only MUST be created in order to support references from one payload to another.  In the diagram, only two SpdxDocument elements MUST exist (for payloads [2,4,5] and [1,5]).

A decision to not create an SpdxDocument element for a payload does not "make a commitment to future use cases".  If the creator of payload [2,4,5] did not create SpdxDocument [2,4,5], then the creator of payload [1,3,(6)] can create SpdxDocument(6) that describes the referenced payload.  The consequence of not having the creator's SpdxDocument included in payload [2,4,5] is that there is no creator's signature to allow original source verification.  The proxy signature by the creator of payload [1,3,(6)] can still be verified by anyone who references or uses that payload.

In summary, the payload external reference information (element list, payload download/query information, and integrity information) belongs exclusively in the SpdxDocument element and must not be included in any other element type.

Regards,
David