Canonicalization: Commutative and Symmetric Relationships


David Kemp
 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED
...

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.
Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:
  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.
#3 is the desired outcome, but there are several model options to achieve it:
3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.

I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


William Bartholomew (CELA)
 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:
  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).
I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

William


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships
 
The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED
...

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.
Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:
  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.
#3 is the desired outcome, but there are several model options to achieve it:
3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.

I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


Dick Brooks
 

Thanks for the insights.

 

Is the default still “CONTAINS” when no relationship node is present?

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of William Bartholomew (CELA) via lists.spdx.org
Sent: Saturday, August 20, 2022 1:52 PM
To: SPDX-list <Spdx-tech@...>; dk190a@...
Subject: Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


William Bartholomew (CELA)
 

The only way to relate artifacts is via relationship nodes, the "containment" that is possible with the types that inherit from Collection is for grouping the metadata, it doesn't imply an artifact relationship. This was one of the things we did to remove confusion between packages and collections. Semantically it is similar to contains but for the SPDX elements themselves, not what the SPDX element describes.

Sent from Outlook


From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 11:12 AM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@... <dk190a@...>
Subject: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships
 

Thanks for the insights.

 

Is the default still “CONTAINS” when no relationship node is present?

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of William Bartholomew (CELA) via lists.spdx.org
Sent: Saturday, August 20, 2022 1:52 PM
To: SPDX-list <Spdx-tech@...>; dk190a@...
Subject: Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


Dick Brooks
 

Thanks, William.

 

This is a big change from SPDX 2.x and will require a much more complicated finite state machine implementation for parsing, based on my limited knowledge of SPDX V 2.x.

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: William Bartholomew (CELA) <willbar@...>
Sent: Saturday, August 20, 2022 2:46 PM
To: 'SPDX-list' <Spdx-tech@...>; dk190a@...; dick@...
Subject: Re: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The only way to relate artifacts is via relationship nodes, the "containment" that is possible with the types that inherit from Collection is for grouping the metadata, it doesn't imply an artifact relationship. This was one of the things we did to remove confusion between packages and collections. Semantically it is similar to contains but for the SPDX elements themselves, not what the SPDX element describes.

 

Sent from Outlook


From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 11:12 AM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@... <dk190a@...>
Subject: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

Thanks for the insights.

 

Is the default still “CONTAINS” when no relationship node is present?

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of William Bartholomew (CELA) via lists.spdx.org
Sent: Saturday, August 20, 2022 1:52 PM
To: SPDX-list <Spdx-tech@...>; dk190a@...
Subject: Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


William Bartholomew (CELA)
 

I should clarify, this is for the logical model, specific serializations can provide alternate representations. For example, associating files with packages is a common use case, so the JSON serializer may allow you to specify files within a package and the JSON serializer would generate the appropriate relationship automatically.

 

Even without that, I’m not sure the finite state machine will be significantly more complex, once you have the package’s SPDXID you find the relationship(s) that have a “from” of that SPDXID and a type of “CONTAINS” (and discard/replace any that have an inbound “AMENDS” relationship).

 

We did have a model where the Package element had a files property, and the decision at the time was that could lead to too much ambiguity because there were now multiple ways to describe containment. Moving this to the serialization layer allows the logical model to have a single way to express containment but still allows simpler expression in serialization formats when that flexibility isn’t required.

 

From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 12:12 PM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@...
Subject: RE: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

Thanks, William.

 

This is a big change from SPDX 2.x and will require a much more complicated finite state machine implementation for parsing, based on my limited knowledge of SPDX V 2.x.

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: William Bartholomew (CELA) <willbar@...>
Sent: Saturday, August 20, 2022 2:46 PM
To: 'SPDX-list' <Spdx-tech@...>; dk190a@...; dick@...
Subject: Re: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The only way to relate artifacts is via relationship nodes, the "containment" that is possible with the types that inherit from Collection is for grouping the metadata, it doesn't imply an artifact relationship. This was one of the things we did to remove confusion between packages and collections. Semantically it is similar to contains but for the SPDX elements themselves, not what the SPDX element describes.

 

Sent from Outlook


From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 11:12 AM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@... <dk190a@...>
Subject: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

Thanks for the insights.

 

Is the default still “CONTAINS” when no relationship node is present?

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of William Bartholomew (CELA) via lists.spdx.org
Sent: Saturday, August 20, 2022 1:52 PM
To: SPDX-list <Spdx-tech@...>; dk190a@...
Subject: Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


David Kemp
 


William,

Thanks.  I was confusing graph equivalence (if Alice and Bob each create an SBOM for Package X, are the SBOMs equivalent?) with data equivalence (canonicalization).  As you say, each element has a unique SPDXID.  But even if Alice and Bob screw up and assign colliding IDs that aren't discovered until the two SBOMs are entered into the same element store, the elements still have different creation information.

So this isn't a canonicalization problem at all.   Or as Emily Litella would say, "never mind".

Regards,
David


On Sat, Aug 20, 2022 at 1:52 PM William Bartholomew (CELA) <willbar@...> wrote:
I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:
  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).
I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

William


David Kemp
 

I believe that simplification (Package element with a files property) is possible only if you define an "SPDX-Lite" logical model that disallows AMENDS.  The reason for switching the Package type from Collection to Artifact was to allow changing the files that a single Package contains rather than just defining a new amended Package.  If SPDX conformance requires supporting that ability, then serializations have to support it.

If AMENDS isn't required, it would be ultra confusing to have different Lite and Full serializations (with and without Package files) for each syntax for a single logical model.

David


On Sun, Aug 21, 2022 at 12:42 AM William Bartholomew (CELA) <willbar@...> wrote:

I should clarify, this is for the logical model, specific serializations can provide alternate representations. For example, associating files with packages is a common use case, so the JSON serializer may allow you to specify files within a package and the JSON serializer would generate the appropriate relationship automatically.

 

Even without that, I’m not sure the finite state machine will be significantly more complex, once you have the package’s SPDXID you find the relationship(s) that have a “from” of that SPDXID and a type of “CONTAINS” (and discard/replace any that have an inbound “AMENDS” relationship).

 

We did have a model where the Package element had a files property, and the decision at the time was that could lead to too much ambiguity because there were now multiple ways to describe containment. Moving this to the serialization layer allows the logical model to have a single way to express containment but still allows simpler expression in serialization formats when that flexibility isn’t required.

 

From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 12:12 PM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@...
Subject: RE: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

Thanks, William.

 

This is a big change from SPDX 2.x and will require a much more complicated finite state machine implementation for parsing, based on my limited knowledge of SPDX V 2.x.

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: William Bartholomew (CELA) <willbar@...>
Sent: Saturday, August 20, 2022 2:46 PM
To: 'SPDX-list' <Spdx-tech@...>; dk190a@...; dick@...
Subject: Re: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The only way to relate artifacts is via relationship nodes, the "containment" that is possible with the types that inherit from Collection is for grouping the metadata, it doesn't imply an artifact relationship. This was one of the things we did to remove confusion between packages and collections. Semantically it is similar to contains but for the SPDX elements themselves, not what the SPDX element describes.

 

Sent from Outlook


From: Dick Brooks <dick@...>
Sent: Saturday, August 20, 2022 11:12 AM
To: William Bartholomew (CELA) <willbar@...>; 'SPDX-list' <Spdx-tech@...>; dk190a@... <dk190a@...>
Subject: [EXTERNAL] RE: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

Thanks for the insights.

 

Is the default still “CONTAINS” when no relationship node is present?

 

Thanks,

 

Dick Brooks

 

Active Member of the CISA Critical Manufacturing Sector,

Sector Coordinating Council – A Public-Private Partnership

 

Never trust software, always verify and report!

http://www.reliableenergyanalytics.com

Email: dick@...

Tel: +1 978-696-1788

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of William Bartholomew (CELA) via lists.spdx.org
Sent: Saturday, August 20, 2022 1:52 PM
To: SPDX-list <Spdx-tech@...>; dk190a@...
Subject: Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David


Sean Barnum
 

+1

I think what William outlines below covers us.

 

And I would strongly object to 3C. I believe that is a significant level of unnecessary complexity.

 

sean

 

Sean Barnum

C – 703-473-8262

sbarnum@...

We are here to change the world!

signature_1388200754signature_1442303485signature_245889441signature_984325223signature_929545762

signature_1845422085

 

 

From: Spdx-tech@... <Spdx-tech@...> on behalf of William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...>
Date: Saturday, August 20, 2022 at 1:52 PM
To: SPDX-list <Spdx-tech@...>, dk190a@... <dk190a@...>
Subject: [EXT] Re: [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

I want to separate my thoughts on this from decisions that were already made for 3.0, not saying we can't reopen them but the bar for that should be high. The decisions that have already been made are:

  1. We will identify each relationship type as either directional or non-directional (DESCRIBES is directional, RELATED_TO is not).
  2. For directional relationships from: X to: Y and from: Y to: X are not equal, for non-directional relationships from: X to: Y and from: Y to: X are equal.
  3. We will remove the inverse relationship types for directional relationships since the same can be achieved by inverting the to: and from:.
  4. To decide which of the inverse relationships to keep we'll use two criteria:
    1. The one where the to: direction makes more sense to be plural (since as you point out the from: is singular).
    2. The most common direction (if these are in conflict, we'll need to evaluate).

I think this helps with canonicalization because it clarifies the rules around directionality and equality. The other thing to remember is that relationships are elements, as a result they have unique identities, so from a canonicalization perspective two relationships, even if they express the same relations, will canonicalize differently because of the SPDXID of the relationship element itself.

 

William

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Saturday, August 20, 2022 9:57 AM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] Canonicalization: Commutative and Symmetric Relationships

 

The canonicalization team discussed several approaches to handling relationships that we thought should be brought to a larger group for thought.

Background: SPDX 2.3 defines a large number of relationship types that will be brought forward to version 3, including:

DESCRIBES
DESCRIBED_BY
CONTAINS
CONTAINED_BY
DEPENDS_ON
DEPENDENCY_OF
DEPENDENCY_MANIFEST_OF
BUILD_DEPENDENCY_OF
 ...
EXAMPLE_OF
GENERATES
GENERATED_FROM
 ...
PATCH_FOR
PATCH_APPLIED
COPY_OF
FILE_ADDED
FILE_DELETED
FILE_MODIFIED

...

 

  • The version 3 Relationship element is asymmetric - from 1 Element to 1..* Elements.  
  • The description relationship is symmetric - A DESCRIBES B is semantically identical to B DESCRIBED_BY A.
  • The copy relationship is commutative - A COPY_OF B is semantically identical to B COPY_OF A.

Problem: Canonicalization should have a single way of representing any combination of relationships between elements.  Although it would be ideal to discard one relationship type from each symmetric pair, the fact that the relationship element is asymmetric makes that impossible: A DESCRIBES [B,C,D] cannot be replaced by {B,C,D] DESCRIBED_BY A.

Options:

  1. Status quo: ignore the problem, keep all the existing types, don't worry about semantics
  2. Make all relationships 1 to 1: This allows one of each symmetric pair to be discarded, but causes the number of relationship elements to explode when the "n" in 1..n is large.
  3. Make the relationship element symmetric by allowing both 1..n and n..1: DESCRIBED_BY can be discarded while supporting both A DESCRIBES [B,C,D] and [B,C,D] DESCRIBES A use cases.

#3 is the desired outcome, but there are several model options to achieve it:

3a. Define the Relationship element to have properties from [1..*] and 1 [1..*] but include explanatory text prohibiting many-to-many relationships: either from or to must have one element

 

3b. Define a "direction" flag in the Relationship element with values "inbound" and "outbound", and redefine the element properties to be direction agnostic: x is 1..1, y is 1..*, outbound is from x to y and inbound is from y to x.  This works, but is a bit convoluted to explain.

 

3c. Define two Relationship elements: Relationship_out and Relationship_in.  Relationship out would have the current from 1 to 1..* properties, and Relationship_in would have properties from 1..* to 1.

 

We did not identify additional mechanisms to allow symmetric relationship type pairs to be pruned to a single type, but others may be possible.  We are unanimous that pruning pairs is the goal.


I am opposed to option 3a - the model itself should define that many-to-many relationships are not allowed.  Allowing them in the model but prohibiting them in separate text is lazy, error-prone, and puts enforcement of the text on the backs of implementers rather than tools.

I prefer option 3c - it does mean the model needs one more box, but it avoids some convoluted logic within 3b's single Relationship box.

Thoughts and suggestions?

Regards,
David