V3 serialization


David Kemp
 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


William Bartholomew (CELA)
 

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


David Kemp
 

One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, and it's really ugly to stuff one element's value inside another, not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents.  Even two levels of element nesting makes things quite difficult to disentangle.

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.


* defaults:
I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).  Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

* type property
Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.  Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

Regards,
Dave


On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


William Bartholomew (CELA)
 

CIL


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization
 
One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

[William] Agreed, does my example violate that? It would be difficult for a serialization to "generate" elements because of the id and other required properties so I had not considered this a possibility.

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

[William] I think it depends on your definition of "depends on" (pun intended). Elements may have properties that are references to other elements and serializers may choose to use that information for more compact serialization but since this would get unwound on deserialization that's immaterial.

I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, 

[William] Agreed, though a lot of serializers/deserializers end up making minor changes as a result of normalization and other processes. Not ideal but that's an implementation detail within each serializer/deserializer.

and it's really ugly to stuff one element's value inside another

[William] I don't agree with this, at least for "collection" elements. Also, the serialization model for collection elements could support either element references or the element itself so if you think it's ugly then you would have the option of not doing nesting.

not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents

[William] This is avoidable and using references instead of nesting doesn't prevent this problem. In fact, if you only use nesting then it's impossible to have infinite recursion, it's only when you use references that becomes possible.

 Even two levels of element nesting makes things quite difficult to disentangle.

[William] I don't agree, for collections the nesting makes it obvious which collection an element is part of without having to follow the id references. Since the serialization model could support either approach I don't see this being a blocker.

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.

[William] We've had this discussion a number of times, the Collection element (and its subclasses) aren't metadata about collections, document, SBOM, etc. they are the collection, document, SBOM, etc. There is no "physical" thing outside of the SPDX document that is the collection, document, SBOM, etc., they only exist in the SPDX graph. You could take that SBOM, serialize it to disk, and then have a File element that talks about the physical serialization of the SBOM, but that's different to the SBOM SPDX element.

* defaults:
I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

[William] I'm struggling with multiple properties that have the same definition having different names and different locations on the objects, it feels like a lot to explain. We could flag certain properties as inheritable in the schema, and this only applies to collection elements so I think the scope is quite narrow.

* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  

[William] While it is built-in to XML and JSON my experience is that it's not been supported well by schema languages and serializers/deserializers. I know I've had situations where I had to duplicate the id property in the class to ensure that other things work correctly (and to maintain the independence of the class). Also, in most object oriented languages there is not a way to get the key from the object so you end up having to track the key independently of the object which is a pain.

JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.

[William] In both JSON-LD and SQL the properties are still normal properties, in JSON-LD it's still a property on the object it just has a special name, in SQL it's still a column in the table it just has special metadata attached to it. Even autogenerated ids are typically normal columns they're just system generated and you can't change their definition.

  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).

[William] Not the same 🙂. The map for extensions is a map of "extension type" to value, not of "id" to value. It is a consequence of us deciding that each type can only be assigned once that it can also be used as an id, but it is primarily a type, not an id. If we changed that design decision it would no longer function as an id.

Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

[William] In SQL Server (others are similar) a foreign key takes the form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, ParentCol2, ...), they're still just columns, nothing magical about them, not even their names.

* type property
Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer would use @type, an XML one would use XML namespaces and element names, a ProtoBuf one would use message types. Since my examples were "plain" JSON which does not have a built-in way of declaring types I used a "plain" property to capture the type, I agree that the name of this property should avoid potential conflict (e.g. by prefixing with an _).

* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.

[William] Same disagreement as above.

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

[William] I'm not sure what definition of "class" you're using here, but the boxes on the diagram could be represent in an OO language as classes or interfaces, for our purposes I don't think the distinction between class and data type is meaningful.

Regards,
Dave


On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


David Kemp
 

Compare serialized examples for three use cases:

1) I want to transfer three Actors.
2) SBOM C uses SBOM B, which uses SBOM A.
3) I want the canonical hash of a Relationship.

Regards,
Dave


On Tue, Jul 19, 2022 at 12:47 AM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:
CIL


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization
 
One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

[William] Agreed, does my example violate that? It would be difficult for a serialization to "generate" elements because of the id and other required properties so I had not considered this a possibility.

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

[William] I think it depends on your definition of "depends on" (pun intended). Elements may have properties that are references to other elements and serializers may choose to use that information for more compact serialization but since this would get unwound on deserialization that's immaterial.

I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, 

[William] Agreed, though a lot of serializers/deserializers end up making minor changes as a result of normalization and other processes. Not ideal but that's an implementation detail within each serializer/deserializer.

and it's really ugly to stuff one element's value inside another

[William] I don't agree with this, at least for "collection" elements. Also, the serialization model for collection elements could support either element references or the element itself so if you think it's ugly then you would have the option of not doing nesting.

not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents

[William] This is avoidable and using references instead of nesting doesn't prevent this problem. In fact, if you only use nesting then it's impossible to have infinite recursion, it's only when you use references that becomes possible.

 Even two levels of element nesting makes things quite difficult to disentangle.

[William] I don't agree, for collections the nesting makes it obvious which collection an element is part of without having to follow the id references. Since the serialization model could support either approach I don't see this being a blocker.

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.

[William] We've had this discussion a number of times, the Collection element (and its subclasses) aren't metadata about collections, document, SBOM, etc. they are the collection, document, SBOM, etc. There is no "physical" thing outside of the SPDX document that is the collection, document, SBOM, etc., they only exist in the SPDX graph. You could take that SBOM, serialize it to disk, and then have a File element that talks about the physical serialization of the SBOM, but that's different to the SBOM SPDX element.

* defaults:
I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

[William] I'm struggling with multiple properties that have the same definition having different names and different locations on the objects, it feels like a lot to explain. We could flag certain properties as inheritable in the schema, and this only applies to collection elements so I think the scope is quite narrow.

* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  

[William] While it is built-in to XML and JSON my experience is that it's not been supported well by schema languages and serializers/deserializers. I know I've had situations where I had to duplicate the id property in the class to ensure that other things work correctly (and to maintain the independence of the class). Also, in most object oriented languages there is not a way to get the key from the object so you end up having to track the key independently of the object which is a pain.

JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.

[William] In both JSON-LD and SQL the properties are still normal properties, in JSON-LD it's still a property on the object it just has a special name, in SQL it's still a column in the table it just has special metadata attached to it. Even autogenerated ids are typically normal columns they're just system generated and you can't change their definition.

  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).

[William] Not the same 🙂. The map for extensions is a map of "extension type" to value, not of "id" to value. It is a consequence of us deciding that each type can only be assigned once that it can also be used as an id, but it is primarily a type, not an id. If we changed that design decision it would no longer function as an id.

Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

[William] In SQL Server (others are similar) a foreign key takes the form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, ParentCol2, ...), they're still just columns, nothing magical about them, not even their names.

* type property
Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer would use @type, an XML one would use XML namespaces and element names, a ProtoBuf one would use message types. Since my examples were "plain" JSON which does not have a built-in way of declaring types I used a "plain" property to capture the type, I agree that the name of this property should avoid potential conflict (e.g. by prefixing with an _).

* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.

[William] Same disagreement as above.

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

[William] I'm not sure what definition of "class" you're using here, but the boxes on the diagram could be represent in an OO language as classes or interfaces, for our purposes I don't think the distinction between class and data type is meaningful.

Regards,
Dave


On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


Sean Barnum
 

I very much agree with all 4 of William’s bullets and would go further to explicit state that I strongly support all of them as the preferred approach.

 

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.

 

[sdb] This is the approach that I have seen be successful in other information standardization efforts. It is simpler and cleaner.

 

 

  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).

 

[sdb] Agree.

 

  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.

 

[sdb] Agree

 

  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

[sdb] Agree

 

 

sean

 

 

Sean Barnum

C – 703-473-8262

sbarnum@...

We are here to change the world!

signature_1388200754signature_1442303485signature_245889441signature_984325223signature_929545762

signature_1845422085

 

 

From: Spdx-tech@... <Spdx-tech@...> on behalf of William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...>
Date: Monday, July 18, 2022 at 7:08 PM
To: dk190a@... <dk190a@...>, SPDX-list <Spdx-tech@...>
Subject: [EXT] Re: [spdx-tech] V3 serialization

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


Sean Barnum
 

CIL (recurse 😊 )

 

Sean Barnum

C – 703-473-8262

sbarnum@...

We are here to change the world!

signature_1388200754signature_1442303485signature_245889441signature_984325223signature_929545762

signature_1845422085

 

 

From: Spdx-tech@... <Spdx-tech@...> on behalf of William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...>
Date: Tuesday, July 19, 2022 at 12:47 AM
To: SPDX-list <Spdx-tech@...>, dk190a@... <dk190a@...>
Subject: [EXT] Re: [spdx-tech] V3 serialization

CIL

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization

 

One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

 

[William] Agreed, does my example violate that? It would be difficult for a serialization to "generate" elements because of the id and other required properties so I had not considered this a possibility.

 

[Sean] I think that the key is you could simply add one more Document/Bundle Element to the serialization package that ties the other six Elements as related (in this case for the intended serialized exchange) and the “defaultable” properties can simply be asserted on the Document/Bundle.

 

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

 

[William] I think it depends on your definition of "depends on" (pun intended). Elements may have properties that are references to other elements and serializers may choose to use that information for more compact serialization but since this would get unwound on deserialization that's immaterial.

 

[Sean] Agree with William here


I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, 

 

[William] Agreed, though a lot of serializers/deserializers end up making minor changes as a result of normalization and other processes. Not ideal but that's an implementation detail within each serializer/deserializer.

 

[Sean] Agree with William here

 

and it's really ugly to stuff one element's value inside another

 

[William] I don't agree with this, at least for "collection" elements. Also, the serialization model for collection elements could support either element references or the element itself so if you think it's ugly then you would have the option of not doing nesting.

 

[Sean] Agree with William here

 

not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents

 

[William] This is avoidable and using references instead of nesting doesn't prevent this problem. In fact, if you only use nesting then it's impossible to have infinite recursion, it's only when you use references that becomes possible.

 

[Sean] Agree with William here

 

 Even two levels of element nesting makes things quite difficult to disentangle.

 

[William] I don't agree, for collections the nesting makes it obvious which collection an element is part of without having to follow the id references. Since the serialization model could support either approach I don't see this being a blocker.

 

[Sean] Agree with William here

 

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.

 

[William] We've had this discussion a number of times, the Collection element (and its subclasses) aren't metadata about collections, document, SBOM, etc. they are the collection, document, SBOM, etc. There is no "physical" thing outside of the SPDX document that is the collection, document, SBOM, etc., they only exist in the SPDX graph. You could take that SBOM, serialize it to disk, and then have a File element that talks about the physical serialization of the SBOM, but that's different to the SBOM SPDX element.

 

[Sean] I 100% agree with William here. I was going to make a similar comment on David’s initial email in this thread but figured William said it well here and I would simply add my concurrence here.

It appears there may be continuing confusion about the meaning/intent of the Document element. It is NOT intended to have any sort of tie to the concept of or realization of a serialized file (transfer unit). As William states here it is a logical model concept referencing other logical model concepts. It can be serialized just like any Elements but its serialized form would be described using a File element, not a Document element.


* defaults:

I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

 

[William] I'm struggling with multiple properties that have the same definition having different names and different locations on the objects, it feels like a lot to explain. We could flag certain properties as inheritable in the schema, and this only applies to collection elements so I think the scope is quite narrow.

 

[Sean] Agree with William here


* array vs map

I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  

 

[William] While it is built-in to XML and JSON my experience is that it's not been supported well by schema languages and serializers/deserializers. I know I've had situations where I had to duplicate the id property in the class to ensure that other things work correctly (and to maintain the independence of the class). Also, in most object oriented languages there is not a way to get the key from the object so you end up having to track the key independently of the object which is a pain.

 

[Sean] Agree with William here

 

JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.

 

[William] In both JSON-LD and SQL the properties are still normal properties, in JSON-LD it's still a property on the object it just has a special name, in SQL it's still a column in the table it just has special metadata attached to it. Even autogenerated ids are typically normal columns they're just system generated and you can't change their definition.

 

[Sean] Agree with William here

 

  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).

 

[William] Not the same 🙂. The map for extensions is a map of "extension type" to value, not of "id" to value. It is a consequence of us deciding that each type can only be assigned once that it can also be used as an id, but it is primarily a type, not an id. If we changed that design decision it would no longer function as an id.

 

[Sean] Agree with William here

 

Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

 

[William] In SQL Server (others are similar) a foreign key takes the form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, ParentCol2, ...), they're still just columns, nothing magical about them, not even their names.


* type property

Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

 

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer would use @type, an XML one would use XML namespaces and element names, a ProtoBuf one would use message types. Since my examples were "plain" JSON which does not have a built-in way of declaring types I used a "plain" property to capture the type, I agree that the name of this property should avoid potential conflict (e.g. by prefixing with an _).


* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.

 

[William] Same disagreement as above.

 

[Sean] Agree with William here

 

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

 

[William] I'm not sure what definition of "class" you're using here, but the boxes on the diagram could be represent in an OO language as classes or interfaces, for our purposes I don't think the distinction between class and data type is meaningful.


Regards,

Dave

 

On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:

There are some “proposed” examples at the bottom of the model diagram (note that I intended these to be representative until we define the exact serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png

 

Some of the key differences (with no implied support for either choice, I have included my reasoning for reference only):

  • Defaults being represented as the original properties on a collection element vs being in their own “defaults” property.
    • I was thinking about this as a traditional inheritance/overrides structure. If a property doesn’t have a value you can walk the tree up looking for the same property.
  • Array of elements vs map of elements.
    • In the past I have found schema languages don’t have good support for one of the properties of an object being outside of the object (i.e. a key on the collection outside). Having a completely contained object makes canonicalization etc. easier at the risk of the array having multiple instances of the same element (which can be solved in other ways).
  • Type being a string property vs an object property containing the type.
    • I mainly followed the JSON-LD style and it has one less level of nesting.
  • Document root being an element vs a custom class.
    • Tried to minimize custom classes by having everything as either an element or a value type.

 

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: Spdx-tech@... <Spdx-tech@...> On Behalf Of David Kemp via lists.spdx.org
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] [spdx-tech] V3 serialization

 

Last week I took an action item to describe what serialized data for the v3 logical model could look like, in order to clarify discussion of the types shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not document centric. Element instances from the knowledge graph can be serialized into data instances, but the data definition is controlled by the logical model, not vice versa.  Data examples in various formats can illustrate the logical model for readers of the v3 spec, but they do not define it as they do in SPDX v2.

A collection of independent element values is shown in "logical-elements".  JSON data is use to visualize the element values, but it is important to remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an instance to be retrieved, and those property values are independent of serialization format.

That collection of elements can be serialized into a transfer unit file as shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need to be present in the transfer unit.  The example transfer unit containing six elements (an SBOM, a Package, two Files, a Relationship, and an Actor that created them) is:

{
  "namespace": "urn:acme.dev:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": "http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz",
          "homePage": "https://www.gnu.org/software/coreutils/"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "fred@..."}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived from the logical model are available in https://github.com/davaya/spdx-3-elements.

The intent is for these to assist in refining the logical model and its serializations together.

Regards,
Dave


David Kemp
 

I'll try again with an example:
  • An SBOM for Windows 10 is a Collection that could have millions of elements, yes?  The serialized file containing the values of those elements could be megabytes.
  • An SBOM for My App is a Collection with a few elements. My App runs on / depends on Windows.
When I serialize the SBOM for My App, how big is the file?  Megabytes or kilobytes?  That is my definition of "depends on".

If Microsoft serializes and signs the file containing the Windows SBOM and its millions of elements, the chain of integrity is broken if the MyApp SBOM file has a copy of Windows element values instead of references.

The difference between a logical model and a data model is that the logical model doesn't care how relationships are implemented, they just exist.  A data model *defines* how relationships are implemented - as either nested values or a map/array of independent values.  My definition of "depends on" is: the value (and hash) of every element is independent of the value (and hash) of every other element.  Elements cannot be nested, an SBOM (Collection) element must have an array of IRIs, not a map/array of values. That requirement exists at the data level because a pure logical model doesn't care.  But to the extent that data shapes are hybridized into the SPDX model, it must also require that independence.

Note that the full SpdxFile is not an array, it is an object with namespace, namespaceMap, defaults, elementValues, and references to other SpdxFiles.  But the elementValues property is an array because the values aren't nested.

Regards,
David


On Tue, Jul 19, 2022 at 12:47 AM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:
CIL


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization
 
One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

[William] Agreed, does my example violate that? It would be difficult for a serialization to "generate" elements because of the id and other required properties so I had not considered this a possibility.

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

[William] I think it depends on your definition of "depends on" (pun intended). Elements may have properties that are references to other elements and serializers may choose to use that information for more compact serialization but since this would get unwound on deserialization that's immaterial.

I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, 

[William] Agreed, though a lot of serializers/deserializers end up making minor changes as a result of normalization and other processes. Not ideal but that's an implementation detail within each serializer/deserializer.

and it's really ugly to stuff one element's value inside another

[William] I don't agree with this, at least for "collection" elements. Also, the serialization model for collection elements could support either element references or the element itself so if you think it's ugly then you would have the option of not doing nesting.

not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents

[William] This is avoidable and using references instead of nesting doesn't prevent this problem. In fact, if you only use nesting then it's impossible to have infinite recursion, it's only when you use references that becomes possible.

 Even two levels of element nesting makes things quite difficult to disentangle.

[William] I don't agree, for collections the nesting makes it obvious which collection an element is part of without having to follow the id references. Since the serialization model could support either approach I don't see this being a blocker.

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.

[William] We've had this discussion a number of times, the Collection element (and its subclasses) aren't metadata about collections, document, SBOM, etc. they are the collection, document, SBOM, etc. There is no "physical" thing outside of the SPDX document that is the collection, document, SBOM, etc., they only exist in the SPDX graph. You could take that SBOM, serialize it to disk, and then have a File element that talks about the physical serialization of the SBOM, but that's different to the SBOM SPDX element.

* defaults:
I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

[William] I'm struggling with multiple properties that have the same definition having different names and different locations on the objects, it feels like a lot to explain. We could flag certain properties as inheritable in the schema, and this only applies to collection elements so I think the scope is quite narrow.

* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  

[William] While it is built-in to XML and JSON my experience is that it's not been supported well by schema languages and serializers/deserializers. I know I've had situations where I had to duplicate the id property in the class to ensure that other things work correctly (and to maintain the independence of the class). Also, in most object oriented languages there is not a way to get the key from the object so you end up having to track the key independently of the object which is a pain.

JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.

[William] In both JSON-LD and SQL the properties are still normal properties, in JSON-LD it's still a property on the object it just has a special name, in SQL it's still a column in the table it just has special metadata attached to it. Even autogenerated ids are typically normal columns they're just system generated and you can't change their definition.

  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).

[William] Not the same 🙂. The map for extensions is a map of "extension type" to value, not of "id" to value. It is a consequence of us deciding that each type can only be assigned once that it can also be used as an id, but it is primarily a type, not an id. If we changed that design decision it would no longer function as an id.

Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

[William] In SQL Server (others are similar) a foreign key takes the form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, ParentCol2, ...), they're still just columns, nothing magical about them, not even their names.

* type property
Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer would use @type, an XML one would use XML namespaces and element names, a ProtoBuf one would use message types. Since my examples were "plain" JSON which does not have a built-in way of declaring types I used a "plain" property to capture the type, I agree that the name of this property should avoid potential conflict (e.g. by prefixing with an _).

* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.

[William] Same disagreement as above.

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

[William] I'm not sure what definition of "class" you're using here, but the boxes on the diagram could be represent in an OO language as classes or interfaces, for our purposes I don't think the distinction between class and data type is meaningful.

Regards,
Dave


William Bartholomew (CELA)
 

We should avoid the word “relationship” for this discussion, relationships are something else entirely. In this discussion we’re really referring to properties in the model that are references to elements (relationships have two of those types of properties but the relationship itself is not one of these).

 

You’re assuming a couple of things about nesting that I wasn’t intending. How I’m intending nesting to work, these are logically equivalent, would hash to the same value (because canonicalization would be over the referenced IRIs), and you could switch between them without affecting the hash or validation:

 

Nested:

{ "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [ { "id": "urn:microsoft:calculator:10.0.1.0", "type": "package", …  } ] }

 

Peers:

"elements": [

{ "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [ "urn:microsoft:calculator:10.0.1.0" ] },

{ "id": "urn:microsoft:calculator:10.0.1.0", "type": "package", … }

]

 

Externalized:

"elements": [

  { "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [ "calc:10.0.1.0" ] }

],

"externalMap": [

  { "externalId": "calc:10.0.1.0", "locationHint": "https://sbom.microsoft.com/calculator/10.0.1.0", "verifiedUsing": [ … ] }

]

 

This still has the properties (pun intended) that you desire: there is element independence, you’re not forced into a nested structure, you’re not prevented from using a nested structure, you can use subsets without breaking the signing of the elements.

 

P.S. The Windows SBOM is over a hundred megabytes (though it should come down when we move from SPDX 2.2 to 2.3).

 

Regards,

 

William Bartholomew (he/him) – Let’s chat

Principal Security Strategist

Global Cybersecurity Policy – Microsoft

 

My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours.

 

From: David Kemp <dk190a@...>
Sent: Tuesday, July 26, 2022 7:59 AM
To: William Bartholomew (CELA) <willbar@...>
Cc: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization

 

I'll try again with an example:

  • An SBOM for Windows 10 is a Collection that could have millions of elements, yes?  The serialized file containing the values of those elements could be megabytes.
  • An SBOM for My App is a Collection with a few elements. My App runs on / depends on Windows.

When I serialize the SBOM for My App, how big is the file?  Megabytes or kilobytes?  That is my definition of "depends on".

If Microsoft serializes and signs the file containing the Windows SBOM and its millions of elements, the chain of integrity is broken if the MyApp SBOM file has a copy of Windows element values instead of references.

The difference between a logical model and a data model is that the logical model doesn't care how relationships are implemented, they just exist.  A data model *defines* how relationships are implemented - as either nested values or a map/array of independent values.  My definition of "depends on" is: the value (and hash) of every element is independent of the value (and hash) of every other element.  Elements cannot be nested, an SBOM (Collection) element must have an array of IRIs, not a map/array of values. That requirement exists at the data level because a pure logical model doesn't care.  But to the extent that data shapes are hybridized into the SPDX model, it must also require that independence.

Note that the full SpdxFile is not an array, it is an object with namespace, namespaceMap, defaults, elementValues, and references to other SpdxFiles.  But the elementValues property is an array because the values aren't nested.

Regards,
David

 

On Tue, Jul 19, 2022 at 12:47 AM William Bartholomew (CELA) via lists.spdx.org <willbar=microsoft.com@...> wrote:

CIL

 


From: Spdx-tech@... <Spdx-tech@...> on behalf of David Kemp via lists.spdx.org <dk190a=gmail.com@...>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <Spdx-tech@...>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization

 

One principle is that the goal of serialization is to put Elements into physical format, NOT to create new elements that didn't exist prior to serialization.  If you have 6 elements going into serialization, you should have 6 elements coming out, not 7.

 

[William] Agreed, does my example violate that? It would be difficult for a serialization to "generate" elements because of the id and other required properties so I had not considered this a possibility.

 

The second principle is that logical elements should be independent: the value of one element does not depend on the value of any other element.

 

[William] I think it depends on your definition of "depends on" (pun intended). Elements may have properties that are references to other elements and serializers may choose to use that information for more compact serialization but since this would get unwound on deserialization that's immaterial.


I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out, 

 

[William] Agreed, though a lot of serializers/deserializers end up making minor changes as a result of normalization and other processes. Not ideal but that's an implementation detail within each serializer/deserializer.

 

and it's really ugly to stuff one element's value inside another

 

[William] I don't agree with this, at least for "collection" elements. Also, the serialization model for collection elements could support either element references or the element itself so if you think it's ugly then you would have the option of not doing nesting.

 

not least because you can wind up with infinite recursion with documents inside documents inside documents inside documents

 

[William] This is avoidable and using references instead of nesting doesn't prevent this problem. In fact, if you only use nesting then it's impossible to have infinite recursion, it's only when you use references that becomes possible.

 

 Even two levels of element nesting makes things quite difficult to disentangle.

 

[William] I don't agree, for collections the nesting makes it obvious which collection an element is part of without having to follow the id references. Since the serialization model could support either approach I don't see this being a blocker.

 

The fundamental principle is that a file containing data is not an element.  A Transfer Unit is defined by a data schema, just like the content of any XML file or JSON file or ASN.1 file.  If the logical model has a Document element that describes an X.509 certificate, that element has interesting facts about the certificate but does not define its content.  It is essential to remember the difference between the bytes in a file and the properties of a File or Document element - the difference between a thing and metadata about that thing.

 

[William] We've had this discussion a number of times, the Collection element (and its subclasses) aren't metadata about collections, document, SBOM, etc. they are the collection, document, SBOM, etc. There is no "physical" thing outside of the SPDX document that is the collection, document, SBOM, etc., they only exist in the SPDX graph. You could take that SBOM, serialize it to disk, and then have a File element that talks about the physical serialization of the SBOM, but that's different to the SBOM SPDX element.


* defaults:

I created a separate defaults property to hold the five defaultable properties in order to distinguish them from non-defaultable properties.  Gary and I like the idea, but I'm not wedded to it.  The transfer unit schema could have "defaultCreatedBy", "defaultCreated", etc properties at the top level, to highlight that they are defaults, unlike name, description, comments, etc.  Whatever the mechanism, there must be a way to ensure that "name" doesn't take an inappropriate default value if it isn't populated, while the default for "profiles" is appropriate.

 

[William] I'm struggling with multiple properties that have the same definition having different names and different locations on the objects, it feels like a lot to explain. We could flag certain properties as inheritable in the schema, and this only applies to collection elements so I think the scope is quite narrow.


* array vs map

I used map as a conversation starter, because it fits the "unique" semantics of element ids, and because mapping types are ubiquitous now,  XML schema had it in 2005 https://www.w3.org/2005/07/xml-schema-patterns.html#Maps, and it's a built-in part of JSON.  

 

[William] While it is built-in to XML and JSON my experience is that it's not been supported well by schema languages and serializers/deserializers. I know I've had situations where I had to duplicate the id property in the class to ensure that other things work correctly (and to maintain the independence of the class). Also, in most object oriented languages there is not a way to get the key from the object so you end up having to track the key independently of the object which is a pain.

 

JSON-LD even treats ID differently from other properties by giving it a reserved @ID type, and SQL databases have primary keys with the special characteristic that they uniquely identify the record rather than being just another column.  Autogenerated ids are often hidden because they are ubiquitous.

 

[William] In both JSON-LD and SQL the properties are still normal properties, in JSON-LD it's still a property on the object it just has a special name, in SQL it's still a column in the table it just has special metadata attached to it. Even autogenerated ids are typically normal columns they're just system generated and you can't change their definition.

 

  And finally, you introduced Map to the logical model for Extensions.  If it's OK for extensions, it's OK for Elements :-).

 

[William] Not the same 🙂. The map for extensions is a map of "extension type" to value, not of "id" to value. It is a consequence of us deciding that each type can only be assigned once that it can also be used as an id, but it is primarily a type, not an id. If we changed that design decision it would no longer function as an id.

 

Seriously though, I'm not wedded to Map.  Treating Id as any other property but having some prose saying that it can be used as a primary key / unique identifier is OK, it's just kind of loose given that references from foreign to primary keys is a universal concept.

 

[William] In SQL Server (others are similar) a foreign key takes the form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, ParentCol2, ...), they're still just columns, nothing magical about them, not even their names.


* type property

Since JSON does not have types it's good practice to ensure that "type: identity" cannot collide with a property named "identity".  At the core profile all type and property names are defined and don't collide, but if "type" goes away we'll need to ensure that properties defined in any profile cannot collide with types defined in any profile.  Again JSON-LD treats @type as a reserved property: https://w3c.github.io/json-ld-syntax/#typed-values.

 

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer would use @type, an XML one would use XML namespaces and element names, a ProtoBuf one would use message types. Since my examples were "plain" JSON which does not have a built-in way of declaring types I used a "plain" property to capture the type, I agree that the name of this property should avoid potential conflict (e.g. by prefixing with an _).


* document root
A transfer unit file is not an Element and not a logical type or a class. The bytes in SPDX documents are not defined by the logical model, they just have to be able to be de-serialized into element instances.

 

[William] Same disagreement as above.

 

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they define only data types.

 

[William] I'm not sure what definition of "class" you're using here, but the boxes on the diagram could be represent in an OO language as classes or interfaces, for our purposes I don't think the distinction between class and data type is meaningful.


Regards,

Dave