Spec structure


Peter Williams <peter.williams@...>
 

Some recent feed back on the composite-licensing proposal [1] was that a way is needed to for particular composite licensing info to be referenced by multiple files in the tag-value format. While attempting to integrate this in to the proposal i have run into an issue. These "named" composite licensing info resources need to be defined a section of there own. However, the current spec does not currently describe how the various "sections" of the spdx tag-value are separated.

There seems to be quite a bit of high level information regarding the overall structure and principles of the tag-value format that are missing from the spec entirely. Further, there does not seem to be a good place in the current spec layout for this discussion.

Should rearrange the spec so that we have three major sections: abstract semantics, tag-value format (including how this format maps to/from an RDF graph), and the RDF/XML format? Such a layout would give us a obvious place to talk about the overall structure of the tag-value and RDF/XML files. It would also simplify the property and class blocks because they could be limited to discussions of the semantics of the individual elements.

Obviously, we would also need a new subsection in the introduction to describe the layout and be quite explicit at the beginning of the abstract semantics section about its role.

Any thoughts or alternative ideas?

Peter

[1]: <http://www.spdx.org/wiki/proposal-2010-10-21-4-composite-licensing>


Peter Williams <peter.williams@...>
 

Is this thing on?

On 11/18/10 1:38 PM, Peter Williams wrote:
Some recent feed back on the composite-licensing proposal [1] was that a
way is needed to for particular composite licensing info to be
referenced by multiple files in the tag-value format. While attempting
to integrate this in to the proposal i have run into an issue. These
"named" composite licensing info resources need to be defined a section
of there own. However, the current spec does not currently describe how
the various "sections" of the spdx tag-value are separated.

There seems to be quite a bit of high level information regarding the
overall structure and principles of the tag-value format that are
missing from the spec entirely. Further, there does not seem to be a
good place in the current spec layout for this discussion.

Should rearrange the spec so that we have three major sections: abstract
semantics, tag-value format (including how this format maps to/from an
RDF graph), and the RDF/XML format? Such a layout would give us a
obvious place to talk about the overall structure of the tag-value and
RDF/XML files. It would also simplify the property and class blocks
because they could be limited to discussions of the semantics of the
individual elements.

Obviously, we would also need a new subsection in the introduction to
describe the layout and be quite explicit at the beginning of the
abstract semantics section about its role.

Any thoughts or alternative ideas?

Peter

[1]: <http://www.spdx.org/wiki/proposal-2010-10-21-4-composite-licensing>


Gary O'Neall
 

Hi Peter - I'm receiving the emails.

I don't really have a better alternative than what you suggest. Your
proposal sounds like a pretty clean approach. It could also be easily
extended if other syntaxes were desired (although I'm not sure what they
would be).

The only other approach I can think of is to create a section in the SPDX
for "License Info" and include the semantics and both syntaxes (RDF/XML and
keyword/keyvalue pairs) for that section. This would follow the same
pattern as the current doc.

Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Peter Williams
Sent: Friday, November 19, 2010 8:45 AM
To: Spdx-tech@...
Subject: Re: Spec structure

Is this thing on?

On 11/18/10 1:38 PM, Peter Williams wrote:
Some recent feed back on the composite-licensing proposal [1] was that a
way is needed to for particular composite licensing info to be
referenced by multiple files in the tag-value format. While attempting
to integrate this in to the proposal i have run into an issue. These
"named" composite licensing info resources need to be defined a section
of there own. However, the current spec does not currently describe how
the various "sections" of the spdx tag-value are separated.

There seems to be quite a bit of high level information regarding the
overall structure and principles of the tag-value format that are
missing from the spec entirely. Further, there does not seem to be a
good place in the current spec layout for this discussion.

Should rearrange the spec so that we have three major sections: abstract
semantics, tag-value format (including how this format maps to/from an
RDF graph), and the RDF/XML format? Such a layout would give us a
obvious place to talk about the overall structure of the tag-value and
RDF/XML files. It would also simplify the property and class blocks
because they could be limited to discussions of the semantics of the
individual elements.

Obviously, we would also need a new subsection in the introduction to
describe the layout and be quite explicit at the beginning of the
abstract semantics section about its role.

Any thoughts or alternative ideas?

Peter

[1]: <http://www.spdx.org/wiki/proposal-2010-10-21-4-composite-licensing>
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

On 11/19/10 11:02 AM, Gary O'Neall wrote:
Hi Peter - I'm receiving the emails.
Thanks for responding. Just wanted to make sure since this list is new.

I don't really have a better alternative than what you suggest. Your
proposal sounds like a pretty clean approach. It could also be easily
extended if other syntaxes were desired (although I'm not sure what they
would be).

The only other approach I can think of is to create a section in the SPDX
for "License Info" and include the semantics and both syntaxes (RDF/XML and
keyword/keyvalue pairs) for that section. This would follow the same
pattern as the current doc.
We could do that. I think it would lead to some duplication, though. We would need very similar discussions how sections are separated in the License, File, DisjunctiveLicenseSet, ConjunctiveLicenseSet, and, should we adopt it, LicenseOrLaterVersion sections of the spec.

Does anyone know what the intended mechnaism was to separate the various sections? I.e. how is a consumer of a tag-value spdx file suppose to know when one file end and the next one begins?

Peter


Kate Stewart <kate.stewart@...>
 

On Fri, 2010-11-19 at 11:20 -0700, Peter Williams wrote:
On 11/19/10 11:02 AM, Gary O'Neall wrote:
Hi Peter - I'm receiving the emails.
Thanks for responding. Just wanted to make sure since this list is new.
I'd gotten it too.

Just had parked it in the "study a bit before responding", because am
not really understanding what changing to this approach is really buying
us. We sort of talked about this before - and I'm trying to figure out
how to see this from your viewpoint - cause it doesn't seem like a
problem to me - so I'm probably missing something. :(


I don't really have a better alternative than what you suggest. Your
proposal sounds like a pretty clean approach. It could also be easily
extended if other syntaxes were desired (although I'm not sure what they
would be).

The only other approach I can think of is to create a section in the SPDX
for "License Info" and include the semantics and both syntaxes (RDF/XML and
keyword/keyvalue pairs) for that section. This would follow the same
pattern as the current doc.
We could do that. I think it would lead to some duplication, though.
We would need very similar discussions how sections are separated in the
License, File, DisjunctiveLicenseSet, ConjunctiveLicenseSet, and, should
we adopt it, LicenseOrLaterVersion sections of the spec.

Does anyone know what the intended mechnaism was to separate the various
sections? I.e. how is a consumer of a tag-value spdx file suppose to
know when one file end and the next one begins?
There is only supposed to be one spdx file per package, so they should
not be concatenated together.

Sections are separate by recognizing keywords in implied sequential
order, as layed out in the sections in the spec. The only area we
discussed relaxing this was for the signed-off by property to be
appended to the end, without perturbing the contents of the rest of the
file. Possible we just make a separate audit section at the end and
move it to there? That way can just use the implied order and
keywords to delimit sections - when translating back and forth.

One of my concerns is that the RDF syntax as illustrated in the zlib
example is overly wordy and expands the size of the file unnecessarily,
and makes it awkward to enter by hand, which will be a barrier to
adoption in some cases. Can we focus as well, on tightening it up to
correspond as close as possible to the tags, and do some optimization on
the expression?

Kate


Peter Williams <peter.williams@...>
 

On 11/19/10 3:00 PM, Kate Stewart wrote:
On Fri, 2010-11-19 at 11:20 -0700, Peter Williams wrote:
On 11/19/10 11:02 AM, Gary O'Neall wrote:
Hi Peter - I'm receiving the emails.
Thanks for responding. Just wanted to make sure since this list is new.
I'd gotten it too.
Thanks for responding

Just had parked it in the "study a bit before responding", because am
not really understanding what changing to this approach is really buying
us. We sort of talked about this before - and I'm trying to figure out
how to see this from your viewpoint - cause it doesn't seem like a
problem to me - so I'm probably missing something. :(
My point is that in the current structure there is no high level discussion of the tag-value format. Specifically, issues such as a overall layout, ordering issues, common idioms, must ignore/understand semantics, mapping in to rdf, etc.

There is no obvious location to place such an discussion. Probably the best we could do in the current structure is to add such a section for each format before getting into the nitty-gritty details of the various properties and classes. I prefer to learn by moving from abstract to concrete, but i understand that is not everyone's preference.

I think i best that the spec to be very clear that there is an abstract model that all spdx formats map into. Then in separate sections describe the two supported serialization methods completely. Each of the sections could be structured in a way appropriate to that particular format. Having a section per serialization format would also allow us to provide a complete treatment of the particular format. Such single focus sections would also reduce superfluous information to the readers intent. That is, if a reader is trying to figure out how to write an spdx tag-value file they would be able to read the appropriate part of the spec an ignore the rdf/xml section.

On a sort-of related topic, we also have an issue with the same property appearing in multiple sections of the spec currently. Currently these properties are defined twice. Not a huge technical deal, but it could lead to confusion. From an rdf perspective they are actually the exact same property. A reader with an rdf background would probably be left wonder why the same property is defined twice in the same spec. We could probably handle this issue without the sort of rearrangement i proposed but it it might be easier with such an arrangement.



I don't really have a better alternative than what you suggest. Your
proposal sounds like a pretty clean approach. It could also be easily
extended if other syntaxes were desired (although I'm not sure what they
would be).

The only other approach I can think of is to create a section in the SPDX
for "License Info" and include the semantics and both syntaxes (RDF/XML and
keyword/keyvalue pairs) for that section. This would follow the same
pattern as the current doc.
We could do that. I think it would lead to some duplication, though.
We would need very similar discussions how sections are separated in the
License, File, DisjunctiveLicenseSet, ConjunctiveLicenseSet, and, should
we adopt it, LicenseOrLaterVersion sections of the spec.

Does anyone know what the intended mechnaism was to separate the various
sections? I.e. how is a consumer of a tag-value spdx file suppose to
know when one file end and the next one begins?
There is only supposed to be one spdx file per package, so they should
not be concatenated together.
Agreed. However, a single spdx file contains data about many files from the package in question. My question was, as a consumer of a single spdx tag-value file how do i delineate the information about each individual file in the package.

Sections are separate by recognizing keywords in implied sequential
order, as layed out in the sections in the spec. The only area we
discussed relaxing this was for the signed-off by property to be
appended to the end, without perturbing the contents of the rest of the
file. Possible we just make a separate audit section at the end and
move it to there? That way can just use the implied order and
keywords to delimit sections - when translating back and forth.
One of my concerns is that the RDF syntax as illustrated in the zlib
example is overly wordy and expands the size of the file unnecessarily,
and makes it awkward to enter by hand, which will be a barrier to
adoption in some cases. Can we focus as well, on tightening it up to
correspond as close as possible to the tags, and do some optimization on
the expression?
I agree. I think some of the verbosity exists because of some missing properties. Primarily, we do not have a way to actually associate a File resource with a Package resource. The RDF model really needs a property to make this association explicit otherwise that relationship is going to be lost once the file is read into an abstract graph.

The lack of this membership property forces the use of `<Description>` tags in places where an `<includes>` (or `<contains>`, or `<memberFile>`, or something) tag would be more appropriate.


Peter


Peter Williams <peter.williams@...>
 

On 11/19/10 3:00 PM, Kate Stewart wrote:
Sections are separate by recognizing keywords in implied sequential
order, as layed out in the sections in the spec.
So when parsing a tag-value spdx file i can tell that that is have hit the end of data for one file in the package when i see a line that starts with "Name: "? Required ordering is hard for humans. The lack of visual boundaries between sections is also hard of humans reading the file.

A section boundary would provide visual breaks and allow us to remove the ordering requirements. Maybe something like this.

[Package <https://olex.openlogic.com/package_versions/download/9423?path=openlogic/zlib/1.2.5/openlogic-zlib-1.2.5-all-src-1.zip&;package_version_id=3690>]
DeclaredName: zlib 1.2.5 Source
...

[File <#contrib/iostream2/zstream.h>]
Type: source
License: FullLicense-2
...

[File <#adler32.c>]
Type: source
License: Zlib
...

This would make it easier to produce by hand because it has no ordering requirement. It would be easier to parse because the sections are more clearly delineated. It would be easier to map into an rdf graph because the resources described would have a uniform way to declare their uris.

Peter