Date
1 - 5 of 5
Tag-value RDF mapping proposal
Peter Williams <peter.williams@...>
I wrote up the beginnings of a proposal for doing to tag-value to rdf
mapping, <http://www.spdx.org/wiki/proposal-2010-12-07-1-tag-value-rdf-mapping>. The proposal does not include any changes to the spec proper yet. Once a consensus on the technicalities of the format and mapping develops we can translate that in the spec. I urge everyone to read the proposal and make comments/suggests. This is a very important part of the spec and the more eyes we have on it the better it will be. Peter www.openlogic.com |
|
Peter Williams <peter.williams@...>
I realized that there was a way to simplify the tag value files a bit
so updated the proposal (<http://www.spdx.org/wiki/proposal-2010-12-07-1-tag-value-rdf-mapping>). These changes make producing the files a little bit simpler but it make writing a parser a bit more complicated. They also limit non-standard extension of the format. However, the increased simplicity is probably worth these trade offs if we think any one every going to produce one of these by hand. Which raises a question: do we actually anticipate humans producing these files? That seems like an awful lot of tedious typing for any human to actually do it. If we don't expect humans write these files perhaps we should revisit the supported format discussion. If we do expect humans to write these files i would the interested in what situations. Peter On Tue, Dec 7, 2010 at 11:58 AM, Peter Williams <peter.williams@...> wrote: I wrote up the beginnings of a proposal for doing to tag-value to rdf |
|
Peter Williams <peter.williams@...>
Hi all,
Kate posted some comments on the proposal. It is worth me responding to the list so everyone can see where the debate is. Also there are many special cases <KES>: it is one tag per data value all on the same line, unless it isI was unaware of the `<text>...</text>` syntax. This approach definitely solves my multi-line value issue. I think i joined the group after that discussion occurred and it has not yet made its way into the official spec. Using that approach the first example would look like SPDXVersion: SPDX-1.0 CreatedBy: Tool: spdx-gen 1.0 [Package http://oss.net/foo-1.0.tar.gz] DeclaredLicense: FullLicense-1 DeclaredLicense: license:GPL2 Description: <text>This is along multiline value</text> [License FullLicnse-1] LicenseText: <text> Some terms and conditions </text> A uniform way declare new resources (entities) and link to them This generalized pattern will make it much easier to create backward<KES> Positional order will give it, I'm not sure this is adding value compatible improvements to the spdx format as time goes on. New type of sections can be added and any spdx processor that does not understand that type of section could just it. In the current positional approach you run the risk of having properties attached to the wrong top-level item in that situation. This uniformity would also make implementing improvements easier. The parser component of spdx processors would not have to be changed at all to support new versions of the spec. New sections and tags would be parsed just like the existing structures. Only parts of the tool that interpreted the information would need to be updated. It will make it easier to write lint like tools for spdx in the face of future versions of spdx. By encoding the structure explicitly in the format a lint-ish tool will be able to use that information even if it does not understand the sections/tags themselves. This approach would completely remove the positional nature of the format. This, in and of itself, is a huge win in my book. Remembering the appropriate order of the, large number of, tags in spdx will be difficult and tedious. People do much better with formats that have an explicit structure, rather than ones with an implicit order based one. Explicitly structured documents are easier to read, produce and to parse/interpret reliably. This is born out by the fact that most, if not all, popular interchange specifications use order independent formats such as xml. It also makes the mapping to rdf easier to describe, understand and implement. Arbitrary rdf properties will be <KES> Whoa, the only place we have buy in from the lawyers is in the comment field. Introducing this notation, needs to be thought out, - also this conflicts withI don't think it does actually conflict. The `<text>` and `</text>` sequences can only occur in the value of a tag. This described a syntax that can only occur in the tag part of the line. I don't think it would be hard to all write a parser to distinguish between the two. On the other hand, i am open to other ways to allow arbitrary rdf values to be added. I feel strongly that extensibility is an important feature. Spdx needs a mechanism to allow experimentation and innovation. It is preferable to have that happen in a way the will not cause problems with future versions of the spec. Forcing experiments into globally unique namespaces (using uris) will help prevent collisions between experiments and future improvements in spdx. I think the lawyers will be comfortable with this approach. This ability does not constitute a legal judgment. Additionally, by forcing such non-standard tags into their own namespaces we are also making it clear that they are not part of the spdx standard. Peter www.openlogic.com |
|
Kate Stewart <kate.stewart@...>
On Tue, 2010-12-14 at 12:15 -0700, Peter Williams wrote:
The phone discussion highlighted that there was a terminology differenceA uniform way declare new resources (entities) and link to them on how sections are used in the rdf context, vs. in the specification context. Peter is using sections in the rdf context, as Projects are not sections in the specification. The approach I'm advocating is by recognition of keywords to start grouping of related tags in the section approach advocated by the spec. By recognizing keywords associated with Package, File, License related fields, the same effect can be created, without adding additional character syntax. The proposal from the meeting was to create a flex/bison (or lex/yacc - depending on which syntax I can remember best) version of the tag specification to clarify the keywords meeting and the fields, and possibly rename some tag keywords to be more meaningful. I took the action to get a draft out for discussion. The other point worth noting is that we'll be updating the specification to splitting out the reviewedby, into a separate section (in the spec sense), and unifying the values associated with the tag with those used in createdby. I'm not sure I agree with the conclusion, and it would make remembering the syntax to specify harder for those doing hand coding. Keeping this as intuitive to use as DEP-5 seems a reasonable goal for the tag version. This approach would completely remove the positional nature of thePositional nature is being overinterpretted here, I think. Within a section, you should be able to associate related tags, without forcing a specific order. I think that the lex/yacc will make it clearer. I'll split the other issue referenced into its own thread. Kate |
|
Peter Williams <peter.williams@...>
On Wed, Dec 15, 2010 at 3:55 PM, Kate Stewart
<kate.stewart@...> wrote: I find DEP-5 quite unintuitive with regard to repeating structuresI'm not sure I agree with the conclusion, and it would make remembering such as files. I prefer distinct syntactic structures to be visually distinct. This allows humans to quickly detect the new context. Blessing particular tags as the start of a new "section" turns those tags into a different syntactic beast than all the other tags. However, the uninitiated human reader is left without any hint of that fact. That being said, think the biggest issue we need to deal with is forward compatibility. We need a way for future versions of the spec to be able to introduce meaningful improvements that are backwards compatibility with this version. Having implicit section boundaries greatly limits the freedom of future versions. They will not be able to introduce new sections without breaking backwards compatibility. I think that limitation will basically make this format an evolutionary dead end because many potential changes will be impossible, or inordinately difficult, to implement without the use of sections. Future versions of the spec will likely have to choose between obsoleting a large number of tools or not implementing needed improvements. Peter PS: in this email i am using "section" in the sense Kate usually does, ie a group of related tag-value pairs in an spdx file. |
|