Re: meeting minutes <>

Two things:

1) The easiest and least complex way to retain the original text is to retain the original text explicitly.
2) The existence of a single tool written in a single language is much less useful than the existence of a well-known format with tools in pretty much every language. This is the biggest advantage of moving to XML (or some other standard format) - it doesn't force developers to insert Java or command line tool wrappers into whatever they are doing, allowing them to use native tools in their native language and not be tempted to either reimplement a one-off case-specific spec that is unique(?) to SPDX. I believe XML can serve the purpose you're currently the SPDX markup for in the same way you're using it, but I'm not convinced that reconstruction is the way to go, particularly as and if the markup gets more complex. They are, however, two separate discussions (changing to XML versus changing the approach used to model the data).

As an aside, I don't think JSON is at all a useful data format for what you're trying to do here, and in general I much prefer JSON over XML.


-----Original Message-----
From: Gary O'Neall [mailto:gary@...]
Sent: Friday, August 07, 2015 18:45
To: <>; 'J Lovejoy' <opensource@...>; 'Philippe Ombredanne' <pombredanne@...>
Cc: 'SPDX-legal' <spdx-legal@...>
Subject: RE: meeting minutes

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.

There is actually an opensource tool (source located at which generates the web pages for precisely in the manner Kris proposes below (barring an bugs which is always possible ;).

Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (

There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...


-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for
the markup vs no markup debate. One is: a repository of data that can
be used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a license

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from
text, you are not likely going to go about it by scraping the website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be
produced from the marked up text, and you're good to go (for whatever
that's worth, since the markup is, of course, there to cover
*variance* in the reference text. Deciding which version will be the
"official" version is beyond this discussion...)

This solves everyone's needs.


-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:

For the last two calls:
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility
of making mark-up changes something that others can do and then
submit as a patch [...]
2) Kris had raised request via tech list regarding markup on
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
examples: ISC License - now default license for NPM, has reference
ISC in text (needs markup); one link broken on SPDX list and one
to link with slightly different text (generic v. specific to ISC)
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call was
about improving some of the existing markup, adding markup that should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we
definitely want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have
now as what you describe above?
I do understand that having a “clean” license text with no markup
could be desirable (cut and pasting…) but we do have that on the HTML pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted, just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of
the license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting
minutes (or other assumptions that didn’t warrant much discussion, as
we all agreed). will try to bear that in mind for future minutes!

Spdx-legal mailing list
Spdx-legal mailing list

Join { to automatically receive all group messages.