Two things:
1) The easiest and least complex way to retain the original text is to retain the original text explicitly. 2) The existence of a single tool written in a single language is much less useful than the existence of a well-known format with tools in pretty much every language. This is the biggest advantage of moving to XML (or some other standard format) - it doesn't force developers to insert Java or command line tool wrappers into whatever they are doing, allowing them to use native tools in their native language and not be tempted to either reimplement a one-off case-specific spec that is unique(?) to SPDX. I believe XML can serve the purpose you're currently the SPDX markup for in the same way you're using it, but I'm not convinced that reconstruction is the way to go, particularly as and if the markup gets more complex. They are, however, two separate discussions (changing to XML versus changing the approach used to model the data).
As an aside, I don't think JSON is at all a useful data format for what you're trying to do here, and in general I much prefer JSON over XML.
Kris
toggle quoted message
Show quoted text
-----Original Message----- From: Gary O'Neall [mailto:gary@...] Sent: Friday, August 07, 2015 18:45 To: Kris.re <Kris.re@...>; 'J Lovejoy' <opensource@...>; 'Philippe Ombredanne' <pombredanne@...> Cc: 'SPDX-legal' <spdx-legal@...> Subject: RE: meeting minutes Hi Kris, Philippe and all, The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text. There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;). Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses ( https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf). There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list... Gary -----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of Kris.re Sent: Friday, August 7, 2015 11:42 AM To: J Lovejoy; Philippe Ombredanne Cc: SPDX-legal Subject: RE: meeting minutes
There are two purposes at odds here and, I suspect, responsible for the markup vs no markup debate. One is: a repository of data that can be used to identify license names/ids from content. The other is: a repository of data that can be used to produce content given a license name/id.
Luckily, the people utilizing either use-case are, I should think, doing it with different interfaces.
If you want to use SPDX data to look up the BSD 2 Clause license text, you are not likely to go about it by cloning the repository, finding the file, and reading the raw content.
Similarly, if you want to use the SPDX data to identify a license from text, you are not likely going to go about it by scraping the website, processing the html, and doing a bunch of extra work.
The simple solution that presents itself is this:
Mark up all the data as much as needed. Generate the website from the marked up data. Ensure that the original reference text can be produced from the marked up text, and you're good to go (for whatever that's worth, since the markup is, of course, there to cover *variance* in the reference text. Deciding which version will be the "official" version is beyond this discussion...)
This solves everyone's needs.
Kris
-----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of J Lovejoy Sent: Friday, August 07, 2015 11:29 To: Philippe Ombredanne <pombredanne@...> Cc: SPDX-legal <spdx-legal@...> Subject: Re: meeting minutes
Hi Philippe,
Comments below:
For the last two calls:
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23 [...] 3) Mark-up bug raised on tech team call- bug filed requesting that the mark-up be done to facilitate automation vs. human readable. Good
goal that tech team will look to see if it can be prioritized for next year. Gary will also talk with Jilayne about the possibility of making mark-up changes something that others can do and then submit as a patch [...] http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06 [...] 2) Kris had raised request via tech list regarding markup on licenses
and matching rules and joined to discuss some issues matching guidelines that are programmatically difficult to implement, wanted to be able to make suggestions global review or make small improvements examples: ISC License - now default license for NPM, has reference to
ISC in text (needs markup); one link broken on SPDX list and one goes
to link with slightly different text (generic v. specific to ISC) [...] Adding matching markup inside the reference license texts will eventually lead to un-resolvable conflicts: There is already markup in about 15% of the licenses, as per our in depth discussions a couple years ago. The conversation on the call was about improving some of the existing markup, adding markup that should have been there but isn’t (in at least one case), and adding some markup for other matching guidelines. I think everything that was recommended by Kris and discussed on the call was probably something discussed at the original meetings on how to implement the markup for the matching guidelines at Collab Summit a couple years ago (Daniel German’s ears are ringing!) - As I explained as part of the background: we admittedly took a conservative approach as to what we could markup up on the initial release of this, always easier to add later, than remove. My memory was that the people directly involved various tools (license scanners) wanted more markup, as that eliminates having to make a determination.
- markup will make a license text no longer a reference - it will make it less readable or unusable as such - it may damage or transform a reference text in unwanted ways I’m not sure what you mean here by the license text no longer being a reference or damaging it? Can you explain or provide an example? If we’ve missed a use-case that more markup could frustrate, we definitely want to discuss that.
It would be simpler to separate the two cleanly:
1. Reference license texts, not modified for matching. They may contain lightweight markup for the purpose of clarity, not for matching. We already have licenses with markup - do you consider what we have now as what you describe above? I do understand that having a “clean” license text with no markup could be desirable (cut and pasting…) but we do have that on the HTML pages. as the actual markup does not show up there (only in colored text for the visual).
2. Arbitrarily marked-up texts for matching modified as needed. They may contain heavy markup to the detriment of clarity. I don’t think we need arbitrarily marked-up texts, nor would we have that. We need accurately marked-up texts, that have been vetted, just as we did for the markup that we currently have - this won’t change.
Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team. That is not tenable - if someone suggests additional markup, the legal team will need to review it to ensure that, for example, text that is proposed to be marked as replaceable does not change the meaning of the license - a cross-team approach (tech and legal) is needed.
A given was that there would be a review process for any recommendations - perhaps I did not capture that in the meeting minutes (or other assumptions that didn’t warrant much discussion, as we all agreed). will try to bear that in mind for future minutes!
Jilayne _______________________________________________ Spdx-legal mailing list Spdx-legal@... https://lists.spdx.org/mailman/listinfo/spdx-legal _______________________________________________ Spdx-legal mailing list Spdx-legal@... https://lists.spdx.org/mailman/listinfo/spdx-legal
|