HI All,
Just going through some old threads.
Philippe - do the subsequent comments clear up your concerns? If not, can you explain further? It’d be great to get some follow-up thoughts.
Thanks, Jilayne
toggle quoted message
Show quoted text
On Aug 7, 2015, at 4:44 PM, Gary O'Neall <gary@...> wrote:
Hi Kris, Philippe and all,
The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.
There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).
Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).
There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...
Gary
-----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of Kris.re Sent: Friday, August 7, 2015 11:42 AM To: J Lovejoy; Philippe Ombredanne Cc: SPDX-legal Subject: RE: meeting minutes
There are two purposes at odds here and, I suspect, responsible for the markup vs no markup debate. One is: a repository of data that can be used to identify license names/ids from content. The other is: a repository of data that can be used to produce content given a license name/id.
Luckily, the people utilizing either use-case are, I should think, doing it with different interfaces.
If you want to use SPDX data to look up the BSD 2 Clause license text, you are not likely to go about it by cloning the repository, finding the file, and reading the raw content.
Similarly, if you want to use the SPDX data to identify a license from text, you are not likely going to go about it by scraping the website, processing the html, and doing a bunch of extra work.
The simple solution that presents itself is this:
Mark up all the data as much as needed. Generate the website from the marked up data. Ensure that the original reference text can be produced from the marked up text, and you're good to go (for whatever that's worth, since the markup is, of course, there to cover *variance* in the reference text. Deciding which version will be the "official" version is beyond this discussion...)
This solves everyone's needs.
Kris
-----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of J Lovejoy Sent: Friday, August 07, 2015 11:29 To: Philippe Ombredanne <pombredanne@...> Cc: SPDX-legal <spdx-legal@...> Subject: Re: meeting minutes
Hi Philippe,
Comments below:
For the last two calls:
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23 [...] 3) Mark-up bug raised on tech team call- bug filed requesting that the mark-up be done to facilitate automation vs. human readable. Good
goal that tech team will look to see if it can be prioritized for next year. Gary will also talk with Jilayne about the possibility of making mark-up changes something that others can do and then submit as a patch [...] http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06 [...] 2) Kris had raised request via tech list regarding markup on licenses
and matching rules and joined to discuss some issues matching guidelines that are programmatically difficult to implement, wanted to be able to make suggestions global review or make small improvements examples: ISC License - now default license for NPM, has reference to
ISC in text (needs markup); one link broken on SPDX list and one goes
to link with slightly different text (generic v. specific to ISC) [...] Adding matching markup inside the reference license texts will eventually lead to un-resolvable conflicts: There is already markup in about 15% of the licenses, as per our in depth discussions a couple years ago. The conversation on the call was about improving some of the existing markup, adding markup that should have been there but isn’t (in at least one case), and adding some markup for other matching guidelines. I think everything that was recommended by Kris and discussed on the call was probably something discussed at the original meetings on how to implement the markup for the matching guidelines at Collab Summit a couple years ago (Daniel German’s ears are ringing!) - As I explained as part of the background: we admittedly took a conservative approach as to what we could markup up on the initial release of this, always easier to add later, than remove. My memory was that the people directly involved various tools (license scanners) wanted more markup, as that eliminates having to make a determination.
- markup will make a license text no longer a reference - it will make it less readable or unusable as such - it may damage or transform a reference text in unwanted ways I’m not sure what you mean here by the license text no longer being a reference or damaging it? Can you explain or provide an example? If we’ve missed a use-case that more markup could frustrate, we definitely want to discuss that.
It would be simpler to separate the two cleanly:
1. Reference license texts, not modified for matching. They may contain lightweight markup for the purpose of clarity, not for matching. We already have licenses with markup - do you consider what we have now as what you describe above? I do understand that having a “clean” license text with no markup could be desirable (cut and pasting…) but we do have that on the HTML pages. as the actual markup does not show up there (only in colored text for the visual).
2. Arbitrarily marked-up texts for matching modified as needed. They may contain heavy markup to the detriment of clarity. I don’t think we need arbitrarily marked-up texts, nor would we have that. We need accurately marked-up texts, that have been vetted, just as we did for the markup that we currently have - this won’t change.
Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team. That is not tenable - if someone suggests additional markup, the legal team will need to review it to ensure that, for example, text that is proposed to be marked as replaceable does not change the meaning of the license - a cross-team approach (tech and legal) is needed.
A given was that there would be a review process for any recommendations - perhaps I did not capture that in the meeting minutes (or other assumptions that didn’t warrant much discussion, as we all agreed). will try to bear that in mind for future minutes!
Jilayne _______________________________________________ Spdx-legal mailing list Spdx-legal@... https://lists.spdx.org/mailman/listinfo/spdx-legal _______________________________________________ Spdx-legal mailing list Spdx-legal@... https://lists.spdx.org/mailman/listinfo/spdx-legal
|
On Fri, Aug 28, 2015 at 10:52 AM, J Lovejoy <opensource@...> wrote: HI All, Just going through some old threads. Philippe - do the subsequent comments clear up your concerns? If not, can you explain further? It’d be great to get some follow-up thoughts. Thanks, Jilayne Jilayne: The subsequent comments are fine indeed. I still think that markup for detection and reference texts would be best handled separately in the long run, yet the volume is still rather low and Gary and you seem to be ready to handle the coordination alright. -- Cordially Philippe Ombredanne On Aug 7, 2015, at
4:44 PM, Gary O'Neall <gary@...> wrote: Hi Kris, Philippe and all,
The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.
There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).
Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).
There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...
Gary
-----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of Kris.re Sent: Friday, August 7, 2015 11:42 AM To: J Lovejoy; Philippe Ombredanne Cc: SPDX-legal Subject: RE: meeting minutes
There are two purposes at odds here and, I suspect, responsible for the markup vs no markup debate. One is: a repository of data that can be used to identify license names/ids from content. The other is: a repository of data that can be used to produce content given a license name/id.
Luckily, the people utilizing either use-case are, I should think, doing it with different interfaces.
If you want to use SPDX data to look up the BSD 2 Clause license text, you are not likely to go about it by cloning the repository, finding the file, and reading the raw content.
Similarly, if you want to use the SPDX data to identify a license from text, you are not likely going to go about it by scraping the website, processing the html, and doing a bunch of extra work.
The simple solution that presents itself is this:
Mark up all the data as much as needed. Generate the website from the marked up data. Ensure that the original reference text can be produced from the marked up text, and you're good to go (for whatever that's worth, since the markup is, of course, there to cover *variance* in the reference text. Deciding which version will be the "official" version is beyond this discussion...)
This solves everyone's needs.
Kris
-----Original Message----- From: spdx-legal-bounces@... [mailto:spdx-legal- bounces@...] On Behalf Of J Lovejoy Sent: Friday, August 07, 2015 11:29 To: Philippe Ombredanne <pombredanne@...> Cc: SPDX-legal <spdx-legal@...> Subject: Re: meeting minutes
Hi Philippe,
Comments below:
For the last two calls:
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23 [...] 3) Mark-up bug raised on tech team call- bug filed requesting that the mark-up be done to facilitate automation vs. human readable. Good
goal that tech team will look to see if it can be prioritized for next year. Gary will also talk with Jilayne about the possibility of making mark-up changes something that others can do and then submit as a patch [...] http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06 [...] 2) Kris had raised request via tech list regarding markup on licenses
and matching rules and joined to discuss some issues matching guidelines that are programmatically difficult to implement, wanted to be able to make suggestions global review or make small improvements examples: ISC License - now default license for NPM, has reference to
ISC in text (needs markup); one link broken on SPDX list and one goes
to link with slightly different text (generic v. specific to ISC) [...] Adding matching markup inside the reference license texts will eventually lead to un-resolvable conflicts: There is already markup in about 15% of the licenses, as per our in depth discussions a couple years ago. The conversation on the call was about improving some of the existing markup, adding markup that should have been there but isn’t (in at least one case), and adding some markup for other matching guidelines. I think everything that was recommended by Kris and discussed on the call was probably something discussed at the original meetings on how to implement the markup for the matching guidelines at Collab Summit a couple years ago (Daniel German’s ears are ringing!) - As I explained as part of the background: we admittedly took a conservative approach as to what we could markup up on the initial release of this, always easier to add later, than remove. My memory was that the people directly involved various tools (license scanners) wanted more markup, as that eliminates having to make a determination.
- markup will make a license text no longer a reference - it will make it less readable or unusable as such - it may damage or transform a reference text in unwanted ways I’m not sure what you mean here by the license text no longer being a reference or damaging it? Can you explain or provide an example? If we’ve missed a use-case that more markup could frustrate, we definitely want to discuss that.
It would be simpler to separate the two cleanly:
1. Reference license texts, not modified for matching. They may contain lightweight markup for the purpose of clarity, not for matching. We already have licenses with markup - do you consider what we have now as what you describe above? I do understand that having a “clean” license text with no markup could be desirable (cut and pasting…) but we do have that on the HTML pages. as the actual markup does not show up there (only in colored text for the visual).
2. Arbitrarily marked-up texts for matching modified as needed. They may contain heavy markup to the detriment of clarity. I don’t think we need arbitrarily marked-up texts, nor would we have that. We need accurately marked-up texts, that have been vetted, just as we did for the markup that we currently have - this won’t change.
Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team. That is not tenable - if someone suggests additional markup, the legal team will need to review it to ensure that, for example, text that is proposed to be marked as replaceable does not change the meaning of the license - a cross-team approach (tech and legal) is needed.
A given was that there would be a review process for any recommendations - perhaps I did not capture that in the meeting minutes (or other assumptions that didn’t warrant much discussion, as we all agreed). will try to bear that in mind for future minutes!
Jilayne
|