license list markup (was: "meeting minutes")


J Lovejoy
 

HI All,

Just going through some old threads.

Philippe - do the subsequent comments clear up your concerns? If not, can you explain further? It’d be great to get some follow-up thoughts.

Thanks,
Jilayne

On Aug 7, 2015, at 4:44 PM, Gary O'Neall <gary@...> wrote:

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.

There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).

Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).

There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Kris.re
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for the
markup vs no markup debate. One is: a repository of data that can be
used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a license
name/id.

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from
text, you are not likely going to go about it by scraping the website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be produced
from the marked up text, and you're good to go (for whatever that's
worth, since the markup is, of course, there to cover *variance* in the
reference text. Deciding which version will be the "official" version
is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility of
making mark-up changes something that others can do and then submit
as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on
licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference
to
ISC in text (needs markup); one link broken on SPDX list and one
goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call was
about improving some of the existing markup, adding markup that should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we definitely
want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have now
as what you describe above?
I do understand that having a “clean” license text with no markup could
be desirable (cut and pasting…) but we do have that on the HTML pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted, just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of the
license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting minutes
(or other assumptions that didn’t warrant much discussion, as we all
agreed). will try to bear that in mind for future minutes!

Jilayne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal


Philippe Ombredanne
 

On Fri, Aug 28, 2015 at 10:52 AM, J Lovejoy <opensource@...> wrote:
HI All,
Just going through some old threads.
Philippe - do the subsequent comments clear up your concerns? If not, can you explain further? It’d be great to get some follow-up thoughts.
Thanks,
Jilayne
Jilayne:
The subsequent comments are fine indeed.
I still think that markup for detection and reference texts would be
best handled separately in the long run, yet the volume is still
rather low and Gary and you seem to be ready to handle the
coordination alright.


--
Cordially
Philippe Ombredanne

On Aug 7, 2015, at
4:44 PM, Gary O'Neall <gary@...> wrote:

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.

There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).

Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).

There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Kris.re
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for the
markup vs no markup debate. One is: a repository of data that can be
used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a license
name/id.

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from
text, you are not likely going to go about it by scraping the website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be produced
from the marked up text, and you're good to go (for whatever that's
worth, since the markup is, of course, there to cover *variance* in the
reference text. Deciding which version will be the "official" version
is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility of
making mark-up changes something that others can do and then submit
as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on
licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference
to
ISC in text (needs markup); one link broken on SPDX list and one
goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call was
about improving some of the existing markup, adding markup that should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we definitely
want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have now
as what you describe above?
I do understand that having a “clean” license text with no markup could
be desirable (cut and pasting…) but we do have that on the HTML pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted, just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of the
license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting minutes
(or other assumptions that didn’t warrant much discussion, as we all
agreed). will try to bear that in mind for future minutes!

Jilayne