meeting minutes


J Lovejoy
 


Philippe Ombredanne
 

On Fri, Aug 7, 2015 at 6:39 AM, J Lovejoy <opensource@...> wrote:
For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that the mark-up be
done to facilitate automation vs. human readable. Good goal that tech team will look
to see if it can be prioritized for next year. Gary will also talk with Jilayne about the
possibility of making mark-up changes something that others can do and then
submit as a patch
[...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on licenses and matching rules and joined to discuss some issues
matching guidelines that are programmatically difficult to implement, wanted to be
able to make suggestions
global review or make small improvements
examples: ISC License - now default license for NPM, has reference to ISC in text (needs markup);
one link broken on SPDX list and one goes to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.

Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team.

--
Cordially
Philippe Ombredanne


J Lovejoy
 

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that the mark-up be
done to facilitate automation vs. human readable. Good goal that tech team will look
to see if it can be prioritized for next year. Gary will also talk with Jilayne about the
possibility of making mark-up changes something that others can do and then
submit as a patch
[...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on licenses and matching rules and joined to discuss some issues
matching guidelines that are programmatically difficult to implement, wanted to be
able to make suggestions
global review or make small improvements
examples: ISC License - now default license for NPM, has reference to ISC in text (needs markup);
one link broken on SPDX list and one goes to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in depth discussions a couple years ago. The conversation on the call was about improving some of the existing markup, adding markup that should have been there but isn’t (in at least one case), and adding some markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the call was probably something discussed at the original meetings on how to implement the markup for the matching guidelines at Collab Summit a couple years ago (Daniel German’s ears are ringing!) - As I explained as part of the background: we admittedly took a conservative approach as to what we could markup up on the initial release of this, always easier to add later, than remove. My memory was that the people directly involved various tools (license scanners) wanted more markup, as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a reference or damaging it? Can you explain or provide an example? If we’ve missed a use-case that more markup could frustrate, we definitely want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have now as what you describe above?
I do understand that having a “clean” license text with no markup could be desirable (cut and pasting…) but we do have that on the HTML pages. as the actual markup does not show up there (only in colored text for the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have that. We need accurately marked-up texts, that have been vetted, just as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal team will need to review it to ensure that, for example, text that is proposed to be marked as replaceable does not change the meaning of the license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any recommendations - perhaps I did not capture that in the meeting minutes (or other assumptions that didn’t warrant much discussion, as we all agreed). will try to bear that in mind for future minutes!

Jilayne


Kris.re <Kris.re@...>
 

There are two purposes at odds here and, I suspect, responsible for the markup vs no markup debate. One is: a repository of data that can be used to identify license names/ids from content. The other is: a repository of data that can be used to produce content given a license name/id.

Luckily, the people utilizing either use-case are, I should think, doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text, you are not likely to go about it by cloning the repository, finding the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from text, you are not likely going to go about it by scraping the website, processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the marked up data. Ensure that the original reference text can be produced from the marked up text, and you're good to go (for whatever that's worth, since the markup is, of course, there to cover *variance* in the reference text. Deciding which version will be the "official" version is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable. Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility of
making mark-up changes something that others can do and then submit
as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference to
ISC in text (needs markup); one link broken on SPDX list and one goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in depth discussions a couple years ago. The conversation on the call was about improving some of the existing markup, adding markup that should have been there but isn’t (in at least one case), and adding some markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the call was probably something discussed at the original meetings on how to implement the markup for the matching guidelines at Collab Summit a couple years ago (Daniel German’s ears are ringing!) - As I explained as part of the background: we admittedly took a conservative approach as to what we could markup up on the initial release of this, always easier to add later, than remove. My memory was that the people directly involved various tools (license scanners) wanted more markup, as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a reference or damaging it? Can you explain or provide an example? If we’ve missed a use-case that more markup could frustrate, we definitely want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have now as what you describe above?
I do understand that having a “clean” license text with no markup could be desirable (cut and pasting…) but we do have that on the HTML pages. as the actual markup does not show up there (only in colored text for the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have that. We need accurately marked-up texts, that have been vetted, just as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal team will need to review it to ensure that, for example, text that is proposed to be marked as replaceable does not change the meaning of the license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any recommendations - perhaps I did not capture that in the meeting minutes (or other assumptions that didn’t warrant much discussion, as we all agreed). will try to bear that in mind for future minutes!

Jilayne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal


Gary O'Neall
 

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.

There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).

Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).

There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Kris.re
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for the
markup vs no markup debate. One is: a repository of data that can be
used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a license
name/id.

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from
text, you are not likely going to go about it by scraping the website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be produced
from the marked up text, and you're good to go (for whatever that's
worth, since the markup is, of course, there to cover *variance* in the
reference text. Deciding which version will be the "official" version
is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility of
making mark-up changes something that others can do and then submit
as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on
licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference
to
ISC in text (needs markup); one link broken on SPDX list and one
goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call was
about improving some of the existing markup, adding markup that should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we definitely
want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have now
as what you describe above?
I do understand that having a “clean” license text with no markup could
be desirable (cut and pasting…) but we do have that on the HTML pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted, just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of the
license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting minutes
(or other assumptions that didn’t warrant much discussion, as we all
agreed). will try to bear that in mind for future minutes!

Jilayne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal


Kris.re <Kris.re@...>
 

Two things:

1) The easiest and least complex way to retain the original text is to retain the original text explicitly.
2) The existence of a single tool written in a single language is much less useful than the existence of a well-known format with tools in pretty much every language. This is the biggest advantage of moving to XML (or some other standard format) - it doesn't force developers to insert Java or command line tool wrappers into whatever they are doing, allowing them to use native tools in their native language and not be tempted to either reimplement a one-off case-specific spec that is unique(?) to SPDX. I believe XML can serve the purpose you're currently the SPDX markup for in the same way you're using it, but I'm not convinced that reconstruction is the way to go, particularly as and if the markup gets more complex. They are, however, two separate discussions (changing to XML versus changing the approach used to model the data).

As an aside, I don't think JSON is at all a useful data format for what you're trying to do here, and in general I much prefer JSON over XML.

Kris

-----Original Message-----
From: Gary O'Neall [mailto:gary@...]
Sent: Friday, August 07, 2015 18:45
To: Kris.re <Kris.re@...>; 'J Lovejoy' <opensource@...>; 'Philippe Ombredanne' <pombredanne@...>
Cc: 'SPDX-legal' <spdx-legal@...>
Subject: RE: meeting minutes

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains the original text. The optional tags surround the original text and the replaceable text has an original property which retains the original text.

There is actually an opensource tool (source located at https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java) which generates the web pages for spdx.org/licenses precisely in the manner Kris proposes below (barring an bugs which is always possible ;).

Note that if you go to the website, there are RDFa tags which will allow you to pull either the original text or the template from the website directly for those who would rather parse RDFa than download the raw templates (either approach is OK, of course). These tags are described in the pdf document Accessing SPDX Licenses (https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-2%20v1%200.pdf).

There is also quite a bit of interest to make the same facility available using JSON - but I should probably take that discussion over to the tech mailing list...

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Kris.re
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for
the markup vs no markup debate. One is: a repository of data that can
be used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a license
name/id.

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license from
text, you are not likely going to go about it by scraping the website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be
produced from the marked up text, and you're good to go (for whatever
that's worth, since the markup is, of course, there to cover
*variance* in the reference text. Deciding which version will be the
"official" version is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility
of making mark-up changes something that others can do and then
submit as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on
licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement, wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference
to
ISC in text (needs markup); one link broken on SPDX list and one
goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call was
about improving some of the existing markup, adding markup that should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we
definitely want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have
now as what you describe above?
I do understand that having a “clean” license text with no markup
could be desirable (cut and pasting…) but we do have that on the HTML pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted, just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of
the license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting
minutes (or other assumptions that didn’t warrant much discussion, as
we all agreed). will try to bear that in mind for future minutes!

Jilayne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal


Gary O'Neall
 

Hi Kris,

Both good points. Responses inline below.

-----Original Message-----
From: Kris.re [mailto:Kris.re@...]
Sent: Monday, August 31, 2015 7:17 AM
To: Gary O'Neall; 'J Lovejoy'; 'Philippe Ombredanne'
Cc: 'SPDX-legal'
Subject: RE: meeting minutes

Two things:

1) The easiest and least complex way to retain the original text is to
retain the original text explicitly.
[Gary] we did discuss and consider this but decided to go with the current approach for two reasons:
a) Easier to maintain a single text file which generates the various formats
b) For some of the licenses, you may find different variants on what the original text actually is on different reference websites - I don't remember the details on this, but I remember the point being brought up in the original discussion.
2) The existence of a single tool written in a single language is much
less useful than the existence of a well-known format with tools in
pretty much every language. This is the biggest advantage of moving to
XML (or some other standard format) - it doesn't force developers to
insert Java or command line tool wrappers into whatever they are doing,
allowing them to use native tools in their native language and not be
tempted to either reimplement a one-off case-specific spec that is
unique(?) to SPDX. I believe XML can serve the purpose you're currently
[Gary] I'm not suggesting that there be a single tool, in fact supporting multiple formats on the website is a way to support multiple tools. The tool is just used to generate the website - this could be considered an "internal tool" to the website creation. Since the RDFa, JSON, and text template formats are all well documented, anyone can create tools in any language to consume it. In my document describing RDFa, I point to different tools in different languages to help in this regard.
the SPDX markup for in the same way you're using it, but I'm not
convinced that reconstruction is the way to go, particularly as and if
the markup gets more complex. They are, however, two separate
discussions (changing to XML versus changing the approach used to model
the data).
[Gary] Changing the input format to XML, is much more involved than adding an additional output format since we would need to change the tools that produce the website, change our internal processes and anyone maintaining the license list would need to edit the XML which is more involved than editing a text file.
As an aside, I don't think JSON is at all a useful data format for what
you're trying to do here, and in general I much prefer JSON over XML.
[Gary] What do you think about adding an additional XML format to the output and post on the website (something like http://spdx.org/licenses/licenselist.xml)? The tool that builds the website could output this as an additional format. It doesn't solve issue #1 above, but it would allow for more tools to adopt in an easier and more precise fashion. One could also argue that the RDFa should be sufficient, but I do realize that not too may tools are using RDFa and the libraries in different languages are more limited. I also agree with the limitations on JSON - that being said there is enough interest in JSON I don't see any reason not to add that to the list of output formats supported on the website (especially if others are contributing the code to make this happen :)
Kris

-----Original Message-----
From: Gary O'Neall [mailto:gary@...]
Sent: Friday, August 07, 2015 18:45
To: Kris.re <Kris.re@...>; 'J Lovejoy'
<opensource@...>; 'Philippe Ombredanne' <pombredanne@...>
Cc: 'SPDX-legal' <spdx-legal@...>
Subject: RE: meeting minutes

Hi Kris, Philippe and all,

The markup language for the templates was crafted in a way that retains
the original text. The optional tags surround the original text and
the replaceable text has an original property which retains the
original text.

There is actually an opensource tool (source located at
https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDF
AGenerator.java) which generates the web pages for spdx.org/licenses
precisely in the manner Kris proposes below (barring an bugs which is
always possible ;).

Note that if you go to the website, there are RDFa tags which will
allow you to pull either the original text or the template from the
website directly for those who would rather parse RDFa than download
the raw templates (either approach is OK, of course). These tags are
described in the pdf document Accessing SPDX Licenses
(https://spdx.org/sites/spdx/files/publications/SPDX-TR-2014-
2%20v1%200.pdf).

There is also quite a bit of interest to make the same facility
available using JSON - but I should probably take that discussion over
to the tech mailing list...

Gary

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of Kris.re
Sent: Friday, August 7, 2015 11:42 AM
To: J Lovejoy; Philippe Ombredanne
Cc: SPDX-legal
Subject: RE: meeting minutes

There are two purposes at odds here and, I suspect, responsible for
the markup vs no markup debate. One is: a repository of data that can
be used to identify license names/ids from content. The other is: a
repository of data that can be used to produce content given a
license
name/id.

Luckily, the people utilizing either use-case are, I should think,
doing it with different interfaces.

If you want to use SPDX data to look up the BSD 2 Clause license
text,
you are not likely to go about it by cloning the repository, finding
the file, and reading the raw content.

Similarly, if you want to use the SPDX data to identify a license
from
text, you are not likely going to go about it by scraping the
website,
processing the html, and doing a bunch of extra work.

The simple solution that presents itself is this:

Mark up all the data as much as needed. Generate the website from the
marked up data. Ensure that the original reference text can be
produced from the marked up text, and you're good to go (for whatever
that's worth, since the markup is, of course, there to cover
*variance* in the reference text. Deciding which version will be the
"official" version is beyond this discussion...)

This solves everyone's needs.

Kris

-----Original Message-----
From: spdx-legal-bounces@... [mailto:spdx-legal-
bounces@...] On Behalf Of J Lovejoy
Sent: Friday, August 07, 2015 11:29
To: Philippe Ombredanne <pombredanne@...>
Cc: SPDX-legal <spdx-legal@...>
Subject: Re: meeting minutes

Hi Philippe,

Comments below:


For the last two calls:

http://wiki.spdx.org/view/Legal_Team/Minutes/2015-07-23
[...]
3) Mark-up bug raised on tech team call- bug filed requesting that
the mark-up be done to facilitate automation vs. human readable.
Good
goal that tech team will look to see if it can be prioritized for
next year. Gary will also talk with Jilayne about the possibility
of making mark-up changes something that others can do and then
submit as a patch [...]
http://wiki.spdx.org/view/Legal_Team/Minutes/2015-08-06
[...]
2) Kris had raised request via tech list regarding markup on
licenses
and matching rules and joined to discuss some issues matching
guidelines that are programmatically difficult to implement,
wanted
to be able to make suggestions global review or make small
improvements
examples: ISC License - now default license for NPM, has reference
to
ISC in text (needs markup); one link broken on SPDX list and one
goes
to link with slightly different text (generic v. specific to ISC)
[...]
Adding matching markup inside the reference license texts will
eventually lead to un-resolvable conflicts:
There is already markup in about 15% of the licenses, as per our in
depth discussions a couple years ago. The conversation on the call
was
about improving some of the existing markup, adding markup that
should
have been there but isn’t (in at least one case), and adding some
markup for other matching guidelines.
I think everything that was recommended by Kris and discussed on the
call was probably something discussed at the original meetings on how
to implement the markup for the matching guidelines at Collab Summit
a
couple years ago (Daniel German’s ears are ringing!) - As I explained
as part of the background: we admittedly took a conservative approach
as to what we could markup up on the initial release of this, always
easier to add later, than remove. My memory was that the people
directly involved various tools (license scanners) wanted more
markup,
as that eliminates having to make a determination.

- markup will make a license text no longer a reference
- it will make it less readable or unusable as such
- it may damage or transform a reference text in unwanted ways
I’m not sure what you mean here by the license text no longer being a
reference or damaging it? Can you explain or provide an example? If
we’ve missed a use-case that more markup could frustrate, we
definitely want to discuss that.

It would be simpler to separate the two cleanly:

1. Reference license texts, not modified for matching.
They may contain lightweight markup for the purpose of clarity, not
for matching.
We already have licenses with markup - do you consider what we have
now as what you describe above?
I do understand that having a “clean” license text with no markup
could be desirable (cut and pasting…) but we do have that on the HTML
pages.
as the actual markup does not show up there (only in colored text for
the visual).

2. Arbitrarily marked-up texts for matching modified as needed.
They may contain heavy markup to the detriment of clarity.
I don’t think we need arbitrarily marked-up texts, nor would we have
that. We need accurately marked-up texts, that have been vetted,
just
as we did for the markup that we currently have - this won’t change.

Each can then have their own contribution and review paths: texts
with
the legal team, markup with the tech team.
That is not tenable - if someone suggests additional markup, the
legal
team will need to review it to ensure that, for example, text that is
proposed to be marked as replaceable does not change the meaning of
the license - a cross-team approach (tech and legal) is needed.

A given was that there would be a review process for any
recommendations - perhaps I did not capture that in the meeting
minutes (or other assumptions that didn’t warrant much discussion, as
we all agreed). will try to bear that in mind for future minutes!

Jilayne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal
_______________________________________________
Spdx-legal mailing list
Spdx-legal@...
https://lists.spdx.org/mailman/listinfo/spdx-legal