Licensing data for object files


Peter Williams <peter.williams@...>
 

Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like
to get package and license information for that library in spdx
format. Or the situation where a developer downloads a binary
distribution of an open source project and would like get package and
license information for that library in spdx format for the
corporate/project governance processes.

Clearly there are not going to be copyright notices/license
declarations in most object files. However, object files have
significant licensing implications for their users. (For example, if
the source from which it is was built was gpl -- either intentionally,
or because it included some other gpl code.)

Should the spdx file for a "binary" package show the license
information of an object file as the union of the license information
for all the source file used to create the object file? And do we
expect that the package license information for a "binary" package
would be the same as the license information for the "source" package
from which the binary is built?

Peter
openlogic.com


Kate Stewart <kate.stewart@...>
 

On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:
Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like
to get package and license information for that library in spdx
format. Or the situation where a developer downloads a binary
distribution of an open source project and would like get package and
license information for that library in spdx format for the
corporate/project governance processes.

Clearly there are not going to be copyright notices/license
declarations in most object files. However, object files have
significant licensing implications for their users. (For example, if
the source from which it is was built was gpl -- either intentionally,
or because it included some other gpl code.)
True. There are some cases though where a scan of the binary reveals
the copyright/license in the comments (made so explicitly by the build
system) "od filename | grep" is your friend here.

Should the spdx file for a "binary" package show the license
information of an object file as the union of the license information
for all the source file used to create the object file?
Not necessarily - since sometimes the binary is build with the selection
of a specific license in mind.

Since a binary is one file, the use of the inferred license field we've
been talking about might be appropriate here as well.

And do we
expect that the package license information for a "binary" package
would be the same as the license information for the "source" package
from which the binary is built?
Likely to be the common case I would guess, but since source may be "or"
license based, and someone selects one license explicitly - if we have a
way to discover it, we should reflect it. Example I'm thinking of is
dual licensed library GPL and BSD. If statically bound, quite a
difference whether the library has been build with intention to be BSD
instead of GPL.


Peter
openlogic.com
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

I am starting to the that the interaction of licenses is too intricate
for use to support using multiple fields. It seems like you could
easily end up with conjunctions and disjunctions between the licenses
in the detect, declared and inferred licensing information.

Perhaps we should collapse those three fields in to a single field.
Doing so would allow the complete licensing information for a
file/package to be represented in a clear, precise and explicit way.

Is distinguishing the mechanism by which a license was discovered
required for spdx version 1? If so we could provide a way to describe
how licenses where discovered that is separate from the complete
license information. If not we could save ourselves the work at delay
that feature until version 2.

Peter
openlogic.com

On Thu, Jan 6, 2011 at 10:06 AM, Kate Stewart
<kate.stewart@...> wrote:
On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:
Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like
to get package and license information for that library in spdx
format.  Or the situation where a developer downloads a binary
distribution of an open source project and would like get package and
license information for that library in spdx format for the
corporate/project governance processes.

Clearly there are not going to be copyright notices/license
declarations in most object files.  However, object files have
significant licensing implications for their users.  (For example, if
the source from which it is was built was gpl -- either intentionally,
or because it included some other gpl code.)
True.   There are some cases though where a scan of the binary reveals
the copyright/license in the comments (made so explicitly by the build
system) "od filename | grep" is your friend here.

Should the spdx file for a "binary" package show the license
information of an object file as the union of the license information
for all the source file used to create the object file?
Not necessarily - since sometimes the binary is build with the selection
of a specific license in mind.

Since a binary is one file, the use of the inferred license field we've
been talking about might be appropriate here as well.

And do we
expect that the package license information for a "binary" package
would be the same as the license information for the "source" package
from which the binary is built?
Likely to be the common case I would guess, but since source may be "or"
license based, and someone selects one license explicitly - if we have a
way to discover it, we should reflect it.   Example I'm thinking of is
dual licensed library GPL and BSD.  If statically bound, quite a
difference whether the library has been build with intention to be BSD
instead of GPL.


Peter
openlogic.com
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Gary O'Neall
 

I think this is a rather common case and a good one to discuss.

Here's my thoughts:

Scenario A: If the library is provided independently, then I would expect
the SDPX file to contain only one file (the file for the library) with the
declared license for that library.

The declared license would be the license stated by originator of the
library (whoever built the library). If there is no stated license, then
the inferred license would be appropriate.

One issue with this approach is how do we communicate where the information
for the declared license came from? If it is embedded in the binary file,
it would be straightforward. If it came from a website URL, this may be
harder to independently verify.

Scenario B: If the library is included in a larger package described by an
SPDX file, then this library would be an embedded package. I would think
the treatment of this file would be the same as an embedded archive file of
the source. There would be one file and the license would be the stated or
inferred license of that open source package.

For both scenarios, it would be nice if a reference could be made to an SPDX
file for the origin open source package for the library or archive file.

In terms of the linkage implications, for Scenario A I think the SPDX
description should only state the license of the file itself since the
actual linkage and usage of the library would not be known. For Scenario B
any implications of the linkage should be considered for the overall package
license if that package links in any copyleft code.

I would consider the description of any analysis of the binary file to be
outside the scope of the SPDX standard. If there was an analysis of the
source code which produced the binary, that source code analysis could be
represented as its own SPDX file which would include references to the
source files used to build the binaries.

Just a note - I've been working on some binary analysis tools (decompilers
and such). If we do want to extend SPDX to include analysis results of
binaries, we would want to add additional details such as discovered class
and package names for java. A good topic for after 1.0.

Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Kate Stewart
Sent: Thursday, January 06, 2011 9:07 AM
To: Peter Williams
Cc: spdx-tech@...
Subject: Re: Licensing data for object files

On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:
Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like
to get package and license information for that library in spdx
format. Or the situation where a developer downloads a binary
distribution of an open source project and would like get package and
license information for that library in spdx format for the
corporate/project governance processes.

Clearly there are not going to be copyright notices/license
declarations in most object files. However, object files have
significant licensing implications for their users. (For example, if
the source from which it is was built was gpl -- either intentionally,
or because it included some other gpl code.)
True. There are some cases though where a scan of the binary reveals
the copyright/license in the comments (made so explicitly by the build
system) "od filename | grep" is your friend here.

Should the spdx file for a "binary" package show the license
information of an object file as the union of the license information
for all the source file used to create the object file?
Not necessarily - since sometimes the binary is build with the selection
of a specific license in mind.

Since a binary is one file, the use of the inferred license field we've
been talking about might be appropriate here as well.

And do we
expect that the package license information for a "binary" package
would be the same as the license information for the "source" package
from which the binary is built?
Likely to be the common case I would guess, but since source may be "or"
license based, and someone selects one license explicitly - if we have a
way to discover it, we should reflect it. Example I'm thinking of is
dual licensed library GPL and BSD. If statically bound, quite a
difference whether the library has been build with intention to be BSD
instead of GPL.


Peter
openlogic.com
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech

_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Gary O'Neall
 

I completely agree that trying to interpret and represent the interaction
of licenses should be outside the scope of SPDX 1.0. There are two
reasons for this:
1. It can be rather complex to represent the complete analysis since the
interaction may depend on the usage of a specific component and we have
already decided (early on) not to include usage information in SPDX 1.0
2. The license interaction analysis may require legal interpretation of
some of the license clauses

It still may make sense, however, to have a way of representing when there
is an explicit license declaration in a file. We could go back to only
capturing explicit licenses and not including any inferred licenses, but
we would loose some analysis information which would be very nice to
convey.

BTW - when I think about the detected and inferred licenses, I am not
thinking as much about license interactions as much as capturing a clear
intent by the author to license a file under a particular license. For
example, if an author (or copyright owner) has a file in a directory
stating all files in this directory are under a particular license and the
files within that directory have the same copyright and no conflicting
information.


Gary

I am starting to the that the interaction of licenses is too intricate
for use to support using multiple fields. It seems like you could
easily end up with conjunctions and disjunctions between the licenses
in the detect, declared and inferred licensing information.

Perhaps we should collapse those three fields in to a single field.
Doing so would allow the complete licensing information for a
file/package to be represented in a clear, precise and explicit way.

Is distinguishing the mechanism by which a license was discovered
required for spdx version 1? If so we could provide a way to describe
how licenses where discovered that is separate from the complete
license information. If not we could save ourselves the work at delay
that feature until version 2.

Peter
openlogic.com

On Thu, Jan 6, 2011 at 10:06 AM, Kate Stewart
<kate.stewart@...> wrote:
On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:
Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like
to get package and license information for that library in spdx
format.  Or the situation where a developer downloads a binary
distribution of an open source project and would like get package and
license information for that library in spdx format for the
corporate/project governance processes.

Clearly there are not going to be copyright notices/license
declarations in most object files.  However, object files have
significant licensing implications for their users.  (For example, if
the source from which it is was built was gpl -- either intentionally,
or because it included some other gpl code.)
True.   There are some cases though where a scan of the binary reveals
the copyright/license in the comments (made so explicitly by the build
system) "od filename | grep" is your friend here.

Should the spdx file for a "binary" package show the license
information of an object file as the union of the license information
for all the source file used to create the object file?
Not necessarily - since sometimes the binary is build with the selection
of a specific license in mind.

Since a binary is one file, the use of the inferred license field we've
been talking about might be appropriate here as well.

And do we
expect that the package license information for a "binary" package
would be the same as the license information for the "source" package
from which the binary is built?
Likely to be the common case I would guess, but since source may be "or"
license based, and someone selects one license explicitly - if we have a
way to discover it, we should reflect it.   Example I'm thinking of is
dual licensed library GPL and BSD.  If statically bound, quite a
difference whether the library has been build with intention to be BSD
instead of GPL.


Peter
openlogic.com
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech

_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

On Thu, Jan 6, 2011 at 3:20 PM, <gary@...> wrote:
I completely agree that trying to interpret and represent the interaction
of licenses should be outside the scope of SPDX 1.0.  There are two
reasons for this:
1. It can be rather complex to represent the complete analysis since the
interaction may depend on the usage of a specific component and we have
already decided (early on) not to include usage information in SPDX 1.0
2. The license interaction analysis may require legal interpretation of
some of the license clauses
Perhaps we are thinking of different kinds of "interaction? I meant
"interaction" along the lines of "the package is declared as licensed
under a or b but contains files licensed under c so the actual package
licensing is a and c or b and c". Interaction is probably not a good
word for this, but i don't have a better one. I this this should be
handled by spdx.

Interaction in the sense of "license a is incompatible with license b"
is definitely outside the scope of spdx for exactly the reasons you
point out.

It still may make sense, however, to have a way of representing when there
is an explicit license declaration in a file.  We could go back to only
capturing explicit licenses and not including any inferred licenses, but
we would loose some analysis information which would be very nice to
convey.
I think excluding non-explicitly declared licenses from spdx would
remove almost all of the value of this effort.

BTW - when I think about the detected and inferred licenses, I am not
thinking as much about license interactions as much as capturing a clear
intent by the author to license a file under a particular license.  For
example, if an author (or copyright owner) has a file in a directory
stating all files in this directory are under a particular license and the
files within that directory have the same copyright and no conflicting
information.
I am trying to work out the use case for segregating the declared,
detected and inferred licenses. A consumer of spdx seems mostly like
to desire a complete understanding of the choice -- or lack thereof --
of licenses for a package. If that is a case perhaps we can simplify
the model a bit by having a single 'licensing' property.

Peter


Gary O'Neall
 

We are probably thinking a little differently on interaction. The use cases
I have in mind have to do with representing technical analysis on the
interaction between components that affect the license terms.

For the interaction you describe, I believe describing the licensing at the
package level based on the licenses present in the files will require a more
complex interaction analysis. The obligations of the license at the package
level will be affected by the usage of that file within the package. For
example, if a file contains an LGPL license it will cause the package to be
licensed under LGPL if the LGPL file is not a distinct library within the
package.

One requirement I think we should have on SPDX is any logic or judgment used
by the auditor creating the SPDX document should be called out as a judgment
(distinct from a fact). This would allow for independent verification of
such judgments. If we were to apply this requirement to the licensing at
package level (acknowledging that we have not discussed or agreed to such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information since
the interaction analysis requires judgment.

Let me know if this make sense and if you agree with the requirement above.

Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Peter Williams
Sent: Thursday, January 06, 2011 8:13 PM
To: spdx-tech@...
Subject: Re: Licensing data for object files

On Thu, Jan 6, 2011 at 3:20 PM, <gary@...> wrote:
I completely agree that trying to interpret and represent the interaction
of licenses should be outside the scope of SPDX 1.0.  There are two
reasons for this:
1. It can be rather complex to represent the complete analysis since the
interaction may depend on the usage of a specific component and we have
already decided (early on) not to include usage information in SPDX 1.0
2. The license interaction analysis may require legal interpretation of
some of the license clauses
Perhaps we are thinking of different kinds of "interaction? I meant
"interaction" along the lines of "the package is declared as licensed
under a or b but contains files licensed under c so the actual package
licensing is a and c or b and c". Interaction is probably not a good
word for this, but i don't have a better one. I this this should be
handled by spdx.

Interaction in the sense of "license a is incompatible with license b"
is definitely outside the scope of spdx for exactly the reasons you
point out.

It still may make sense, however, to have a way of representing when there
is an explicit license declaration in a file.  We could go back to only
capturing explicit licenses and not including any inferred licenses, but
we would loose some analysis information which would be very nice to
convey.
I think excluding non-explicitly declared licenses from spdx would
remove almost all of the value of this effort.

BTW - when I think about the detected and inferred licenses, I am not
thinking as much about license interactions as much as capturing a clear
intent by the author to license a file under a particular license.  For
example, if an author (or copyright owner) has a file in a directory
stating all files in this directory are under a particular license and the
files within that directory have the same copyright and no conflicting
information.
I am trying to work out the use case for segregating the declared,
detected and inferred licenses. A consumer of spdx seems mostly like
to desire a complete understanding of the choice -- or lack thereof --
of licenses for a package. If that is a case perhaps we can simplify
the model a bit by having a single 'licensing' property.

Peter
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...> wrote:
We are probably thinking a little differently on interaction.  The use cases
I have in mind have to do with representing technical analysis on the
interaction between components that affect the license terms.
I usually think of spdx as a way to express the conclusions of a
technical analysis. Actually representing the analysis itself sounds
hard. :)

For the interaction you describe, I believe describing the licensing at the
package level based on the licenses present in the files will require a more
complex interaction analysis. The obligations of the license at the package
level will be affected by the usage of that file within the package.  For
example, if a file contains an LGPL license it will cause the package to be
licensed under LGPL if the LGPL file is not a distinct library within the
package.
The analysis will be complex. The conclusion will much simplier.
Either the lgpl code is a distinct library used by the package or not.

One requirement I think we should have on SPDX is any logic or judgment used
by the auditor creating the SPDX document should be called out as a judgment
(distinct from a fact).  This would allow for independent verification of
such judgments.  If we were to apply this requirement to the licensing at
package level (acknowledging that we have not discussed or agreed to such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information since
the interaction analysis requires judgment.
I don't think non-facts have a place in spdx files. However, facts
can be established though the use of logic. If a file contains 100
contiguous lines that are identical to 100 contiguous lines in the
linux kernel and those lines where committed to the kernel before they
where committed to project of the package being analyzed then it is a
*fact* that the file must be licensed under gpl2. This is true
regardless of the licensing statements of the file, directory, or
project. It is important that spdx be able to represent this sort of
fact. It is also important that spdx be able to represent all the
licenses provided by the package regardless of how the fact of those
licenses was determined.

It would be nice to be able to represent the evidence for each license
in an spdx file. A model to record such evidence seems like it would
be non-trivial to develop. It also feels like an evidentiary model
could be implemented as an extension to a conclusions only spdx.
(Either as an separate standard or in a future version of spdx.) I'd
rather an evidentiary model be out of scope for v1 because spdx can
provide a lot of value without it and i think it will be straight
forward to add it later.

Peter


Kate Stewart <kate.stewart@...>
 

On Fri, 2011-01-07 at 13:35 -0700, Peter Williams wrote:
On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...> wrote:
We are probably thinking a little differently on interaction. The use cases
I have in mind have to do with representing technical analysis on the
interaction between components that affect the license terms.
I usually think of spdx as a way to express the conclusions of a
technical analysis. Actually representing the analysis itself sounds
hard. :)
esp. for version 1. :)



One requirement I think we should have on SPDX is any logic or judgment used
by the auditor creating the SPDX document should be called out as a judgment
(distinct from a fact). This would allow for independent verification of
such judgments. If we were to apply this requirement to the licensing at
package level (acknowledging that we have not discussed or agreed to such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information since
the interaction analysis requires judgment.
I don't think non-facts have a place in spdx files. However, facts
can be established though the use of logic. If a file contains 100
contiguous lines that are identical to 100 contiguous lines in the
linux kernel and those lines where committed to the kernel before they
where committed to project of the package being analyzed then it is a
*fact* that the file must be licensed under gpl2. This is true
regardless of the licensing statements of the file, directory, or
project. It is important that spdx be able to represent this sort of
fact. It is also important that spdx be able to represent all the
licenses provided by the package regardless of how the fact of those
licenses was determined.
Inferred field proposal at file level should cover this I think.
Question in my mind is how many "reasons" for the inference do we want
to accept in version 1? It takes a sophisticated tool and knowledge
base ;) to generally detect code in other place matches code in
specific package. Not everyone has this capability. However if the
capability is there - marking it as "inferred" license - base on code
matching seems a good way of capturing the fact that there is evidence
that another license is in play, other than what may be declared.


It would be nice to be able to represent the evidence for each license
in an spdx file. A model to record such evidence seems like it would
be non-trivial to develop. It also feels like an evidentiary model
could be implemented as an extension to a conclusions only spdx.
(Either as an separate standard or in a future version of spdx.) I'd
rather an evidentiary model be out of scope for v1 because spdx can
provide a lot of value without it and i think it will be straight
forward to add it later.
Agree. I think for version 1, a set of known reasons on the inferred
license field is probably the most we should aim for, for right now.

Kate


Gary O'Neall
 

Agree with both of your comments in general - we don't want to try to build
a model to include all of the logic. I do, however, think it is important
to capture where assumptions or judgment is being applied (even if we don't
capture the logic behind the judgment). Having the inferred license type
would be sufficient for this.

Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Kate Stewart
Sent: Friday, January 07, 2011 1:18 PM
To: Peter Williams
Cc: spdx-tech@...
Subject: Re: Licensing data for object files

On Fri, 2011-01-07 at 13:35 -0700, Peter Williams wrote:
On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...>
wrote:
We are probably thinking a little differently on interaction. The use
cases
I have in mind have to do with representing technical analysis on the
interaction between components that affect the license terms.
I usually think of spdx as a way to express the conclusions of a
technical analysis. Actually representing the analysis itself sounds
hard. :)
esp. for version 1. :)



One requirement I think we should have on SPDX is any logic or judgment
used
by the auditor creating the SPDX document should be called out as a
judgment
(distinct from a fact). This would allow for independent verification
of
such judgments. If we were to apply this requirement to the licensing
at
package level (acknowledging that we have not discussed or agreed to
such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information
since
the interaction analysis requires judgment.
I don't think non-facts have a place in spdx files. However, facts
can be established though the use of logic. If a file contains 100
contiguous lines that are identical to 100 contiguous lines in the
linux kernel and those lines where committed to the kernel before they
where committed to project of the package being analyzed then it is a
*fact* that the file must be licensed under gpl2. This is true
regardless of the licensing statements of the file, directory, or
project. It is important that spdx be able to represent this sort of
fact. It is also important that spdx be able to represent all the
licenses provided by the package regardless of how the fact of those
licenses was determined.
Inferred field proposal at file level should cover this I think.
Question in my mind is how many "reasons" for the inference do we want
to accept in version 1? It takes a sophisticated tool and knowledge
base ;) to generally detect code in other place matches code in
specific package. Not everyone has this capability. However if the
capability is there - marking it as "inferred" license - base on code
matching seems a good way of capturing the fact that there is evidence
that another license is in play, other than what may be declared.


It would be nice to be able to represent the evidence for each license
in an spdx file. A model to record such evidence seems like it would
be non-trivial to develop. It also feels like an evidentiary model
could be implemented as an extension to a conclusions only spdx.
(Either as an separate standard or in a future version of spdx.) I'd
rather an evidentiary model be out of scope for v1 because spdx can
provide a lot of value without it and i think it will be straight
forward to add it later.
Agree. I think for version 1, a set of known reasons on the inferred
license field is probably the most we should aim for, for right now.

Kate

_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

Can you give some examples of the "assumptions or judgements" you have in mind?

On Fri, Jan 7, 2011 at 2:25 PM, Gary O'Neall <gary@...> wrote:
Agree with both of your comments in general - we don't want to try to build
a model to include all of the logic.  I do, however, think it is important
to capture where assumptions or judgment is being applied (even if we don't
capture the logic behind the judgment).  Having the inferred license type
would be sufficient for this.

Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Kate Stewart
Sent: Friday, January 07, 2011 1:18 PM
To: Peter Williams
Cc: spdx-tech@...
Subject: Re: Licensing data for object files

On Fri, 2011-01-07 at 13:35 -0700, Peter Williams wrote:
On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...>
wrote:
We are probably thinking a little differently on interaction.  The use
cases
I have in mind have to do with representing technical analysis on the
interaction between components that affect the license terms.
I usually think of spdx as a way to express the conclusions of a
technical analysis.  Actually representing the analysis itself sounds
hard. :)
esp. for version 1.  :)



One requirement I think we should have on SPDX is any logic or judgment
used
by the auditor creating the SPDX document should be called out as a
judgment
(distinct from a fact).  This would allow for independent verification
of
such judgments.  If we were to apply this requirement to the licensing
at
package level (acknowledging that we have not discussed or agreed to
such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information
since
the interaction analysis requires judgment.
I don't think non-facts have a place in spdx files.  However, facts
can be established though the use of logic.  If a file contains 100
contiguous lines that are identical to 100 contiguous lines in the
linux kernel and those lines where committed to the kernel before they
where committed to project of the package being analyzed then it is a
*fact* that the file must be licensed under gpl2.  This is true
regardless of the licensing statements of the file, directory, or
project.  It is important that spdx be able to represent this sort of
fact.  It is also important that spdx be able to represent all the
licenses provided by the package regardless of how the fact of those
licenses was determined.
Inferred field proposal at file level should cover this I think.
Question in my mind is how many "reasons" for the inference do we want
to accept in version 1?   It takes a sophisticated tool and knowledge
base ;)  to generally detect code in other place matches code in
specific package.   Not everyone has this capability.  However if the
capability is there - marking it as "inferred" license - base on code
matching seems a good way of capturing the fact that there is evidence
that another license is in play, other than what may be declared.


It would be nice to be able to represent the evidence for each license
in an spdx file.  A model to record such evidence seems like it would
be non-trivial to develop.  It also feels like an evidentiary model
could be implemented as an extension to a conclusions only spdx.
(Either as an separate standard or in a future version of spdx.)  I'd
rather an evidentiary model be out of scope for v1 because spdx can
provide a lot of value without it and i think it will be straight
forward to add it later.
Agree.   I think for version 1, a set of known reasons on the inferred
license field is probably the most we should aim for, for right now.

Kate

_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech



Peter Williams <peter.williams@...>
 

On Fri, Jan 7, 2011 at 2:18 PM, Kate Stewart <kate.stewart@...> wrote:
One requirement I think we should have on SPDX is any logic or judgment used
by the auditor creating the SPDX document should be called out as a judgment
(distinct from a fact).  This would allow for independent verification of
such judgments.  If we were to apply this requirement to the licensing at
package level (acknowledging that we have not discussed or agreed to such a
requirement), I think we would need to either include the "inferred"
information or not include the interaction based licensing information since
the interaction analysis requires judgment.
I don't think non-facts have a place in spdx files.  However, facts
can be established though the use of logic.  If a file contains 100
contiguous lines that are identical to 100 contiguous lines in the
linux kernel and those lines where committed to the kernel before they
where committed to project of the package being analyzed then it is a
*fact* that the file must be licensed under gpl2.  This is true
regardless of the licensing statements of the file, directory, or
project.  It is important that spdx be able to represent this sort of
fact.  It is also important that spdx be able to represent all the
licenses provided by the package regardless of how the fact of those
licenses was determined.
Inferred field proposal at file level should cover this I think.
Question in my mind is how many "reasons" for the inference do we want
to accept in version 1?   It takes a sophisticated tool and knowledge
base ;)  to generally detect code in other place matches code in
specific package.   Not everyone has this capability.  However if the
capability is there - marking it as "inferred" license - base on code
matching seems a good way of capturing the fact that there is evidence
that another license is in play, other than what may be declared.
I vote for exactly zero reasons. A partial model is probably worse
that no model at all. Any model that is based on static list will be
obsolete from the start and will have to be replaced later, probably
by an largely incompatible model.

To really support this we would need a highly extensible model to
describe the techniques used. The Openlogic and Blackduck have
developed a some interchange formats that might be somewhat applicable
to this problem space. However, they are not directly applicable and
it turns out to sort of complicated. To make matters worse code
forensics techniques tend to be treated a proprietary information so
getting vendors to actually populated such fields would be a struggle
in practice.

Peter


Peter Williams <peter.williams@...>
 

On Fri, Jan 7, 2011 at 6:18 PM, <gary@...> wrote:
Can you give some examples of the "assumptions or judgements" you have in
mind?
In the context of the license, I can think of 2 examples:

File containing copyleft code snippet:
  - Fact1: File A contains several lines of code
  - Fact2: A subset of the code in file A matches code in file B
  - Fact3: File A does not contain any license statement
  - Fact4: File B contains an explicit LGPL license statement
  - Fact5: File B does not exist in the package described by the SPDEX
document.
In this case, I would not call it a fact that file A is under the LGPL
license, but rather a judgement call by the person who identified those
lines as being copied.  Note - this example assumes that the author of
File A is not the person making the judgement.
Perhaps we have a primarily a terminology problem. The above is
certainly a judgment in the sense that it is reaching a conclusion
based on some evidence. However it is not an assumption, guess or
legal opinion.

I think we have to reach a consensus on what goal is. If spdx is
primarily a way to report the license declarations in a package (and
it files) then we can get away from judgments. Judgments are going to
be a way of life if the goal is to communicate the licenses under
which a package may be used to the best knowledge of the spdx file
producer. I think the latter approach is what spdx should be
targeting.

Consider the following:
- fact1: file A has header that indicates it is licensed under mit
- fact2: there is no evidence than any part of this file is licensed
under any other license

Representing the actual licenses of a package containing just file A
to the best of our knowledge requires some judgment. We will look at
the preceding facts and reach the conclusion that to the best of our
knowledge the file and package are licensed under the mit license and
no other license. A pretty obvious conclusion to reach but still a
judgment. It is a judgment because it could be wrong, perhaps file A
is also covered by another license. Absence of evidence is not, in
itself, evidence of absence. Parts of file A could have been copied
from a GPL project and we may just not have noticed.

I think the conclusion in Gary's first example is a pretty obvious one
too. It requires more evidence to see the obviousness, but once you
have all the facts there is little doubt about the correct conclusion.

I think we should develop an evidentiary model for version 2. It
should provide direct, out of the box, support for basic types of
evidence and be extensible so that it can support all types of
evidence. Until we have created such a model i think we should
restrict ourselves to representing just the conclusions of judgments
made by the spdx producer. Representing the conclusions provides a
immense amount of value even without including the evidence.

Peter


Gary O'Neall
 

Perhaps this would be a good topic for the call tomorrow morning. I think
we can break this into 2 different decisions:
- Do we need a way to differentiate a judgment from a fact?
- Do we need a model to represent the evidence leading to a judgment?


Gary

-----Original Message-----
From: spdx-tech-bounces@...
[mailto:spdx-tech-bounces@...] On Behalf Of Peter Williams
Sent: Monday, January 10, 2011 9:33 AM
To: spdx-tech@...
Subject: Re: Licensing data for object files

On Fri, Jan 7, 2011 at 6:18 PM, <gary@...> wrote:
Can you give some examples of the "assumptions or judgements" you have in
mind?
In the context of the license, I can think of 2 examples:

File containing copyleft code snippet:
  - Fact1: File A contains several lines of code
  - Fact2: A subset of the code in file A matches code in file B
  - Fact3: File A does not contain any license statement
  - Fact4: File B contains an explicit LGPL license statement
  - Fact5: File B does not exist in the package described by the SPDEX
document.
In this case, I would not call it a fact that file A is under the LGPL
license, but rather a judgement call by the person who identified those
lines as being copied.  Note - this example assumes that the author of
File A is not the person making the judgement.
Perhaps we have a primarily a terminology problem. The above is
certainly a judgment in the sense that it is reaching a conclusion
based on some evidence. However it is not an assumption, guess or
legal opinion.

I think we have to reach a consensus on what goal is. If spdx is
primarily a way to report the license declarations in a package (and
it files) then we can get away from judgments. Judgments are going to
be a way of life if the goal is to communicate the licenses under
which a package may be used to the best knowledge of the spdx file
producer. I think the latter approach is what spdx should be
targeting.

Consider the following:
- fact1: file A has header that indicates it is licensed under mit
- fact2: there is no evidence than any part of this file is licensed
under any other license

Representing the actual licenses of a package containing just file A
to the best of our knowledge requires some judgment. We will look at
the preceding facts and reach the conclusion that to the best of our
knowledge the file and package are licensed under the mit license and
no other license. A pretty obvious conclusion to reach but still a
judgment. It is a judgment because it could be wrong, perhaps file A
is also covered by another license. Absence of evidence is not, in
itself, evidence of absence. Parts of file A could have been copied
from a GPL project and we may just not have noticed.

I think the conclusion in Gary's first example is a pretty obvious one
too. It requires more evidence to see the obviousness, but once you
have all the facts there is little doubt about the correct conclusion.

I think we should develop an evidentiary model for version 2. It
should provide direct, out of the box, support for basic types of
evidence and be extensible so that it can support all types of
evidence. Until we have created such a model i think we should
restrict ourselves to representing just the conclusions of judgments
made by the spdx producer. Representing the conclusions provides a
immense amount of value even without including the evidence.

Peter
_______________________________________________
Spdx-tech mailing list
Spdx-tech@...
https://fossbazaar.org/mailman/listinfo/spdx-tech


Peter Williams <peter.williams@...>
 

On Mon, Jan 10, 2011 at 11:00 AM, Gary O'Neall <gary@...> wrote:
Perhaps this would be a good topic for the call tomorrow morning.  I think
we can break this into 2 different decisions:
- Do we need a way to differentiate a judgment from a fact?
- Do we need a model to represent the evidence leading to a judgment?
Good idea.


Kate Stewart <kate.stewart@...>
 

On Mon, 2011-01-10 at 11:23 -0700, Peter Williams wrote:
On Mon, Jan 10, 2011 at 11:00 AM, Gary O'Neall <gary@...> wrote:
Perhaps this would be a good topic for the call tomorrow morning. I think
we can break this into 2 different decisions:
- Do we need a way to differentiate a judgment from a fact?
- Do we need a model to represent the evidence leading to a judgment?
Good idea.
+1