Date
1 - 16 of 16
Licensing data for object files
Peter Williams <peter.williams@...>
Consider the situation where a supplier that wants to provide an
library as an object file and the headers and a purchaser would like to get package and license information for that library in spdx format. Or the situation where a developer downloads a binary distribution of an open source project and would like get package and license information for that library in spdx format for the corporate/project governance processes. Clearly there are not going to be copyright notices/license declarations in most object files. However, object files have significant licensing implications for their users. (For example, if the source from which it is was built was gpl -- either intentionally, or because it included some other gpl code.) Should the spdx file for a "binary" package show the license information of an object file as the union of the license information for all the source file used to create the object file? And do we expect that the package license information for a "binary" package would be the same as the license information for the "source" package from which the binary is built? Peter openlogic.com |
|
Kate Stewart <kate.stewart@...>
On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:
Consider the situation where a supplier that wants to provide anTrue. There are some cases though where a scan of the binary reveals the copyright/license in the comments (made so explicitly by the build system) "od filename | grep" is your friend here. Not necessarily - since sometimes the binary is build with the selection of a specific license in mind. Since a binary is one file, the use of the inferred license field we've been talking about might be appropriate here as well. And do weLikely to be the common case I would guess, but since source may be "or" license based, and someone selects one license explicitly - if we have a way to discover it, we should reflect it. Example I'm thinking of is dual licensed library GPL and BSD. If statically bound, quite a difference whether the library has been build with intention to be BSD instead of GPL.
|
|
Peter Williams <peter.williams@...>
I am starting to the that the interaction of licenses is too intricate
for use to support using multiple fields. It seems like you could easily end up with conjunctions and disjunctions between the licenses in the detect, declared and inferred licensing information. Perhaps we should collapse those three fields in to a single field. Doing so would allow the complete licensing information for a file/package to be represented in a clear, precise and explicit way. Is distinguishing the mechanism by which a license was discovered required for spdx version 1? If so we could provide a way to describe how licenses where discovered that is separate from the complete license information. If not we could save ourselves the work at delay that feature until version 2. Peter openlogic.com On Thu, Jan 6, 2011 at 10:06 AM, Kate Stewart <kate.stewart@...> wrote: On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote:Consider the situation where a supplier that wants to provide anTrue. There are some cases though where a scan of the binary reveals |
|
Gary O'Neall
I think this is a rather common case and a good one to discuss.
toggle quoted message
Show quoted text
Here's my thoughts: Scenario A: If the library is provided independently, then I would expect the SDPX file to contain only one file (the file for the library) with the declared license for that library. The declared license would be the license stated by originator of the library (whoever built the library). If there is no stated license, then the inferred license would be appropriate. One issue with this approach is how do we communicate where the information for the declared license came from? If it is embedded in the binary file, it would be straightforward. If it came from a website URL, this may be harder to independently verify. Scenario B: If the library is included in a larger package described by an SPDX file, then this library would be an embedded package. I would think the treatment of this file would be the same as an embedded archive file of the source. There would be one file and the license would be the stated or inferred license of that open source package. For both scenarios, it would be nice if a reference could be made to an SPDX file for the origin open source package for the library or archive file. In terms of the linkage implications, for Scenario A I think the SPDX description should only state the license of the file itself since the actual linkage and usage of the library would not be known. For Scenario B any implications of the linkage should be considered for the overall package license if that package links in any copyleft code. I would consider the description of any analysis of the binary file to be outside the scope of the SPDX standard. If there was an analysis of the source code which produced the binary, that source code analysis could be represented as its own SPDX file which would include references to the source files used to build the binaries. Just a note - I've been working on some binary analysis tools (decompilers and such). If we do want to extend SPDX to include analysis results of binaries, we would want to add additional details such as discovered class and package names for java. A good topic for after 1.0. Gary -----Original Message-----
From: spdx-tech-bounces@... [mailto:spdx-tech-bounces@...] On Behalf Of Kate Stewart Sent: Thursday, January 06, 2011 9:07 AM To: Peter Williams Cc: spdx-tech@... Subject: Re: Licensing data for object files On Thu, 2011-01-06 at 09:25 -0700, Peter Williams wrote: Consider the situation where a supplier that wants to provide anTrue. There are some cases though where a scan of the binary reveals the copyright/license in the comments (made so explicitly by the build system) "od filename | grep" is your friend here. Not necessarily - since sometimes the binary is build with the selection of a specific license in mind. Since a binary is one file, the use of the inferred license field we've been talking about might be appropriate here as well. And do weLikely to be the common case I would guess, but since source may be "or" license based, and someone selects one license explicitly - if we have a way to discover it, we should reflect it. Example I'm thinking of is dual licensed library GPL and BSD. If statically bound, quite a difference whether the library has been build with intention to be BSD instead of GPL.
_______________________________________________ Spdx-tech mailing list Spdx-tech@... https://fossbazaar.org/mailman/listinfo/spdx-tech |
|
Gary O'Neall
I completely agree that trying to interpret and represent the interaction
toggle quoted message
Show quoted text
of licenses should be outside the scope of SPDX 1.0. There are two reasons for this: 1. It can be rather complex to represent the complete analysis since the interaction may depend on the usage of a specific component and we have already decided (early on) not to include usage information in SPDX 1.0 2. The license interaction analysis may require legal interpretation of some of the license clauses It still may make sense, however, to have a way of representing when there is an explicit license declaration in a file. We could go back to only capturing explicit licenses and not including any inferred licenses, but we would loose some analysis information which would be very nice to convey. BTW - when I think about the detected and inferred licenses, I am not thinking as much about license interactions as much as capturing a clear intent by the author to license a file under a particular license. For example, if an author (or copyright owner) has a file in a directory stating all files in this directory are under a particular license and the files within that directory have the same copyright and no conflicting information. Gary I am starting to the that the interaction of licenses is too intricate |
|
Peter Williams <peter.williams@...>
On Thu, Jan 6, 2011 at 3:20 PM, <gary@...> wrote:
I completely agree that trying to interpret and represent the interactionPerhaps we are thinking of different kinds of "interaction? I meant "interaction" along the lines of "the package is declared as licensed under a or b but contains files licensed under c so the actual package licensing is a and c or b and c". Interaction is probably not a good word for this, but i don't have a better one. I this this should be handled by spdx. Interaction in the sense of "license a is incompatible with license b" is definitely outside the scope of spdx for exactly the reasons you point out. It still may make sense, however, to have a way of representing when thereI think excluding non-explicitly declared licenses from spdx would remove almost all of the value of this effort. BTW - when I think about the detected and inferred licenses, I am notI am trying to work out the use case for segregating the declared, detected and inferred licenses. A consumer of spdx seems mostly like to desire a complete understanding of the choice -- or lack thereof -- of licenses for a package. If that is a case perhaps we can simplify the model a bit by having a single 'licensing' property. Peter |
|
Gary O'Neall
We are probably thinking a little differently on interaction. The use cases
toggle quoted message
Show quoted text
I have in mind have to do with representing technical analysis on the interaction between components that affect the license terms. For the interaction you describe, I believe describing the licensing at the package level based on the licenses present in the files will require a more complex interaction analysis. The obligations of the license at the package level will be affected by the usage of that file within the package. For example, if a file contains an LGPL license it will cause the package to be licensed under LGPL if the LGPL file is not a distinct library within the package. One requirement I think we should have on SPDX is any logic or judgment used by the auditor creating the SPDX document should be called out as a judgment (distinct from a fact). This would allow for independent verification of such judgments. If we were to apply this requirement to the licensing at package level (acknowledging that we have not discussed or agreed to such a requirement), I think we would need to either include the "inferred" information or not include the interaction based licensing information since the interaction analysis requires judgment. Let me know if this make sense and if you agree with the requirement above. Gary -----Original Message-----
From: spdx-tech-bounces@... [mailto:spdx-tech-bounces@...] On Behalf Of Peter Williams Sent: Thursday, January 06, 2011 8:13 PM To: spdx-tech@... Subject: Re: Licensing data for object files On Thu, Jan 6, 2011 at 3:20 PM, <gary@...> wrote: I completely agree that trying to interpret and represent the interactionPerhaps we are thinking of different kinds of "interaction? I meant "interaction" along the lines of "the package is declared as licensed under a or b but contains files licensed under c so the actual package licensing is a and c or b and c". Interaction is probably not a good word for this, but i don't have a better one. I this this should be handled by spdx. Interaction in the sense of "license a is incompatible with license b" is definitely outside the scope of spdx for exactly the reasons you point out. It still may make sense, however, to have a way of representing when thereI think excluding non-explicitly declared licenses from spdx would remove almost all of the value of this effort. BTW - when I think about the detected and inferred licenses, I am notI am trying to work out the use case for segregating the declared, detected and inferred licenses. A consumer of spdx seems mostly like to desire a complete understanding of the choice -- or lack thereof -- of licenses for a package. If that is a case perhaps we can simplify the model a bit by having a single 'licensing' property. Peter _______________________________________________ Spdx-tech mailing list Spdx-tech@... https://fossbazaar.org/mailman/listinfo/spdx-tech |
|
Peter Williams <peter.williams@...>
On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...> wrote:
We are probably thinking a little differently on interaction. The use casesI usually think of spdx as a way to express the conclusions of a technical analysis. Actually representing the analysis itself sounds hard. :) For the interaction you describe, I believe describing the licensing at theThe analysis will be complex. The conclusion will much simplier. Either the lgpl code is a distinct library used by the package or not. One requirement I think we should have on SPDX is any logic or judgment usedI don't think non-facts have a place in spdx files. However, facts can be established though the use of logic. If a file contains 100 contiguous lines that are identical to 100 contiguous lines in the linux kernel and those lines where committed to the kernel before they where committed to project of the package being analyzed then it is a *fact* that the file must be licensed under gpl2. This is true regardless of the licensing statements of the file, directory, or project. It is important that spdx be able to represent this sort of fact. It is also important that spdx be able to represent all the licenses provided by the package regardless of how the fact of those licenses was determined. It would be nice to be able to represent the evidence for each license in an spdx file. A model to record such evidence seems like it would be non-trivial to develop. It also feels like an evidentiary model could be implemented as an extension to a conclusions only spdx. (Either as an separate standard or in a future version of spdx.) I'd rather an evidentiary model be out of scope for v1 because spdx can provide a lot of value without it and i think it will be straight forward to add it later. Peter |
|
Kate Stewart <kate.stewart@...>
On Fri, 2011-01-07 at 13:35 -0700, Peter Williams wrote:
On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...> wrote:esp. for version 1. :)We are probably thinking a little differently on interaction. The use casesI usually think of spdx as a way to express the conclusions of a Inferred field proposal at file level should cover this I think.One requirement I think we should have on SPDX is any logic or judgment usedI don't think non-facts have a place in spdx files. However, facts Question in my mind is how many "reasons" for the inference do we want to accept in version 1? It takes a sophisticated tool and knowledge base ;) to generally detect code in other place matches code in specific package. Not everyone has this capability. However if the capability is there - marking it as "inferred" license - base on code matching seems a good way of capturing the fact that there is evidence that another license is in play, other than what may be declared. Agree. I think for version 1, a set of known reasons on the inferred license field is probably the most we should aim for, for right now. Kate |
|
Gary O'Neall
Agree with both of your comments in general - we don't want to try to build
toggle quoted message
Show quoted text
a model to include all of the logic. I do, however, think it is important to capture where assumptions or judgment is being applied (even if we don't capture the logic behind the judgment). Having the inferred license type would be sufficient for this. Gary -----Original Message-----
From: spdx-tech-bounces@... [mailto:spdx-tech-bounces@...] On Behalf Of Kate Stewart Sent: Friday, January 07, 2011 1:18 PM To: Peter Williams Cc: spdx-tech@... Subject: Re: Licensing data for object files On Fri, 2011-01-07 at 13:35 -0700, Peter Williams wrote: On Fri, Jan 7, 2011 at 11:45 AM, Gary O'Neall <gary@...>wrote: casesWe are probably thinking a little differently on interaction. The use esp. for version 1. :)I have in mind have to do with representing technical analysis on theI usually think of spdx as a way to express the conclusions of a usedOne requirement I think we should have on SPDX is any logic or judgment judgmentby the auditor creating the SPDX document should be called out as a of(distinct from a fact). This would allow for independent verification atsuch judgments. If we were to apply this requirement to the licensing such apackage level (acknowledging that we have not discussed or agreed to sincerequirement), I think we would need to either include the "inferred" Inferred field proposal at file level should cover this I think.the interaction analysis requires judgment.I don't think non-facts have a place in spdx files. However, facts Question in my mind is how many "reasons" for the inference do we want to accept in version 1? It takes a sophisticated tool and knowledge base ;) to generally detect code in other place matches code in specific package. Not everyone has this capability. However if the capability is there - marking it as "inferred" license - base on code matching seems a good way of capturing the fact that there is evidence that another license is in play, other than what may be declared. Agree. I think for version 1, a set of known reasons on the inferred license field is probably the most we should aim for, for right now. Kate _______________________________________________ Spdx-tech mailing list Spdx-tech@... https://fossbazaar.org/mailman/listinfo/spdx-tech |
|
Peter Williams <peter.williams@...>
Can you give some examples of the "assumptions or judgements" you have in mind?
toggle quoted message
Show quoted text
On Fri, Jan 7, 2011 at 2:25 PM, Gary O'Neall <gary@...> wrote:
Agree with both of your comments in general - we don't want to try to build |
|
Peter Williams <peter.williams@...>
On Fri, Jan 7, 2011 at 2:18 PM, Kate Stewart <kate.stewart@...> wrote:
I vote for exactly zero reasons. A partial model is probably worseInferred field proposal at file level should cover this I think.One requirement I think we should have on SPDX is any logic or judgment usedI don't think non-facts have a place in spdx files. However, facts that no model at all. Any model that is based on static list will be obsolete from the start and will have to be replaced later, probably by an largely incompatible model. To really support this we would need a highly extensible model to describe the techniques used. The Openlogic and Blackduck have developed a some interchange formats that might be somewhat applicable to this problem space. However, they are not directly applicable and it turns out to sort of complicated. To make matters worse code forensics techniques tend to be treated a proprietary information so getting vendors to actually populated such fields would be a struggle in practice. Peter |
|
Peter Williams <peter.williams@...>
On Fri, Jan 7, 2011 at 6:18 PM, <gary@...> wrote:
Perhaps we have a primarily a terminology problem. The above isCan you give some examples of the "assumptions or judgements" you have inIn the context of the license, I can think of 2 examples: certainly a judgment in the sense that it is reaching a conclusion based on some evidence. However it is not an assumption, guess or legal opinion. I think we have to reach a consensus on what goal is. If spdx is primarily a way to report the license declarations in a package (and it files) then we can get away from judgments. Judgments are going to be a way of life if the goal is to communicate the licenses under which a package may be used to the best knowledge of the spdx file producer. I think the latter approach is what spdx should be targeting. Consider the following: - fact1: file A has header that indicates it is licensed under mit - fact2: there is no evidence than any part of this file is licensed under any other license Representing the actual licenses of a package containing just file A to the best of our knowledge requires some judgment. We will look at the preceding facts and reach the conclusion that to the best of our knowledge the file and package are licensed under the mit license and no other license. A pretty obvious conclusion to reach but still a judgment. It is a judgment because it could be wrong, perhaps file A is also covered by another license. Absence of evidence is not, in itself, evidence of absence. Parts of file A could have been copied from a GPL project and we may just not have noticed. I think the conclusion in Gary's first example is a pretty obvious one too. It requires more evidence to see the obviousness, but once you have all the facts there is little doubt about the correct conclusion. I think we should develop an evidentiary model for version 2. It should provide direct, out of the box, support for basic types of evidence and be extensible so that it can support all types of evidence. Until we have created such a model i think we should restrict ourselves to representing just the conclusions of judgments made by the spdx producer. Representing the conclusions provides a immense amount of value even without including the evidence. Peter |
|
Gary O'Neall
Perhaps this would be a good topic for the call tomorrow morning. I think
toggle quoted message
Show quoted text
we can break this into 2 different decisions: - Do we need a way to differentiate a judgment from a fact? - Do we need a model to represent the evidence leading to a judgment? Gary -----Original Message-----
From: spdx-tech-bounces@... [mailto:spdx-tech-bounces@...] On Behalf Of Peter Williams Sent: Monday, January 10, 2011 9:33 AM To: spdx-tech@... Subject: Re: Licensing data for object files On Fri, Jan 7, 2011 at 6:18 PM, <gary@...> wrote: Perhaps we have a primarily a terminology problem. The above isCan you give some examples of the "assumptions or judgements" you have inIn the context of the license, I can think of 2 examples: certainly a judgment in the sense that it is reaching a conclusion based on some evidence. However it is not an assumption, guess or legal opinion. I think we have to reach a consensus on what goal is. If spdx is primarily a way to report the license declarations in a package (and it files) then we can get away from judgments. Judgments are going to be a way of life if the goal is to communicate the licenses under which a package may be used to the best knowledge of the spdx file producer. I think the latter approach is what spdx should be targeting. Consider the following: - fact1: file A has header that indicates it is licensed under mit - fact2: there is no evidence than any part of this file is licensed under any other license Representing the actual licenses of a package containing just file A to the best of our knowledge requires some judgment. We will look at the preceding facts and reach the conclusion that to the best of our knowledge the file and package are licensed under the mit license and no other license. A pretty obvious conclusion to reach but still a judgment. It is a judgment because it could be wrong, perhaps file A is also covered by another license. Absence of evidence is not, in itself, evidence of absence. Parts of file A could have been copied from a GPL project and we may just not have noticed. I think the conclusion in Gary's first example is a pretty obvious one too. It requires more evidence to see the obviousness, but once you have all the facts there is little doubt about the correct conclusion. I think we should develop an evidentiary model for version 2. It should provide direct, out of the box, support for basic types of evidence and be extensible so that it can support all types of evidence. Until we have created such a model i think we should restrict ourselves to representing just the conclusions of judgments made by the spdx producer. Representing the conclusions provides a immense amount of value even without including the evidence. Peter _______________________________________________ Spdx-tech mailing list Spdx-tech@... https://fossbazaar.org/mailman/listinfo/spdx-tech |
|
Peter Williams <peter.williams@...>
On Mon, Jan 10, 2011 at 11:00 AM, Gary O'Neall <gary@...> wrote:
Perhaps this would be a good topic for the call tomorrow morning. I thinkGood idea. |
|
Kate Stewart <kate.stewart@...>
On Mon, 2011-01-10 at 11:23 -0700, Peter Williams wrote:
On Mon, Jan 10, 2011 at 11:00 AM, Gary O'Neall <gary@...> wrote:+1Perhaps this would be a good topic for the call tomorrow morning. I thinkGood idea. |
|