Re: public domain dedications proliferation

Michael Dolan

In addition to Steve's thoughts... I will respond quickly as that was the request ... and likely miss issues. My only additional though is could we add a generic public domain license reference to the license list and then keep a list of discovered uses in the metadata? It would break from the traditional identifier = 1 specific license text model, but technically we do allow for variations in the headers already. 

A) Full Name: Public Domain Generic Dedication
B) Identifier: PDGD
C) Other web pages for this license: [insert example URLs where these generic dedications show up]
D) Notes: Public domain dedications occur in varying texts and contexts. The PDGD license identifier encompasses all texts dedicating unlicensed works to the public domain.
E/F) Yes if on either list... unlikely... but possible
G) Full text would be "Public domain" - which seems to encompass all the examples here. There may be other public domain dedications/declarations we could include. 
I do realize its odd to suggest adding a (non-license) public domain dedication to the License List - but... hey, you asked for fast :-)

-- Mike

On Mon, Aug 15, 2022 at 8:28 PM Steve Winslow <swinslow@...> wrote:
Hi Jilayne, since you asked for input ASAP, here are a few immediate gut reactions  :)

I think getting the data of seeing a bunch of different ways that people said "this code is released into the public domain" is _slightly_ useful, but not very useful. My guess is that there's a ton of variations that are substantively saying the same thing but doing so in a way that would be extremely difficult to meaningfully capture into a few categories with regexs / pattern matching.

If the goal is really to find one or a few different regular-expression-matchable phrases that would go on the license list in its current form and format, then maybe that would be helpful data. But I guess I'm skeptical that we would find those patterns in a way that fits the current approach to license IDs on the license list, without ending up with a hundred variations of basically the same thing.

Maybe I'm jumping ahead to "what are the options?" before getting the data, but it seems to me like there are basically 4 options for whether and how to capture public domain statements:

1. No change: Don't add "this is in the public domain" statements to the license list. People can use LicenseRef's if they want.
Pro: Maintains the current approach that the License List is for licenses with specific text.
Con: Doesn't solve the problem people are having, with wanting to represent public domain statements generally with a common identifier.

2. Add a category ID to the spec: Alongside NONE and NOASSERTION as values defined in the SPDX spec, add PUBLIC-DOMAIN as another option defined in the spec rather than on the license list. Unlike NONE and NOASSERTION, PUBLIC-DOMAIN would presumably be useable in complex expressions (e.g. MIT AND PUBLIC-DOMAIN).
Pro: Provides a general identifier for public domain statements. Also maintains the current approach that the License List is for licenses with specific text.
Con: We're frankly too late to get this in as a substantive change for the SPDX 2.3 spec.

3. Add a category ID to the license list: Rather than changing the spec, add a category ID for "Public-Domain" (or similar) to the License List. Modify the license list schema somehow to indicate that this ID is meant to represent the collection of texts stating that a work is in the public domain, rather than one specific text.
Pro: Wouldn't be tied to a change to the spec. Would probably represent the way that most human users tend to think about public domain statements.
Con: Breaks expectations about all other License List entries, that they are tied to a particular text. Might also have implications for the SPDX spec that aren't coming to mind at the moment.

4. Add each statement individually: Add Public-Domain-1, Public-Domain-2, ... to the License List as separate entries, to capture every non-matching representation that we run into to say "this is in the public domain".
Pro: Maintains the current approach that the License List is for licenses with specific text.
Con: Get ready for 700 new Public Domain entries on the license list  :)  Probably becomes unwieldy for humans to meaningfully make use of this.

Definitely open to other options, but these are the ones that come to mind offhand. (And the above is intentionally ignoring public domain dedications that really do have a set standard text, such as CC-PDDC.)

Personally, if we weren't about to have SPDX 2.3 released imminently, I'd probably lean towards option 2. Given that it is about to be released, I could be persuaded to consider option 3, though I suspect we would need significant input from the tooling community as to whether this breaks too many current expectations on their side.


On Mon, Aug 15, 2022 at 7:15 PM J Lovejoy <opensource@...> wrote:
Hi SPDX-legal,

I have raised this a couple times in the past few months or so, but now that it is more of a "ripe" topic, I wanted to get some input on preliminary ideas:

Fedora has now officially adopted the use of SPDX ids in packages meta data (specifically, the license field of the package spec file). Due to Fedora historically using "category" short names for groups of similar licenses, we suspect there may be a number of additions to the SPDX License List needed. 

Public domain category:
Specifically, Fedora has used "public domain" for any public domain dedication, without capturing the exact text. For Fedora package maintainers who are keen to update the license info for their packages, we have given this interim advice:
and (see section on "public domain")

SPDX approach:
The SPDX License List has always operated from the principle that an SPDX license id represents a specific, identifiable license/set of text. This is critical as part of our project goal of being both human and machine-readable.

However, if it turns out that there are a large number of slight variations of text that mean the same thing (e.g., a simply one line statement of public domain dedication), then perhaps SPDX might consider a slightly different approach?

But, in order to even consider this, we'd need data.

Idea for going forward:
As Fedora package maintainers find these texts that had been marked generically as "public domain" - what if we asked them to copy the actual text of what they found and maybe a link to where they found it (or some other such pointer) in a simply formatted file in the Fedora license-list-data repo?

This would be at least a beginning of collecting this data that SPDX-legal could then review in a more bulk fashion in order to consider the above potential approach.

If so, what other info besides the text itself should be collected and how can it be most easily formatted to enable easy consumption later?

For example, it could be something like this:

package = Foo
text = This is released into the public domain.
source = <url to file or other such pointer>

I wouldn't want to be more than a few pieces of information. There is also always the ability to search on the short name "public domain" or the interiem SPDX id of "LicenseRef-Fedora-Public-Domain", but might as well start collecting info if package maintainers are looking as well.

Thoughts? (need input ASAP :)


Join { to automatically receive all group messages.