variants of licenses


Hi everybody,

my colleague Yuki Manabe, from the Univ. of Osaka, has been working hard
trying to extract all the licenses in every program of every package in
various Linux distributions (using Ninka).

This has allowed us to create a corpus of "licensing sentences". I think
this data might be useful for the matching of variations of licenses. Is
anybody working on this?

here are two files, representing the last two sentence of the BSD licensees:

Some of the variants are due to copyright owners, but there are some
other interesting cases. The number at the front of each is its
frequency (per file).

Daniel M. German
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .