Use of escaped characters in SPDX XML files.


Sam Ellis
 

Hi,

 

I would like to resolve some queries in the SPDX XML files in relation to the strange text such as &#x2019;. The characters < and > and & are used in the structure of XML files, for example in the tag “<p>” and so these characters cannot be used directly in text, otherwise we would not know what is text and what is a tag. XML deals with this by allowing these characters and others to be escaped when they occur in text. So for example, if the original text really contains a < then we must replace it in the XML with &lt; or &#60; with these representing the name of the character and the number respectively. Strictly, the only characters that must be escaped in XML text are < and &. It is common but not necessary to also escape > for consistency with <. We see these in the SPDX XML files as:

 

<             &lt;

>             &gt;

&             &amp;

 

Escaping of other characters is optional, and whilst they make it harder for people to read, a computer program reading XML files should deal with these just fine. If we come across these escaped characters and want to check what they mean, then paste the full name such as &lt; or full number such as &#60; into google and the top hit will usually show us which character it represents. So my take on these is that so long as the escaped character when converted back to a proper character still matches what is in the original license then this is correct and acceptable.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Join Spdx-legal@lists.spdx.org to automatically receive all group messages.