[lxml-dev] Help with an error message

Stefan Behnel stefan_ml at behnel.de
Thu Jan 3 17:57:19 CET 2008


Hi,

Rob Sanderson wrote:
> The null character makes the XML non-well-formed anyway.
> 
> The legal character ranges for XML (as per the spec, section 2.2):
> 
> Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x10000-#x10FFFF]
> 
> Definitely no \x00!

that's true. While you could get away on the XML /generator/ side with adding
an Entity (and lxml 2.0 will let you do that), this will just let you write
out broken XML that the recipient will not be able to parse:

  >>> from lxml import etree as et
  >>> el = et.Element("test")
  >>> el.text = "mind the "
  >>> el.append(et.Entity("#0"))
  >>> xml = et.tostring(el)
  '<test>mind the &#0;</test>'

  >>> et.fromstring(xml)
  Traceback (most recent call last):
  lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 0, line 1,
column 20

Maybe we should fix the Entity() factory here to prevent such misuse...

Stefan



More information about the lxml-dev mailing list