[lxml-dev] Help with an error message
Stefan Behnel
stefan_ml at behnel.de
Thu Jan 3 17:57:19 CET 2008
Hi,
Rob Sanderson wrote:
> The null character makes the XML non-well-formed anyway.
>
> The legal character ranges for XML (as per the spec, section 2.2):
>
> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x10000-#x10FFFF]
>
> Definitely no \x00!
that's true. While you could get away on the XML /generator/ side with adding
an Entity (and lxml 2.0 will let you do that), this will just let you write
out broken XML that the recipient will not be able to parse:
>>> from lxml import etree as et
>>> el = et.Element("test")
>>> el.text = "mind the "
>>> el.append(et.Entity("#0"))
>>> xml = et.tostring(el)
'<test>mind the �</test>'
>>> et.fromstring(xml)
Traceback (most recent call last):
lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 0, line 1,
column 20
Maybe we should fix the Entity() factory here to prevent such misuse...
Stefan
More information about the lxml-dev
mailing list