[lxml-dev] Low ASCII values as text
F Wolff
friedel at translate.org.za
Wed Apr 8 10:27:23 CEST 2009
Hallo list
I encountered a small issue from a user's error report, and a way to
duplicate the issue is from this example code:
from lxml import etree
l = etree.Element('cow')
l.text = unicode('\xd0\x94\x1bi\x1b\x1b\x1b?', "utf-8")
etree.fromstring(etree.tostring(l))
With lxml 2.1 I get:
XMLSyntaxError: PCDATA invalid Char value 27, line 1, column 13
It seems that etree.tostring() can generate XML that etree.fromstring()
can't handle.
But with a newer version (I think a beta of 2.2), I get
"All strings must be XML compatible : Unicode or ASCII, no NULL bytes"
on the assignment statement (l.text = ...).
So in either case my question is if lxml's handling of these low values
in ASCII is correct, since it doesn't seem possible to actually
represent them at all, but I guess I am missing something important. As
far as I know the XML 1.0 specification demands indicating these with
numeric entities.
Keep well
Friedel
--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/monolingual-translation-formats-considered-harmful
More information about the lxml-dev
mailing list