[lxml-dev] Low ASCII values as text

F Wolff friedel at translate.org.za
Wed Apr 8 10:27:23 CEST 2009


Hallo list

I encountered a small issue from a user's error report, and a way to
duplicate the issue is from this example code:

from lxml import etree
l = etree.Element('cow')
l.text = unicode('\xd0\x94\x1bi\x1b\x1b\x1b?', "utf-8")
etree.fromstring(etree.tostring(l))

With lxml 2.1 I get:

XMLSyntaxError: PCDATA invalid Char value 27, line 1, column 13

It seems that etree.tostring() can generate XML that etree.fromstring()
can't handle.


But with a newer version (I think a beta of 2.2), I get
"All strings must be XML compatible : Unicode or ASCII, no NULL bytes"
on the assignment statement (l.text = ...).


So in either case my question is if lxml's handling of these low values
in ASCII is correct, since it doesn't seem possible to actually
represent them at all, but I guess I am missing something important. As
far as I know the XML 1.0 specification demands indicating these with
numeric entities.

Keep well
Friedel


--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/monolingual-translation-formats-considered-harmful



More information about the lxml-dev mailing list