[lxml-dev] automatic attribute unicode decode?
Hervé Cauwelier
herve.cauwelier at free.fr
Fri Jul 31 17:39:45 CEST 2009
Hi,
I'm quite puzzled by the following excerpt:
>>> from lxml import etree
>>> r = etree.fromstring('<root toto="français" titi="ascii" tata="1"/>'
>>> r.attrib
{'titi': 'ascii', 'toto': u'fran\xe7ais', 'tata': '1'}
In a bare document with no encoding declaration, lxml has decoded itself
a string that did not match the ascii table (what heuristic did it
use?). Now I have three attributes of two different types. I wonder why
the integer was not decoded. ;-)
I actually found this in a real-world document with encoding and
namespaces (An ODF xml part).
Is this a bug to report and how to circumvent it?
Thanks,
Hervé
More information about the lxml-dev
mailing list