[lxml-dev] lxml parser encodings? What's supported?
John Krukoff
jkrukoff at ltgc.com
Wed Sep 17 21:08:43 CEST 2008
So, I've been trying to deal with some places where I need to force the
parser's encoding, and I've been surprised by how little it seems to
support. Specifically, 'ascii' isn't a supported encoding:
Python 2.5.2 (r252:60911, Jul 31 2008, 15:38:58)
[GCC 4.1.2 (Gentoo 4.1.2 p1.1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> p = etree.XMLParser( encoding = 'ascii' )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "parser.pxi", line 1240, in lxml.etree.XMLParser.__init__
(src/lxml/lxml.etree.c:58722)
File "parser.pxi", line 711, in lxml.etree._BaseParser.__init__
(src/lxml/lxml.etree.c:55050)
LookupError: unknown encoding: 'ascii'
>>> etree.__version__
u'2.1.1'
I checked the libxml2 documentation, and that claims that on linux it
supports all the encodings that iconv does, which is quite a lot. Almost
none of those returned by iconv actually work, though. Am I doing
something wrong here by trying to specify the encoding in this way? Is
there something weird about my build?
If everything is working as intended, is there anyplace I can find a
list of the encodings lxml does support?
My current workaround is to do the decoding to unicode first, then hand
the unicode string to lxml, but that seems less efficient than letting
the parser handle it.
--
John Krukoff <jkrukoff at ltgc.com>
Land Title Guarantee Company
More information about the lxml-dev
mailing list