[lxml-dev] lxml parser encodings? What's supported?
Stefan Behnel
stefan_ml at behnel.de
Wed Sep 17 21:31:58 CEST 2008
Hi,
John Krukoff wrote:
> So, I've been trying to deal with some places where I need to force the
> parser's encoding, and I've been surprised by how little it seems to
> support. Specifically, 'ascii' isn't a supported encoding:
>
> Python 2.5.2 (r252:60911, Jul 31 2008, 15:38:58)
> [GCC 4.1.2 (Gentoo 4.1.2 p1.1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from lxml import etree
>>>> p = etree.XMLParser( encoding = 'ascii' )
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "parser.pxi", line 1240, in lxml.etree.XMLParser.__init__
> (src/lxml/lxml.etree.c:58722)
> File "parser.pxi", line 711, in lxml.etree._BaseParser.__init__
> (src/lxml/lxml.etree.c:55050)
> LookupError: unknown encoding: 'ascii'
>>>> etree.__version__
> u'2.1.1'
>
>
> I checked the libxml2 documentation, and that claims that on linux it
> supports all the encodings that iconv does, which is quite a lot. Almost
> none of those returned by iconv actually work, though. Am I doing
> something wrong here by trying to specify the encoding in this way?
No, you've found a bug. The way the override input encoding is checked by the
parser instantiation is simply wrong, it doesn't find any "standard" encoding
(utf-8 or ASCII), neither does it find iconv encodings.
Here's a fix.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parser-encoding.patch
Type: text/x-patch
Size: 1581 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080917/ba9bff40/attachment-0001.bin
More information about the lxml-dev
mailing list