[lxml-dev] lxml parser encodings? What's supported?

John Krukoff jkrukoff at ltgc.com
Fri Sep 26 01:32:38 CEST 2008


On Wed, 2008-09-17 at 21:31 +0200, Stefan Behnel wrote:
> Hi,
> No, you've found a bug. The way the override input encoding is checked by the
> parser instantiation is simply wrong, it doesn't find any "standard" encoding
> (utf-8 or ASCII), neither does it find iconv encodings.
> 
> Here's a fix.
> 
> Stefan

After some abortive fumbling until I figured out I needed to have cython
installed to use the patch, I gave it a try. Looks like it works fine
here for my use case:

>>> html.fromstring( '<html></html>', parser = html.HTMLParser( ) )
<Element html at 81b471c>
>>> html.fromstring( '<html></html>', parser = html.HTMLParser( encoding
= 'us-ascii' ) )
<Element html at 81b444c>

-- 
John Krukoff <jkrukoff at ltgc.com>
Land Title Guarantee Company



More information about the lxml-dev mailing list