[lxml-dev] Any way to pass encoding to html.html_parser?

Stefan Behnel stefan_ml at behnel.de
Thu Sep 27 08:32:38 CEST 2007


js wrote:
> A simple question about lxml2.0alpha3's new feature.
> 
>>    * Parsers accept an 'encoding' keyword argument that overrides the
>>     encoding of the parsed documents.
> 
> How can I pass encoding argument to the parser when using html.parse instead of
> etree.parse?

Hmm, true, you can't currently do that, as lxml.html.html_parser is a parser
instance, not a class.

It's easy to build an equivalent parser, though. The next release will
duplicate the parser class into lxml.html, until then, you can do this:

class HTMLParser(lxml.etree.HTMLParser):
    def __init__(self, **kwargs):
        super(HTMLParser, self).__init__(**kwargs)
        self.setElementClassLookup(lxml.html.HtmlElementClassLookup())

Stefan



More information about the lxml-dev mailing list