[lxml-dev] Error (?) with UTF-8 document and Python unicode repr.

Artur Siekielski artur.siekielski at gmail.com
Thu Nov 29 21:05:21 CET 2007


Frederik Elwert napisał:
> Am Donnerstag, den 29.11.2007, 19:41 +0100 schrieb Artur Siekielski:
>>> No, I think the better way would be to parse it, look for the encoding
>>> (either by looking at <tree>.docinfo.encoding or looking for the
>>> meta-Tag with find()), and then reparse the unaltered document, now
>>> using the "encoding" keyword. This is what Stefan suggests:
>>> http://article.gmane.org/gmane.comp.python.lxml.devel/3001/
>> Hi,
>> thanks for suggestion. But how can I pass the "encoding" keyword? 
>> Neither etree.parse nor etree.HTMLParser supports it.
> 
> Oh, I'm sorry. This is only supported by the alpha of lxml 2.0. Simply
> overlooked that. So for the time being, serialisation and reparsing
> might be the best option, but I haven't tried that.

How stable is 2.0 alpha? I'm using lxml for parsing HTML and traversing 
parsed tree with etree API and XPath.


More information about the lxml-dev mailing list