[lxml-dev] Error (?) with UTF-8 document and Python unicode repr.

Frederik Elwert felwert at uni-bremen.de
Thu Nov 29 20:56:22 CET 2007


Am Donnerstag, den 29.11.2007, 19:41 +0100 schrieb Artur Siekielski:
> > No, I think the better way would be to parse it, look for the encoding
> > (either by looking at <tree>.docinfo.encoding or looking for the
> > meta-Tag with find()), and then reparse the unaltered document, now
> > using the "encoding" keyword. This is what Stefan suggests:
> > http://article.gmane.org/gmane.comp.python.lxml.devel/3001/
> 
> Hi,
> thanks for suggestion. But how can I pass the "encoding" keyword? 
> Neither etree.parse nor etree.HTMLParser supports it.

Oh, I'm sorry. This is only supported by the alpha of lxml 2.0. Simply
overlooked that. So for the time being, serialisation and reparsing
might be the best option, but I haven't tried that.

Cheers,
Frederik



More information about the lxml-dev mailing list