[lxml-dev] Error (?) with UTF-8 document and Python unicode repr.
Frederik Elwert
felwert at uni-bremen.de
Thu Nov 29 20:56:22 CET 2007
Am Donnerstag, den 29.11.2007, 19:41 +0100 schrieb Artur Siekielski:
> > No, I think the better way would be to parse it, look for the encoding
> > (either by looking at <tree>.docinfo.encoding or looking for the
> > meta-Tag with find()), and then reparse the unaltered document, now
> > using the "encoding" keyword. This is what Stefan suggests:
> > http://article.gmane.org/gmane.comp.python.lxml.devel/3001/
>
> Hi,
> thanks for suggestion. But how can I pass the "encoding" keyword?
> Neither etree.parse nor etree.HTMLParser supports it.
Oh, I'm sorry. This is only supported by the alpha of lxml 2.0. Simply
overlooked that. So for the time being, serialisation and reparsing
might be the best option, but I haven't tried that.
Cheers,
Frederik
More information about the lxml-dev
mailing list