[lxml-dev] clean_html

Francesco cattafra at hotmail.com
Fri Jun 26 11:48:57 CEST 2009


Thank you for your answer...

I will try the ".docinfo.encoding" property.

How could I use UTF-8 on output in general? I have tried 
output.write(unicode(result)) and output.write(result.encode('utf-8')).

With the first I got "UnicodeEncodeError:
'ascii' codec can't encode character u'\xbb' in position 17: ordinal not in 
range(128)"
while with the second the extra character "Â" before "»".

result is u'La Repubblica.it \xbb Homepage'

Thanks,

Francesco 




More information about the lxml-dev mailing list