[lxml-dev] clean_html
Francesco
cattafra at hotmail.com
Fri Jun 26 11:48:57 CEST 2009
Thank you for your answer...
I will try the ".docinfo.encoding" property.
How could I use UTF-8 on output in general? I have tried
output.write(unicode(result)) and output.write(result.encode('utf-8')).
With the first I got "UnicodeEncodeError:
'ascii' codec can't encode character u'\xbb' in position 17: ordinal not in
range(128)"
while with the second the extra character "Â" before "»".
result is u'La Repubblica.it \xbb Homepage'
Thanks,
Francesco
More information about the lxml-dev
mailing list