[lxml-dev] clean_html
Kev Dwyer
kevin.p.dwyer at gmail.com
Wed Jun 24 14:10:49 CEST 2009
Hello Francesco,
For me the problem can be avoided by defining html as a unicode string:
>>> html = u"»"
>>> print clean_html(html)
<p>»</p>
I suspect this is only a problem if the encoding of the html string passed
to clean_html is undefined, or incorrectly defined.
Kevin
2009/6/24 Francesco <cattafra at hotmail.com>
> I have written the following code:
>
> >>> from lxml.html.clean import clean_html
> >>> html = "»"
> >>> print clean_html(html)
> <p>»</p>
>
> I am wondering why I have an extra character (Â) in my output.
> What should I do to avoid that?
>
> Thanks,
>
> Francesco
>
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090624/66fc56fe/attachment.htm
More information about the lxml-dev
mailing list