I have written the following code: >>> from lxml.html.clean import clean_html >>> html = "»" >>> print clean_html(html) <p>»</p> I am wondering why I have an extra character (Â) in my output. What should I do to avoid that? Thanks, Francesco