[lxml-dev] Unicode oddness

Adam dood at zworg.com
Wed Apr 8 12:27:13 CEST 2009


Stefan Behnel <stefan_ml <at> behnel.de> writes:

> Your HTML snippet lacks a <meta> tag, so the HTMLParser has no way of
> knowing what encoding your HTML snippet uses. It therefore falls back to
> assuming Latin-1. If your snippet was encoded in Latin-1, you'd be quite
> happy about this default.
> 
> If you know the encoding in advance, you can create your own parser
> instance and pass it the "encoding" keyword option. 

Of course! Thank you, I had a feeling I was overlooking something simple.



More information about the lxml-dev mailing list