[lxml-dev] lxml and html encodings
Luke Tucker
ltucker at openplans.org
Thu Oct 19 16:48:35 CEST 2006
hah jeez,
erg. sorry to waste your time and thanks for your patience. Wasn't
intending to suggest it should handle malformed stuff, just a
mistake, but I can definitely understand what you're saying all the
same.
- Luke
> Sorry, but your HTML is very broken, too. It has two <html> tags and two
> contradictory <meta> tags (saying both "us-ascii" and "shift_jis"), so don't
> expect libxml2's HTML parser to magically know what you really meant when you
> wrote it. That's like saying: Ok, I know this function only works for values
> from 1-5, so I'll put in a 99 and complain if it breaks.
>
> If you parse broken HTML and the parser doesn't handle it correctly, the
> reason is your broken HTML, really.
>
> If you think libxml2 should be able to parse this kind of non-HTML, please
> file a bug on the libxml2 parser. There is nothing lxml can do about it.
>
> Stefan
>
> !DSPAM:1014,453722533261362196140!
>
More information about the lxml-dev
mailing list