[lxml-dev] lxml and html encodings

Luke Tucker ltucker at openplans.org
Thu Oct 19 16:48:35 CEST 2006


hah jeez, 

erg. sorry to waste your time and thanks for your patience. Wasn't
intending to suggest it should handle malformed stuff, just a 
mistake, but I can definitely understand what you're saying all the 
same. 

- Luke 

> Sorry, but your HTML is very broken, too. It has two <html> tags and two
> contradictory <meta> tags (saying both "us-ascii" and "shift_jis"), so don't
> expect libxml2's HTML parser to magically know what you really meant when you
> wrote it. That's like saying: Ok, I know this function only works for values
> from 1-5, so I'll put in a 99 and complain if it breaks.
> 
> If you parse broken HTML and the parser doesn't handle it correctly, the
> reason is your broken HTML, really.
> 
> If you think libxml2 should be able to parse this kind of non-HTML, please
> file a bug on the libxml2 parser. There is nothing lxml can do about it.
> 
> Stefan
> 
> !DSPAM:1014,453722533261362196140!
> 



More information about the lxml-dev mailing list