[lxml-dev] Proposal: Better html5lib Support

Armin Ronacher armin.ronacher at active-4.com
Sun Jul 13 22:32:30 CEST 2008


Hi,

Stefan Behnel <stefan_ml <at> behnel.de> writes:

> I do not use html5lib myself, but I'm happily taking patches if you can fix it
> up in a more convenient way.
I created a patch now: http://paste.pocoo.org/show/79376/

That however has two disadvantages.  For one it extends the lxml etree builder
in a pretty ugly way but that could probably be improved, and it also creates
etree.Comment objects and not etree.html.HtmlComments.  The same problem exists
with the soupparser, mainly because there is no way to generate HtmlComment
objects without creating a segfault.  (The only way is to use html.fromstring
with the comment there, but that's an ugly hack).


Regards,
Armin



More information about the lxml-dev mailing list