[lxml-dev] Proposal: Better html5lib Support
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 14 21:30:05 CEST 2008
Hi,
Armin Ronacher wrote:
> Stefan Behnel <stefan_ml <at> behnel.de> writes:
>
>> I do not use html5lib myself, but I'm happily taking patches if you can fix it
>> up in a more convenient way.
> I created a patch now: http://paste.pocoo.org/show/79376/
Thanks!
> That however has two disadvantages. For one it extends the lxml etree builder
> in a pretty ugly way but that could probably be improved,
I'll take a look at it as soon as I find the time.
> and it also creates
> etree.Comment objects and not etree.html.HtmlComments. The same problem exists
> with the soupparser, mainly because there is no way to generate HtmlComment
> objects without creating a segfault.
Yes. Although this isn't really a bug (you should use the Comment factory to
create a comment, not the _Comment or HtmlComment classes), this seems to be a
common misconception especially by new users. This behaviour will change in
lxml 2.2, where calling an Element class already creates a new Element.
> (The only way is to use html.fromstring
> with the comment there, but that's an ugly hack).
Using the etree.Comment() factory is just fine and will do the right thing.
Stefan
More information about the lxml-dev
mailing list