[lxml-dev] Possible bug

Laurence Rowe l at lrowe.co.uk
Fri Mar 20 21:38:25 CET 2009


2009/3/19 Stefan Behnel <stefan_ml at behnel.de>:
> Bob Kline wrote:
>> Before I dig into the work of producing a repro case, would the lxml
>> developers be interested in a bug report if I confirm that the XSL/T
>> parser which comes with the lxml package chokes on the serialized
>> version of an XML tree assembled by the lxml's HTML parser when the
>> original HTML document contains a comment which the XML spec doesn't
>> like (because "--" appears inside the comment)?
>
> So what you do is:
>
> 1) parse an HTML document that contains "--" in a comment
> 2) serialise it to XML, which produces broken XML because of the comment
> value

This is not necessary, libxslt is perfectly happy to work on trees
parsed by the HTMLParser, e.g.

>>> doc = etree.parse(html_file, parser=etree.HTMLParser())
>>> transform = etree.XSLT(etree.parse(transform_file))
>>> result = transform(doc)

Laurence


More information about the lxml-dev mailing list