[lxml-dev] Possible bug

Stefan Behnel stefan_ml at behnel.de
Fri Mar 20 15:55:47 CET 2009


Bob Kline wrote:
> Stefan Behnel wrote:
>> It's not really "no". It's just that this is a rare case and there is
>> little one can do about it.
>
> Well, you could do what I'm doing: collapse sequences of two or more
> hyphens to single hyphens, and drop leading and trailing hyphens (or
> prefix leading hyphens with a space and append a space to a trailing
> hyphen) in the comment text.

Hmmm, yes, I could imagine using a SAX function that wraps the comments
callback in the HTML parser. But that would require a separate parser
option, as it could break code. There are many HTML templating languages
that use comments for all sorts of stuff, so if lxml starts preprocessing
them, I imagine that there will be some rather unfriendly user comments.

BTW, if performance is not your sine-qua-non priority here, you can write
your own parser target that does the same thing in Python space.


>> But there's always space for better
>> documentation - contributions very welcome.
>
> Excellent.  How about (following the sentence quoted earlier in this
> thread, beginning "It lets libxml2 try its best to return something
> usable ...."):
>
>     The result, when serialized with etree.tostring(), will often (but
>     not always) be a well-formed XML document.

I'll update it.

Stefan



More information about the lxml-dev mailing list