[lxml-dev] Possible bug
Stefan Behnel
stefan_ml at behnel.de
Fri Mar 20 15:55:47 CET 2009
Bob Kline wrote:
> Stefan Behnel wrote:
>> It's not really "no". It's just that this is a rare case and there is
>> little one can do about it.
>
> Well, you could do what I'm doing: collapse sequences of two or more
> hyphens to single hyphens, and drop leading and trailing hyphens (or
> prefix leading hyphens with a space and append a space to a trailing
> hyphen) in the comment text.
Hmmm, yes, I could imagine using a SAX function that wraps the comments
callback in the HTML parser. But that would require a separate parser
option, as it could break code. There are many HTML templating languages
that use comments for all sorts of stuff, so if lxml starts preprocessing
them, I imagine that there will be some rather unfriendly user comments.
BTW, if performance is not your sine-qua-non priority here, you can write
your own parser target that does the same thing in Python space.
>> But there's always space for better
>> documentation - contributions very welcome.
>
> Excellent. How about (following the sentence quoted earlier in this
> thread, beginning "It lets libxml2 try its best to return something
> usable ...."):
>
> The result, when serialized with etree.tostring(), will often (but
> not always) be a well-formed XML document.
I'll update it.
Stefan
More information about the lxml-dev
mailing list