[lxml-dev] Annoying interaction between comments and text

Itamar Shtull-Trauring itamar at itamarst.org
Tue Jun 12 20:28:30 CEST 2007


Lets say I have an element with some text in it. No subelements, just
text. It may have a comment in it, but I really don't want to have to
think about it. In elementtree I can just do:

>>> elementtree.ElementTree.fromstring("<x>hello <!-- hello -->
world</x>").text
'hello  world'

But, in lxml if I do that I get:
>>> lxml.etree.fromstring("<x>hello <!-- hello --> world</x>").text
'hello '

One needs to use xpath to extract all the text. This is problematic
because it means you can basically *never use the text attribute of
elements*, since someone may have added a comment. Since comments have no
semantic meaning this is something of a problem. I bet there's lots and
lots of lxml code that would break if someone added a comment inside an
element's text.



More information about the lxml-dev mailing list