[lxml-dev] Annoying interaction between comments and text
Itamar Shtull-Trauring
itamar at itamarst.org
Tue Jun 12 20:28:30 CEST 2007
Lets say I have an element with some text in it. No subelements, just
text. It may have a comment in it, but I really don't want to have to
think about it. In elementtree I can just do:
>>> elementtree.ElementTree.fromstring("<x>hello <!-- hello -->
world</x>").text
'hello world'
But, in lxml if I do that I get:
>>> lxml.etree.fromstring("<x>hello <!-- hello --> world</x>").text
'hello '
One needs to use xpath to extract all the text. This is problematic
because it means you can basically *never use the text attribute of
elements*, since someone may have added a comment. Since comments have no
semantic meaning this is something of a problem. I bet there's lots and
lots of lxml code that would break if someone added a comment inside an
element's text.
More information about the lxml-dev
mailing list