[lxml-dev] problem\bug in xpath compare() with text in tail
Stefan Behnel
stefan_ml at behnel.de
Sat May 24 13:48:11 CEST 2008
Hi,
Matan Ninio wrote:
> This may be a just my (limited) understanding of Xpath and XML, but i'm getting
> a strange problem when I try to use xpath to search for specific strings in a
> file. specifically, when I use "\\*[compare(text(),"needle")]" to look for
> elements with "needle" in their text, it only works when the strings appears in
> the "text" part, but not when its in the "tail" part. So:
>
> <prompt> e=etree.HTML("<html><body>inbody<h5>text</h5>tail</body></html>")
>
> <prompt> e.xpath("//text()")
> ['inbody', 'text', 'tail']
>
> <prompt> e.xpath("//*[contains(text(),'text')]//text()")
> ['text']
>
> ---- works fine, but
>
> <prompt> e.xpath("//*[contains(text(),'tail')]//text()")
> []
>
> ---- does not.
>
> is it just that I need to use a different function/attribute for the tail
> (instead of text())?
The tail text is not inside the element, so it's non-trivial to search for it
in XPath. You can either iterate over all nodes and check .tail yourself, or
do this (untested) to reduce the overhead on the Python side:
for el in e.xpath("//*[contains(following-sibling::text(),'tail')]"):
if 'tail' in el.tail:
...
Do some testing to find out which is faster for your data.
Stefan
More information about the lxml-dev
mailing list