[lxml-dev] xpath on text nodes
Stefan Behnel
stefan_ml at behnel.de
Thu Apr 30 09:42:00 CEST 2009
Jamie Norrish wrote:
>> There is no such concept as a text node in lxml.etree.
>
> Okay, but the string results of an XPath selecting text nodes in the XML
> have additional attributes - it just seems a pity that an xpath method
> isn't one of them.
It would be rarely used, I'd say. What sort of interesting XPath queries
could you possibly do on a node that doesn't have any children, nor
attributes, nor a tag name or namespace. Also, XPath queries can return
Elements and (special) strings, but also plain numbers and boolean values.
So you'd still not have a common interface for all possible result types.
>> That sounds a lot like you should do that in Python by using iterwalk()
>> and collecting .text and .tail attributes of Elements, not by using
>> XPath.
>
> Well, I like XPath. :) In fact I already have an implementation of the
> use case that, while slightly subobtimal, is sufficient - it just seemed
> like one obvious way of doing it better was to use XPath. I shall
> investigate using iterwalk instead.
This should basically be a no-brainer with iterwalk(). You iterate over
start and end events and just collect the .text values on start and the
.tail values on end. Put them in a list, count the total character length
on the way, break when it's long enough and ''.join() the list.
Stefan
More information about the lxml-dev
mailing list