[lxml-dev] xpath on text nodes

Jamie Norrish jamie at artefact.org.nz
Sun May 31 08:40:33 CEST 2009


Stefan Behnel wrote:

> This sounds like your algorithm is already more complex than a simple "any
> text node preceding the one that matches". That convinces me that an API
> based solution will be a lot more flexible than anything you could scratch
> out of XPath. It would allow you to special case certain tag types, for
> example, or to notice when you cross parent boundaries.

Well, the original plan didn't really call for much special casing of
particular elements, but now that things are working I'll likely add in
such as I think of them. I've changed the approach completely, to use
XSLT to transform the entire document into something that has the
context handled appropriately (and using XPath on text nodes :). It
takes two transformations (the second one to handle ordering issues with
the preceding context, and to do a little cleanup of whitespace, but it
is more than an order of magnitude faster than what I had before.

I'm not sure why I didn't go down that route in the first place, but now
that I have I'm very happy. And of course it's great that XSLT is so
easy to use with lxml.

Oh, I also tried using .getparent() and some logic to get the equivalent
of preceding::text()[1] and following::text()[1], but it turned out (not
surprisingly, given the complexities of that approach) to be marginally
slower than what I had.

Jamie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090531/b28709dd/attachment-0001.pgp 


More information about the lxml-dev mailing list