[lxml-dev] Some XPath questions...

Ian Bicking ianb at colorstudy.com
Tue Jul 3 01:26:06 CEST 2007


Stefan Behnel wrote:
>> So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
>> find it peculiar to do a search from the root when you've called the 
>> method on some particular element (that may not be at the root).
> 
> There's also ".//*".

That seems to be equivalent to //*, i.e., // goes directly to the root 
regardless of context.

>>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>>> Ouch. let me think about that one.
>>>> Yeah, I couldn't figure that one out.  I thought this might work:
>>>>      >>> xpath('E:empty')
>>>>      e[count(./children::*) = 0 and string(.) = '']
>>>> But maybe I don't understand how count() works; this isn't a valid XPath 
>>>> expression.
>>> You want "child" not "children". Using normalize-space(.) instead of
>>> string(.) will exclude whitespace. This does assume you are ignoring
>>> comments and PIs; I believe that's the behavior you want.
>> Cool, that seems to work right.
> 
> What about "e[not(*) and not(normalize-space())]" ?

Yes, that works too.

>> One query I'm realizing might be really hard (maybe too hard in XPath) 
>> is *:first-of-type, *:last-of-type, and *:only-of-type, since they match 
>> in a funny sort of way.  You can't really do:
>>
>>    *[count(../*[name() = name()) = 1]
> 
> You need two expressions here, one to find the node and one to compare it to
> others (note that name() can also take an argument) - but those are really
> trick, you're right. They may already touch the borders of what XPath can express.

I could probably do it by adding a new function, I suppose; 
css:last-of-type() for instance.  It's not that hard to do in Python, 
after all.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers


More information about the lxml-dev mailing list