[lxml-dev] Some XPath questions...

Ian Bicking ianb at colorstudy.com
Tue Jul 3 01:45:37 CEST 2007


Mike Meyer wrote:
> In <4689898E.9080509 at colorstudy.com>, Ian Bicking <ianb at colorstudy.com> typed:
>> Stefan Behnel wrote:
>>>> So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
>>>> find it peculiar to do a search from the root when you've called the 
>>>> method on some particular element (that may not be at the root).
>>> There's also ".//*".
>> That seems to be equivalent to //*, i.e., // goes directly to the root 
>> regardless of context.
> 
> Not quite. '//*' always goes to the root. './/*' starts at the current
> node and matches from there down. If you always test at the root of
> the document, they'll look the same.

It seems to be changing the results when I replace 
'descendant-or-self::' with './/'.  I want to include the current node 
if it matches; at least to me, that seems most logical.  Also necessary 
when I was doing microformat parsing, as a single element can have 
multiple roles.  It seems like .// excludes the current node, only 
looking at descendants.

>>>>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>>>>> Ouch. let me think about that one.
>>>>>> Yeah, I couldn't figure that one out.  I thought this might work:
>>>>>>      >>> xpath('E:empty')
>>>>>>      e[count(./children::*) = 0 and string(.) = '']
>>>>>> But maybe I don't understand how count() works; this isn't a valid XPath 
>>>>>> expression.
>>>>> You want "child" not "children". Using normalize-space(.) instead of
>>>>> string(.) will exclude whitespace. This does assume you are ignoring
>>>>> comments and PIs; I believe that's the behavior you want.
>>>> Cool, that seems to work right.
>>> What about "e[not(*) and not(normalize-space())]" ?
>> Yes, that works too.
> 
> That's the 'implicit conversion' I was talking about. You're relying
> on 0 and the empty string being false. It's a standard idiom, and
> pythonic, but I'm not sure you want to use it in automatically
> generated code, since it means you can't generalize the code from "has
> 0 children" to "has n children".

In this case it's a fixed expression used for e:empty, and nothing else, 
so it seems fine.  And possibly makes the resulting expression a bit 
easier to recognize from its CSS roots.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers


More information about the lxml-dev mailing list