[lxml-dev] Some XPath questions...
Ian Bicking
ianb at colorstudy.com
Tue Jul 3 01:45:37 CEST 2007
Mike Meyer wrote:
> In <4689898E.9080509 at colorstudy.com>, Ian Bicking <ianb at colorstudy.com> typed:
>> Stefan Behnel wrote:
>>>> So when I use // it works. Huh. I prefer descendant-or-self, because I
>>>> find it peculiar to do a search from the root when you've called the
>>>> method on some particular element (that may not be at the root).
>>> There's also ".//*".
>> That seems to be equivalent to //*, i.e., // goes directly to the root
>> regardless of context.
>
> Not quite. '//*' always goes to the root. './/*' starts at the current
> node and matches from there down. If you always test at the root of
> the document, they'll look the same.
It seems to be changing the results when I replace
'descendant-or-self::' with './/'. I want to include the current node
if it matches; at least to me, that seems most logical. Also necessary
when I was doing microformat parsing, as a single element can have
multiple roles. It seems like .// excludes the current node, only
looking at descendants.
>>>>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>>>>> Ouch. let me think about that one.
>>>>>> Yeah, I couldn't figure that one out. I thought this might work:
>>>>>> >>> xpath('E:empty')
>>>>>> e[count(./children::*) = 0 and string(.) = '']
>>>>>> But maybe I don't understand how count() works; this isn't a valid XPath
>>>>>> expression.
>>>>> You want "child" not "children". Using normalize-space(.) instead of
>>>>> string(.) will exclude whitespace. This does assume you are ignoring
>>>>> comments and PIs; I believe that's the behavior you want.
>>>> Cool, that seems to work right.
>>> What about "e[not(*) and not(normalize-space())]" ?
>> Yes, that works too.
>
> That's the 'implicit conversion' I was talking about. You're relying
> on 0 and the empty string being false. It's a standard idiom, and
> pythonic, but I'm not sure you want to use it in automatically
> generated code, since it means you can't generalize the code from "has
> 0 children" to "has n children".
In this case it's a fixed expression used for e:empty, and nothing else,
so it seems fine. And possibly makes the resulting expression a bit
easier to recognize from its CSS roots.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
| Write code, do good | http://topp.openplans.org/careers
More information about the lxml-dev
mailing list