[lxml-dev] Some XPath questions...

Stefan Behnel stefan_ml at behnel.de
Sat Jun 30 20:34:24 CEST 2007


Hi Ian,

if this is supposed to go into lxml.html (or maybe something like lxml.css)
please don't call your function "xpath()". That's the XPath evaluation method
in etree. Consider calling it "build_xpath()", "css_to_xpath()" or something,
depending on the context you provide it in.

Ian Bicking wrote:
> Mike Meyer wrote:
>> In <468579E3.7010802 at colorstudy.com>, Ian Bicking <ianb at colorstudy.com> typed:
>>> Thanks, very helpful.  I'm guessing it was an oversight that you didn't 
>>> copy the list...
>> I wasn't sure which way to go.
> 
> Without CC'ing people won't know you've already answered my questions.

And without CC'ing the list, the mail won't get archived, people won't be able
to find the discussion later and will keep asking the same questions over and
over again. :)

Oh, and: people won't even be able to comment on what you (Mike) propose as a
solution and you won't be able to learn anything either, in case there's a
better solution.


> So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
> find it peculiar to do a search from the root when you've called the 
> method on some particular element (that may not be at the root).

There's also ".//*".


>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>> Ouch. let me think about that one.
>>> Yeah, I couldn't figure that one out.  I thought this might work:
>>>      >>> xpath('E:empty')
>>>      e[count(./children::*) = 0 and string(.) = '']
>>> But maybe I don't understand how count() works; this isn't a valid XPath 
>>> expression.
>> You want "child" not "children". Using normalize-space(.) instead of
>> string(.) will exclude whitespace. This does assume you are ignoring
>> comments and PIs; I believe that's the behavior you want.
> 
> Cool, that seems to work right.

What about "e[not(*) and not(normalize-space())]" ?


> One query I'm realizing might be really hard (maybe too hard in XPath) 
> is *:first-of-type, *:last-of-type, and *:only-of-type, since they match 
> in a funny sort of way.  You can't really do:
> 
>    *[count(../*[name() = name()) = 1]

You need two expressions here, one to find the node and one to compare it to
others (note that name() can also take an argument) - but those are really
trick, you're right. They may already touch the borders of what XPath can express.


> But it's kind of what *:only-of-type means.  Or:
> 
>    *[count(following-sibling::name()) = 0 and
>      count(previous-sibling::name()) = 0]
> 
> You just can't use name() that way.  Hmm... well, it's not that 
> important of a query to me, I guess, so maybe I'll just catch it and 
> give an error.

But you can call "name()" with an argument - although not with a node-set (it
will just work on the first entry and ignore the rest in that case).

Stefan


More information about the lxml-dev mailing list