[lxml-dev] Ingore namespace when parsing

Aaron Maxwell amax at redsymbol.net
Sat May 2 00:30:06 CEST 2009


On Friday 01 May 2009 11:00:29 am John Lovell wrote:
> Aaron:
>
> It sounds to me like you could use an xpath query.
> rootElement.xpath('//*[local-name() = 'Child1')
> http://codespeak.net/lxml/xpathxslt.html

Thanks, that does work fine.

My actual problem is somewhat more complex than the simplistic example I gave, 
however.  The structure of the XML document is more like this (lots of the 
actual document is excised):
{{{
<ItemLookupResponse 
xmlns="http://webservices.amazon.com/AWSECommerceService/2008-04-07">
  <OperationRequest>
  <Items>
    <Item>
      <ASIN>0521545668</ASIN>
      <OfferSummary>
         (snip)
      </OfferSummary>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>7517</Amount>
            </Price>
(snip)
}}}

This is from Amazon's Associate Web Service API, incidentally.  What's needed 
is to extract the prices for the offers.  So I first obtain an offer 
element - the easiest way is to use exactly the xpath expression you 
mentioned:

{{{
offers = tree.xpath('//*[local-name()="Offer"])
}}}

Then for each offer in offers, I want to get the price information, i.e. the 
content of that Amount tag.  This works:
{{{
def price(offer):
    return 
offer.xpath('*[local-name()="OfferListing"]/*[local-name()="Price"]/*[local-name()="Amount"]')
[0].text
}}}

But, in a word, "yikes".  There has got to be a less verbose way!  I can't 
skip any of those intermediate elements (there are multiple leaf elements 
named Amount, for example; only the specific one above is the actual sale 
price.)  So something like 
{{{'*[local-name()="OfferListing"]//*[local-name()="Amount"]'}}} fails by 
mixing in garbage with the correct result.

(This will probably improve once I learn xpath a little better - still in the 
process of mastering it.)

Anyway, thanks for the xpath suggestion, John - it's probably better than the 
ns()/no_ns() functions in my first post.  Would still be useful if there is a 
way to instruct lxml.etree to somehow strip out the namespace prefix more 
automatically, if anyone can suggest that.

Cheers,
Aaron

--
Aaron Maxwell
http://redsymbol.net/


More information about the lxml-dev mailing list