[lxml-dev] Ingore namespace when parsing

Sergio Monteiro Basto sergio at sergiomb.no-ip.org
Wed May 6 00:39:33 CEST 2009


On Wed, 2009-05-06 at 00:04 +0200, Laurence Rowe wrote:
> 2009/5/2 Aaron Maxwell <amax at redsymbol.net>:
> > On Friday 01 May 2009 11:00:29 am John Lovell wrote:
> >> Aaron:
> >>
> >> It sounds to me like you could use an xpath query.
> >> rootElement.xpath('//*[local-name() = 'Child1')
> >> http://codespeak.net/lxml/xpathxslt.html
> >
> > Thanks, that does work fine.
> >
> > My actual problem is somewhat more complex than the simplistic example I gave,
> > however.  The structure of the XML document is more like this (lots of the
> > actual document is excised):
> > {{{
> > <ItemLookupResponse
> > xmlns="http://webservices.amazon.com/AWSECommerceService/2008-04-07">
> >  <OperationRequest>
> >  <Items>
> >    <Item>
> >      <ASIN>0521545668</ASIN>
> >      <OfferSummary>
> >         (snip)
> >      </OfferSummary>
> >      <Offers>
> >        <Offer>
> >          <OfferListing>
> >            <Price>
> >              <Amount>7517</Amount>
> >            </Price>
> > (snip)
> > }}}
> >
> > This is from Amazon's Associate Web Service API, incidentally.  What's needed
> > is to extract the prices for the offers.  So I first obtain an offer
> > element - the easiest way is to use exactly the xpath expression you
> > mentioned:
> >
> > {{{
> > offers = tree.xpath('//*[local-name()="Offer"])
> > }}}
> >
> > Then for each offer in offers, I want to get the price information, i.e. the
> > content of that Amount tag.  This works:
> > {{{
> > def price(offer):
> >    return
> > offer.xpath('*[local-name()="OfferListing"]/*[local-name()="Price"]/*[local-name()="Amount"]')
> > [0].text
> > }}}
> >
> > But, in a word, "yikes".  There has got to be a less verbose way!  I can't
> > skip any of those intermediate elements (there are multiple leaf elements
> > named Amount, for example; only the specific one above is the actual sale
> > price.)  So something like
> > {{{'*[local-name()="OfferListing"]//*[local-name()="Amount"]'}}} fails by
> > mixing in garbage with the correct result.
> >
> > (This will probably improve once I learn xpath a little better - still in the
> > process of mastering it.)
> >
> > Anyway, thanks for the xpath suggestion, John - it's probably better than the
> > ns()/no_ns() functions in my first post.  Would still be useful if there is a
> > way to instruct lxml.etree to somehow strip out the namespace prefix more
> > automatically, if anyone can suggest that.
> 
> 
> You can supply a namespaces argument to the xpath method:
> {{{
> offers = tree.xpath('//aws:Offer',
> namespaces=dict(aws="http://webservices.amazon.com/AWSECommerceService/2008-04-07"))
> }}}
> 
> See http://codespeak.net/lxml/xpathxslt.html for the details.

or define in global way (I think that is what you want). 
Using last example: 

myns1 = etree.FunctionNamespace('http://webservices.amazon.com/AWSECommerceService/2008-04-07')
myns1.prefix = "aws"
offers = tree.xpath('//aws:Offer')

-- 
Sérgio M. B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2192 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090505/e37c746a/attachment.bin 


More information about the lxml-dev mailing list