[lxml-dev] Ingore namespace when parsing

Sergio Monteiro Basto sergio at sergiomb.no-ip.org
Tue May 5 23:36:16 CEST 2009


from http://codespeak.net/lxml/xpathxslt.html 
a simplifyed example:     
    f = StringIO('''<a:foo xmlns:a="http://codespeak.net/ns/test1"
        xmlns:b="http://codespeak.net/ns/test2">
        <b:bar>Text</b:bar>
    </a:foo> ''')

    doc = etree.parse(f,parser=hparser)
    r = doc.xpath('//b:bar',
         namespaces={'b': 'http://codespeak.net/ns/test2'})
    print len(r)
    print r[0].tag
    print r[0].text

and extensions 
http://codespeak.net/lxml/extensions.html

I'm trying work with some namespaces either but the documentation spin
too much for me. 

In yours example, I don't see any <t:price> etc .
so it is difficult guess 

On Fri, 2009-05-01 at 15:30 -0700, Aaron Maxwell wrote:
> On Friday 01 May 2009 11:00:29 am John Lovell wrote:
> > Aaron:
> >
> > It sounds to me like you could use an xpath query.
> > rootElement.xpath('//*[local-name() = 'Child1')
> > http://codespeak.net/lxml/xpathxslt.html
> 
> Thanks, that does work fine.
> 
> My actual problem is somewhat more complex than the simplistic example I gave, 
> however.  The structure of the XML document is more like this (lots of the 
> actual document is excised):
> {{{
> <ItemLookupResponse 
> xmlns="http://webservices.amazon.com/AWSECommerceService/2008-04-07">
>   <OperationRequest>
>   <Items>
>     <Item>
>       <ASIN>0521545668</ASIN>
>       <OfferSummary>
>          (snip)
>       </OfferSummary>
>       <Offers>
>         <Offer>
>           <OfferListing>
>             <Price>
>               <Amount>7517</Amount>
>             </Price>
> (snip)
> }}}
> 
> This is from Amazon's Associate Web Service API, incidentally.  What's needed 
> is to extract the prices for the offers.  So I first obtain an offer 
> element - the easiest way is to use exactly the xpath expression you 
> mentioned:
> 
> {{{
> offers = tree.xpath('//*[local-name()="Offer"])
> }}}
> 
> Then for each offer in offers, I want to get the price information, i.e. the 
> content of that Amount tag.  This works:
> {{{
> def price(offer):
>     return 
> offer.xpath('*[local-name()="OfferListing"]/*[local-name()="Price"]/*[local-name()="Amount"]')
> [0].text
> }}}
> 
> But, in a word, "yikes".  There has got to be a less verbose way!  I can't 
> skip any of those intermediate elements (there are multiple leaf elements 
> named Amount, for example; only the specific one above is the actual sale 
> price.)  So something like 
> {{{'*[local-name()="OfferListing"]//*[local-name()="Amount"]'}}} fails by 
> mixing in garbage with the correct result.
> 
> (This will probably improve once I learn xpath a little better - still in the 
> process of mastering it.)
> 
> Anyway, thanks for the xpath suggestion, John - it's probably better than the 
> ns()/no_ns() functions in my first post.  Would still be useful if there is a 
> way to instruct lxml.etree to somehow strip out the namespace prefix more 
> automatically, if anyone can suggest that.
> 
> Cheers,
> Aaron
> 
> --
> Aaron Maxwell
> http://redsymbol.net/
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-- 
Sérgio M. B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2192 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090505/7da5d528/attachment.bin 


More information about the lxml-dev mailing list