[lxml-dev] Ingore namespace when parsing

John Lovell jlovell at nwesd.org
Fri May 1 20:00:29 CEST 2009


Aaron:

It sounds to me like you could use an xpath query.

rootElement.xpath('//*[local-name() = 'Child1')

http://codespeak.net/lxml/xpathxslt.html

Good luck,

John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
 
www.nwesd.org
Together We Can ...


-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Aaron Maxwell
Sent: Friday, May 01, 2009 10:41 AM
To: lxml-dev at codespeak.net
Subject: [lxml-dev] Ingore namespace when parsing

Hi all,

When using python lxml to parse an XML document whose root element defines a namespace, is there some way the library can allow me to not explicitly invoke that namespace in queries?

Consider an XML document with this content:
{{{
<?xml version="1.0" ?>
<Root xmlns="http://redsymbol.net/SomeNamespace">
  <Child1></Child1>
  <Child2></Child2>
</Root>
}}}

If I parse it like this:
{{{
def ignore_ns(path_to_file):
    x = etree.parse(open(path_to_file))
    for kid in x.getroot():
        print kid.tag
}}}

... where the path_to_file contains the above xml document, then this output is produced:

{{{
{http://redsymbol.net/SomeNamespace}Child1
{http://redsymbol.net/SomeNamespace}Child2
}}}

Alternatively, I can define a namespace-string stripping function dynamically, and apply it as needed:

{{{
def strip_out_ns():
    x = etree.parse(open(path_to_file))
    ns = x.getroot().nsmap[None]
    def no_ns(s):
        return s.split('{'+ns+'}')[-1]
    for kid in x.getroot():
        print no_ns(kid.tag)
}}}

The output of this is simpler:
{{{
Child1
Child2
}}}

More commonly, I will want to search for a child element of some root, using a query like 

{{{
rootElement.find('Child1')
}}}

(where rootElement is an Element object).  In the namespaced xml document above, this call to .find() will return None, but

{{{
# ns found from rootElement.nsmap as above rootElement.find('{' + ns + '}' + 'Child1') }}}

will correctly find the child element.

In this kind of situation, where I just want to parse the document and really don't care about the namespace, is there some way to construct a parser that will ignore it in a more automated way?  Is there a simpler, better approach, or some insight I'm missing?

Thanks everyone in advance.

Cheers,
Aaron
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev


More information about the lxml-dev mailing list