[lxml-dev] Ingore namespace when parsing
John Lovell
jlovell at nwesd.org
Fri May 1 20:00:29 CEST 2009
Aaron:
It sounds to me like you could use an xpath query.
rootElement.xpath('//*[local-name() = 'Child1')
http://codespeak.net/lxml/xpathxslt.html
Good luck,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
www.nwesd.org
Together We Can ...
-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Aaron Maxwell
Sent: Friday, May 01, 2009 10:41 AM
To: lxml-dev at codespeak.net
Subject: [lxml-dev] Ingore namespace when parsing
Hi all,
When using python lxml to parse an XML document whose root element defines a namespace, is there some way the library can allow me to not explicitly invoke that namespace in queries?
Consider an XML document with this content:
{{{
<?xml version="1.0" ?>
<Root xmlns="http://redsymbol.net/SomeNamespace">
<Child1></Child1>
<Child2></Child2>
</Root>
}}}
If I parse it like this:
{{{
def ignore_ns(path_to_file):
x = etree.parse(open(path_to_file))
for kid in x.getroot():
print kid.tag
}}}
... where the path_to_file contains the above xml document, then this output is produced:
{{{
{http://redsymbol.net/SomeNamespace}Child1
{http://redsymbol.net/SomeNamespace}Child2
}}}
Alternatively, I can define a namespace-string stripping function dynamically, and apply it as needed:
{{{
def strip_out_ns():
x = etree.parse(open(path_to_file))
ns = x.getroot().nsmap[None]
def no_ns(s):
return s.split('{'+ns+'}')[-1]
for kid in x.getroot():
print no_ns(kid.tag)
}}}
The output of this is simpler:
{{{
Child1
Child2
}}}
More commonly, I will want to search for a child element of some root, using a query like
{{{
rootElement.find('Child1')
}}}
(where rootElement is an Element object). In the namespaced xml document above, this call to .find() will return None, but
{{{
# ns found from rootElement.nsmap as above rootElement.find('{' + ns + '}' + 'Child1') }}}
will correctly find the child element.
In this kind of situation, where I just want to parse the document and really don't care about the namespace, is there some way to construct a parser that will ignore it in a more automated way? Is there a simpler, better approach, or some insight I'm missing?
Thanks everyone in advance.
Cheers,
Aaron
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
More information about the lxml-dev
mailing list