[lxml-dev] Serialization with namespaces

Anders Bruun Olsen anders at bruun-olsen.net
Tue Sep 11 22:14:11 CEST 2007


Hi,

I need to chop up some XML based on XPath expressions and serialize the
resulting chunks individually. I thought LXML would be perfect for this
task but have run into some problems.

Here is the sample I use, test.xtm:
<?xml version="1.0" encoding="UTF-8"?>
<topicMap
        xmlns="http://www.topicmaps.org/xtm/1.0/"
        xmlns:xlink="http://www.w3.org/1999/xlink"
        id="personnavnereg1">
        <topic id="abeleHenriksdatter">
                <instanceOf>
                        <topicRef xlink:href="person-template.xtmp#kvinde"/>
                </instanceOf>
                <baseName>
                        <baseNameString>Abele Henriksdatter i Radsted,
Gotfred Bangs hustru</baseNameString>
                </baseName>
                <occurrence>
                        <instanceOf>
                                <topicRef
xlink:href="DDref.xtmp#dato1407.01.06"/>
                        </instanceOf>
                        <resourceRef
xlink:href="http://xxx/diplomer/07-002.html"/>
                </occurrence>
        </topic>
</topicMap>

First I parse the file and grab the root:

   >>> from lxml import etree
   >>> tree = etree.parse("test.xtm")
   >>> root = tree.getroot()
   >>> root.nsmap
   {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
'http://www.w3.org/1999/xlink'}

Then I do a little XPath magic:

   >>> find_topics = etree.ETXPath("//{%s}topic" % root.nsmap[None])
   >>> elem = find_topics(root)[0]
   >>> elem
   <Element {http://www.topicmaps.org/xtm/1.0/}topic at 2b8501aa37e0>

Now the problem occurs when I try to serialize. When I serialize the
root, everything looks fine:

   >>> etree.tostring(root, pretty_print=True)
   '<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
xmlns:xlink="http://www.w3.org/1999/xlink" id="personnavnereg1">
   ...

The XML Namespace is applied as it should. However on the topic-element
that I found using XPath no XML Namespace is output:

   >>> etree.tostring(elem, pretty_print=True)
   '<topic id="abeleHenriksdatter">\n\t\t<instanceOf>\n\t\t\t<topicRef
  ...

Even though the nsmap attribute is set correctly:

   >>> elem.nsmap
   {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
'http://www.w3.org/1999/xlink'}

I realize this might be because the element is not the root of the
current document. How can I make LXML output the xmlns in this case?

-- 
Anders


More information about the lxml-dev mailing list