[lxml-dev] lxml 2.2 validation question

jholg at gmx.de jholg at gmx.de
Tue May 19 11:06:43 CEST 2009


> I can get the second ElementTree object (etree2) to validate if I put the
> long
> explicit namesplace in front of the tag value (Foo) when I create etree2
> in
> the
> script.  So, if I change line 25 in the script to:
> rootelem = etree.Element('{http://example.com}Foo', {}, nsmap)
> , it will validate.

And this is the right way to create an element that lives in namespace http://example.com.

Some comments:

> nsmap={None: 'http://example.com', 'foo': 'http://example.com'}
> rootElem = etree.Element('Foo', {}, nsmap)

Note that this does not put Foo into the http://example.com NS. It creates an element Foo wit no namespace. The nsmap is rather a collection of known namespace prefixes in the context of an element.

So you could do 

>>> rootElem = etree.Element('{http://example.com}Foo', {}, {None: 'http://example.com'})
rootElem.text = '\nContents\n'

>>> rootElem.text = '\nContents\n'
>>> print etree.tostring(rootElem, pretty_print=True)
<Foo xmlns="http://example.com">
Contents
</Foo>
 
>>>
>>> print schemaObj.validate(rootElem)
True
>>>

which puts Foo into the intended NS and uses this NS unprefixed in the output.

But if you do this

>>> rootElem = etree.Element('{http://example.com}Foo', {}, {None: 'http://example.com',
'foo': 'http://example.com'})
>>> rootElem.text = '\nContents\n'
>>> print schemaObj.validate(rootElem)
True
>>> print etree.tostring(rootElem, pretty_print=True)
<foo:Foo xmlns:foo="http://example.com" xmlns="http://example.com">
Contents
</foo:Foo>
 
>>>

you end up with the foo prefix, the reason for this probably being the order a prefix for the NS http://example.com is found in the given nsmap (dictionaries are unordered).

> However, the 2 resulting xml outputs are no longer equal b/c the output > from 
> etree2 is output with explict namespaces.

While textual equality is often dubious in XML :) you might cleanup the superfluous namespaces:

>>> rootElem = etree.Element('{http://example.com}Foo', nsmap={None: 'http://example.com'})
>>> rootElem.text = '\nContents\n'
>>>
>>> print etree.tostring(rootElem, pretty_print=True)
<Foo xmlns="http://example.com">
Contents
</Foo>
 
>>> etree1 = etree.fromstring("""\
... <Foo xmlns:foo="http://example.com"
...      xmlns="http://example.com">
... Contents
... </Foo>
... """
... )
>>> etree.tostring(etree1, pretty_print=True) == etree.tostring(rootElem, pretty_print=True)
False
>>> etree.cleanup_namespaces(etree1)
>>>
>>> etree.tostring(etree1, pretty_print=True) == etree.tostring(rootElem, pretty_print=True)
True
>>>

Or even consider canonicalization, see http://codespeak.net/lxml/api.html#write-c14n-on-elementtree

Holger

-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


More information about the lxml-dev mailing list