[lxml-dev] Problem with using the same URI twice in a namespace

Andreas Degert ad at papyrus-gmbh.de
Wed Apr 23 10:20:07 CEST 2008


On Tue, 22 Apr 2008 22:42:12 +0200
Stefan Behnel <stefan_ml at behnel.de> wrote:

> Hi,
> 
> Andreas Degert wrote:
> > I assume it is legal to have to following namespace
> > declaration/usage:
> > 
> > <top xmlns="a" xmlns:a="a" xmlns:b="b">
> >   <foo bar=""/>
> >   <b:foobar a:bar=""/>
> > </top>
> 
> Sure, the spec calls this well-formed XML - not talking aesthetics,
> though.
> 
> 
> > It works when I read such a definition with lxml.etree.parse, but I
> > can't construct it with lxml.etree.Element because then the nsmap
> > dict will be normalized in such a way that each URI occurs only
> > once.
> 
> Finally someone complaining that there are too *few* namespace
> declarations instead of too many. ;o)
> 
> lxml does a lot of work behind the scenes to keep namespaces
> consistent and simple throughout whatever operation you affect at the
> API level. In the case you describe, lxml checks on each new
> namespace prefix declaration if that namespace is already defined in
> the tree context of the Element and reuses the old prefix if that is
> the case. The function that does that is _initNodeNamespaces() in
> apihelpers.pxi, in case you're interested.
> 
> 
> > Is this a bug in lxml or shouldn't it be used in this way?
> 
> I don't see the use case. What could you do with redundant namespace
> prefix declarations that you can't do with a single one?

I think the behaviour leads to a bug:

t = Element("top",nsmap={None:"a","b":"b"})
SubElement(t, "{b}foobar", {"{a}bar":""})
print tostring(t, pretty_print=True)
-----
<top xmlns="a" xmlns:b="b">
  <b:foobar bar=""/>
</top>
-----

In the output the attribute bar should have namespace a, but it has no
namespace (the default namespace doesn't apply to attributes as
specified in http://www.w3.org/TR/REC-xml-names/#scoping-defaulting,
section 6.2).

hmmm... even simpler example:

Element("top", {"bar":"", "{a}bar":""}, nsmap={None:"a","b":"b"})

yields <top xmlns="a" xmlns:b="b" bar="" bar=""/>

> Imagine you have two prefixes defined for a namespace and you add a
> subelement with that namespace. Which prefix should be used? What
> purpose does that ambiguity serve?

The default namespace is a special case because it doesn't apply to
attributes (this means when attributes have a namespace value they
must be serialized with a prefix). When serializing elements the default
namespace should have a higher priority, i.e. those elements can be
written without prefix.

> Stefan
> 


More information about the lxml-dev mailing list