[lxml-dev] xmlns / xmlns:xmlns inconsistency

Aaron Brady castironpi at gmail.com
Fri Sep 12 14:53:39 CEST 2008


On Fri, Sep 12, 2008 at 3:42 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> jholg at gmx.de wrote:
>>> Aaron Brady wrote:
>>> > <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
>>>
>>> Use
>>>     root = etree.Element(
>>>         '{urn:schemas-microsoft-com:office:spreadsheet}Workbook' )
>>
>> While I can't see the usecase for it, lxml doesn't allow to use two
>> different ns-prefixes for the
>> same namespace through the API, but it does when parsing:
>>
>>  >>> root = etree.fromstring('<root xmlns:foo="/foo/bar/namespace"
>>        xmlns="/foo/bar/namespace"/>')
>>  >>> print etree.tostring(root)
>>  <root xmlns:foo="/foo/bar/namespace" xmlns="/foo/bar/namespace"/>
>>  >>> root.nsmap
>>  {'foo': '/foo/bar/namespace', None: '/foo/bar/namespace'}
>>  >>> root2 = etree.Element("root", nsmap=root.nsmap)
>>  >>> print etree.tostring(root2)
>>  <root xmlns:foo="/foo/bar/namespace"/>
>
> Yes, now that you mention it...
>
> lxml (starting with 2.1 IIRC, or maybe also in 2.0.x) prefers the prefixed
> namespace over the default namespace if both are defined in one nsmap and
> have the same URI. The code that handles this is in apihelpers.pxi,
> function _initNodeNamespaces().
>
> The reason is that the prefixed namespace can also be used for attributes
> and within text values, while the default namespace only applies to
> elements. This is not a 100% solution, rather a "works in most cases" one.
> There are corner cases where the default namespace still wins, e.g. when a
> parsed document defines it before the equivalent prefixed namespace, so
> that libxml2 finds it first when it looks for a declaration.
>
> I consider it best to avoid the default namespace when you're dealing with
> multiple (say, more than two) namespaces in one document, regardless of
> the tool you are using. You never need the default namespace, it's always
> pure convenience.
>
> Stefan

Whoops, sorry Stefan.  Reply to all:

I was getting round-trip errors, plus I was targeting exact MS XML
output.  If the semantics are the same, then it's not a problem, or
shouldn't be.

Here's the MS XML:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
 <Author> </Author>
...

Which as you inferred defines 'xmlns' in the same node as other
attributes that use it.


More information about the lxml-dev mailing list