[lxml-dev] xmlns / xmlns:xmlns inconsistency
Aaron Brady
castironpi at gmail.com
Fri Sep 12 14:53:39 CEST 2008
On Fri, Sep 12, 2008 at 3:42 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> jholg at gmx.de wrote:
>>> Aaron Brady wrote:
>>> > <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
>>>
>>> Use
>>> root = etree.Element(
>>> '{urn:schemas-microsoft-com:office:spreadsheet}Workbook' )
>>
>> While I can't see the usecase for it, lxml doesn't allow to use two
>> different ns-prefixes for the
>> same namespace through the API, but it does when parsing:
>>
>> >>> root = etree.fromstring('<root xmlns:foo="/foo/bar/namespace"
>> xmlns="/foo/bar/namespace"/>')
>> >>> print etree.tostring(root)
>> <root xmlns:foo="/foo/bar/namespace" xmlns="/foo/bar/namespace"/>
>> >>> root.nsmap
>> {'foo': '/foo/bar/namespace', None: '/foo/bar/namespace'}
>> >>> root2 = etree.Element("root", nsmap=root.nsmap)
>> >>> print etree.tostring(root2)
>> <root xmlns:foo="/foo/bar/namespace"/>
>
> Yes, now that you mention it...
>
> lxml (starting with 2.1 IIRC, or maybe also in 2.0.x) prefers the prefixed
> namespace over the default namespace if both are defined in one nsmap and
> have the same URI. The code that handles this is in apihelpers.pxi,
> function _initNodeNamespaces().
>
> The reason is that the prefixed namespace can also be used for attributes
> and within text values, while the default namespace only applies to
> elements. This is not a 100% solution, rather a "works in most cases" one.
> There are corner cases where the default namespace still wins, e.g. when a
> parsed document defines it before the equivalent prefixed namespace, so
> that libxml2 finds it first when it looks for a declaration.
>
> I consider it best to avoid the default namespace when you're dealing with
> multiple (say, more than two) namespaces in one document, regardless of
> the tool you are using. You never need the default namespace, it's always
> pure convenience.
>
> Stefan
Whoops, sorry Stefan. Reply to all:
I was getting round-trip errors, plus I was targeting exact MS XML
output. If the semantics are the same, then it's not a problem, or
shouldn't be.
Here's the MS XML:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author> </Author>
...
Which as you inferred defines 'xmlns' in the same node as other
attributes that use it.
More information about the lxml-dev
mailing list