[lxml-dev] lxml 2.2 validation question
Stefan Behnel
stefan_ml at behnel.de
Tue May 19 19:44:46 CEST 2009
James Slagle wrote:
> I'm having some trouble getting lxml (v. 2.2) to validate an ElementTree
> object that I'm building and was hoping someone on the list could help and
> maybe tell me what I'm doing wrong.
>
> If I create an ElementTree object directly from xml and an associated
> schema,
> it will validate fine. If I then construct a similar ElementTree object by
> just instantianting ElementTree, it will not validate. The odd thing is
> that the resulting xml from etree.tostring for both objects is identical.
>
> I've attached a python script that shows the problem I'm having. The
> validation error is:
> *** DocumentInvalid: Element 'Foo': No matching global declaration available
> for the validation root.
>
> I can get the second ElementTree object (etree2) to validate if I put the
> long
> explicit namesplace in front of the tag value (Foo) when I create etree2 in
> the
> script. So, if I change line 25 in the script to:
> rootelem = etree.Element('{http://example.com}Foo', {}, nsmap)
> , it will validate.
>
> However, the 2 resulting xml outputs are no longer equal b/c the output from
> etree2 is output with explict namespaces.
With "explicit", do you mean that it uses namespace prefixes instead of the
default namespace?
lxml.etree internally does some namespace cleanup on the fly and (re-)maps
the namespaces of qualified tag names ("{abc}tag") to namespace prefixes
depending on the place you insert an Element into a tree. Doing so, it will
only use one namespace declaration for each mapping, even if you redeclare
a namespace with more than one prefix. A side effect is that a namespace
declaration may end up being unused if lxml finds a different declaration
first.
Anyway, a few things to note here:
1) namespace prefixes are highly overrated
2) the default namespace is highly overused, especially when mixed with
other (prefixed) namespaces
3) it is rarely (not 'never', but 'rarely') useful to declare the same
namespace more than once.
4) comparing textual representations of XML documents is futile most of the
time, except for their C14N serialisation.
Stefan
More information about the lxml-dev
mailing list