[lxml-dev] XMLSchemaParseError if XML schema namespace uri is not "http://www.w3.org/2001/XMLSchema"
Stefan Behnel
stefan_ml at behnel.de
Fri Apr 3 21:31:07 CEST 2009
Hi,
Kev Dwyer wrote:
> I have encountered a problem with schema object creation with lxml; the
> problem relates to namespace used for the root element of the schema.
>
> <snip>
>>>> import lxml.etree
>>>> et = lxml.etree.ElementTree(file=open('c:\\temp\\MySchema', 'r'))
>>>> et
> <lxml.etree._ElementTree object at 0x011B8AF8>
>>>> xsd = lxml.etree.XMLSchema(et)
>
> Traceback (most recent call last):
> File "<pyshell#4>", line 1, in <module>
> xsd = lxml.etree.XMLSchema(et)
> File "xmlschema.pxi", line 50, in lxml.etree.XMLSchema.__init__
> (src/lxml/lxml.etree.c:120919)
> XMLSchemaParseError: Document is not XML Schema
> </snip>
>
> Looking in subversion
> (http://codespeak.net/svn/lxml/trunk/src/lxml/xmlschema.pxi), in the
> XMLSchema class I see:
>
> <snip>
>
> # work around for libxml2 bug if document is not XML schema at
> all
> #if _LIBXML_VERSION_INT < 20624:
> c_node = root_node._c_node
> c_href = _getNs(c_node)
> if c_href is NULL or \
> cstd.strcmp(c_href, 'http://www.w3.org/2001/XMLSchema')
> != 0:
> raise XMLSchemaParseError, u"Document is not XML Schema"
Thanks for pointing me to this, this is a left-over work-around for a bug
that no longer exists in more recent libxml2 versions. I'll try to figure
out when it was fixed and disable this from that point on. Note that this
will not solve your problem, though.
> The schemas that I am using use this root element:
> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
I actually had to look this up, and found a lot of documents containing
this namespace, but little information why it was changed at the time. It
appears to be part of an older specification version that happens to still
work for your stylesheets.
Note that libxml2 does not support this namespace at all, just like most
other validators I could find a link about.
> The schemas are not built by my application, so changing them might be
> an issue.
You can always do a string replace before passing the XML data to the
schema parser. Or, you can parse the XML tree using iterparse and fix the
namespaces while doing so, simply by overwriting the tag names. You can
pass "tag={http://www.w3.org/2000/10/XMLSchema}*" to iterparse() to make
sure it only intercepts on the interesting elements. It will still build
the complete tree for you, which you can retrieve using "it.root" at the end.
Note that a string replace might still be the safer way to do it, as it
also keeps any prefix mappings intact that XMLSchema may use in text
content (i.e. qualified names). To be sure that you can safely replace the
string, you can parse the XML, serialise it to UTF-8, do the replacement,
and then parse it again. Both parsing and serialising are fast, so you may
not even notice the difference.
Does that help?
Stefan
More information about the lxml-dev
mailing list