[lxml-dev] XMLSchemaParseError if XML schema namespace uri is not "http://www.w3.org/2001/XMLSchema"
Kev Dwyer
kevin.p.dwyer at gmail.com
Mon Apr 6 11:32:16 CEST 2009
Hello Stefan,
Thanks for the speedy response, and for the workaround suggestions.
All the best,
Kevin
2009/4/3 Stefan Behnel <stefan_ml at behnel.de>
> Hi,
>
> Kev Dwyer wrote:
> > I have encountered a problem with schema object creation with lxml; the
> > problem relates to namespace used for the root element of the schema.
> >
> > <snip>
> >>>> import lxml.etree
> >>>> et = lxml.etree.ElementTree(file=open('c:\\temp\\MySchema', 'r'))
> >>>> et
> > <lxml.etree._ElementTree object at 0x011B8AF8>
> >>>> xsd = lxml.etree.XMLSchema(et)
> >
> > Traceback (most recent call last):
> > File "<pyshell#4>", line 1, in <module>
> > xsd = lxml.etree.XMLSchema(et)
> > File "xmlschema.pxi", line 50, in lxml.etree.XMLSchema.__init__
> > (src/lxml/lxml.etree.c:120919)
> > XMLSchemaParseError: Document is not XML Schema
> > </snip>
> >
> > Looking in subversion
> > (http://codespeak.net/svn/lxml/trunk/src/lxml/xmlschema.pxi), in the
> > XMLSchema class I see:
> >
> > <snip>
> >
> > # work around for libxml2 bug if document is not XML schema
> at
> > all
> > #if _LIBXML_VERSION_INT < 20624:
> > c_node = root_node._c_node
> > c_href = _getNs(c_node)
> > if c_href is NULL or \
> > cstd.strcmp(c_href, 'http://www.w3.org/2001/XMLSchema
> ')
> > != 0:
> > raise XMLSchemaParseError, u"Document is not XML Schema"
>
> Thanks for pointing me to this, this is a left-over work-around for a bug
> that no longer exists in more recent libxml2 versions. I'll try to figure
> out when it was fixed and disable this from that point on. Note that this
> will not solve your problem, though.
>
>
> > The schemas that I am using use this root element:
> > <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
>
> I actually had to look this up, and found a lot of documents containing
> this namespace, but little information why it was changed at the time. It
> appears to be part of an older specification version that happens to still
> work for your stylesheets.
>
> Note that libxml2 does not support this namespace at all, just like most
> other validators I could find a link about.
>
>
> > The schemas are not built by my application, so changing them might be
> > an issue.
>
> You can always do a string replace before passing the XML data to the
> schema parser. Or, you can parse the XML tree using iterparse and fix the
> namespaces while doing so, simply by overwriting the tag names. You can
> pass "tag={http://www.w3.org/2000/10/XMLSchema}*<http://www.w3.org/2000/10/XMLSchema%7D*>"
> to iterparse() to make
> sure it only intercepts on the interesting elements. It will still build
> the complete tree for you, which you can retrieve using "it.root" at the
> end.
>
> Note that a string replace might still be the safer way to do it, as it
> also keeps any prefix mappings intact that XMLSchema may use in text
> content (i.e. qualified names). To be sure that you can safely replace the
> string, you can parse the XML, serialise it to UTF-8, do the replacement,
> and then parse it again. Both parsing and serialising are fast, so you may
> not even notice the difference.
>
> Does that help?
>
> Stefan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090406/0e640ff8/attachment-0001.htm
More information about the lxml-dev
mailing list