Hello Stefan,<br><br>Thanks for the speedy response, and for the workaround suggestions.<br><br>All the best,<br><br>Kevin<br><br><div class="gmail_quote">2009/4/3 Stefan Behnel <span dir="ltr"><<a href="mailto:stefan_ml@behnel.de">stefan_ml@behnel.de</a>></span><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi,<br>
<div><div></div><div class="h5"><br>
Kev Dwyer wrote:<br>
> I have encountered a problem with schema object creation with lxml; the<br>
> problem relates to namespace used for the root element of the schema.<br>
><br>
> <snip><br>
>>>> import lxml.etree<br>
>>>> et = lxml.etree.ElementTree(file=open('c:\\temp\\MySchema', 'r'))<br>
>>>> et<br>
> <lxml.etree._ElementTree object at 0x011B8AF8><br>
>>>> xsd = lxml.etree.XMLSchema(et)<br>
><br>
> Traceback (most recent call last):<br>
> File "<pyshell#4>", line 1, in <module><br>
> xsd = lxml.etree.XMLSchema(et)<br>
> File "xmlschema.pxi", line 50, in lxml.etree.XMLSchema.__init__<br>
> (src/lxml/lxml.etree.c:120919)<br>
> XMLSchemaParseError: Document is not XML Schema<br>
> </snip><br>
><br>
> Looking in subversion<br>
> (<a href="http://codespeak.net/svn/lxml/trunk/src/lxml/xmlschema.pxi" target="_blank">http://codespeak.net/svn/lxml/trunk/src/lxml/xmlschema.pxi</a>), in the<br>
> XMLSchema class I see:<br>
><br>
> <snip><br>
><br>
> # work around for libxml2 bug if document is not XML schema at<br>
> all<br>
> #if _LIBXML_VERSION_INT < 20624:<br>
> c_node = root_node._c_node<br>
> c_href = _getNs(c_node)<br>
> if c_href is NULL or \<br>
> cstd.strcmp(c_href, '<a href="http://www.w3.org/2001/XMLSchema" target="_blank">http://www.w3.org/2001/XMLSchema</a>')<br>
> != 0:<br>
> raise XMLSchemaParseError, u"Document is not XML Schema"<br>
<br>
</div></div>Thanks for pointing me to this, this is a left-over work-around for a bug<br>
that no longer exists in more recent libxml2 versions. I'll try to figure<br>
out when it was fixed and disable this from that point on. Note that this<br>
will not solve your problem, though.<br>
<div class="im"><br>
<br>
> The schemas that I am using use this root element:<br>
> <xsd:schema xmlns:xsd="<a href="http://www.w3.org/2000/10/XMLSchema" target="_blank">http://www.w3.org/2000/10/XMLSchema</a>"><br>
<br>
</div>I actually had to look this up, and found a lot of documents containing<br>
this namespace, but little information why it was changed at the time. It<br>
appears to be part of an older specification version that happens to still<br>
work for your stylesheets.<br>
<br>
Note that libxml2 does not support this namespace at all, just like most<br>
other validators I could find a link about.<br>
<div class="im"><br>
<br>
> The schemas are not built by my application, so changing them might be<br>
> an issue.<br>
<br>
</div>You can always do a string replace before passing the XML data to the<br>
schema parser. Or, you can parse the XML tree using iterparse and fix the<br>
namespaces while doing so, simply by overwriting the tag names. You can<br>
pass "tag={<a href="http://www.w3.org/2000/10/XMLSchema%7D*" target="_blank">http://www.w3.org/2000/10/XMLSchema}*</a>" to iterparse() to make<br>
sure it only intercepts on the interesting elements. It will still build<br>
the complete tree for you, which you can retrieve using "it.root" at the end.<br>
<br>
Note that a string replace might still be the safer way to do it, as it<br>
also keeps any prefix mappings intact that XMLSchema may use in text<br>
content (i.e. qualified names). To be sure that you can safely replace the<br>
string, you can parse the XML, serialise it to UTF-8, do the replacement,<br>
and then parse it again. Both parsing and serialising are fast, so you may<br>
not even notice the difference.<br>
<br>
Does that help?<br>
<font color="#888888"><br>
Stefan<br>
</font></blockquote></div><br>