[lxml-dev] schema validation and resolvers
Michael Ballbach
ballbach at rten.net
Tue Jun 24 03:45:25 CEST 2008
I've been trying to use etree.XMLSchema() to load a schema that has
external imports - but I'd like to utilize an etree.Resolver object. In
my application schemas aren't always on disk.
This does not work, because the libxml2 xmlSchemaAddSchemaDoc()
function, when called to prepare the import, creates a new parser
context and the _local_resolver() function is unable to map this unknown
context to a Resolvers object.
I've made a simple patch but I'll preface this by saying that I don't
know libxml2 or lxml internals very well, so I may well have
misunderstood something about this or there may be a way to do this
(other than the base_url trick, which works, but as I said my schemas
aren't always in files, and when they are they aren't always in the same
directories).
The naive way to fix this is:
Index: parser.pxi
===================================================================
--- parser.pxi (revision 56012)
+++ parser.pxi (working copy)
@@ -393,13 +393,22 @@
context._storage.add(data)
return c_input
+cdef xmlparser.xmlParserCtxt* _findDefaultParserContext() with gil:
+ return __GLOBAL_PARSER_CONTEXT.getDefaultParser()._getParserContext()._c_ctxt
+
cdef xmlparser.xmlParserInput* _local_resolver(char* c_url, char* c_pubid,
xmlparser.xmlParserCtxt* c_context) nogil:
# no Python objects here, may be called without thread context !
# when we declare a Python object, Pyrex will INCREF(None) !
cdef xmlparser.xmlParserInput* c_input
cdef int error
+
+ # check the default parser to support contexts generated within libxml2
if c_context._private is NULL:
+ if _findDefaultParserContext() is not NULL:
+ c_context = _findDefaultParserContext()
+
+ if c_context._private is NULL:
if __DEFAULT_ENTITY_LOADER is NULL:
return NULL
return __DEFAULT_ENTITY_LOADER(c_url, c_pubid, c_context)
This will default to using the thread's default parser's
_ParserDictionaryContext object when none was found inside the passed
context. This 'makes sense' to me in the sense that when I was first
debugging this problem one of the first things I tried was using the
default parser, which I figured might come into play for any parsing
that happens behind the scenes. This works fine and I'm able to use a
Resolver now - as long as I add it to the default parser's resolvers
list.
However, it's not 'correct' in the sense that in reality it should
probably use the resolvers associated with the schema's original parser.
(I think it could be argued that the patch is correct in the abstract in
that it catches resolution requests that would otherwise be missed, but
that it is incorrect in the sense that the more specific resolver
associated with the document's original parser is what should be found
in this particular case.)
From my perspective this could be fixed in one of two basic ways:
1) Modify libxml2 to somehow get lxml's _private stuff in there. This
could be by passing a user specified context for use in new parses
or something like that. I'd imagine this is much less likely to work
out than option 2.
2) Use the _ParserDictionaryContext system there to store state about
when the schema code is entered so that a proper XML context can be
inferred and the original document's parser's resolvers can be
called. One must be careful here to make sure that any use of lxml
within the resolver callbacks does not mess up this state.
I'd love feedback about this issue, and I'd be happy to implement one of
these changes or something else, whatever makes sense to folks, as my
long term interest is in having this work 'out-of-the-box'.
In closing, thanks to all you fellows working on lxml, it's really
great!
--
Michael Ballbach, N0ZTQ
ballbach at rten.net -- PGP KeyID: 0xA05D5555
http://www.rten.net/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080623/cc9da193/attachment.pgp
More information about the lxml-dev
mailing list