[lxml-dev] parsing DTDs - listing of valid elements
Stefan Behnel
stefan_ml at behnel.de
Sat Jan 10 17:50:07 CET 2009
Richard Rosenberg wrote:
> Hello:
>
> I am interested in using lxml for parsing DTDs (or better still RelaxNG
> schemas) and extracting info about the DTD as opposed to validating XML.
>
> The idea is to use it in a python powered XML editor. Has anyone done anything
> similar? Or even thought about anything similar?
>
> I found an old post on the XML SIG that talks about xmlproc:
>
> http://mail.python.org/pipermail/xml-sig/2001-February/004582.html
>
> . . .And it looks like it may be possible using:
>
> xml.parsers.xmlproc.xmldtd.CompletedDTD.get_elements()
>
> As in the linked example.
>
> Any ideas about how to use lxml or an alternative, and/or any notions as to
> other approaches are most welcome. I'm already using (and loving) lxml for
> some relatively simple parsing tasks, so that's why I am starting here.
>
> Thanks,
>
> Richard
The content of a parsed DTD is not exposed by lxml.etree. Implementing that
would require a complete Python-level object representation of a DTD.
You could extract this information at the C level (by implementing a
separate Cython module), but not currently at the Python level. DTDs are
parsed here:
http://codespeak.net/svn/lxml/trunk/src/lxml/dtd.pxi
Here's a short example of an external module:
http://codespeak.net/lxml/capi.html
although all you'd really need is the internal _c_dtd field of the DTD
class, which you could cimport as described here:
http://docs.cython.org/docs/sharing_declarations.html#sharing-declarations
http://docs.cython.org/docs/sharing_declarations.html#sharing-extension-types
Stefan
More information about the lxml-dev
mailing list