[lxml-dev] parsing DTDs - listing of valid elements

Stefan Behnel stefan_ml at behnel.de
Sat Jan 10 17:50:07 CET 2009


Richard Rosenberg wrote:
> Hello:
> 
> I am interested in using lxml for parsing DTDs (or better still RelaxNG 
> schemas) and extracting info about the DTD as opposed to validating XML.
> 
> The idea is to use it in a python powered XML editor. Has anyone done anything 
> similar? Or even thought about anything similar?
> 
> I found an old post on the XML SIG that talks about xmlproc:
> 
> http://mail.python.org/pipermail/xml-sig/2001-February/004582.html
> 
> . . .And it looks like it may be possible using:
> 
> xml.parsers.xmlproc.xmldtd.CompletedDTD.get_elements() 
>
> As in the linked example.
>
> Any ideas about how to use lxml or an alternative, and/or any notions as to 
> other approaches are most welcome. I'm already using (and loving) lxml for 
> some relatively simple parsing tasks, so that's why I am starting here.
> 
> Thanks,
> 
> Richard

The content of a parsed DTD is not exposed by lxml.etree. Implementing that
would require a complete Python-level object representation of a DTD.

You could extract this information at the C level (by implementing a
separate Cython module), but not currently at the Python level. DTDs are
parsed here:

http://codespeak.net/svn/lxml/trunk/src/lxml/dtd.pxi

Here's a short example of an external module:

http://codespeak.net/lxml/capi.html

although all you'd really need is the internal _c_dtd field of the DTD
class, which you could cimport as described here:

http://docs.cython.org/docs/sharing_declarations.html#sharing-declarations
http://docs.cython.org/docs/sharing_declarations.html#sharing-extension-types

Stefan


More information about the lxml-dev mailing list