[lxml-dev] parsing DTDs - listing of valid elements
Elliott Slaughter
elliottslaughter at gmail.com
Wed Jul 1 00:04:26 CEST 2009
Hi,
I'm trying to get the elements in a DTD. Since these internals are not
exported in the Python interface of lxml.etree, I am trying to write a
Cython extension to do so, as previously suggested on this mailing list (see
link below).
http://codespeak.net/pipermail/lxml-dev/2009-January/004298.html
To quote the message, "all you'd really need is the internal _c_dtd field of
the DTD class, which you could cimport". I'm wondering exactly how I am
supposed to do that (my attempts so far are described below). It would also
be nice to know if the last attempt to do so was successful or not.
Thanks. Any help would be appreciated.
Here is what I've tried so far (on Python 2.5.4, Cython 0.11.2, Windows):
The DTD class is not declared in etreepublic.pxd, so I can't just "cimport
etreepublic". The actual DTD class definition is in dtd.pxi, as stated in
the message. But I can't just "include 'dtd.pxi' " because it inherits from
the _Validator class in lxml.etree.pyx . And I can't "cimport lxml.etree"
because there is no file lxml.etree.pxd.
I tried writing a lxml.etree.pxd file to circumvent these barriers (which
was thoroughly confusing because _Validator contains an _ErrorLog which made
me search through several other files...), but even when I got the entire
thing to compile, it failed to load in Python:
>>> import mydtd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pxd", line 3, in mydtd (mydtd.c:513)
cdef class _LogEntry:
ValueError: lxml.etree._LogEntry does not appear to be the correct type
object
I have attached my lxml.etree.pxd in case I made any mistakes, in the event
that this method can be made to work.
--
Elliott Slaughter
"Don't worry about what anybody else is going to do. The best way to predict
the future is to invent it." - Alan Kay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090630/0a7b98d7/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxml.etree.pxd
Type: application/octet-stream
Size: 602 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090630/0a7b98d7/attachment.obj
More information about the lxml-dev
mailing list