[lxml-dev] Simple doctypes not in docinfo.doctype

Stefan Behnel stefan_ml at behnel.de
Wed Oct 15 19:46:04 CEST 2008


Hi,

F Wolff wrote:
> I've tried this with an old (1.3.2) and newer (2.0.6) lxml version.
> 
> (this example is roughly based on the code at
> http://codespeak.net/lxml/tutorial.html)
> 
> from lxml import etree
> from StringIO import StringIO
> tree = etree.parse(StringIO("""<!DOCTYPE TS><TS></TS>"""))
> tree.docinfo.doctype
> ''
> 
> From my understanding this DOCTYPE declaration is valid (and occurring
> in the wild in Qt .ts files). My real issue is round-trip problems in a
> reading-writing cycle where the DOCTYPE is lost, but I guess not being
> able to use .docinfo.doctype is already a problem.

I agree that better handling is desirable here. Could you file a bug report so
that this doesn't get lost? (and so that you get notified on any further
development).

https://bugs.launchpad.net/lxml

If you want to give it a try yourself, the DOCTYPE writing code is in
src/lxml/serializer.pxi, function _writeDtdToBuffer(), the docinfo code is in
lxml.etree.pyx, class DocInfo. Patches and test cases
(src/lxml/tests/test_etree.py) are welcome.

Thanks,
Stefan



More information about the lxml-dev mailing list