[lxml-dev] lxml.html adds a default doctype to HTML documents
James Graham
jg307 at cam.ac.uk
Sun Sep 7 22:15:59 CEST 2008
In [2]: from lxml import html
In [3]: t = html.fromstring("<html><p>Hello World")
In [4]: docinfo = t.getroottree().docinfo
In [5]: docinfo.public_id
Out[5]: '-//W3C//DTD HTML 4.0 Transitional//EN'
Is it possible to prevent this from occurring? I couldn't see anything in the
API documentation but I might have been missing something obvious. Silently
gaining incorrect data is annoying :)
--
"Eternity's a terrible thought. I mean, where's it all going to end?"
-- Tom Stoppard, Rosencrantz and Guildenstern are Dead
More information about the lxml-dev
mailing list