[lxml-dev] lxml.html adds a default doctype to HTML documents

James Graham jg307 at cam.ac.uk
Sun Sep 7 22:15:59 CEST 2008


In [2]: from lxml import html

In [3]: t = html.fromstring("<html><p>Hello World")

In [4]: docinfo = t.getroottree().docinfo

In [5]: docinfo.public_id
Out[5]: '-//W3C//DTD HTML 4.0 Transitional//EN'

Is it possible to prevent this from occurring? I couldn't see anything in the 
API documentation but I might have been missing something obvious. Silently 
gaining incorrect data is annoying :)

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead


More information about the lxml-dev mailing list