[lxml-dev] Proposal: Better html5lib Support
Stefan Behnel
stefan_ml at behnel.de
Sun Jul 13 06:57:05 CEST 2008
Hi,
Armin Ronacher wrote:
> Stefan Behnel <stefan_ml <at> behnel.de> writes:
>>> There is another small problem with html5lib and lxml interoperability that
>>> is the HTML5 doctype ("<!DOCTYPE HTML>") that lxml naturally cannot handle.
>> Does the "cannot handle" result in any visible problems?
> This document::
>
> <!doctype html>
> <title>foo</title>
> <p>blah
>
> Comes out as (lxml.etree.tostring)::
>
> <!DOCTYPE html PUBLIC "" "">
> ...
We are actually serialising the DOCTYPE ourselves. Try this patch.
I'm not sure if <!DOCTYPE html> is actually allowed in SGML, didn't find
anything on that so far. If it isn't, I'll have to see if I can restrict the
impact of the patch to this specific case.
Note that you will need Cython 0.9.8 installed to build a patched lxml.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: html5-doctype.patch
Type: text/x-patch
Size: 1781 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080713/1f011c61/attachment.bin
More information about the lxml-dev
mailing list