[lxml-dev] Xhtml and entities
Stefan Behnel
stefan_ml at behnel.de
Sat May 26 12:54:56 CEST 2007
Eric Garin wrote:
> I got something a bit better cause I've just declare a DOCTYPE in the XML instance.
>
> My document is :
>
> <!DOCTYPE html SYSTEM "dtd/xhtml1-strict.dtd">
>
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> <title>Untitled Document</title>
> </head>
>
> <body>
> <p>This entity exists : ’</p>
> <p>This not exists : &rsquo1111;</p>
> </body>
> </html>
>
>
> The thing is that now it's ALWAYS valid even if there is an non-declared XHTML entity in the XML
> I use the parser to load the instance like that :
>
> parser = etree.XMLParser(load_dtd=True)
> docXhtml = etree.parse(fileXhtml, parser)
As I said, please read
http://codespeak.net/lxml/dev/parsing.html#parser-options
where it says:
load_dtd - load and parse the DTD while parsing (no validation is performed)
dtd_validation - validate while parsing (if a DTD was referenced)
lxml.etree does not load DTDs by default and it does not do validation only
because you tell it to *load* the DTD. If you want DTD validation in the
parser, you have to tell it to do "dtd_validation".
Stefan
More information about the lxml-dev
mailing list