[lxml-dev] Xhtml and entities

Stefan Behnel stefan_ml at behnel.de
Sat May 26 12:54:56 CEST 2007



Eric Garin wrote:
> I got something a bit better cause I've just declare a DOCTYPE in the XML instance.
> 
> My document is :
> 
> <!DOCTYPE html SYSTEM "dtd/xhtml1-strict.dtd">
> 
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> <title>Untitled Document</title>
> </head>
> 
> <body>
> <p>This entity exists : &rsquo;</p>
> <p>This not exists : &rsquo1111;</p>
> </body>
> </html>
> 
> 
> The thing is that now it's ALWAYS valid even if there is an non-declared XHTML entity in the XML
> I use the parser to load the instance like that :
> 
>             parser = etree.XMLParser(load_dtd=True)
>             docXhtml = etree.parse(fileXhtml, parser)

As I said, please read

http://codespeak.net/lxml/dev/parsing.html#parser-options

where it says:

  load_dtd - load and parse the DTD while parsing (no validation is performed)
  dtd_validation - validate while parsing (if a DTD was referenced)

lxml.etree does not load DTDs by default and it does not do validation only
because you tell it to *load* the DTD. If you want DTD validation in the
parser, you have to tell it to do "dtd_validation".

Stefan



More information about the lxml-dev mailing list