[lxml-dev] XMLSchema validate and entities

Stefan Behnel stefan_ml at behnel.de
Thu Jul 31 20:26:56 CEST 2008


Hi,

Eric Garin wrote:
> At first, congratulations, I'm using lxml for more that one year and enjoy the huge progress (and work) you have done.

:) Happy to hear that.


> I'm using lxml to validate XML documents instances with etree.XMLSchema(schema_doc).validate(xml_doc).
> I've used to work with DTD's where it's possible to include standard sets of HTML entities declarations like for example for (  é etc ...).
> 
> Now, working with XML schemas, sometimes I have some of those common HTML entities that appears (from an editor like FCK) in the content.
> And at the validation time, of course, I have an error like this :
> 
>   File "lxml.etree.pyx", line 2520, in lxml.etree.parse
>   File "parser.pxi", line 1309, in lxml.etree._parseDocument
>   File "parser.pxi", line 1338, in lxml.etree._parseDocumentFromURL
>   File "parser.pxi", line 1248, in lxml.etree._parseDocFromFile
>   File "parser.pxi", line 828, in lxml.etree._BaseParser._parseDocFromFile
>   File "parser.pxi", line 452, in lxml.etree._ParserContext._handleParseResultDoc
>   File "parser.pxi", line 536, in lxml.etree._handleParseResult
>   File "parser.pxi", line 478, in lxml.etree._raiseParseError
> lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 21, column 16
> 
> 1. Is there a way to escape those entities at validation time ?

The stack trace above shows up at parse time. If you have entity references in
your XML document, you have to use a DTD at parse time that defines them, or
you can pass the "resolve_entities=False" option to the parser to keep them in
the tree (which might make tree handling a little harder, though).


> 2. Or Do I need to declare entities in the schema (I understand that this question is not in the lxml topic, but I didn't find a way to do that)

XML Schema deliberately does not support entity declarations (or references,
for that purpose). They are a pure DTD thing.

Stefan


More information about the lxml-dev mailing list