[lxml-dev] XMLSchema validate and entities
Stefan Behnel
stefan_ml at behnel.de
Thu Jul 31 20:26:56 CEST 2008
Hi,
Eric Garin wrote:
> At first, congratulations, I'm using lxml for more that one year and enjoy the huge progress (and work) you have done.
:) Happy to hear that.
> I'm using lxml to validate XML documents instances with etree.XMLSchema(schema_doc).validate(xml_doc).
> I've used to work with DTD's where it's possible to include standard sets of HTML entities declarations like for example for ( é etc ...).
>
> Now, working with XML schemas, sometimes I have some of those common HTML entities that appears (from an editor like FCK) in the content.
> And at the validation time, of course, I have an error like this :
>
> File "lxml.etree.pyx", line 2520, in lxml.etree.parse
> File "parser.pxi", line 1309, in lxml.etree._parseDocument
> File "parser.pxi", line 1338, in lxml.etree._parseDocumentFromURL
> File "parser.pxi", line 1248, in lxml.etree._parseDocFromFile
> File "parser.pxi", line 828, in lxml.etree._BaseParser._parseDocFromFile
> File "parser.pxi", line 452, in lxml.etree._ParserContext._handleParseResultDoc
> File "parser.pxi", line 536, in lxml.etree._handleParseResult
> File "parser.pxi", line 478, in lxml.etree._raiseParseError
> lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 21, column 16
>
> 1. Is there a way to escape those entities at validation time ?
The stack trace above shows up at parse time. If you have entity references in
your XML document, you have to use a DTD at parse time that defines them, or
you can pass the "resolve_entities=False" option to the parser to keep them in
the tree (which might make tree handling a little harder, though).
> 2. Or Do I need to declare entities in the schema (I understand that this question is not in the lxml topic, but I didn't find a way to do that)
XML Schema deliberately does not support entity declarations (or references,
for that purpose). They are a pure DTD thing.
Stefan
More information about the lxml-dev
mailing list