[lxml-dev] lxml lets undeclared entities pass through silently
Stefan Behnel
stefan_ml at behnel.de
Sun May 27 10:34:00 CEST 2007
Stefan Behnel wrote:
> Hi Eric,
>
> please reply also to the mailing list (reply all), not just to me. That way,
> you may also get comments by other people and the mails will be archived and
> others can search and read them.
>
> Eric Garin wrote:
>> Sorry Stefan but I've actually read this documentation (and even several times)
>
> Sorry if I sounded somewhat harsh, I do believe you.
>
>
>> So I did a test with something really simple :
> [parse XML containing an undeclared entity]
>> Result : parser says nothing even if &oneXXX; is not declared
>
> Not quite, it does say something, it just doesn't raise an exception.
>
> >>> from lxml import etree
> >>> parser = etree.XMLParser()
> >>> xml = etree.parse("entity.xml", parser)
>
> Ok, no exception here, so what happened?
>
> >>> print parser.error_log
> entity.xml:5:ERROR:PARSER:WAR_UNDECLARED_ENTITY: Entity 'oneXXX' not defined
>
> So, libxml2 did find the missing entity and reported the error to lxml. I
> looked into it and it seems that the parser continued parsing and returned a
> document containing the entity reference, saying that it was well formed.
> Therefore, lxml did not raise an exception.
Here's a trivial patch that raises an exception in this case. Still not sure
this is the right solution, though.
Stefan
Index: src/lxml/parser.pxi
===================================================================
--- src/lxml/parser.pxi (Revision 43690)
+++ src/lxml/parser.pxi (Arbeitskopie)
@@ -622,7 +622,8 @@
ctxt.myDoc = NULL
if result is not NULL:
- if ctxt.wellFormed or recover:
+ if recover or (ctxt.wellFormed and \
+ ctxt.lastError.level < xmlerror.XML_ERR_ERROR):
__GLOBAL_PARSER_CONTEXT.initDocDict(result)
else:
# free broken document
More information about the lxml-dev
mailing list