[lxml-dev] lxml lets undeclared entities pass through silently

Stefan Behnel stefan_ml at behnel.de
Sun May 27 10:34:00 CEST 2007



Stefan Behnel wrote:
> Hi Eric,
> 
> please reply also to the mailing list (reply all), not just to me. That way,
> you may also get comments by other people and the mails will be archived and
> others can search and read them.
> 
> Eric Garin wrote:
>> Sorry Stefan but I've actually read this documentation (and even several times)
> 
> Sorry if I sounded somewhat harsh, I do believe you.
> 
> 
>> So I did a test with something really simple :
> [parse XML containing an undeclared entity]
>> Result : parser says nothing even if &oneXXX; is not declared
> 
> Not quite, it does say something, it just doesn't raise an exception.
> 
>   >>> from lxml import etree
>   >>> parser = etree.XMLParser()
>   >>> xml = etree.parse("entity.xml", parser)
> 
> Ok, no exception here, so what happened?
> 
>   >>> print parser.error_log
>   entity.xml:5:ERROR:PARSER:WAR_UNDECLARED_ENTITY: Entity 'oneXXX' not defined
> 
> So, libxml2 did find the missing entity and reported the error to lxml. I
> looked into it and it seems that the parser continued parsing and returned a
> document containing the entity reference, saying that it was well formed.
> Therefore, lxml did not raise an exception.

Here's a trivial patch that raises an exception in this case. Still not sure
this is the right solution, though.

Stefan


Index: src/lxml/parser.pxi
===================================================================
--- src/lxml/parser.pxi (Revision 43690)
+++ src/lxml/parser.pxi (Arbeitskopie)
@@ -622,7 +622,8 @@
         ctxt.myDoc = NULL

     if result is not NULL:
-        if ctxt.wellFormed or recover:
+        if recover or (ctxt.wellFormed and \
+                       ctxt.lastError.level < xmlerror.XML_ERR_ERROR):
             __GLOBAL_PARSER_CONTEXT.initDocDict(result)
         else:
             # free broken document



More information about the lxml-dev mailing list