[lxml-dev] parser target exception recovery bug?

D.Hendriks (Dennis) D.Hendriks at tue.nl
Wed Jun 17 09:47:14 CEST 2009


Hello,

Stefan Behnel wrote:
> D.Hendriks (Dennis) wrote:
>> Stefan Behnel wrote:
>>> D.Hendriks (Dennis) wrote:
>>>>   - Is the not calling 'close' a bug?
>>> I don't know. ElementTree doesn't specify the behaviour in the error
>>> case.
>>>
>>> http://effbot.org/elementtree/elementtree-xmlparser.htm
>>>
>>> In my tests, ET 1.3 didn't call the .close() method either. I may have
>>> to look into this a bit closer, but so far, I don't see an obligation
>>> to call it in the case of an error.
>>> I may have to look into this a bit closer, but so far, I don't see an
>>> obligation to call it in the case of an error.
>> The lxml 2.2 documentation (pdf) states: "You can reuse the parser and
>> its target as often as you like, so you should take care that the
>> .close() methods really resets the target to a usable state (also in the
>> case of an error!)." I assumed this meant that the close() method will
>> always be called, even in case of errors, to (re)set the parser target
>> into a usable state for the next parsing. Is this not true? If not, the
>> next parsing operation will most likely fail, since the state of the
>> parser target was not reset yet.
> 
> Hmm, yes, in that light, it does make sense to call .close() even after an
> error. However, the question is if existing code is prepared for such a
> call even if the parsing failed for some reason. Imagine one of the
> callbacks (like ".data()") raises an exception which needs to get
> propagated, and then we call ".close()", which also happens to raise an
> exception (maybe due to an inconsistent state). I assume that the latter
> should be ignored then, although it may really hide a bug (or even a
> resource leak) in the user code, so both exceptions are of interest.
> Dropping exceptions is a bad thing... We should at least write something
> to the parser log, I think.
> 
> This could be handled differently in Py3, though. We could raise the
> .close() exception there (instead of dropping it) and pass the original
> exception as its context (instead of raising it). You would get a
> different exception in Py3 in that case, but at least you wouldn't loose
> any information. This would effectively map to this Python code in Py3:
> 
>     try:
>         parse_to_target(parser_target, input)
>     except:
>         parser_target.close()
>         raise
>     else:
>         return parser_target.close()
> 
> and to this code in Py2:
> 
>     try:
>         parse_to_target(parser_target, input)
>     except:
>         try: parser_target.close()
>         except: pass
>         raise
>     else:
>         return parser_target.close()
> 
> Given the documented requirement that you quote above (which I likely
> wrote myself ;), it's actually a clear bug in user code if .close() fails
> to reset the target in an error case. So I think users will just have to
> fix their code. And since this only affects a case where things went wrong
> anyway, I doubt that there will be a huge impact in fixing this. You may
> get a different exception in some cases, but it makes things a lot safer
> to always call .close().
> 
> Stefan

Thanks for the reply. I agree that close() should always be called. In 
case of multiple exceptions, you could create an instance of yet another 
exception, containing the 'multiple exceptions'. This could for instance 
be called MultiParseException or whatever. This would work for both 
Python 2.x and Python 3.x...

Anyway, let me know what you decide and when it will be in SVN. Thanks!

Dennis



More information about the lxml-dev mailing list