[lxml-dev] parser target exception recovery bug?
Stefan Behnel
stefan_ml at behnel.de
Wed Jun 17 09:05:49 CEST 2009
D.Hendriks (Dennis) wrote:
> Stefan Behnel wrote:
>> D.Hendriks (Dennis) wrote:
>>> - Is the not calling 'close' a bug?
>>
>> I don't know. ElementTree doesn't specify the behaviour in the error
>> case.
>>
>> http://effbot.org/elementtree/elementtree-xmlparser.htm
>>
>> In my tests, ET 1.3 didn't call the .close() method either. I may have
>> to look into this a bit closer, but so far, I don't see an obligation
>> to call it in the case of an error.
>> I may have to look into this a bit closer, but so far, I don't see an
>> obligation to call it in the case of an error.
>
> The lxml 2.2 documentation (pdf) states: "You can reuse the parser and
> its target as often as you like, so you should take care that the
> .close() methods really resets the target to a usable state (also in the
> case of an error!)." I assumed this meant that the close() method will
> always be called, even in case of errors, to (re)set the parser target
> into a usable state for the next parsing. Is this not true? If not, the
> next parsing operation will most likely fail, since the state of the
> parser target was not reset yet.
Hmm, yes, in that light, it does make sense to call .close() even after an
error. However, the question is if existing code is prepared for such a
call even if the parsing failed for some reason. Imagine one of the
callbacks (like ".data()") raises an exception which needs to get
propagated, and then we call ".close()", which also happens to raise an
exception (maybe due to an inconsistent state). I assume that the latter
should be ignored then, although it may really hide a bug (or even a
resource leak) in the user code, so both exceptions are of interest.
Dropping exceptions is a bad thing... We should at least write something
to the parser log, I think.
This could be handled differently in Py3, though. We could raise the
.close() exception there (instead of dropping it) and pass the original
exception as its context (instead of raising it). You would get a
different exception in Py3 in that case, but at least you wouldn't loose
any information. This would effectively map to this Python code in Py3:
try:
parse_to_target(parser_target, input)
except:
parser_target.close()
raise
else:
return parser_target.close()
and to this code in Py2:
try:
parse_to_target(parser_target, input)
except:
try: parser_target.close()
except: pass
raise
else:
return parser_target.close()
Given the documented requirement that you quote above (which I likely
wrote myself ;), it's actually a clear bug in user code if .close() fails
to reset the target in an error case. So I think users will just have to
fix their code. And since this only affects a case where things went wrong
anyway, I doubt that there will be a huge impact in fixing this. You may
get a different exception in some cases, but it makes things a lot safer
to always call .close().
Stefan
More information about the lxml-dev
mailing list