[lxml-dev] parser target exception recovery bug?
D.Hendriks (Dennis)
D.Hendriks at tue.nl
Tue Jun 16 16:20:07 CEST 2009
Hello all,
Using lxml 2.2 with a custom parser target (tree builder), I've run into
a problem when the parser target raises an exception. In this case,
parsing continues, although only for 'data' (not for 'start' and 'end').
I used recover=False when creating the XMLParser.
Using the following code:
import sys
from lxml import etree
# Parser target without exceptions.
class MyTreeBuilder1(object):
def close(self):
print 'close'
def start(self, tag, attrs):
print 'start', tag, attrs
def data(self, data):
if len(data.strip()) > 0:
print 'data: data=', repr(data)
def end(self, tag):
print 'end', tag
# Parser target with exceptions.
class MyTreeBuilder2(MyTreeBuilder1):
def close(self):
print 'close'
def start(self, tag, attrs):
print 'start', tag, attrs
def data(self, data):
if len(data.strip()) > 0:
print 'data: data=', repr(data)
def end(self, tag):
print 'end', tag
if tag=='b':
print 'ERROR'
raise ValueError('error')
xml_data='''<a>
<b>test</b>
<d>test2</d>
<d>test2</d>
</a>'''
# Successfull parsing.
print '---'
builder = MyTreeBuilder1()
parser = etree.XMLParser(target=builder, recover=False)
rslt = etree.fromstring(xml_data, parser)
# Unsuccessfull parsing.
print '---'
builder = MyTreeBuilder2()
parser = etree.XMLParser(target=builder, recover=False)
rslt = etree.fromstring(xml_data, parser)
I get this output:
---
start a {}
start b {}
data: data= u'test'
end b
start d {}
data: data= u'test2'
end d
start d {}
data: data= u'test2'
end d
end a
close
---
start a {}
start b {}
data: data= u'test'
end b
ERROR
data: data= u'test2'
data: data= u'test2'
Traceback (most recent call last):
File "lxml_parser_target_bug.py", line 49, in ?
rslt = etree.fromstring(xml_data, parser)
File "lxml.etree.pyx", line 2534, in lxml.etree.fromstring
(src/lxml/lxml.etree.c:51135)
File "parser.pxi", line 1523, in lxml.etree._parseMemoryDocument
(src/lxml/lxml.etree.c:76176)
File "parser.pxi", line 1402, in lxml.etree._parseDoc
(src/lxml/lxml.etree.c:74927)
File "parser.pxi", line 928, in lxml.etree._BaseParser._parseDoc
(src/lxml/lxml.etree.c:71707)
File "parsertarget.pxi", line 135, in
lxml.etree._TargetParserContext._handleParseResultDoc
(src/lxml/lxml.etree.c:82586)
File "lxml.etree.pyx", line 230, in
lxml.etree._ExceptionContext._raise_if_stored
(src/lxml/lxml.etree.c:6813)
File "saxparser.pxi", line 227, in lxml.etree._handleSaxEnd
(src/lxml/lxml.etree.c:78230)
File "parsertarget.pxi", line 78, in
lxml.etree._PythonSaxParserTarget._handleSaxEnd
(src/lxml/lxml.etree.c:81918)
File "lxml_parser_target_bug.py", line 33, in end
raise ValueError('error')
ValueError: error
The first output (between --- and ---) is ok, since it is for the
non-exception parser target. The second output (after the second ---) is
not ok for me. You can see 'ERROR' at the point where the exception is
raised. After that, two 'data' events are generated in the parser
target. Clearly, parsing continued. Also, the 'close' is never called.
After the entire input is parsed, the exception is finally re-raised.
Two questions:
- Is the continued parsing ('data' function calls) a bug?
- Is the not calling 'close' a bug?
Any replies would be greatly appreciated.
Dennis
More information about the lxml-dev
mailing list