[lxml-dev] Getting info from an XML file that has invalid character data in it (and how to specify recover option)

Stefan Behnel stefan_ml at behnel.de
Fri May 9 16:42:16 CEST 2008


Hi,

Ben wrote:
> Xml = etree.parse(XmlFileName)
> ##############################
> XmlFileName = r'c:/BEX03194.xml'
> parser = etree.XMLParser(recover=True)
> Xml   = etree.parse(StringIO(XmlFileName), parser)

Not sure if this is just a "find-a-short-example" error, but you parse the
filename, not the file here. This should read

   Xml   = etree.parse(XmlFileName, parser)


> Also, I've tried the 'recover' parser option, but I'm doing something wrong,
> because I get this:
>
> C:\>python sb-lxml.py
> Traceback (most recent call last):
>   File "sb-lxml.py", line 9, in <module>
>     print Xml.findtext(".//end_time")
>   File "lxml.etree.pyx", line 1656, in lxml.etree._ElementTree.findtext
> (src/lxml/lxml.etree.c:15354)
>   File "lxml.etree.pyx", line 1489, in lxml.etree._ElementTree._assertHasRoot
> (src/lxml/lxml.etree.c:14116)
> AssertionError: ElementTree not initialized, missing root

I guess that happens when the parser "recover"s from not finding any XML at
all. Maybe we should still raise an exception in this case instead of
returning an empty ElementTree. This is really an extreme case of broken data...

Stefan



More information about the lxml-dev mailing list