[lxml-dev] Parsing XML with undefined namespace
jholg at gmx.de
jholg at gmx.de
Thu Jul 3 12:59:33 CEST 2008
Hi,
> Is it possible to parse slightly broken XML like this?
>
> etree.parse("""<xml sanitizer="true" sanitizer:value="true"/>""")
>
> >>> lxml.etree.XMLSyntaxError: Namespace prefix sanitizer for value
> on xml is not defined, line 1, column 31
>
You can use a parser that is up to the task:
>>> parser = etree.XMLParser(recover=True)
>>> root = etree.fromstring("""<xml sanitizer="true"
sanitizer:value="true"/>""", parser=parser)
>>> print root
<Element xml at 268930>
>>> print etree.tostring(root)
<xml sanitizer="true" value="true"/>
>>>
Please take a look at help(etree.XMLParser):
...
- recover - try hard to parse through broken XML
Cheers,
Holger
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080703/8b4ec1be/attachment-0001.htm
More information about the lxml-dev
mailing list