[lxml-dev] Parsing XML with undefined namespace

jholg at gmx.de jholg at gmx.de
Thu Jul 3 12:59:33 CEST 2008


Hi,



>    Is it possible to parse slightly broken XML like this?
> 
>    etree.parse("""<xml sanitizer="true" sanitizer:value="true"/>""")
> 
>    >>> lxml.etree.XMLSyntaxError: Namespace prefix sanitizer for value
> on xml is not defined, line 1, column 31
> 

 You can use a parser that is up to the task:

 >>> parser = etree.XMLParser(recover=True)
>>> root = etree.fromstring("""<xml sanitizer="true" 
sanitizer:value="true"/>""", parser=parser)
>>> print root
<Element xml at 268930>
>>> print etree.tostring(root)
<xml sanitizer="true" value="true"/>
>>> 
 Please take a look at help(etree.XMLParser):

 ... 

  - recover            - try hard to parse through broken XML  

 Cheers,

Holger 


-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080703/8b4ec1be/attachment-0001.htm 


More information about the lxml-dev mailing list