[lxml-dev] Problem with lxml library running on Windows

Stefan Behnel stefan_ml at behnel.de
Sat Oct 6 22:21:28 CEST 2007


Hi,

as expected, I cannot reproduce your problem on Linux.


Roberto Carrasco wrote:
> We have an issue with libxml library running on Windows.
> We are trying to read an xml document from a string over and over but the
> program crashes in the while loop. We suspect the problem is that we cannot
> run the function etree.parse too much times when we are reading a xml
> document from a string.

lxml.etree actually optimises parsing from a StringIO object into parsing via
fromstring() - or rather its internal implementation. So I can't see how this
would make a difference.


> We are trying to execute the piece of code shown below in this environment:
> 
>    - Windows XP Service Pack 2
>    - Python 2.5
>    - lxml 1.3.4 and 2.0 alpha 3

You are using the pre-built binaries from PyPI, right? I'm not currently sure
which version of libxml2 they use, but should be 2.6.28 or later.


> The code crashes when the program read a xml document repeatedly.
> The issue is on Windows becuase on an Linux environment there is no problem
> excecuting it.
> 
> The question is: what we are doing wrong? or is this a problem with the
> library running on Windows?
> 
> # -*- coding: UTF-8 -*-
> from lxml import etree
> from StringIO import StringIO
> 
> if __name__ == "__main__":
> 
>     document="""    <doc type="Ficha Visado" id="2">
>         <attribute name="Rut Afiliado" type="string">1-3</attribute>
>         <attribute name="Fecha Escaneo" type="date">2006-03-13
> 08:44:52</attribute>
>         <attribute name="Suc Promotor" type="string">SANTIAGO</attribute>
>         <attribute name="Apellido Paterno" type="string">PUENTE</attribute>
>         <attribute name="Fecha Visacion"
> type="string">13/03/2006</attribute>
>         <attribute name="Nombres Afiliado"
> type="string">Robertin</attribute>
>         <attribute name="Apellido Materno" type="string">MANZANO</attribute>
>         <doc type="DPS" id="1">
>             <attribute name="Fecha Escaneo" type="date">2006-03-10
> 15:52:29</attribute>
>             <link href="1-0.img" rel="media"/>
>             <link href="1-1.img" rel="media"/>
>         </doc>
>     </doc>"""
> 
>     j=0
>     while 1:
>         print j
>         j+=1
> 
>         #tree = etree.parse(StringIO(docRauco0))
>         tree = etree.fromstring(document)
>         images_url = tree.xpath('//link[@rel="media"][@href]')
>         image_url_name=images_url[0].attrib['href']

Just to mention it, you could simplify this to

         images_url_names = tree.xpath('//link[@rel="media"]/@href')


Regarding your problem - instead of this line:

         image_url_name=images_url[0].attrib['href']

could you try this instead, to see if it still crashes:

         image_url_name=images_url[0].get('href')


Apart from that, I would need some debugging information to understand what's
happening here. While there are differences between the behaviour of libxml2
under Linux and Windows, I don't currently see any that could cause the above
code to fail.

Stefan


More information about the lxml-dev mailing list