[lxml-dev] Problem with lxml library running on Windows

Agustín Villena agustin.villena at gmail.com
Mon Oct 8 23:42:13 CEST 2007


Hi!

I tested a simplified code (attached to this post) in 2 versions of 
Windows, with different results:

Python version:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on win32

lxml version:
LIBXML_COMPILED_VERSION: (2, 6, 28)
LIBXML_VERSION : (2, 6, 28)
LIBXSLT_COMPILED_VERSION: (1, 1, 19)
LIBXSLT_VERSION: (1, 1, 19),
LXML_VERSION: (1, 3, 4, 0)}

For the same version of python and lxml

- Doesn't crashes in
  Microsoft Windows Vista Ultimate
  Version: 6.0.6000 build 6000

- Crashes after 137 iterations
  Microsoft Windows XP Profesional
  Version: 5.1.2600 Service Pack 2 Build 2600

The generated error signature is:
AppName: python.exe
AppVer: 0.0.0.0
ModName: etree.pyd
ModVer: 0.0.0.0
Offset: 00010c90

Attached to this post is the error report generated for Microsoft after 
the crash

Cheers
   Agustin

Stefan Behnel escribió:
> Hi,
> 
> as expected, I cannot reproduce your problem on Linux.
> 
> 
> Roberto Carrasco wrote:
>> We have an issue with libxml library running on Windows.
>> We are trying to read an xml document from a string over and over but the
>> program crashes in the while loop. We suspect the problem is that we cannot
>> run the function etree.parse too much times when we are reading a xml
>> document from a string.
> 
> lxml.etree actually optimises parsing from a StringIO object into parsing via
> fromstring() - or rather its internal implementation. So I can't see how this
> would make a difference.
> 
> 
>> We are trying to execute the piece of code shown below in this environment:
>>
>>    - Windows XP Service Pack 2
>>    - Python 2.5
>>    - lxml 1.3.4 and 2.0 alpha 3
> 
> You are using the pre-built binaries from PyPI, right? I'm not currently sure
> which version of libxml2 they use, but should be 2.6.28 or later.
> 
> 
>> The code crashes when the program read a xml document repeatedly.
>> The issue is on Windows becuase on an Linux environment there is no problem
>> excecuting it.
>>
>> The question is: what we are doing wrong? or is this a problem with the
>> library running on Windows?
>>
>> # -*- coding: UTF-8 -*-
>> from lxml import etree
>> from StringIO import StringIO
>>
>> if __name__ == "__main__":
>>
>>     document="""    <doc type="Ficha Visado" id="2">
>>         <attribute name="Rut Afiliado" type="string">1-3</attribute>
>>         <attribute name="Fecha Escaneo" type="date">2006-03-13
>> 08:44:52</attribute>
>>         <attribute name="Suc Promotor" type="string">SANTIAGO</attribute>
>>         <attribute name="Apellido Paterno" type="string">PUENTE</attribute>
>>         <attribute name="Fecha Visacion"
>> type="string">13/03/2006</attribute>
>>         <attribute name="Nombres Afiliado"
>> type="string">Robertin</attribute>
>>         <attribute name="Apellido Materno" type="string">MANZANO</attribute>
>>         <doc type="DPS" id="1">
>>             <attribute name="Fecha Escaneo" type="date">2006-03-10
>> 15:52:29</attribute>
>>             <link href="1-0.img" rel="media"/>
>>             <link href="1-1.img" rel="media"/>
>>         </doc>
>>     </doc>"""
>>
>>     j=0
>>     while 1:
>>         print j
>>         j+=1
>>
>>         #tree = etree.parse(StringIO(docRauco0))
>>         tree = etree.fromstring(document)
>>         images_url = tree.xpath('//link[@rel="media"][@href]')
>>         image_url_name=images_url[0].attrib['href']
> 
> Just to mention it, you could simplify this to
> 
>          images_url_names = tree.xpath('//link[@rel="media"]/@href')
> 
> 
> Regarding your problem - instead of this line:
> 
>          image_url_name=images_url[0].attrib['href']
> 
> could you try this instead, to see if it still crashes:
> 
>          image_url_name=images_url[0].get('href')
> 
> 
> Apart from that, I would need some debugging information to understand what's
> happening here. While there are differences between the behaviour of libxml2
> under Linux and Windows, I don't currently see any that could cause the above
> code to fail.
> 
> Stefan

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2551_appcompat.txt
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20071008/225338ee/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxml_crash_windows.py
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20071008/225338ee/attachment.diff 


More information about the lxml-dev mailing list