[lxml-dev] Problem with lxml library running on Windows
Agustín Villena
agustin.villena at gmail.com
Mon Oct 8 23:42:13 CEST 2007
Hi!
I tested a simplified code (attached to this post) in 2 versions of
Windows, with different results:
Python version:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
lxml version:
LIBXML_COMPILED_VERSION: (2, 6, 28)
LIBXML_VERSION : (2, 6, 28)
LIBXSLT_COMPILED_VERSION: (1, 1, 19)
LIBXSLT_VERSION: (1, 1, 19),
LXML_VERSION: (1, 3, 4, 0)}
For the same version of python and lxml
- Doesn't crashes in
Microsoft Windows Vista Ultimate
Version: 6.0.6000 build 6000
- Crashes after 137 iterations
Microsoft Windows XP Profesional
Version: 5.1.2600 Service Pack 2 Build 2600
The generated error signature is:
AppName: python.exe
AppVer: 0.0.0.0
ModName: etree.pyd
ModVer: 0.0.0.0
Offset: 00010c90
Attached to this post is the error report generated for Microsoft after
the crash
Cheers
Agustin
Stefan Behnel escribió:
> Hi,
>
> as expected, I cannot reproduce your problem on Linux.
>
>
> Roberto Carrasco wrote:
>> We have an issue with libxml library running on Windows.
>> We are trying to read an xml document from a string over and over but the
>> program crashes in the while loop. We suspect the problem is that we cannot
>> run the function etree.parse too much times when we are reading a xml
>> document from a string.
>
> lxml.etree actually optimises parsing from a StringIO object into parsing via
> fromstring() - or rather its internal implementation. So I can't see how this
> would make a difference.
>
>
>> We are trying to execute the piece of code shown below in this environment:
>>
>> - Windows XP Service Pack 2
>> - Python 2.5
>> - lxml 1.3.4 and 2.0 alpha 3
>
> You are using the pre-built binaries from PyPI, right? I'm not currently sure
> which version of libxml2 they use, but should be 2.6.28 or later.
>
>
>> The code crashes when the program read a xml document repeatedly.
>> The issue is on Windows becuase on an Linux environment there is no problem
>> excecuting it.
>>
>> The question is: what we are doing wrong? or is this a problem with the
>> library running on Windows?
>>
>> # -*- coding: UTF-8 -*-
>> from lxml import etree
>> from StringIO import StringIO
>>
>> if __name__ == "__main__":
>>
>> document=""" <doc type="Ficha Visado" id="2">
>> <attribute name="Rut Afiliado" type="string">1-3</attribute>
>> <attribute name="Fecha Escaneo" type="date">2006-03-13
>> 08:44:52</attribute>
>> <attribute name="Suc Promotor" type="string">SANTIAGO</attribute>
>> <attribute name="Apellido Paterno" type="string">PUENTE</attribute>
>> <attribute name="Fecha Visacion"
>> type="string">13/03/2006</attribute>
>> <attribute name="Nombres Afiliado"
>> type="string">Robertin</attribute>
>> <attribute name="Apellido Materno" type="string">MANZANO</attribute>
>> <doc type="DPS" id="1">
>> <attribute name="Fecha Escaneo" type="date">2006-03-10
>> 15:52:29</attribute>
>> <link href="1-0.img" rel="media"/>
>> <link href="1-1.img" rel="media"/>
>> </doc>
>> </doc>"""
>>
>> j=0
>> while 1:
>> print j
>> j+=1
>>
>> #tree = etree.parse(StringIO(docRauco0))
>> tree = etree.fromstring(document)
>> images_url = tree.xpath('//link[@rel="media"][@href]')
>> image_url_name=images_url[0].attrib['href']
>
> Just to mention it, you could simplify this to
>
> images_url_names = tree.xpath('//link[@rel="media"]/@href')
>
>
> Regarding your problem - instead of this line:
>
> image_url_name=images_url[0].attrib['href']
>
> could you try this instead, to see if it still crashes:
>
> image_url_name=images_url[0].get('href')
>
>
> Apart from that, I would need some debugging information to understand what's
> happening here. While there are differences between the behaviour of libxml2
> under Linux and Windows, I don't currently see any that could cause the above
> code to fail.
>
> Stefan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2551_appcompat.txt
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20071008/225338ee/attachment.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxml_crash_windows.py
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20071008/225338ee/attachment.diff
More information about the lxml-dev
mailing list