[lxml-dev] Huge memory leak in latest 2.0

Stefan Behnel stefan_ml at behnel.de
Wed Dec 19 11:57:37 CET 2007


Stefan Behnel wrote:
> Artur Siekielski wrote:
>> I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 
>> 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in 
>> loop, 100-200kB each. I have noticed that memory used by my program 
>> increases about 1MB after each document processed, so after a few 
>> hundreds of passes system is about to hang. Running the same code with 
>> lxml 1.3.6 doesn't cause such memory usage increase.
>>
>> I'm using the following library calls:
>> tree = etree.parse( <opened file>, HTMLParser(encoding=...))
>> etree.tostring(tree)
>> el.xpath(...)
>> getting children and attributes of elements
>>
>> I'm using libxml2 version 2.6.28.
>>
>> If anyone knows about solution/workaround, please write.
> 
> Hmmm, weird. The problem doesn't result from any change in lxml, just from the
> switch to Cython 0.9.6.8+. And I don't even see any obvious problem in the
> generated code.

I fixed the problem in Cython (and Pyrex). It should work with the next
release. I attached the patch that I used in case you want to build lxml
yourself using Cython.

Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kw-only-fixes.patch
Type: text/x-patch
Size: 1651 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20071219/2c5f77bc/attachment.bin 


More information about the lxml-dev mailing list