[lxml-dev] Huge memory leak in latest 2.0

Artur Siekielski artur.siekielski at gmail.com
Fri Dec 7 01:56:28 CET 2007


Hi.

I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 
'encoding' keyword in HTMLParser). I'm parsing many HTML documents in 
loop, 100-200kB each. I have noticed that memory used by my program 
increases about 1MB after each document processed, so after a few 
hundreds of passes system is about to hang. Running the same code with 
lxml 1.3.6 doesn't cause such memory usage increase.

I'm using the following library calls:
tree = etree.parse( <opened file>, HTMLParser(encoding=...))
etree.tostring(tree)
el.xpath(...)
getting children and attributes of elements

I'm using libxml2 version 2.6.28.

If anyone knows about solution/workaround, please write.

Regards,
Artur



More information about the lxml-dev mailing list