[lxml-dev] Cleaner instances don't get garbage collected
Alejandro Valdez
alejandro.valdez at gmail.com
Sat Jan 24 11:42:21 CET 2009
Hello list, I'm new to lxml and I'm really stuck with this problem:
After starting my program and running it for a while it stop with a
MemoryError exception. While the program is running I can see that
python uses more and more memory until it run out of memory.
I used objgraph (great tool) and I found that there are a lot of
Cleaner, _ListErrorLog and XMLSyntaxError instances that aren't
collected by the garbage collector even if I do a gc.collect(). There
are nearly as many Cleaner instances as the program created, I think
it means they aren't deleted.
My program is a kind of daemon that process a lot of html documents,
for each document it creates a cleaner instances, clean the document,
and then delete the cleaner instance.
Here is a snippet of the function where I use Cleaner:
def cleanHtml(self, html):
from lxml.html.clean import Cleaner
cleaner = Cleaner(page_structure=False, style=True)
cleanHtml = cleaner.clean_html(html)
cleaner = None
del cleaner
return cleanHtml
I'm using Python 2.5.2 and lxml 2.2-beta1, any ideas?
More information about the lxml-dev
mailing list