[lxml-dev] Cleaner instances don't get garbage collected

Alejandro Valdez alejandro.valdez at gmail.com
Sat Jan 24 11:42:21 CET 2009


Hello list, I'm new to lxml and I'm really stuck with this problem:
After starting my program and running it for a while it stop with a
MemoryError exception. While the program is running I can see that
python uses more and more memory until it run out of memory.

I used objgraph (great tool) and I found that there are a lot of
Cleaner, _ListErrorLog and XMLSyntaxError instances that aren't
collected by the garbage collector even if I do a gc.collect(). There
are nearly as many Cleaner instances as the program created, I think
it means they aren't deleted.

My program is a kind of daemon that process a lot of html documents,
for each document it creates a cleaner instances, clean the document,
and then delete the cleaner instance.

Here is a snippet of the function where I use Cleaner:

    def cleanHtml(self, html):
        from lxml.html.clean import Cleaner
        cleaner = Cleaner(page_structure=False, style=True)
        cleanHtml = cleaner.clean_html(html)
        cleaner = None
        del cleaner
        return cleanHtml

I'm using Python 2.5.2 and lxml 2.2-beta1, any ideas?


More information about the lxml-dev mailing list