[lxml-dev] elements in gc.garbage - ok ?

Stefan Behnel stefan_ml at behnel.de
Sat Jan 26 12:48:46 CET 2008


Hi,

Jeroen van Hilst wrote:
> While using lxml (which is a great tool!), i am experiencing some memory 
> issues.
> 
> I have made a small piece of code that makes elements go into in gc.garbage.

Which just shows that the GC works for them.


> I am suspecting this to be the problem of my program cunsuming lots of 
> memory.

In-memory trees can be larger than you'd expect, quite a multiple of the
serialised XML size. But at you suggest, the real problem usually tends to be
elsewhere.


> Can someone tell me if it is ok that this happens - or what the reason is ?
> 
> #======================
> from lxml import etree
> import gc
> 
> gc.set_debug(gc.DEBUG_LEAK)
> 
> for x in range(1,19):
>     r = etree.Element('div')
>     s = str(x)
>     #if the attr is not touched there are no messages from gc
>     r.attrib[s] = s
> 
> gc.collect()
> if gc.garbage: print gc.garbage
> #======================

Using "el.attrib" will create a cyclic reference between the Element proxy and
its attribute proxy. It therefore requires a GC run to free the two, which
usually doesn't pose any problems. In the case above, you use DEBUG_LEAK,
which implies DEBUG_SAVEALL:

"""
When set, all unreachable objects found will be appended to garbage rather
than being freed.
"""

So that's why they are not freed in your test case, but that has nothing to do
with lxml. They would have been freed just fine without the DEBUG_SAVEALL.

Take care that you do not accidentally keep a reference to the Element
instances or the .attrib instances in your program, and the GC should do the
rest. If you want to make sure the Python objects are freed as early as
possible, avoid using .attrib and use .get()/.set()/.items()/... instead.

Stefan


More information about the lxml-dev mailing list