[lxml-dev] One-time memory leak?
Marius Gedminas
marius at pov.lt
Thu Feb 14 20:53:03 CET 2008
Hi!
I've been using libxml2 (before lxml was even created) and I've built
some infrastructure for catching libxml2 memory leaks in my unit tests.
Recently I've started using lxml on a completely different project and
noticed that my old leak watcher was hooked up -- because it reported a
leak.
This is most likely a false positive (the "leak" happens only once during the
program's lifetime), but I'd like to understand what exactly happens. I'm
attaching a short test program that produces this output on my machine:
$ bin/python lxml-memleak.py
test_libxml2_html: leaked 0 bytes
test_libxml2_xml: leaked 0 bytes
test_lxml_html: leaked 9423 bytes
test_lxml_xml: leaked 9479 bytes
This is in a virtualenv sandbox with lxml 2.0.1 from cheeseshop and
system-wide libxml2 2.0.30 (plus a security patch or two) from Ubuntu
Gutsy. Each of those tests was run in a separate Python process to
avoid contamination.
Note that if I run the same test more than once, I see no new leaks:
$ bin/python lxml-memleak.py test_lxml_html 3
test_lxml_html: leaked 9423 bytes
test_lxml_html: leaked 0 bytes
test_lxml_html: leaked 0 bytes
which leads me to think this "leak" is in fact harmless on-demand
initialization of some sort. I would like to improve my leak detector
to avoid false positives (it already does a funny dance with
initParser/cleanupParser to do so).
I've tried looking at the lxml source code but gave up in about 30
seconds. I don't know Cython. I can't tell which is generated code and
which is the source for that. I cannot find the entry point that would
let me trace how lxml.etree.HTML() is implemented ("HTML" is a pretty
ungreppable string). ltrace'ing a Python process failed to notice any
dynamic library calls to libxml2's functions.
How can I translate the short lxml code snippet
from lxml.etree import HTML
doc = HTML(sample_document)
del doc
to low-level libxml2 library function calls and see where it allocates
the extra memory?
I could, of course, declare lxml to be leak-free and just disable my
leak finder, but I cannot resist the opportunity to make sure of it (and
for that I need a leak detector without false positives).
Regards,
Marius Gedminas
--
Professionalism has no place in art, and hacking is art. Software Engineering
might be science; but that's not what I do. I'm a hacker, not an engineer.
-- jwz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxml-memleak.py
Type: text/x-python
Size: 1372 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080214/dbf47b65/attachment.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080214/dbf47b65/attachment.pgp
More information about the lxml-dev
mailing list