[lxml-dev] Another (last?) take on proxy deallocation
Stefan Behnel
stefan_ml at behnel.de
Sat Jun 30 07:39:53 CEST 2007
Hi all,
while Ian Bicking was working on the "lxml.html" trunk, I noticed that the way
some of the modules were implemented could crash lxml.etree during garbage
collection. I know, Martijn and I have already reimplemented the proxy code a
couple of times, each time solving more and more of the encountered problems,
but I really hope this is the last time we have to reimplement this.
I already disliked the last way I had to rewrite it, as it required an
additional document traversal step each time a document is deallocated. While
we accept this behaviour for disconnected tree fragments (a trade-off between
different overheads), it should not be necessary for the whole document (at
least not more often than required by xmlFreeDoc()). But the problem is that
Python's cyclic garbage collector gives no guarantees about the order in which
the collected objects are cleaned up - and libxml2 requires access to the
document node when cleaning up a tree fragment. So it is actually required
that we clean up all _Element proxies first and *then* free the xmlDoc. So,
_Document must really always be deallocated *after* all its _Element proxies
have been garbage collected.
I was thinking about a way to do this for a while and experimented with it on
a "proxy-deallocation" branch - until I realised that the best way to control
the garbage collector was the garbage collection mechanism itself - i.e.
reference counting.
So, I checked in a small patch (SVN revision 44623 on the trunk) that simply
doubles the ref-counts that an _Element holds to its _Document so that we can
control when the ref-count to the document is decreased. lxml.etree now does
this explicitly in the tp_dealloc function of the _Element class, *after*
cleaning up the proxy, so that the ref-count of the _Document never goes down
to 0 before the last of its _Element proxies was deallocated. It is then safe
to run xmlFreeDoc() on the libxml2 document from _Document.__dealloc__.
https://codespeak.net/viewvc/?view=rev&revision=44623
I really like this approach and I also like that it removes the need for
document traversal on _Document deallocation. And: it keeps the code on the
lxml.html branch from crashing, which is a *really* good sign. :]
So, I will also merge this into the 1.3 branch and release a 1.3.1 soon. We
already had a couple of small fixes on the branch, so a bug-fix release next
week should nicely improve the code quality of the official release.
Have fun,
Stefan
More information about the lxml-dev
mailing list