[lxml-dev] Segmentation fault in lxml.html after pickling

Stefan Behnel stefan_ml at behnel.de
Tue Jul 1 08:35:31 CEST 2008


Ian Bicking wrote:
> A first thought is that the document gets pickled, and then the element
> is an offset in that document.

That's a brilliant idea, but why so complicated? :)

pickle:
    doc = self.getroottree()
    return (tostring(doc), doc.getpath(self))

unpickle:
    doc, path = pickle_value
    return doc.xpath(path)

would do the trick. Maybe we should serialise as XML instead of HTML, so
that we don't run into any "relaxed parser" problems (I remember a not so
old libxml2 HTML serialiser bug with <embed> roundtrips, for example).


> There is no return value for __setstate__, and no way to indicate a
> constructor method for creating instances.  That's dumb.  I don't like
> pickle.

:)

You don't have to use __[sg]etstate__(). You can define an external
function to do it for you, just like objectify does (search
src/lxml/lxml.objectify.pyx for "pickle"). The stupid thing is that this
function has to be registered /and/ public. It's not enough to register it
and delete it afterwards...

Still, the problem remains that we need to assure we keep the element
lookup context, so this is still not a general solution for lxml.etree.
But it should be suitable for lxml.html.

Stefan



More information about the lxml-dev mailing list