[lxml-dev] Segmentation fault in lxml.html after pickling
Stefan Behnel
stefan_ml at behnel.de
Tue Jul 1 08:35:31 CEST 2008
Ian Bicking wrote:
> A first thought is that the document gets pickled, and then the element
> is an offset in that document.
That's a brilliant idea, but why so complicated? :)
pickle:
doc = self.getroottree()
return (tostring(doc), doc.getpath(self))
unpickle:
doc, path = pickle_value
return doc.xpath(path)
would do the trick. Maybe we should serialise as XML instead of HTML, so
that we don't run into any "relaxed parser" problems (I remember a not so
old libxml2 HTML serialiser bug with <embed> roundtrips, for example).
> There is no return value for __setstate__, and no way to indicate a
> constructor method for creating instances. That's dumb. I don't like
> pickle.
:)
You don't have to use __[sg]etstate__(). You can define an external
function to do it for you, just like objectify does (search
src/lxml/lxml.objectify.pyx for "pickle"). The stupid thing is that this
function has to be registered /and/ public. It's not enough to register it
and delete it afterwards...
Still, the problem remains that we need to assure we keep the element
lookup context, so this is still not a general solution for lxml.etree.
But it should be suitable for lxml.html.
Stefan
More information about the lxml-dev
mailing list