Stefan Behnel wrote:
> Martijn Faassen wrote:
>> I'd love it if I could somehow store lxml trees in the ZODB, and that'd
>> need pickle support. Whether it could be made to be efficient I don't
>> know - you'd not want the whole tree to be pickled as a whole in case of
>> large trees, but some form of partitioning scheme into separate pickles.
>> You're right that custom-element binding would be nice in this case, and
>> that means the pickle can't simply be the XML content unless it's
>> somehow annotated first.
>>
>> Anyway, this is a rather out there use case. I am just intrigued to
>> learn that objectify elements can be pickled.
>
> It's just easier to do in objectify, as it has a pretty comprehensive
> setup for Element class mapping. If you want to be sure to get back
> exactly the same Element tree after pickling, you can just annotate() an
> objectify tree before pickling it.
>
> Doing the same thing in lxml.etree would require storing some information
> about the current Element lookup, which may be a lot of information, e.g.
> for the namespace class setup. That's a parser-local setup, so we can't
> just use the setup of the default parser either but need a concrete
> context for the unpickling.
>
> lxml.html might be considered having such a context in a similar way
> lxml.objectify has it, as it comes with its own classes and lookup scheme.
Just what would end up being pickled, do you think? The entire document?
A first thought is that the document gets pickled, and then the element
is an offset in that document. Like, erm...
class HtmlMixin:
def __getstate__(self):
return (self.getroottree(), self._indexes_to_self())
def _indexes_to_self(self):
result = []
el = self
while el.getparent():
result.insert(0, el.getparent().index(el))
el = el.getparent()
return result
def __setstate__(self, state):
# Dammit... this doesn't actually work:
doc, indexes_to_self = state
el = doc.getroot()
for index in indexes_to_self:
el = el[index]
return el
There is no return value for __setstate__, and no way to indicate a
constructor method for creating instances. That's dumb. I don't like
pickle.
For documents, if the pickle hooks worked reasonably I'd just store the
serialization of the document (as a string) plus all the special
attributes (doctype, url, etc). Given that the hooks don't work
reasonably I'm not sure how to do it; maybe people with the ZODB
experience to have hit this problem would have an idea?
From what I can tell there's no reason to store the document as
anything but a string -- serializing and re-parsing the string is faster
than any other means of storing a document (it all ends up as strings
eventually anyway).
--
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org