[lxml-dev] Converting an objectified lxml tree to a standard etree one.

Stefan Behnel stefan_ml at behnel.de
Sat Jun 20 07:51:23 CEST 2009


Hi,

John Krukoff wrote:
> Is the best way to convert an objectified (with lxml.objectify) element
> tree to a standard etree based one just to serialize and reparse? Is the
> reverse transform just as hard?

I would say so. The problem is that if I allow changing the element lookup
while the tree is alive in Python space (which would be required since you
need to pass at least one Element instance into lxml to request the
change), lxml would have to replace the proxies used inside the tree, which
would mean that all live proxies in the tree would become Zombies
(including the one you passed). That's rather dangerous.

Deep copying the tree and returning a root node from the new parser context
would be a solution if you need the tree in memory, which I assume is the
case here. But IIRC, there isn't currently a way to deep-copy the tree so
that it uses a new element lookup.


> I care more about memory than CPU time.

A serialised byte string is several times smaller in memory than the tree
itself, so I doubt that serialising and parsing would really hurt memory
consumption that much (if you take care to drop the original tree when it's
serialised). Do your benchmarks indicate that this is a problem? It *can*
be if the tree needs to be garbage collected due to reference cycles (e.g.
when you use ".attrib"). That might hold it in memory longer than necessary.

As a side-note, it would be possible to compress the memory buffer during
serialisation (see IDEAS.txt), but that's not trivial to implement and
would add a compile time dependency on zlib. It also wouldn't help much if
the next thing you do is parse the tree back into memory...


> Additionally, there's something odd about the objectify module that
> prevents help from working:
> 
>>>> help( objectify )
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.6/site.py", line 430, in __call__
>     return pydoc.help(*args, **kwds)
>   File "/usr/lib/python2.6/pydoc.py", line 1720, in __call__
>     self.help(request)
>   File "/usr/lib/python2.6/pydoc.py", line 1766, in help
>     else: doc(request, 'Help on %s:')
>   File "/usr/lib/python2.6/pydoc.py", line 1508, in doc
>     pager(render_doc(thing, title, forceload))
>   File "/usr/lib/python2.6/pydoc.py", line 1503, in render_doc
>     return title % desc + '\n\n' + text.document(object, name)
>   File "/usr/lib/python2.6/pydoc.py", line 327, in document
>     if inspect.ismodule(object): return self.docmodule(*args)
>   File "/usr/lib/python2.6/pydoc.py", line 1086, in docmodule
>     inspect.getclasstree(classlist, 1), name)]
>   File "/usr/lib/python2.6/inspect.py", line 720, in getclasstree
>     for parent in c.__bases__:
> TypeError: 'lxml.objectify._ObjectifyElementMakerCaller' object is not
> iterable

Hmm, yes, that looks weird. It works with lxml.etree, but not with
lxml.objectify. Could you please file a bug report on this?

Stefan


More information about the lxml-dev mailing list