[lxml-dev] [lxml][objectify] optimization questions

Stefan Behnel behnel_ml at gkec.informatik.tu-darmstadt.de
Tue Oct 24 20:39:16 CEST 2006


Hi Holger,

Holger Joukl wrote:
> Stefan Behnel wrote:
>> Please test on your machine a) if the two code snippets still differ in
>> performance and b) if the new implementation resulted in any
>> noticeable slow down.
> 
> I can confirm
> a) no performance difference between recursive element printing and "manual
> element access" any more
> b) no significant slow down
> using the little timeit snippets for benchmarking.

Good, thanks.


> Some more need for clarification:
> If I understand correctly the lxml element proxy only speeds up things if
> - I hold a python reference to the element object or
> - a circular reference to the element in question prevents it from being
> gc-ed

Correct. However, as I said: do not rely on the second thing. GC runs are
unpredictable (unless you run it by hand).


> To speed up my usecase I could force-create and hold python references to
> every node before starting to operate on the tree.

... the fastest approach likely being

  cache[root] = list(root.getiterator())


> Would it also be possible to modify objectify in a way that the lifespan of
> the python _Element, once it has been instantiated, is tied to the
> existence of the underlying _c_node (xmlNode)?

Hmm, I don't know if that's a good thing in general. It eats substantially
more memory than the C-tree does already.

I mean, feel free to fill a cache like the above when XML comes in and delete
it when it goes back out during processing. It should not be that much slower
than doing it inside objectify, but it's simple enough to not require a
dedicated API and it gives you absolute control over the trade-off between
space and speed.

Stefan



More information about the lxml-dev mailing list