[lxml-dev] Proxy AssertionError in threaded tree traversal

Stefan Behnel stefan_ml at behnel.de
Sat Feb 10 19:52:58 CET 2007


Hi Holger,

Holger Joukl wrote:
> lately I've been running into such problems:
> 
> 2007/01/23 13:22:02:all2all_MainThread:ERROR:    cache[msg] =
> list(msg.getiterator())
> 2007/01/23 13:22:02:all2all_MainThread:ERROR:   File "etree.pyx", line
> 1562, in etree.ElementDepthFirstIte
> rator.__next__
> 2007/01/23 13:22:02:all2all_MainThread:ERROR:   File "etree.pyx", line
> 1207, in etree._elementFactory
> 2007/01/23 13:22:02:all2all_MainThread:ERROR:   File "proxy.pxi", line 28,
> in etree.registerProxy
> 2007/01/23 13:22:02:all2all_MainThread:ERROR:<ThreadedQueue
> name='TQ_normal'>: AssertionError: double regi
> stering proxy!
> 
> I strongly suspect this is a threading-related problem as it occurs in a
> multithreaded
> test program.
> I'm also able to fix this if any thread copy.deepcopy()'s all incoming
> Elements
> before doing anything with them (the threads basically dispatch from Queues
> where other
> threads have put Elements into).
> 
> Hence my question:
> - Am I doing something nasty here which is pretty much forbidden (I know I
> will have to copy
> my Elements anyway, as my threads will want to modify them)
> - and/or should lxml guard the element proxy registration

Although this may not answer your question (and I'm sure you've already read
it), here's the official disclaimer on threading in lxml:

http://codespeak.net/lxml/FAQ.html#can-i-use-threads-to-concurrently-access-the-lxml-api

What you observe is definitely a threading issue. The code in _elementFactory
(etree.pyx) suggests that different threads are concurrently creating proxies
for the same node.

The sad answer is: this is not quite what the threading code was initially
written for. It was rather meant for cases where threads were doing
independent things concurrently, such as a web-server request dispatcher that
forwards requests to different threads that do XSLTs or the like. So, the
problem is: there are not a lot of people using threading with lxml, so we
would mainly reduce the performance for the majority of users if we added
locking to to the _elementFactory for those few who do.

Since you already suggest deep copying, that's definitely the way to go for
you. Another easy way to work around it would be to instantiate all proxies
before dispatching the trees (the usual list(root.getiterator()) bit) and keep
the list until releasing the tree.

I'll ask on the list what others think about making lxml more thread-safe,
though, to avoid this kind of problems in the future.

Regards,
Stefan


More information about the lxml-dev mailing list