[lxml-dev] Unable to solve a crash on Windows with LXML

Robert Liebeskind robl at perfectworld.net
Fri Feb 6 10:09:14 CET 2009


Hi Stefan,

I have modified my code so that an etree never crosses threads.
Now the etree is converted to text and then back to an etree in
the new thread. This has resolved the issue.

Does you know if  lxml 2.2 have the same issue?

Thanks for your help.

Rob.

On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:

> Hi,
>
> I'm CC-ing the list, I hope you don't mind. I think your description  
> is
> abstract enough not to reveal anything about your application.
>
> Robert Liebeskind wrote:
>> The trace you received was from v2.2 of lxml but we continue to
>> experience
>> the same issue with v2.5.  We use XPath extensively.  We do not use
>> XSLT.
>
> I guess you meant 2.2beta1 and 2.1.5?
>
>
>> 1. 	An etree is loaded from an xml file and the data displayed for  
>> the
>> user.
>> 2. 	The etree is modified as the result of user edits using a GUI
>
> I assume that this happens inside one thread.
>
>
>> 3.	The etree is the copied using copy.deepcopy() to etree2
>> 4.	etree2 is passed via a queue to a thread in which it is further
>> processed.
>
> Try copying the tree inside the target thread, (preferably) instead of
> copying it inside another thread and passing it over. Trees inherit  
> state
> from the thread that built them. Also, using a tree inside a thread  
> that
> did not build it will result in some additional adaptation overhead.
>
>
>> 5.	etree2 is modfied as a result of processing in its own thread.
>> during this processing
>> 	additional trees/elems are fetched from disk and used to modify/
>> augment etree2.
>> 6.	etree2 is copied to etree3
>> 7. 	etree3 is sent for a additional processing in its own thread.
>> 8.	etree2 is copied to etree4
>> 9.	etree 4 is sent for additional processing in its own thread.
>
> Same thing for 6/7 and 8/9. Copying the tree from inside the target  
> thread
> will make things more stable. Even if multiple copying is not really
> memory friendly, it's very fast in lxml, so as long as we are not  
> talking
> about documents with several megabytes, and as long as this thing  
> really
> runs on a multi processor machine, you should be fine even with a
> work-around that copies the tree redundantly in both threads.
>
>
>> at this point the initial thread is complete and tears down.
>> the two additional spawned threads finish quickly and tear down as  
>> well.
>> These processes will succeed quite often.  They fail intermittently
>> and result in a Windows Unhandled Exception.
>
> lxml.etree uses a per-thread dictionary that holds names of tags and
> attributes. That's one of the reasons why it's so fast and memory
> friendly. In the stack trace you showed me, it seems that a tree is  
> freed
> in a different thread than the one that built it, but (for whatever
> reason) some of it content is still linked to a dictionary of the  
> original
> thread. In this case, the tree cleanup cannot detect that the name is
> stored in a dictionary and will free it manually. When the originating
> thread goes down, either before or after the thread that freed the  
> tree,
> it will destroy the dictionary that stores the name, which results  
> in a
> double free.
>
> Does that help for now?
>
> Stefan
>
>



More information about the lxml-dev mailing list