[lxml-dev] Unable to solve a crash on Windows with LXML

Robert Liebeskind robl at perfectworld.net
Mon Jan 26 09:42:04 CET 2009


Hi Stefan,

Yes, this is helpful and I will make the adjustment you suggest.

Actually I meant lxml 2.1.2 and lxml 2.1.5.  Sorry for the confusion.

Regards,

Rob.

On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:

> Hi,
>
> I'm CC-ing the list, I hope you don't mind. I think your description  
> is
> abstract enough not to reveal anything about your application.
>
> Robert Liebeskind wrote:
>> The trace you received was from v2.2 of lxml but we continue to
>> experience
>> the same issue with v2.5.  We use XPath extensively.  We do not use
>> XSLT.
>
> I guess you meant 2.2beta1 and 2.1.5?
>
>
>> 1. 	An etree is loaded from an xml file and the data displayed for  
>> the
>> user.
>> 2. 	The etree is modified as the result of user edits using a GUI
>
> I assume that this happens inside one thread.
>
>
>> 3.	The etree is the copied using copy.deepcopy() to etree2
>> 4.	etree2 is passed via a queue to a thread in which it is further
>> processed.
>
> Try copying the tree inside the target thread, (preferably) instead of
> copying it inside another thread and passing it over. Trees inherit  
> state
> from the thread that built them. Also, using a tree inside a thread  
> that
> did not build it will result in some additional adaptation overhead.
>
>
>> 5.	etree2 is modfied as a result of processing in its own thread.
>> during this processing
>> 	additional trees/elems are fetched from disk and used to modify/
>> augment etree2.
>> 6.	etree2 is copied to etree3
>> 7. 	etree3 is sent for a additional processing in its own thread.
>> 8.	etree2 is copied to etree4
>> 9.	etree 4 is sent for additional processing in its own thread.
>
> Same thing for 6/7 and 8/9. Copying the tree from inside the target  
> thread
> will make things more stable. Even if multiple copying is not really
> memory friendly, it's very fast in lxml, so as long as we are not  
> talking
> about documents with several megabytes, and as long as this thing  
> really
> runs on a multi processor machine, you should be fine even with a
> work-around that copies the tree redundantly in both threads.
>
>
>> at this point the initial thread is complete and tears down.
>> the two additional spawned threads finish quickly and tear down as  
>> well.
>> These processes will succeed quite often.  They fail intermittently
>> and result in a Windows Unhandled Exception.
>
> lxml.etree uses a per-thread dictionary that holds names of tags and
> attributes. That's one of the reasons why it's so fast and memory
> friendly. In the stack trace you showed me, it seems that a tree is  
> freed
> in a different thread than the one that built it, but (for whatever
> reason) some of it content is still linked to a dictionary of the  
> original
> thread. In this case, the tree cleanup cannot detect that the name is
> stored in a dictionary and will free it manually. When the originating
> thread goes down, either before or after the thread that freed the  
> tree,
> it will destroy the dictionary that stores the name, which results  
> in a
> double free.
>
> Does that help for now?
>
> Stefan
>
>



More information about the lxml-dev mailing list