[lxml-dev] Unable to solve a crash on Windows with LXML
Robert Liebeskind
robl at perfectworld.net
Mon Jan 26 09:42:04 CET 2009
Hi Stefan,
Yes, this is helpful and I will make the adjustment you suggest.
Actually I meant lxml 2.1.2 and lxml 2.1.5. Sorry for the confusion.
Regards,
Rob.
On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:
> Hi,
>
> I'm CC-ing the list, I hope you don't mind. I think your description
> is
> abstract enough not to reveal anything about your application.
>
> Robert Liebeskind wrote:
>> The trace you received was from v2.2 of lxml but we continue to
>> experience
>> the same issue with v2.5. We use XPath extensively. We do not use
>> XSLT.
>
> I guess you meant 2.2beta1 and 2.1.5?
>
>
>> 1. An etree is loaded from an xml file and the data displayed for
>> the
>> user.
>> 2. The etree is modified as the result of user edits using a GUI
>
> I assume that this happens inside one thread.
>
>
>> 3. The etree is the copied using copy.deepcopy() to etree2
>> 4. etree2 is passed via a queue to a thread in which it is further
>> processed.
>
> Try copying the tree inside the target thread, (preferably) instead of
> copying it inside another thread and passing it over. Trees inherit
> state
> from the thread that built them. Also, using a tree inside a thread
> that
> did not build it will result in some additional adaptation overhead.
>
>
>> 5. etree2 is modfied as a result of processing in its own thread.
>> during this processing
>> additional trees/elems are fetched from disk and used to modify/
>> augment etree2.
>> 6. etree2 is copied to etree3
>> 7. etree3 is sent for a additional processing in its own thread.
>> 8. etree2 is copied to etree4
>> 9. etree 4 is sent for additional processing in its own thread.
>
> Same thing for 6/7 and 8/9. Copying the tree from inside the target
> thread
> will make things more stable. Even if multiple copying is not really
> memory friendly, it's very fast in lxml, so as long as we are not
> talking
> about documents with several megabytes, and as long as this thing
> really
> runs on a multi processor machine, you should be fine even with a
> work-around that copies the tree redundantly in both threads.
>
>
>> at this point the initial thread is complete and tears down.
>> the two additional spawned threads finish quickly and tear down as
>> well.
>> These processes will succeed quite often. They fail intermittently
>> and result in a Windows Unhandled Exception.
>
> lxml.etree uses a per-thread dictionary that holds names of tags and
> attributes. That's one of the reasons why it's so fast and memory
> friendly. In the stack trace you showed me, it seems that a tree is
> freed
> in a different thread than the one that built it, but (for whatever
> reason) some of it content is still linked to a dictionary of the
> original
> thread. In this case, the tree cleanup cannot detect that the name is
> stored in a dictionary and will free it manually. When the originating
> thread goes down, either before or after the thread that freed the
> tree,
> it will destroy the dictionary that stores the name, which results
> in a
> double free.
>
> Does that help for now?
>
> Stefan
>
>
More information about the lxml-dev
mailing list