[lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64

Stefan Behnel stefan_ml at behnel.de
Sat Jan 5 20:39:02 CET 2008


Hi Dmitri,

Stefan Behnel wrote:
> The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some
> things hard to control that lxml uses in libxml2 for performance reasons. In
> particular, lxml uses a thread-local hash table for constant strings, which is
> much faster than a malloc() for each string that occurs in a document.
> However, libxslt doesn't honour this dictionary and creates its own one based
> on the stylesheet dictionary. The result is that the stylesheet can leak into
> the result document through string references that now point into the hash
> table of the stylesheet.
> 
> There isn't a way in libxslt that would allow us to prevent this or to control
> the allocation. That's why I decided to restrict the execution of XSL
> transformations to threads that inherit the same hash table as the stylesheet,
> this should normally prevent any problems.

Here is a trivial patch (the one against xslt.pxi) that, instead of raising an
exception, copies the stylesheet into the current thread context, and thus
works around the current thread restrictions. It seems to work for me, any
chance you could give it a try?

In case it doesn't work reliably, could you additionally check the second
change (in parser.pxi)? It should restrict 'acceptable' hash tables to the
local thread, not including the main thread (as it did before).

Stefan

=== src/lxml/xslt.pxi
==================================================================
--- src/lxml/xslt.pxi   (revision 3205)
+++ src/lxml/xslt.pxi   (local)
@@ -373,7 +373,7 @@
         cdef xmlDoc* c_doc

         if not _checkThreadDict(self._c_style.doc.dict):
-            raise RuntimeError, "stylesheet is not usable in this thread"
+            return self.__copy__()(_input, profile_run=profile_run, **_kw)

         input_doc = _documentOrRaise(_input)
         root_node = _rootNodeOrRaise(_input)
=== src/lxml/parser.pxi
==================================================================
--- src/lxml/parser.pxi (revision 3205)
+++ src/lxml/parser.pxi (local)
@@ -132,8 +132,8 @@
     """Check that c_dict is either the local thread dictionary or the global
     parent dictionary.
     """
-    if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict:
-        return 1 # main thread
+    #if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict:
+    #    return 1 # main thread
     if __GLOBAL_PARSER_CONTEXT._getThreadDict(NULL) is c_dict:
         return 1 # local thread dict
     return 0


More information about the lxml-dev mailing list