[Lxml-checkins] r48188 - lxml/branch/lxml-1.3/doc

scoder at codespeak.net scoder at codespeak.net
Tue Oct 30 12:43:43 CET 2007


Author: scoder
Date: Tue Oct 30 12:43:41 2007
New Revision: 48188

Modified:
   lxml/branch/lxml-1.3/doc/FAQ.txt
Log:
FAQ update from trunk

Modified: lxml/branch/lxml-1.3/doc/FAQ.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/FAQ.txt	(original)
+++ lxml/branch/lxml-1.3/doc/FAQ.txt	Tue Oct 30 12:43:41 2007
@@ -21,6 +21,7 @@
      1.3  What standards does lxml implement?
      1.4  What is the difference between lxml.etree and lxml.objectify?
      1.5  How can I make my application run faster?
+     1.6  What about that trailing text on serialised Elements?
    2  Installation
      2.1  Which version of libxml2 and libxslt should I use or require?
      2.2  Where are the Windows binaries?
@@ -35,6 +36,8 @@
      5.1  Can I use threads to concurrently access the lxml API?
      5.2  Does my program run faster if I use threads?
      5.3  Would my single-threaded program run faster if I turned off threading?
+     5.4  Why can't I reuse XSLT stylesheets in other threads?
+     5.5  My program crashes when run with mod_python/Pyro/Zope/Plone/...
    6  Parsing and Serialisation
      6.1  Why doesn't the ``pretty_print`` option reformat my XML output?
      6.2  Why can't lxml parse my XML from unicode strings?
@@ -262,7 +265,8 @@
 contribute, don't bother with the details, a Python implementation of your
 contribution is better than none.  And keep in mind that lxml's flexible API
 often favours an implementation of features in pure Python, without bothering
-with C-code at all.
+with C-code at all.  For example, the ``lxml.html`` package is entirely written
+in Python.
 
 Please contact the `mailing list`_ if you need any help.
 
@@ -436,6 +440,94 @@
 lxml from source.
 
 
+Why can't I reuse XSLT stylesheets in other threads?
+----------------------------------------------------
+
+lxml currently has the restriction that an XSLT object can only be
+used in a thread if it was created either in the thread itself or in
+the main thread.  This is due to some interfering optimisations in
+libxslt and lxml.etree.  To work around this, you can do a couple of
+things:
+
+* create all XSLT objects in the main program and reuse them wherever
+  you want.
+
+* create them in the thread where you use them and maybe cache them in
+  thread local storage (see the threading module).
+
+If your stylesheets are diverse and status specific, you can still
+prepare them in advance if you:
+
+* use XSLT parameters that you pass at call time to configure the
+  stylesheets
+
+* create the stylesheets (partially) programmatically in the main
+  program, e.g. by adding ``xsl:output`` tags, ``xsl:include`` tags or
+  Templates (be careful with the order here) to the XSL tree, and then
+  create the ``XSLT`` objects and store them in a read-only
+  dictionary.  That way, you can access and use them in any thread.
+  Note that passing the same XSL tree into multiple ``XSLT()``
+  instances will create independent stylesheets.
+
+
+My program crashes when run with mod_python/Pyro/Zope/Plone/...
+---------------------------------------------------------------
+
+These environments can use threads in a way that may not make it obvious when
+threads are created and what happens in which thread.  This makes it hard to
+ensure lxml's threading support is used in a reliable way.  Sadly, if problems
+arise, they are as diverse as the applications, so it is difficult to provide
+any generally applicable solution.  Also, these environments are so complex
+that problems become hard to debug and even harder to reproduce in a
+predictable way.  If you encounter crashes in one of these systems, but your
+code runs perfectly when started by hand, the following gives you a few hints
+for possible approaches to solve your specific problem:
+
+* make sure you use recent versions of libxml2, libxslt and lxml.  The libxml2
+  developers keep fixing bugs in each release, and lxml also tries to become
+  more robust against possible pitfalls.  So newer versions might already fix
+  your problem in a reliable way.
+
+* make sure the library versions you installed are really used.  Do not rely
+  on what your operating system tells you!  Print the version constants in
+  ``lxml.etree`` from within your runtime environment to make sure it is the
+  case.  This is especially a problem under MacOS-X when newer library
+  versions were installed in addition to the outdated system libraries.
+
+* if you use ``mod_python``, try setting this option:
+
+      PythonInterpreter main_interpreter
+
+  There was a discussion on the mailing list about this problem:
+
+      http://comments.gmane.org/gmane.comp.python.lxml.devel/2942
+
+* compile lxml without threading support by running ``setup.py`` with the
+  ``--without-threading`` option.  While this might be slower in certain
+  scenarios on multi-processor systems, it *might* also keep your application
+  from crashing, which should be worth more to you than peek performance.
+  Remember that lxml is fast anyway, so concurrency may not even be worth it.
+
+* avoid doing fancy XSLT stuff like foreign document access or passing in
+  subtrees trough XSLT variables.  This might or might not work, depending on
+  your specific usage.
+
+* try copying trees at suspicious places and working with those instead of a
+  tree shared between threads.  A good candidate might be the result of an
+  XSLT or the stylesheet itself.
+
+* try keeping thread-local copies of XSLT stylesheets, i.e. one per thread,
+  instead of sharing one.  Also see the question above.
+
+* you can try to serialise suspicious parts of your code with explicit thread
+  locks, thus disabling the concurrency of the runtime system.
+
+* report back on the mailing list to see if there are other ways to work
+  around your specific problems.  Do not forget to report the version numbers
+  of lxml, libxml2 and libxslt you are using (see the question on reporting
+  a bug).
+
+
 Parsing and Serialisation
 =========================
 


More information about the lxml-checkins mailing list