[Lxml-checkins] r48188 - lxml/branch/lxml-1.3/doc
scoder at codespeak.net
scoder at codespeak.net
Tue Oct 30 12:43:43 CET 2007
Author: scoder
Date: Tue Oct 30 12:43:41 2007
New Revision: 48188
Modified:
lxml/branch/lxml-1.3/doc/FAQ.txt
Log:
FAQ update from trunk
Modified: lxml/branch/lxml-1.3/doc/FAQ.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/FAQ.txt (original)
+++ lxml/branch/lxml-1.3/doc/FAQ.txt Tue Oct 30 12:43:41 2007
@@ -21,6 +21,7 @@
1.3 What standards does lxml implement?
1.4 What is the difference between lxml.etree and lxml.objectify?
1.5 How can I make my application run faster?
+ 1.6 What about that trailing text on serialised Elements?
2 Installation
2.1 Which version of libxml2 and libxslt should I use or require?
2.2 Where are the Windows binaries?
@@ -35,6 +36,8 @@
5.1 Can I use threads to concurrently access the lxml API?
5.2 Does my program run faster if I use threads?
5.3 Would my single-threaded program run faster if I turned off threading?
+ 5.4 Why can't I reuse XSLT stylesheets in other threads?
+ 5.5 My program crashes when run with mod_python/Pyro/Zope/Plone/...
6 Parsing and Serialisation
6.1 Why doesn't the ``pretty_print`` option reformat my XML output?
6.2 Why can't lxml parse my XML from unicode strings?
@@ -262,7 +265,8 @@
contribute, don't bother with the details, a Python implementation of your
contribution is better than none. And keep in mind that lxml's flexible API
often favours an implementation of features in pure Python, without bothering
-with C-code at all.
+with C-code at all. For example, the ``lxml.html`` package is entirely written
+in Python.
Please contact the `mailing list`_ if you need any help.
@@ -436,6 +440,94 @@
lxml from source.
+Why can't I reuse XSLT stylesheets in other threads?
+----------------------------------------------------
+
+lxml currently has the restriction that an XSLT object can only be
+used in a thread if it was created either in the thread itself or in
+the main thread. This is due to some interfering optimisations in
+libxslt and lxml.etree. To work around this, you can do a couple of
+things:
+
+* create all XSLT objects in the main program and reuse them wherever
+ you want.
+
+* create them in the thread where you use them and maybe cache them in
+ thread local storage (see the threading module).
+
+If your stylesheets are diverse and status specific, you can still
+prepare them in advance if you:
+
+* use XSLT parameters that you pass at call time to configure the
+ stylesheets
+
+* create the stylesheets (partially) programmatically in the main
+ program, e.g. by adding ``xsl:output`` tags, ``xsl:include`` tags or
+ Templates (be careful with the order here) to the XSL tree, and then
+ create the ``XSLT`` objects and store them in a read-only
+ dictionary. That way, you can access and use them in any thread.
+ Note that passing the same XSL tree into multiple ``XSLT()``
+ instances will create independent stylesheets.
+
+
+My program crashes when run with mod_python/Pyro/Zope/Plone/...
+---------------------------------------------------------------
+
+These environments can use threads in a way that may not make it obvious when
+threads are created and what happens in which thread. This makes it hard to
+ensure lxml's threading support is used in a reliable way. Sadly, if problems
+arise, they are as diverse as the applications, so it is difficult to provide
+any generally applicable solution. Also, these environments are so complex
+that problems become hard to debug and even harder to reproduce in a
+predictable way. If you encounter crashes in one of these systems, but your
+code runs perfectly when started by hand, the following gives you a few hints
+for possible approaches to solve your specific problem:
+
+* make sure you use recent versions of libxml2, libxslt and lxml. The libxml2
+ developers keep fixing bugs in each release, and lxml also tries to become
+ more robust against possible pitfalls. So newer versions might already fix
+ your problem in a reliable way.
+
+* make sure the library versions you installed are really used. Do not rely
+ on what your operating system tells you! Print the version constants in
+ ``lxml.etree`` from within your runtime environment to make sure it is the
+ case. This is especially a problem under MacOS-X when newer library
+ versions were installed in addition to the outdated system libraries.
+
+* if you use ``mod_python``, try setting this option:
+
+ PythonInterpreter main_interpreter
+
+ There was a discussion on the mailing list about this problem:
+
+ http://comments.gmane.org/gmane.comp.python.lxml.devel/2942
+
+* compile lxml without threading support by running ``setup.py`` with the
+ ``--without-threading`` option. While this might be slower in certain
+ scenarios on multi-processor systems, it *might* also keep your application
+ from crashing, which should be worth more to you than peek performance.
+ Remember that lxml is fast anyway, so concurrency may not even be worth it.
+
+* avoid doing fancy XSLT stuff like foreign document access or passing in
+ subtrees trough XSLT variables. This might or might not work, depending on
+ your specific usage.
+
+* try copying trees at suspicious places and working with those instead of a
+ tree shared between threads. A good candidate might be the result of an
+ XSLT or the stylesheet itself.
+
+* try keeping thread-local copies of XSLT stylesheets, i.e. one per thread,
+ instead of sharing one. Also see the question above.
+
+* you can try to serialise suspicious parts of your code with explicit thread
+ locks, thus disabling the concurrency of the runtime system.
+
+* report back on the mailing list to see if there are other ways to work
+ around your specific problems. Do not forget to report the version numbers
+ of lxml, libxml2 and libxslt you are using (see the question on reporting
+ a bug).
+
+
Parsing and Serialisation
=========================
More information about the lxml-checkins
mailing list