[Lxml-checkins] r42646 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Thu May 3 21:52:31 CEST 2007


Author: scoder
Date: Thu May  3 21:52:29 2007
New Revision: 42646

Modified:
   lxml/trunk/doc/FAQ.txt
   lxml/trunk/doc/performance.txt
Log:
doc on benchmark and performance

Modified: lxml/trunk/doc/FAQ.txt
==============================================================================
--- lxml/trunk/doc/FAQ.txt	(original)
+++ lxml/trunk/doc/FAQ.txt	Thu May  3 21:52:29 2007
@@ -15,7 +15,7 @@
      1.3  What standards does lxml implement?
      1.4  Where are the Windows binaries?
      1.5  What is the difference between lxml.etree and lxml.objectify?
-     1.6  Why is my application so slow?
+     1.6  How can I make my application run faster?
      1.7  Why do I get errors about missing UCS4 symbols when installing lxml?
    2  Contributing
      2.1  Why is lxml not written in Python?
@@ -136,17 +136,18 @@
   XPath, XSLT or validation.
 
 
-Why is my application so slow?
-------------------------------
+How can I make my application run faster?
+-----------------------------------------
 
 lxml.etree is a very fast library for processing XML.  There are, however, `a
 few caveats`_ involved in the mapping of the powerful libxml2 library to the
 simple and convenient ElementTree API.  Not all operations are as fast as the
-simplicity of the API might suggest.  The `benchmark page`_ has a comparison
-to other ElementTree implementations and a number of tips for performance
-tweaking.  As with any Python application, the rule of thumb is: the more of
-your processing runs in C, the faster your application gets.  See also the
-section on threading_.
+simplicity of the API might suggest, while some use cases can heavily benefit
+from finding the right way of doing them.  The `benchmark page`_ has a
+comparison to other ElementTree implementations and a number of tips for
+performance tweaking.  As with any Python application, the rule of thumb is:
+the more of your processing runs in C, the faster your application gets.  See
+also the section on threading_.
 
 .. _`a few caveats`:  performance.html#the-elementtree-api
 .. _`benchmark page`: performance.html
@@ -182,7 +183,7 @@
 To avoid writing plain C-code and caring too much about the details of
 built-in types and reference counting, lxml is written in Pyrex_, a
 Python-like language that is translated into C-code.  Chances are that if you
-know Python, you can write code that Pyrex accepts.  Again, the C-ish style
+know Python, you can write `code that Pyrex accepts`_.  Again, the C-ish style
 used in the lxml code is just for performance optimisations.  If you want to
 contribute, don't bother with the details, a Python implementation of your
 contribution is better than none.  And keep in mind that lxml's flexible API
@@ -192,6 +193,7 @@
 Please contact the `mailing list`_ if you need any help.
 
 .. _Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/
+.. _`code that Pyrex accepts`: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/overview.html
 
 
 How can I contribute?

Modified: lxml/trunk/doc/performance.txt
==============================================================================
--- lxml/trunk/doc/performance.txt	(original)
+++ lxml/trunk/doc/performance.txt	Thu May  3 21:52:29 2007
@@ -30,10 +30,15 @@
 attributes (-/A), with or without ASCII or unicode text (-/S/U), and either
 against a tree or its serialised form (T/X).  In the result extracts cited
 below, T1 refers to a 3-level tree with many children at the third level, T2
-is swapped around to have many children at the root element, T3 is a deep tree
-with few children at each level and T4 is a small tree, slightly broader than
-deep.  If repetition is involved, this usually means running the benchmark in
-a loop over all children of the tree root.
+is swapped around to have many children below the root element, T3 is a deep
+tree with few children at each level and T4 is a small tree, slightly broader
+than deep.  If repetition is involved, this usually means running the
+benchmark in a loop over all children of the tree root, otherwise, the
+operation is run on the root node (C/R).
+
+As an example, the character code ``(SATR T1)`` states that the benchmark was
+running for tree T1, with plain string text (S) and attributes (A).  It was
+run against the root element (R) in the tree structure of the data (T).
 
 .. contents::
 .. 
@@ -48,11 +53,11 @@
 Bad things first
 ----------------
 
-First thing to say: there *is* an overhead involved in having a C library
-mimic the ElementTree API.  As opposed to ElementTree, lxml has to generate
-Python objects on the fly when asked for them.  What this means is: the more
-of your code runs in Python, the slower your application gets.  Note, however,
-that this is true for most performance critical Python applications.
+First thing to say: there *is* an overhead involved in having a DOM-like C
+library mimic the ElementTree API.  As opposed to ElementTree, lxml has to
+generate Python objects on the fly when asked for them.  What this means is:
+the more of your code runs in Python, the slower your application gets.  Note,
+however, that this is true for most performance critical Python applications.
 
 
 Parsing and Serialising


More information about the lxml-checkins mailing list