[Lxml-checkins] r42646 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Thu May 3 21:52:31 CEST 2007
Author: scoder
Date: Thu May 3 21:52:29 2007
New Revision: 42646
Modified:
lxml/trunk/doc/FAQ.txt
lxml/trunk/doc/performance.txt
Log:
doc on benchmark and performance
Modified: lxml/trunk/doc/FAQ.txt
==============================================================================
--- lxml/trunk/doc/FAQ.txt (original)
+++ lxml/trunk/doc/FAQ.txt Thu May 3 21:52:29 2007
@@ -15,7 +15,7 @@
1.3 What standards does lxml implement?
1.4 Where are the Windows binaries?
1.5 What is the difference between lxml.etree and lxml.objectify?
- 1.6 Why is my application so slow?
+ 1.6 How can I make my application run faster?
1.7 Why do I get errors about missing UCS4 symbols when installing lxml?
2 Contributing
2.1 Why is lxml not written in Python?
@@ -136,17 +136,18 @@
XPath, XSLT or validation.
-Why is my application so slow?
-------------------------------
+How can I make my application run faster?
+-----------------------------------------
lxml.etree is a very fast library for processing XML. There are, however, `a
few caveats`_ involved in the mapping of the powerful libxml2 library to the
simple and convenient ElementTree API. Not all operations are as fast as the
-simplicity of the API might suggest. The `benchmark page`_ has a comparison
-to other ElementTree implementations and a number of tips for performance
-tweaking. As with any Python application, the rule of thumb is: the more of
-your processing runs in C, the faster your application gets. See also the
-section on threading_.
+simplicity of the API might suggest, while some use cases can heavily benefit
+from finding the right way of doing them. The `benchmark page`_ has a
+comparison to other ElementTree implementations and a number of tips for
+performance tweaking. As with any Python application, the rule of thumb is:
+the more of your processing runs in C, the faster your application gets. See
+also the section on threading_.
.. _`a few caveats`: performance.html#the-elementtree-api
.. _`benchmark page`: performance.html
@@ -182,7 +183,7 @@
To avoid writing plain C-code and caring too much about the details of
built-in types and reference counting, lxml is written in Pyrex_, a
Python-like language that is translated into C-code. Chances are that if you
-know Python, you can write code that Pyrex accepts. Again, the C-ish style
+know Python, you can write `code that Pyrex accepts`_. Again, the C-ish style
used in the lxml code is just for performance optimisations. If you want to
contribute, don't bother with the details, a Python implementation of your
contribution is better than none. And keep in mind that lxml's flexible API
@@ -192,6 +193,7 @@
Please contact the `mailing list`_ if you need any help.
.. _Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/
+.. _`code that Pyrex accepts`: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/overview.html
How can I contribute?
Modified: lxml/trunk/doc/performance.txt
==============================================================================
--- lxml/trunk/doc/performance.txt (original)
+++ lxml/trunk/doc/performance.txt Thu May 3 21:52:29 2007
@@ -30,10 +30,15 @@
attributes (-/A), with or without ASCII or unicode text (-/S/U), and either
against a tree or its serialised form (T/X). In the result extracts cited
below, T1 refers to a 3-level tree with many children at the third level, T2
-is swapped around to have many children at the root element, T3 is a deep tree
-with few children at each level and T4 is a small tree, slightly broader than
-deep. If repetition is involved, this usually means running the benchmark in
-a loop over all children of the tree root.
+is swapped around to have many children below the root element, T3 is a deep
+tree with few children at each level and T4 is a small tree, slightly broader
+than deep. If repetition is involved, this usually means running the
+benchmark in a loop over all children of the tree root, otherwise, the
+operation is run on the root node (C/R).
+
+As an example, the character code ``(SATR T1)`` states that the benchmark was
+running for tree T1, with plain string text (S) and attributes (A). It was
+run against the root element (R) in the tree structure of the data (T).
.. contents::
..
@@ -48,11 +53,11 @@
Bad things first
----------------
-First thing to say: there *is* an overhead involved in having a C library
-mimic the ElementTree API. As opposed to ElementTree, lxml has to generate
-Python objects on the fly when asked for them. What this means is: the more
-of your code runs in Python, the slower your application gets. Note, however,
-that this is true for most performance critical Python applications.
+First thing to say: there *is* an overhead involved in having a DOM-like C
+library mimic the ElementTree API. As opposed to ElementTree, lxml has to
+generate Python objects on the fly when asked for them. What this means is:
+the more of your code runs in Python, the slower your application gets. Note,
+however, that this is true for most performance critical Python applications.
Parsing and Serialising
More information about the lxml-checkins
mailing list