[Lxml-checkins] r42887 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Wed May 9 00:15:45 CEST 2007
Author: scoder
Date: Wed May 9 00:15:45 2007
New Revision: 42887
Modified:
lxml/trunk/doc/performance.txt
Log:
doc restructuring
Modified: lxml/trunk/doc/performance.txt
==============================================================================
--- lxml/trunk/doc/performance.txt (original)
+++ lxml/trunk/doc/performance.txt Wed May 9 00:15:45 2007
@@ -196,6 +196,10 @@
are no longer referenced. ET and cET represent the tree itself through these
objects, which reduces the overhead in creating them.
+
+Child access
+------------
+
The same reason makes operations like ``getchildren()`` more costly in lxml.
Where ET and cET can quickly create a shallow copy of their list of children,
lxml has to create a Python object for each child and collect them in a list::
@@ -227,6 +231,10 @@
cET: middle_child (--TR T2) 0.2089 msec/pass
ET : middle_child (--TR T2) 0.9360 msec/pass
+
+Element creation
+----------------
+
As opposed to ET, libxml2 has a notion of documents that each element must be
in. This results in a major performance difference for creating independent
Elements that end up in independently created documents::
@@ -252,6 +260,10 @@
choice. Note, however, that the serialisation performance may even out this
advantage, especially for smaller trees and trees with many attributes.
+
+Merging different sources
+-------------------------
+
A critical action for lxml is moving elements between document contexts. It
requires lxml to do recursive adaptations throughout the moved tree structure.
@@ -285,8 +297,13 @@
cET: replace_children_element (--TC T1) 0.0238 msec/pass
ET : replace_children_element (--TC T1) 0.1628 msec/pass
-You should keep this difference in mind when you merge very large trees. On
-the other hand, deep copying a tree is fast in lxml::
+You should keep this difference in mind when you merge very large trees.
+
+
+deepcopy
+--------
+
+Deep copying a tree is fast in lxml::
lxe: deepcopy (--TC T1) 10.5221 msec/pass
cET: deepcopy (--TC T1) 220.2251 msec/pass
@@ -347,7 +364,7 @@
XPath
------
+=====
The following timings are based on the benchmark script `bench_xpath.py`_.
@@ -390,8 +407,8 @@
lxe: xpath_class_repeat (--TC T4) 1.0269 msec/pass
-An bigger example
-=================
+A bigger example
+================
A while ago, Uche Ogbuji posted a benchmark proposal at `xml.org`_ that would
read in a 3 MB XML version of the Old Testament of the Bible and look for the
@@ -521,6 +538,10 @@
API, the create-discard cycles can become a bottleneck, as elements have to be
instantiated over and over again.
+
+ObjectPath
+----------
+
ObjectPath can be used to speed up the access to elements that are deep in the
tree. It avoids step-by-step Python element instantiations along the path,
which can substantially improve the access time::
@@ -544,6 +565,10 @@
Note, however, that parsing ObjectPath expressions is not for free either, so
this is most effective for frequently accessing the same element.
+
+Caching Elements
+----------------
+
A way to improve the normal attribute access time is static instantiation of
the Python objects, thus trading memory for speed. Just create a cache
dictionary and run::
@@ -581,6 +606,10 @@
is most effective for largely immutable trees. You should consider using a
set instead of a list in this case and add new elements by hand.
+
+Further optimisations
+---------------------
+
Here are some more things to try if optimisation is required:
* A lot of time is usually spent in tree traversal to find the addressed
More information about the lxml-checkins
mailing list