[Lxml-checkins] r44180 - in lxml/branch/lxml-1.3: . benchmark doc src/lxml src/lxml/tests

scoder at codespeak.net scoder at codespeak.net
Tue Jun 12 18:53:58 CEST 2007


Author: scoder
Date: Tue Jun 12 18:53:57 2007
New Revision: 44180

Modified:
   lxml/branch/lxml-1.3/CHANGES.txt
   lxml/branch/lxml-1.3/benchmark/bench_etree.py
   lxml/branch/lxml-1.3/doc/FAQ.txt
   lxml/branch/lxml-1.3/doc/build.txt
   lxml/branch/lxml-1.3/doc/performance.txt
   lxml/branch/lxml-1.3/doc/sax.txt
   lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi
   lxml/branch/lxml-1.3/src/lxml/etree.pyx
   lxml/branch/lxml-1.3/src/lxml/sax.py
   lxml/branch/lxml-1.3/src/lxml/serializer.pxi
   lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py
   lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py
Log:
merged in revs up to 42695 from trunk

Modified: lxml/branch/lxml-1.3/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-1.3/CHANGES.txt	(original)
+++ lxml/branch/lxml-1.3/CHANGES.txt	Tue Jun 12 18:53:57 2007
@@ -8,17 +8,25 @@
 Features added
 --------------
 
+* ``Element.addnext(el)`` and ``Element.addprevious(el)`` methods to support
+  adding processing instructions and comments around the root node
+
 * Element.attrib now has a ``pop()`` method
 
-* Support for custom Element class instantiation in lxml.sax
+* Support for custom Element class instantiation in lxml.sax: passing a
+  ``makeelement()`` function to the ElementTreeContentHandler will reuse the
+  lookup context of that function
 
 * '.' represents empty ObjectPath (identity)
 
-* ``Element.values()`` to accompany the existing ``.keys()`` and ``.items()``
-
 Bugs fixed
 ----------
 
+* Documents lost their top-level PIs and comments on serialisation
+
+* lxml.sax failed on comments and PIs. Comments are now properly ignored and
+  PIs are copied.
+
 * Raise AssertionError when passing strings containing '\0' bytes
 
 

Modified: lxml/branch/lxml-1.3/benchmark/bench_etree.py
==============================================================================
--- lxml/branch/lxml-1.3/benchmark/bench_etree.py	(original)
+++ lxml/branch/lxml-1.3/benchmark/bench_etree.py	Tue Jun 12 18:53:57 2007
@@ -18,6 +18,19 @@
         for child in reversed(root):
             pass
 
+    def bench_first_child(self, root):
+        for i in range(1000):
+            child = root[0]
+
+    def bench_last_child(self, root):
+        for i in range(1000):
+            child = root[-1]
+
+    def bench_middle_child(self, root):
+        pos = len(root) / 2
+        for i in range(1000):
+            child = root[pos]
+
     @with_attributes(True, False)
     @with_text(text=True, utext=True)
     def bench_tostring_utf8(self, root):

Modified: lxml/branch/lxml-1.3/doc/FAQ.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/FAQ.txt	(original)
+++ lxml/branch/lxml-1.3/doc/FAQ.txt	Tue Jun 12 18:53:57 2007
@@ -1,6 +1,6 @@
-==========================
-Frequently Asked Questions
-==========================
+================================
+Frequently Asked Questions (FAQ)
+================================
 
 See also the notes on compatibility_ to ElementTree_.
 
@@ -15,25 +15,28 @@
      1.3  What standards does lxml implement?
      1.4  Where are the Windows binaries?
      1.5  What is the difference between lxml.etree and lxml.objectify?
-     1.6  Why is my application so slow?
+     1.6  How can I make my application run faster?
      1.7  Why do I get errors about missing UCS4 symbols when installing lxml?
-   2  Bugs
-     2.1  My application crashes! Why does lxml.etree do that?
-     2.2  I think I have found a bug in lxml. What should I do?
-   3  Threading
-     3.1  Can I use threads to concurrently access the lxml API?
-     3.2  Does my program run faster if I use threads?
-     3.3  Would my single-threaded program run faster if I turned off threading?
-   4  Parsing and Serialisation
-     4.1  Why doesn't the ``pretty_print`` option reformat my XML output?
-     4.2  Why can't lxml parse my XML from unicode strings?
-     4.3  What is the difference between str(xslt(doc)) and xslt(doc).write() ?
-     4.4  Why can't I just delete parents or clear the root node in iterparse()?
-   5  XPath and Document Traversal
-     5.1  What are the ``findall()`` and ``xpath()`` methods on Element(Tree)?
-     5.2  Why doesn't ``findall()`` support full XPath expressions?
-     5.3  How can I find out which namespace prefixes are used in a document?
-     5.4  How can I specify a default namespace for XPath expressions?
+   2  Contributing
+     2.1  Why is lxml not written in Python?
+     2.2  How can I contribute?
+   3  Bugs
+     3.1  My application crashes! Why does lxml.etree do that?
+     3.2  I think I have found a bug in lxml. What should I do?
+   4  Threading
+     4.1  Can I use threads to concurrently access the lxml API?
+     4.2  Does my program run faster if I use threads?
+     4.3  Would my single-threaded program run faster if I turned off threading?
+   5  Parsing and Serialisation
+     5.1  Why doesn't the ``pretty_print`` option reformat my XML output?
+     5.2  Why can't lxml parse my XML from unicode strings?
+     5.3  What is the difference between str(xslt(doc)) and xslt(doc).write() ?
+     5.4  Why can't I just delete parents or clear the root node in iterparse()?
+   6  XPath and Document Traversal
+     6.1  What are the ``findall()`` and ``xpath()`` methods on Element(Tree)?
+     6.2  Why doesn't ``findall()`` support full XPath expressions?
+     6.3  How can I find out which namespace prefixes are used in a document?
+     6.4  How can I specify a default namespace for XPath expressions?
 
 
 General Questions
@@ -133,17 +136,18 @@
   XPath, XSLT or validation.
 
 
-Why is my application so slow?
-------------------------------
+How can I make my application run faster?
+-----------------------------------------
 
 lxml.etree is a very fast library for processing XML.  There are, however, `a
 few caveats`_ involved in the mapping of the powerful libxml2 library to the
 simple and convenient ElementTree API.  Not all operations are as fast as the
-simplicity of the API might suggest.  The `benchmark page`_ has a comparison
-to other ElementTree implementations and a number of tips for performance
-tweaking.  As with any Python application, the rule of thumb is: the more of
-your processing runs in C, the faster your application gets.  See also the
-section on threading_.
+simplicity of the API might suggest, while some use cases can heavily benefit
+from finding the right way of doing them.  The `benchmark page`_ has a
+comparison to other ElementTree implementations and a number of tips for
+performance tweaking.  As with any Python application, the rule of thumb is:
+the more of your processing runs in C, the faster your application gets.  See
+also the section on threading_.
 
 .. _`a few caveats`:  performance.html#the-elementtree-api
 .. _`benchmark page`: performance.html
@@ -167,6 +171,65 @@
 .. _`build instructions`: build.html
 
 
+Contributing
+============
+
+Why is lxml not written in Python?
+----------------------------------
+
+lxml interfaces with two C libraries: libxml2 and libxslt.  Accessing them at
+the C-level is required for performance reasons.
+
+To avoid writing plain C-code and caring too much about the details of
+built-in types and reference counting, lxml is written in Pyrex_, a
+Python-like language that is translated into C-code.  Chances are that if you
+know Python, you can write `code that Pyrex accepts`_.  Again, the C-ish style
+used in the lxml code is just for performance optimisations.  If you want to
+contribute, don't bother with the details, a Python implementation of your
+contribution is better than none.  And keep in mind that lxml's flexible API
+often favours an implementation of features in pure Python, without bothering
+with C-code at all.
+
+Please contact the `mailing list`_ if you need any help.
+
+.. _Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/
+.. _`code that Pyrex accepts`: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/overview.html
+
+
+How can I contribute?
+---------------------
+
+Besides enhancing the code, there are a lot of places where you can help the
+project and its user base.  You can
+
+* spread the word and write about lxml.  Many users (especially new Python
+  users) have not yet heared about lxml, although our user base is constantly
+  growing.  If you write your own blog and feel like saying something about
+  lxml, go ahead and do so.  If we think your contribution or criticism is
+  valuable to other users, we may even put a link or a quote on the project
+  page.
+
+* provide code examples for the general usage of lxml or specific problems
+  solved with lxml.  Readable code is a very good way of showing how a library
+  can be used and what great things you can do with it.  Again, if we hear
+  about it, we can set a link on the project page.
+
+* work on the documentation.  The web page is generated from a set of ReST_
+  `text files`_.  It is meant both as a representative project page for lxml
+  and as a site for documenting lxml's API and usage.  If you have questions
+  or an idea how to make it more readable and accessible while you are reading
+  it, please send a comment to the `mailing list`_.
+
+.. _ReST: http://docutils.sourceforge.net/rst.html
+.. _`text files`: http://codespeak.net/svn/lxml/trunk/doc/
+
+* improve the docstrings.  lxml uses docstrings to support Python's integrated
+  online ``help()`` function.  However, sometimes these are not sufficient to
+  grasp the details of the function in question.  If you find such a place,
+  you can try to write up a better description and send it to the `mailing
+  list`_.
+
+
 Bugs
 ====
 
@@ -176,7 +239,7 @@
 One of the goals of lxml is "no segfaults", so if there is no clear warning in
 the documentation that you were doing something potentially harmful, you have
 found a bug and we would like to hear about it.  Please report this bug to the
-mailing list.  See the next section on how to do that.
+`mailing list`_.  See the next section on how to do that.
 
 
 I think I have found a bug in lxml. What should I do?

Modified: lxml/branch/lxml-1.3/doc/build.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/build.txt	(original)
+++ lxml/branch/lxml-1.3/doc/build.txt	Tue Jun 12 18:53:57 2007
@@ -2,8 +2,10 @@
 =============================
 
 To build lxml from source, you need libxml2 and libxslt properly installed,
-including header files (possibly shipped in -dev packages).  The build process
-also requires setuptools_.
+*including the header files*.  These are likely shipped in separate ``-dev``
+or ``-devel`` packages like ``libxml2-dev``, which you need to install.  The
+build process also requires setuptools_.  The lxml source distribution comes
+with a script called ``ez_setup.py`` that can be used to install them.
 
 .. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
 
@@ -34,18 +36,22 @@
 
   Newer versions of lxml depend on features and bug fixes that are not yet
   available in an official Pyrex release.  This includes support for the
-  external C-API of lxml, for Python 2.5 and for 64 bit architectures.
+  external C-API of lxml.etree, for Python 2.5 and for 64 bit architectures.
 
   To build lxml 1.1 and later from non-release or modified sources, you must
-  therefore install an updated Pyrex version from here:
+  therefore use an updated Pyrex version from here:
 
   http://codespeak.net/svn/lxml/pyrex/
 
-  Since version 1.1.2, the lxml source distribution includes this Pyrex
-  version.  It will be used if the 'pyrex' directory is available in the lxml
-  root directory.  If you install from SVN or delete this directory from the
-  unpacked distribution directory, the normally installed Pyrex version will
-  be used.
+  A subversion checkout of lxml will automatically retrieve the latest Pyrex
+  as external project source (``svn:externals``).  Look out for the ``Pyrex``
+  directory in the source tree.
+
+  Since version 1.1.2, the lxml source distribution also includes this Pyrex
+  version.  It will be used if the ``Pyrex`` directory is available in the
+  lxml root directory.  If you install from SVN or delete this directory from
+  the unpacked distribution directory, the normally installed Pyrex version
+  will be used.
 
 * lxml 1.0 and earlier
 
@@ -86,6 +92,10 @@
 
   python setup.py build
 
+or::
+
+  python setup.py bdist_egg
+
 If you want to test lxml from the source directory, it is better to build it
 in-place like this::
 
@@ -96,15 +106,24 @@
   make
 
 If you get errors about missing header files (e.g., ``libxml/xmlversion.h``)
-then you need to add the location of that file to the include path like::
+then you need to make sure the development packages of libxml2 and libxslt are
+properly installed.  If this doesn't help, you may have to add the location of
+the header files to the include path like::
 
-  python setup.py build_ext -i -I /usr/include/libxml2
+  python setup.py build_ext -i  -I /usr/include/libxml2
 
 where the file is in ``/usr/include/libxml2/libxml/xmlversion.h``
 
 To use lxml.etree in-place, you can place lxml's ``src`` directory on your
 Python module search path (PYTHONPATH) and then import ``lxml.etree`` to play
-with it.
+with it::
+
+  # cd lxml
+  # PYTHONPATH=src python
+  Python 2.5.1
+  Type "help", "copyright", "credits" or "license" for more information.
+  >>> from lxml import etree
+  >>>
 
 To recompile after changes, note that you may have to run ``make clean`` or
 delete the file ``src/lxml/etree.c``.  Distutils do not automatically pick up
@@ -125,8 +144,8 @@
 
   make test
 
-To run the ElementTree and cElementTree compatibility tests, make sure
-you have lxml on your PYTHONPATH first, then run::
+This also runs the ElementTree and cElementTree compatibility tests.  To call
+them separately, make sure you have lxml on your PYTHONPATH first, then run::
 
   python selftest.py
 
@@ -147,15 +166,16 @@
 
 This is the procedure to make an lxml egg for your platform:
 
-* download the lxml-x.y.tar.gz release. This contains the pregenerated C so we
-  don't run into any Pyrex issues. Unpack it and cd into it.
+* Download the lxml-x.y.tar.gz release.  This contains the pregenerated C so
+  that you don't run into any Pyrex issues.  Unpack it and cd into it.
 
 * python setup.py build
 
-* if you're on a unixy platform, cd into build/lib.your.platform and
-  strip any .so file you find there. This reduces the size of the egg.
+* If you're on a unixy platform, cd into ``build/lib.your.platform`` and strip
+  any ``.so`` file you find there.  This reduces the size of the egg
+  considerably.
 
-* python setup.py bdist_egg upload
+* ``python setup.py bdist_egg upload``
 
 The last 'upload' step only works if you have access to the lxml cheeseshop
 entry.  If not, you can just make an egg with ``bdist_egg`` and mail it to the

Modified: lxml/branch/lxml-1.3/doc/performance.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/performance.txt	(original)
+++ lxml/branch/lxml-1.3/doc/performance.txt	Tue Jun 12 18:53:57 2007
@@ -14,7 +14,7 @@
 .. _ElementTree:  http://effbot.org/zone/element-index.htm
 .. _cElementTree: http://effbot.org/zone/celementtree.htm
 
-The statements made here are backed by the benchmark scripts
+The statements made here are backed by the (micro-)benchmark scripts
 `bench_etree.py`_, `bench_xpath.py`_ and `bench_objectify.py`_ that come with
 the lxml source distribution.  The timings cited below compare lxml 1.3 (with
 libxml2 2.6.26) to the ElementTree and cElementTree versions shipped with
@@ -30,10 +30,22 @@
 attributes (-/A), with or without ASCII or unicode text (-/S/U), and either
 against a tree or its serialised form (T/X).  In the result extracts cited
 below, T1 refers to a 3-level tree with many children at the third level, T2
-is swapped around to have many children at the root element, T3 is a deep tree
-with few children at each level and T4 is a small tree, slightly broader than
-deep.  If repetition is involved, this usually means running the benchmark in
-a loop over all children of the tree root.
+is swapped around to have many children below the root element, T3 is a deep
+tree with few children at each level and T4 is a small tree, slightly broader
+than deep.  If repetition is involved, this usually means running the
+benchmark in a loop over all children of the tree root, otherwise, the
+operation is run on the root node (C/R).
+
+As an example, the character code ``(SATR T1)`` states that the benchmark was
+running for tree T1, with plain string text (S) and attributes (A).  It was
+run against the root element (R) in the tree structure of the data (T).
+
+Note that very small operations are repeated in integer loops to make them
+measurable.  It is therefore not always possible to compare the absolute
+timings of, say, a single access benchmark (which usually loops) and a 'get
+all in one step' benchmark, which already takes enough time to be measurable
+and is therefore measured as is.  Take a look at the concrete benchmarks in
+the scripts to understand how the numbers compare.
 
 .. contents::
 .. 
@@ -48,11 +60,11 @@
 Bad things first
 ----------------
 
-First thing to say: there *is* an overhead involved in having a C library
-mimic the ElementTree API.  As opposed to ElementTree, lxml has to generate
-Python objects on the fly when asked for them.  What this means is: the more
-of your code runs in Python, the slower your application gets.  Note, however,
-that this is true for most performance critical Python applications.
+First thing to say: there *is* an overhead involved in having a DOM-like C
+library mimic the ElementTree API.  As opposed to ElementTree, lxml has to
+generate Python objects on the fly when asked for them.  What this means is:
+the more of your code runs in Python, the slower your application gets.  Note,
+however, that this is true for most performance critical Python applications.
 
 
 Parsing and Serialising
@@ -132,20 +144,20 @@
 (given in seconds)::
 
   lxe:       --     S-     U-     -A     SA     UA
-       T1: 0.1029 0.1005 0.0998 0.1003 0.0998 0.1002
-       T2: 0.1035 0.1013 0.1015 0.1090 0.1089 0.1090
-       T3: 0.0276 0.0270 0.0273 0.0679 0.0673 0.0673
-       T4: 0.0004 0.0004 0.0004 0.0013 0.0013 0.0013
+       T1: 0.1155 0.1154 0.1153 0.1159 0.1181 0.1158
+       T2: 0.1183 0.1197 0.1200 0.1267 0.1261 0.1264
+       T3: 0.0341 0.0312 0.0314 0.0726 0.0717 0.0720
+       T4: 0.0005 0.0004 0.0004 0.0014 0.0014 0.0014
   cET:       --     S-     U-     -A     SA     UA
-       T1: 0.0277 0.0273 0.0273 0.0272 0.0278 0.0275
-       T2: 0.0281 0.0347 0.0281 0.0285 0.0284 0.0284
-       T3: 0.0074 0.0074 0.0074 0.0122 0.0102 0.0101
+       T1: 0.0290 0.0271 0.0275 0.0297 0.0273 0.0274
+       T2: 0.0280 0.0280 0.0281 0.0285 0.0283 0.0286
+       T3: 0.0071 0.0072 0.0071 0.0113 0.0096 0.0096
        T4: 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
   ET :       --     S-     U-     -A     SA     UA
-       T1: 0.1349 0.1962 0.2356 0.1288 0.2642 0.1351
-       T2: 0.3104 0.1344 0.3566 0.3857 0.1354 0.4677
-       T3: 0.0313 0.0325 0.0312 0.0356 0.3803 0.0364
-       T4: 0.0005 0.0005 0.0008 0.0006 0.0007 0.0006
+       T1: 0.1362 0.1985 0.2300 0.1344 0.2672 0.1335
+       T2: 0.3107 0.1386 0.3581 0.3886 0.1388 0.4277
+       T3: 0.0334 0.0332 0.0320 0.0367 0.3769 0.0375
+       T4: 0.0006 0.0005 0.0008 0.0007 0.0007 0.0006
 
 While lxml is still faster than ET in most cases (30-60%), cET can be up to
 three times faster than lxml here.  One of the reasons is that lxml must
@@ -161,6 +173,29 @@
   cET: root_getchildren          (--TR T2)    0.0150 msec/pass
   ET : root_getchildren          (--TR T2)    0.0091 msec/pass
 
+When accessing single children, however, e.g. by index, this handicap is
+negligible::
+
+  lxe: first_child               (--TR T2)    0.2499 msec/pass
+  cET: first_child               (--TR T2)    0.2048 msec/pass
+  ET : first_child               (--TR T2)    0.9291 msec/pass
+
+  lxe: last_child                (--TR T1)    0.2511 msec/pass
+  cET: last_child                (--TR T1)    0.2148 msec/pass
+  ET : last_child                (--TR T1)    0.9191 msec/pass
+
+... unless you add the time to find a child index in a bigger list, as ET and
+cET use Python lists here, which are based on arrays.  The data structure used
+by libxml2 is a linked tree, and thus, a linked list of children::
+
+  lxe: middle_child              (--TR T1)    0.2921 msec/pass
+  cET: middle_child              (--TR T1)    0.2069 msec/pass
+  ET : middle_child              (--TR T1)    0.9291 msec/pass
+
+  lxe: middle_child              (--TR T2)    1.9028 msec/pass
+  cET: middle_child              (--TR T2)    0.2089 msec/pass
+  ET : middle_child              (--TR T2)    0.9360 msec/pass
+
 As opposed to ET, libxml2 has a notion of documents that each element must be
 in.  This results in a major performance difference for creating independent
 Elements that end up in independently created documents::

Modified: lxml/branch/lxml-1.3/doc/sax.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/sax.txt	(original)
+++ lxml/branch/lxml-1.3/doc/sax.txt	Tue Jun 12 18:53:57 2007
@@ -39,6 +39,10 @@
   >>> lxml.etree.tostring(tree.getroot())
   '<a><b foo="bar">Hello world</b></a>'
 
+By passing a ``makeelement`` function the constructor of
+``ElementTreeContentHandler``, e.g. the one of a parser you configured, you
+can determine which element class lookup scheme should be used.
+
 
 Producing SAX events from an ElementTree or Element
 ---------------------------------------------------

Modified: lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi	(original)
+++ lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi	Tue Jun 12 18:53:57 2007
@@ -518,7 +518,6 @@
     c_node = child._c_node
     # store possible text node
     c_next = c_node.next
-    # XXX what if element is coming from a different document?
     tree.xmlUnlinkNode(c_node)
     # move node itself
     tree.xmlAddChild(parent._c_node, c_node)
@@ -527,6 +526,38 @@
     # parent element has moved; change them too..
     moveNodeToDocument(child, parent._doc)
 
+cdef void _appendSibling(_Element element, _Element sibling):
+    """Append a new child to a parent element.
+    """
+    cdef xmlNode* c_next
+    cdef xmlNode* c_node
+    c_node = sibling._c_node
+    # store possible text node
+    c_next = c_node.next
+    tree.xmlUnlinkNode(c_node)
+    # move node itself
+    tree.xmlAddNextSibling(element._c_node, c_node)
+    _moveTail(c_next, c_node)
+    # uh oh, elements may be pointing to different doc when
+    # parent element has moved; change them too..
+    moveNodeToDocument(sibling, element._doc)
+
+cdef void _prependSibling(_Element element, _Element sibling):
+    """Append a new child to a parent element.
+    """
+    cdef xmlNode* c_next
+    cdef xmlNode* c_node
+    c_node = sibling._c_node
+    # store possible text node
+    c_next = c_node.next
+    tree.xmlUnlinkNode(c_node)
+    # move node itself
+    tree.xmlAddPrevSibling(element._c_node, c_node)
+    _moveTail(c_next, c_node)
+    # uh oh, elements may be pointing to different doc when
+    # parent element has moved; change them too..
+    moveNodeToDocument(sibling, element._doc)
+
 cdef int isutf8(char* s):
     cdef char c
     c = s[0]

Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/etree.pyx	(original)
+++ lxml/branch/lxml-1.3/src/lxml/etree.pyx	Tue Jun 12 18:53:57 2007
@@ -531,6 +531,36 @@
         """
         _appendChild(self, element)
 
+    def addnext(self, _Element element):
+        """Adds the element as a following sibling directly after this
+        element.
+
+        This is normally used to set a processing instruction or comment after
+        the root node of a document.  Note that tail text is automatically
+        discarded when adding at the root level.
+        """
+        if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
+            if element._c_node.type != tree.XML_PI_NODE:
+                if element._c_node.type != tree.XML_COMMENT_NODE:
+                    raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+            element.tail = None
+        _appendSibling(self, element)
+
+    def addprevious(self, _Element element):
+        """Adds the element as a preceding sibling directly before this
+        element.
+
+        This is normally used to set a processing instruction or comment
+        before the root node of a document.  Note that tail text is
+        automatically discarded when adding at the root level.
+        """
+        if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
+            if element._c_node.type != tree.XML_PI_NODE:
+                if element._c_node.type != tree.XML_COMMENT_NODE:
+                    raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+            element.tail = None
+        _prependSibling(self, element)
+
     def extend(self, elements):
         """Extends the current children by the elements in the iterable.
         """
@@ -1096,6 +1126,9 @@
     def items(self):
         return []
 
+    def values(self):
+        return []
+
 cdef class _Comment(__ContentOnlyElement):
     property tag:
         def __get__(self):
@@ -1749,6 +1782,8 @@
     tree.xmlAddChild(<xmlNode*>c_doc, c_node)
     return _elementFactory(doc, c_node)
 
+PI = ProcessingInstruction
+
 def SubElement(_Element _parent not None, _tag,
                attrib=None, nsmap=None, **_extra):
     """Subelement factory. This function creates an element instance, and appends it to an

Modified: lxml/branch/lxml-1.3/src/lxml/sax.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/sax.py	(original)
+++ lxml/branch/lxml-1.3/src/lxml/sax.py	Tue Jun 12 18:53:57 2007
@@ -1,5 +1,6 @@
 from xml.sax.handler import ContentHandler
 from etree import ElementTree, Element, SubElement, LxmlError
+from etree import XML, Comment, ProcessingInstruction
 
 class SaxError(LxmlError):
     pass
@@ -15,6 +16,7 @@
     """
     def __init__(self, makeelement=None):
         self._root = None
+        self._root_siblings = []
         self._element_stack = []
         self._default_ns = None
         self._ns_mapping = { None : [None] }
@@ -82,6 +84,10 @@
         if self._root is None:
             element = self._root = \
                       self._makeelement(el_name, attrs, self._new_mappings)
+            if self._root_siblings and hasattr(element, 'addprevious'):
+                for sibling in self._root_siblings:
+                    element.addprevious(sibling)
+            del self._root_siblings[:]
         else:
             element = SubElement(element_stack[-1], el_name,
                                  attrs, self._new_mappings)
@@ -89,10 +95,16 @@
 
         self._new_mappings.clear()
 
+    def processingInstruction(self, target, data):
+        pi = ProcessingInstruction(target, data)
+        if self._root is None:
+            self._root_siblings.append(pi)
+        else:
+            self._element_stack[-1].append(pi)
+
     def endElementNS(self, ns_name, qname):
         element = self._element_stack.pop()
-        tag = element.tag
-        if ns_name != _getNsTag(tag):
+        if ns_name != _getNsTag(element.tag):
             raise SaxError, "Unexpected element closed: {%s}%s" % ns_name
 
     def startElement(self, name, attributes=None):
@@ -106,10 +118,13 @@
         try:
             # if there already is a child element, we must append to its tail
             last_element = last_element[-1]
-            last_element.tail = (last_element.tail or u'') + data
+            last_element.tail = (last_element.tail or '') + data
         except IndexError:
             # otherwise: append to the text
-            last_element.text = (last_element.text or u'') + data
+            last_element.text = (last_element.text or '') + data
+
+    ignorableWhitespace = characters
+        
 
 class ElementTreeProducer(object):
     """Produces SAX events for an element and children.
@@ -124,13 +139,41 @@
         from xml.sax.xmlreader import AttributesNSImpl as attr_class
         self._attr_class = attr_class
         self._empty_attributes = attr_class({}, {})
-        
+
     def saxify(self):
         self._content_handler.startDocument()
-        self._recursive_saxify(self._element, {})
+
+        element = self._element
+        if hasattr(element, 'getprevious'):
+            siblings = []
+            sibling = element.getprevious()
+            while getattr(sibling, 'tag', None) is ProcessingInstruction:
+                siblings.append(sibling)
+                sibling = sibling.getprevious()
+            for sibling in siblings[::-1]:
+                self._recursive_saxify(sibling, {})
+
+        self._recursive_saxify(element, {})
+
+        if hasattr(element, 'getnext'):
+            sibling = element.getnext()
+            while getattr(sibling, 'tag', None) is ProcessingInstruction:
+                self._recursive_saxify(sibling, {})
+                sibling = sibling.getnext()
+
         self._content_handler.endDocument()
 
     def _recursive_saxify(self, element, prefixes):
+        content_handler = self._content_handler
+        tag = element.tag
+        if tag is Comment or tag is ProcessingInstruction:
+            if tag is ProcessingInstruction:
+                content_handler.processingInstruction(
+                    element.target, element.text)
+            if element.tail:
+                content_handler.characters(element.tail)
+            return
+
         new_prefixes = []
         build_qname = self._build_qname
         attribs = element.items()
@@ -146,10 +189,9 @@
         else:
             sax_attributes = self._empty_attributes
 
-        ns_uri, local_name = _getNsTag(element.tag)
+        ns_uri, local_name = _getNsTag(tag)
         qname = build_qname(ns_uri, local_name, prefixes, new_prefixes)
 
-        content_handler = self._content_handler
         for prefix, uri in new_prefixes:
             content_handler.startPrefixMapping(prefix, uri)
         content_handler.startElementNS((ns_uri, local_name),

Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/serializer.pxi	(original)
+++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi	Tue Jun 12 18:53:57 2007
@@ -78,8 +78,10 @@
     if write_xml_declaration:
         _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding)
 
+    _writePrevSiblings(c_buffer, c_node, encoding, pretty_print)
     tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding)
     _writeTail(c_buffer, c_node, encoding, pretty_print)
+    _writeNextSiblings(c_buffer, c_node, encoding, pretty_print)
 
 cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer,
                                     char* version, char* encoding):
@@ -100,6 +102,36 @@
                                pretty_print, encoding)
         c_node = c_node.next
 
+cdef void _writePrevSiblings(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node,
+                             char* encoding, int pretty_print):
+    cdef xmlNode* c_sibling
+    if c_node.parent is not NULL and _isElement(c_node.parent):
+        return
+    # we are at a root node, so add PI and comment siblings
+    c_sibling = c_node
+    while c_sibling.prev != NULL and \
+              (c_sibling.prev.type == tree.XML_PI_NODE or \
+               c_sibling.prev.type == tree.XML_COMMENT_NODE):
+        c_sibling = c_sibling.prev
+    while c_sibling != c_node:
+        tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_sibling, 0,
+                               pretty_print, encoding)
+        c_sibling = c_sibling.next
+
+cdef void _writeNextSiblings(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node,
+                             char* encoding, int pretty_print):
+    cdef xmlNode* c_sibling
+    if c_node.parent is not NULL and _isElement(c_node.parent):
+        return
+    # we are at a root node, so add PI and comment siblings
+    c_sibling = c_node.next
+    while c_sibling != NULL and \
+              (c_sibling.type == tree.XML_PI_NODE or \
+               c_sibling.type == tree.XML_COMMENT_NODE):
+        tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_sibling, 0,
+                               pretty_print, encoding)
+        c_sibling = c_sibling.next
+
 # output to file-like objects
 
 cdef class _FilelikeWriter:

Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py	(original)
+++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py	Tue Jun 12 18:53:57 2007
@@ -404,6 +404,156 @@
         Element = self.etree.Element
         self.assertRaises(TypeError, Element('a').append, None)
 
+    def test_addnext(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        root = Element('root')
+        SubElement(root, 'a')
+        SubElement(root, 'b')
+
+        self.assertEquals(['a', 'b'],
+                          [c.tag for c in root])
+        root[1].addnext(root[0])
+        self.assertEquals(['b', 'a'],
+                          [c.tag for c in root])
+
+    def test_addprevious(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        root = Element('root')
+        SubElement(root, 'a')
+        SubElement(root, 'b')
+
+        self.assertEquals(['a', 'b'],
+                          [c.tag for c in root])
+        root[0].addprevious(root[1])
+        self.assertEquals(['b', 'a'],
+                          [c.tag for c in root])
+
+    def test_addnext_root(self):
+        Element = self.etree.Element
+        a = Element('a')
+        b = Element('b')
+        self.assertRaises(TypeError, a.addnext, b)
+
+    def test_addnext_root(self):
+        Element = self.etree.Element
+        a = Element('a')
+        b = Element('b')
+        self.assertRaises(TypeError, a.addnext, b)
+
+    def test_addprevious_pi(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        PI = self.etree.PI
+        root = Element('root')
+        SubElement(root, 'a')
+        pi = PI('TARGET', 'TEXT')
+        pi.tail = "TAIL"
+
+        self.assertEquals('<root><a></a></root>',
+                          self._writeElement(root))
+        root[0].addprevious(pi)
+        self.assertEquals('<root><?TARGET TEXT?>TAIL<a></a></root>',
+                          self._writeElement(root))
+
+    def test_addprevious_root_pi(self):
+        Element = self.etree.Element
+        PI = self.etree.PI
+        root = Element('root')
+        pi = PI('TARGET', 'TEXT')
+        pi.tail = "TAIL"
+
+        self.assertEquals('<root></root>',
+                          self._writeElement(root))
+        root.addprevious(pi)
+        self.assertEquals('<?TARGET TEXT?>\n<root></root>',
+                          self._writeElement(root))
+
+    def test_addnext_pi(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        PI = self.etree.PI
+        root = Element('root')
+        SubElement(root, 'a')
+        pi = PI('TARGET', 'TEXT')
+        pi.tail = "TAIL"
+
+        self.assertEquals('<root><a></a></root>',
+                          self._writeElement(root))
+        root[0].addnext(pi)
+        self.assertEquals('<root><a></a><?TARGET TEXT?>TAIL</root>',
+                          self._writeElement(root))
+
+    def test_addnext_root_pi(self):
+        Element = self.etree.Element
+        PI = self.etree.PI
+        root = Element('root')
+        pi = PI('TARGET', 'TEXT')
+        pi.tail = "TAIL"
+
+        self.assertEquals('<root></root>',
+                          self._writeElement(root))
+        root.addnext(pi)
+        self.assertEquals('<root></root>\n<?TARGET TEXT?>',
+                          self._writeElement(root))
+
+    def test_addnext_comment(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        Comment = self.etree.Comment
+        root = Element('root')
+        SubElement(root, 'a')
+        comment = Comment('TEXT ')
+        comment.tail = "TAIL"
+
+        self.assertEquals('<root><a></a></root>',
+                          self._writeElement(root))
+        root[0].addnext(comment)
+        self.assertEquals('<root><a></a><!--TEXT -->TAIL</root>',
+                          self._writeElement(root))
+
+    def test_addnext_root_comment(self):
+        Element = self.etree.Element
+        Comment = self.etree.Comment
+        root = Element('root')
+        comment = Comment('TEXT ')
+        comment.tail = "TAIL"
+
+        self.assertEquals('<root></root>',
+                          self._writeElement(root))
+        root.addnext(comment)
+        self.assertEquals('<root></root>\n<!--TEXT -->',
+                          self._writeElement(root))
+
+    def test_addprevious_comment(self):
+        Element = self.etree.Element
+        SubElement = self.etree.SubElement
+        Comment = self.etree.Comment
+        root = Element('root')
+        SubElement(root, 'a')
+        comment = Comment('TEXT ')
+        comment.tail = "TAIL"
+
+        self.assertEquals('<root><a></a></root>',
+                          self._writeElement(root))
+        root[0].addprevious(comment)
+        self.assertEquals('<root><!--TEXT -->TAIL<a></a></root>',
+                          self._writeElement(root))
+
+    def test_addprevious_root_comment(self):
+        Element = self.etree.Element
+        Comment = self.etree.Comment
+        root = Element('root')
+        comment = Comment('TEXT ')
+        comment.tail = "TAIL"
+
+        self.assertEquals('<root></root>',
+                          self._writeElement(root))
+        root.addprevious(comment)
+        self.assertEquals('<!--TEXT -->\n<root></root>',
+                          self._writeElement(root))
+
     # gives error in ElementTree
     def test_comment_empty(self):
         Element = self.etree.Element

Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py	(original)
+++ lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py	Tue Jun 12 18:53:57 2007
@@ -25,6 +25,30 @@
         self.assertEquals('<a>ab<b>bb</b>ba</a>',
                           xml_out)
 
+    def test_etree_sax_comment(self):
+        tree = self.parse('<a>ab<!-- TEST -->ba</a>')
+        xml_out = self._saxify_serialize(tree)
+        self.assertEquals('<a>abba</a>',
+                          xml_out)
+
+    def test_etree_sax_pi(self):
+        tree = self.parse('<a>ab<?this and that?>ba</a>')
+        xml_out = self._saxify_serialize(tree)
+        self.assertEquals('<a>ab<?this and that?>ba</a>',
+                          xml_out)
+
+    def test_etree_sax_comment_root(self):
+        tree = self.parse('<!-- TEST --><a>ab</a>')
+        xml_out = self._saxify_serialize(tree)
+        self.assertEquals('<a>ab</a>',
+                          xml_out)
+
+    def test_etree_sax_pi_root(self):
+        tree = self.parse('<?this and that?><a>ab</a>')
+        xml_out = self._saxify_serialize(tree)
+        self.assertEquals('<?this and that?><a>ab</a>',
+                          xml_out)
+
     def test_etree_sax_attributes(self):
         tree = self.parse('<a aa="5">ab<b b="5"/>ba</a>')
         xml_out = self._saxify_serialize(tree)


More information about the lxml-checkins mailing list