[Lxml-checkins] r44180 - in lxml/branch/lxml-1.3: . benchmark doc src/lxml src/lxml/tests
scoder at codespeak.net
scoder at codespeak.net
Tue Jun 12 18:53:58 CEST 2007
Author: scoder
Date: Tue Jun 12 18:53:57 2007
New Revision: 44180
Modified:
lxml/branch/lxml-1.3/CHANGES.txt
lxml/branch/lxml-1.3/benchmark/bench_etree.py
lxml/branch/lxml-1.3/doc/FAQ.txt
lxml/branch/lxml-1.3/doc/build.txt
lxml/branch/lxml-1.3/doc/performance.txt
lxml/branch/lxml-1.3/doc/sax.txt
lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi
lxml/branch/lxml-1.3/src/lxml/etree.pyx
lxml/branch/lxml-1.3/src/lxml/sax.py
lxml/branch/lxml-1.3/src/lxml/serializer.pxi
lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py
lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py
Log:
merged in revs up to 42695 from trunk
Modified: lxml/branch/lxml-1.3/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-1.3/CHANGES.txt (original)
+++ lxml/branch/lxml-1.3/CHANGES.txt Tue Jun 12 18:53:57 2007
@@ -8,17 +8,25 @@
Features added
--------------
+* ``Element.addnext(el)`` and ``Element.addprevious(el)`` methods to support
+ adding processing instructions and comments around the root node
+
* Element.attrib now has a ``pop()`` method
-* Support for custom Element class instantiation in lxml.sax
+* Support for custom Element class instantiation in lxml.sax: passing a
+ ``makeelement()`` function to the ElementTreeContentHandler will reuse the
+ lookup context of that function
* '.' represents empty ObjectPath (identity)
-* ``Element.values()`` to accompany the existing ``.keys()`` and ``.items()``
-
Bugs fixed
----------
+* Documents lost their top-level PIs and comments on serialisation
+
+* lxml.sax failed on comments and PIs. Comments are now properly ignored and
+ PIs are copied.
+
* Raise AssertionError when passing strings containing '\0' bytes
Modified: lxml/branch/lxml-1.3/benchmark/bench_etree.py
==============================================================================
--- lxml/branch/lxml-1.3/benchmark/bench_etree.py (original)
+++ lxml/branch/lxml-1.3/benchmark/bench_etree.py Tue Jun 12 18:53:57 2007
@@ -18,6 +18,19 @@
for child in reversed(root):
pass
+ def bench_first_child(self, root):
+ for i in range(1000):
+ child = root[0]
+
+ def bench_last_child(self, root):
+ for i in range(1000):
+ child = root[-1]
+
+ def bench_middle_child(self, root):
+ pos = len(root) / 2
+ for i in range(1000):
+ child = root[pos]
+
@with_attributes(True, False)
@with_text(text=True, utext=True)
def bench_tostring_utf8(self, root):
Modified: lxml/branch/lxml-1.3/doc/FAQ.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/FAQ.txt (original)
+++ lxml/branch/lxml-1.3/doc/FAQ.txt Tue Jun 12 18:53:57 2007
@@ -1,6 +1,6 @@
-==========================
-Frequently Asked Questions
-==========================
+================================
+Frequently Asked Questions (FAQ)
+================================
See also the notes on compatibility_ to ElementTree_.
@@ -15,25 +15,28 @@
1.3 What standards does lxml implement?
1.4 Where are the Windows binaries?
1.5 What is the difference between lxml.etree and lxml.objectify?
- 1.6 Why is my application so slow?
+ 1.6 How can I make my application run faster?
1.7 Why do I get errors about missing UCS4 symbols when installing lxml?
- 2 Bugs
- 2.1 My application crashes! Why does lxml.etree do that?
- 2.2 I think I have found a bug in lxml. What should I do?
- 3 Threading
- 3.1 Can I use threads to concurrently access the lxml API?
- 3.2 Does my program run faster if I use threads?
- 3.3 Would my single-threaded program run faster if I turned off threading?
- 4 Parsing and Serialisation
- 4.1 Why doesn't the ``pretty_print`` option reformat my XML output?
- 4.2 Why can't lxml parse my XML from unicode strings?
- 4.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ?
- 4.4 Why can't I just delete parents or clear the root node in iterparse()?
- 5 XPath and Document Traversal
- 5.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)?
- 5.2 Why doesn't ``findall()`` support full XPath expressions?
- 5.3 How can I find out which namespace prefixes are used in a document?
- 5.4 How can I specify a default namespace for XPath expressions?
+ 2 Contributing
+ 2.1 Why is lxml not written in Python?
+ 2.2 How can I contribute?
+ 3 Bugs
+ 3.1 My application crashes! Why does lxml.etree do that?
+ 3.2 I think I have found a bug in lxml. What should I do?
+ 4 Threading
+ 4.1 Can I use threads to concurrently access the lxml API?
+ 4.2 Does my program run faster if I use threads?
+ 4.3 Would my single-threaded program run faster if I turned off threading?
+ 5 Parsing and Serialisation
+ 5.1 Why doesn't the ``pretty_print`` option reformat my XML output?
+ 5.2 Why can't lxml parse my XML from unicode strings?
+ 5.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ?
+ 5.4 Why can't I just delete parents or clear the root node in iterparse()?
+ 6 XPath and Document Traversal
+ 6.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)?
+ 6.2 Why doesn't ``findall()`` support full XPath expressions?
+ 6.3 How can I find out which namespace prefixes are used in a document?
+ 6.4 How can I specify a default namespace for XPath expressions?
General Questions
@@ -133,17 +136,18 @@
XPath, XSLT or validation.
-Why is my application so slow?
-------------------------------
+How can I make my application run faster?
+-----------------------------------------
lxml.etree is a very fast library for processing XML. There are, however, `a
few caveats`_ involved in the mapping of the powerful libxml2 library to the
simple and convenient ElementTree API. Not all operations are as fast as the
-simplicity of the API might suggest. The `benchmark page`_ has a comparison
-to other ElementTree implementations and a number of tips for performance
-tweaking. As with any Python application, the rule of thumb is: the more of
-your processing runs in C, the faster your application gets. See also the
-section on threading_.
+simplicity of the API might suggest, while some use cases can heavily benefit
+from finding the right way of doing them. The `benchmark page`_ has a
+comparison to other ElementTree implementations and a number of tips for
+performance tweaking. As with any Python application, the rule of thumb is:
+the more of your processing runs in C, the faster your application gets. See
+also the section on threading_.
.. _`a few caveats`: performance.html#the-elementtree-api
.. _`benchmark page`: performance.html
@@ -167,6 +171,65 @@
.. _`build instructions`: build.html
+Contributing
+============
+
+Why is lxml not written in Python?
+----------------------------------
+
+lxml interfaces with two C libraries: libxml2 and libxslt. Accessing them at
+the C-level is required for performance reasons.
+
+To avoid writing plain C-code and caring too much about the details of
+built-in types and reference counting, lxml is written in Pyrex_, a
+Python-like language that is translated into C-code. Chances are that if you
+know Python, you can write `code that Pyrex accepts`_. Again, the C-ish style
+used in the lxml code is just for performance optimisations. If you want to
+contribute, don't bother with the details, a Python implementation of your
+contribution is better than none. And keep in mind that lxml's flexible API
+often favours an implementation of features in pure Python, without bothering
+with C-code at all.
+
+Please contact the `mailing list`_ if you need any help.
+
+.. _Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/
+.. _`code that Pyrex accepts`: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/overview.html
+
+
+How can I contribute?
+---------------------
+
+Besides enhancing the code, there are a lot of places where you can help the
+project and its user base. You can
+
+* spread the word and write about lxml. Many users (especially new Python
+ users) have not yet heared about lxml, although our user base is constantly
+ growing. If you write your own blog and feel like saying something about
+ lxml, go ahead and do so. If we think your contribution or criticism is
+ valuable to other users, we may even put a link or a quote on the project
+ page.
+
+* provide code examples for the general usage of lxml or specific problems
+ solved with lxml. Readable code is a very good way of showing how a library
+ can be used and what great things you can do with it. Again, if we hear
+ about it, we can set a link on the project page.
+
+* work on the documentation. The web page is generated from a set of ReST_
+ `text files`_. It is meant both as a representative project page for lxml
+ and as a site for documenting lxml's API and usage. If you have questions
+ or an idea how to make it more readable and accessible while you are reading
+ it, please send a comment to the `mailing list`_.
+
+.. _ReST: http://docutils.sourceforge.net/rst.html
+.. _`text files`: http://codespeak.net/svn/lxml/trunk/doc/
+
+* improve the docstrings. lxml uses docstrings to support Python's integrated
+ online ``help()`` function. However, sometimes these are not sufficient to
+ grasp the details of the function in question. If you find such a place,
+ you can try to write up a better description and send it to the `mailing
+ list`_.
+
+
Bugs
====
@@ -176,7 +239,7 @@
One of the goals of lxml is "no segfaults", so if there is no clear warning in
the documentation that you were doing something potentially harmful, you have
found a bug and we would like to hear about it. Please report this bug to the
-mailing list. See the next section on how to do that.
+`mailing list`_. See the next section on how to do that.
I think I have found a bug in lxml. What should I do?
Modified: lxml/branch/lxml-1.3/doc/build.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/build.txt (original)
+++ lxml/branch/lxml-1.3/doc/build.txt Tue Jun 12 18:53:57 2007
@@ -2,8 +2,10 @@
=============================
To build lxml from source, you need libxml2 and libxslt properly installed,
-including header files (possibly shipped in -dev packages). The build process
-also requires setuptools_.
+*including the header files*. These are likely shipped in separate ``-dev``
+or ``-devel`` packages like ``libxml2-dev``, which you need to install. The
+build process also requires setuptools_. The lxml source distribution comes
+with a script called ``ez_setup.py`` that can be used to install them.
.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
@@ -34,18 +36,22 @@
Newer versions of lxml depend on features and bug fixes that are not yet
available in an official Pyrex release. This includes support for the
- external C-API of lxml, for Python 2.5 and for 64 bit architectures.
+ external C-API of lxml.etree, for Python 2.5 and for 64 bit architectures.
To build lxml 1.1 and later from non-release or modified sources, you must
- therefore install an updated Pyrex version from here:
+ therefore use an updated Pyrex version from here:
http://codespeak.net/svn/lxml/pyrex/
- Since version 1.1.2, the lxml source distribution includes this Pyrex
- version. It will be used if the 'pyrex' directory is available in the lxml
- root directory. If you install from SVN or delete this directory from the
- unpacked distribution directory, the normally installed Pyrex version will
- be used.
+ A subversion checkout of lxml will automatically retrieve the latest Pyrex
+ as external project source (``svn:externals``). Look out for the ``Pyrex``
+ directory in the source tree.
+
+ Since version 1.1.2, the lxml source distribution also includes this Pyrex
+ version. It will be used if the ``Pyrex`` directory is available in the
+ lxml root directory. If you install from SVN or delete this directory from
+ the unpacked distribution directory, the normally installed Pyrex version
+ will be used.
* lxml 1.0 and earlier
@@ -86,6 +92,10 @@
python setup.py build
+or::
+
+ python setup.py bdist_egg
+
If you want to test lxml from the source directory, it is better to build it
in-place like this::
@@ -96,15 +106,24 @@
make
If you get errors about missing header files (e.g., ``libxml/xmlversion.h``)
-then you need to add the location of that file to the include path like::
+then you need to make sure the development packages of libxml2 and libxslt are
+properly installed. If this doesn't help, you may have to add the location of
+the header files to the include path like::
- python setup.py build_ext -i -I /usr/include/libxml2
+ python setup.py build_ext -i -I /usr/include/libxml2
where the file is in ``/usr/include/libxml2/libxml/xmlversion.h``
To use lxml.etree in-place, you can place lxml's ``src`` directory on your
Python module search path (PYTHONPATH) and then import ``lxml.etree`` to play
-with it.
+with it::
+
+ # cd lxml
+ # PYTHONPATH=src python
+ Python 2.5.1
+ Type "help", "copyright", "credits" or "license" for more information.
+ >>> from lxml import etree
+ >>>
To recompile after changes, note that you may have to run ``make clean`` or
delete the file ``src/lxml/etree.c``. Distutils do not automatically pick up
@@ -125,8 +144,8 @@
make test
-To run the ElementTree and cElementTree compatibility tests, make sure
-you have lxml on your PYTHONPATH first, then run::
+This also runs the ElementTree and cElementTree compatibility tests. To call
+them separately, make sure you have lxml on your PYTHONPATH first, then run::
python selftest.py
@@ -147,15 +166,16 @@
This is the procedure to make an lxml egg for your platform:
-* download the lxml-x.y.tar.gz release. This contains the pregenerated C so we
- don't run into any Pyrex issues. Unpack it and cd into it.
+* Download the lxml-x.y.tar.gz release. This contains the pregenerated C so
+ that you don't run into any Pyrex issues. Unpack it and cd into it.
* python setup.py build
-* if you're on a unixy platform, cd into build/lib.your.platform and
- strip any .so file you find there. This reduces the size of the egg.
+* If you're on a unixy platform, cd into ``build/lib.your.platform`` and strip
+ any ``.so`` file you find there. This reduces the size of the egg
+ considerably.
-* python setup.py bdist_egg upload
+* ``python setup.py bdist_egg upload``
The last 'upload' step only works if you have access to the lxml cheeseshop
entry. If not, you can just make an egg with ``bdist_egg`` and mail it to the
Modified: lxml/branch/lxml-1.3/doc/performance.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/performance.txt (original)
+++ lxml/branch/lxml-1.3/doc/performance.txt Tue Jun 12 18:53:57 2007
@@ -14,7 +14,7 @@
.. _ElementTree: http://effbot.org/zone/element-index.htm
.. _cElementTree: http://effbot.org/zone/celementtree.htm
-The statements made here are backed by the benchmark scripts
+The statements made here are backed by the (micro-)benchmark scripts
`bench_etree.py`_, `bench_xpath.py`_ and `bench_objectify.py`_ that come with
the lxml source distribution. The timings cited below compare lxml 1.3 (with
libxml2 2.6.26) to the ElementTree and cElementTree versions shipped with
@@ -30,10 +30,22 @@
attributes (-/A), with or without ASCII or unicode text (-/S/U), and either
against a tree or its serialised form (T/X). In the result extracts cited
below, T1 refers to a 3-level tree with many children at the third level, T2
-is swapped around to have many children at the root element, T3 is a deep tree
-with few children at each level and T4 is a small tree, slightly broader than
-deep. If repetition is involved, this usually means running the benchmark in
-a loop over all children of the tree root.
+is swapped around to have many children below the root element, T3 is a deep
+tree with few children at each level and T4 is a small tree, slightly broader
+than deep. If repetition is involved, this usually means running the
+benchmark in a loop over all children of the tree root, otherwise, the
+operation is run on the root node (C/R).
+
+As an example, the character code ``(SATR T1)`` states that the benchmark was
+running for tree T1, with plain string text (S) and attributes (A). It was
+run against the root element (R) in the tree structure of the data (T).
+
+Note that very small operations are repeated in integer loops to make them
+measurable. It is therefore not always possible to compare the absolute
+timings of, say, a single access benchmark (which usually loops) and a 'get
+all in one step' benchmark, which already takes enough time to be measurable
+and is therefore measured as is. Take a look at the concrete benchmarks in
+the scripts to understand how the numbers compare.
.. contents::
..
@@ -48,11 +60,11 @@
Bad things first
----------------
-First thing to say: there *is* an overhead involved in having a C library
-mimic the ElementTree API. As opposed to ElementTree, lxml has to generate
-Python objects on the fly when asked for them. What this means is: the more
-of your code runs in Python, the slower your application gets. Note, however,
-that this is true for most performance critical Python applications.
+First thing to say: there *is* an overhead involved in having a DOM-like C
+library mimic the ElementTree API. As opposed to ElementTree, lxml has to
+generate Python objects on the fly when asked for them. What this means is:
+the more of your code runs in Python, the slower your application gets. Note,
+however, that this is true for most performance critical Python applications.
Parsing and Serialising
@@ -132,20 +144,20 @@
(given in seconds)::
lxe: -- S- U- -A SA UA
- T1: 0.1029 0.1005 0.0998 0.1003 0.0998 0.1002
- T2: 0.1035 0.1013 0.1015 0.1090 0.1089 0.1090
- T3: 0.0276 0.0270 0.0273 0.0679 0.0673 0.0673
- T4: 0.0004 0.0004 0.0004 0.0013 0.0013 0.0013
+ T1: 0.1155 0.1154 0.1153 0.1159 0.1181 0.1158
+ T2: 0.1183 0.1197 0.1200 0.1267 0.1261 0.1264
+ T3: 0.0341 0.0312 0.0314 0.0726 0.0717 0.0720
+ T4: 0.0005 0.0004 0.0004 0.0014 0.0014 0.0014
cET: -- S- U- -A SA UA
- T1: 0.0277 0.0273 0.0273 0.0272 0.0278 0.0275
- T2: 0.0281 0.0347 0.0281 0.0285 0.0284 0.0284
- T3: 0.0074 0.0074 0.0074 0.0122 0.0102 0.0101
+ T1: 0.0290 0.0271 0.0275 0.0297 0.0273 0.0274
+ T2: 0.0280 0.0280 0.0281 0.0285 0.0283 0.0286
+ T3: 0.0071 0.0072 0.0071 0.0113 0.0096 0.0096
T4: 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
ET : -- S- U- -A SA UA
- T1: 0.1349 0.1962 0.2356 0.1288 0.2642 0.1351
- T2: 0.3104 0.1344 0.3566 0.3857 0.1354 0.4677
- T3: 0.0313 0.0325 0.0312 0.0356 0.3803 0.0364
- T4: 0.0005 0.0005 0.0008 0.0006 0.0007 0.0006
+ T1: 0.1362 0.1985 0.2300 0.1344 0.2672 0.1335
+ T2: 0.3107 0.1386 0.3581 0.3886 0.1388 0.4277
+ T3: 0.0334 0.0332 0.0320 0.0367 0.3769 0.0375
+ T4: 0.0006 0.0005 0.0008 0.0007 0.0007 0.0006
While lxml is still faster than ET in most cases (30-60%), cET can be up to
three times faster than lxml here. One of the reasons is that lxml must
@@ -161,6 +173,29 @@
cET: root_getchildren (--TR T2) 0.0150 msec/pass
ET : root_getchildren (--TR T2) 0.0091 msec/pass
+When accessing single children, however, e.g. by index, this handicap is
+negligible::
+
+ lxe: first_child (--TR T2) 0.2499 msec/pass
+ cET: first_child (--TR T2) 0.2048 msec/pass
+ ET : first_child (--TR T2) 0.9291 msec/pass
+
+ lxe: last_child (--TR T1) 0.2511 msec/pass
+ cET: last_child (--TR T1) 0.2148 msec/pass
+ ET : last_child (--TR T1) 0.9191 msec/pass
+
+... unless you add the time to find a child index in a bigger list, as ET and
+cET use Python lists here, which are based on arrays. The data structure used
+by libxml2 is a linked tree, and thus, a linked list of children::
+
+ lxe: middle_child (--TR T1) 0.2921 msec/pass
+ cET: middle_child (--TR T1) 0.2069 msec/pass
+ ET : middle_child (--TR T1) 0.9291 msec/pass
+
+ lxe: middle_child (--TR T2) 1.9028 msec/pass
+ cET: middle_child (--TR T2) 0.2089 msec/pass
+ ET : middle_child (--TR T2) 0.9360 msec/pass
+
As opposed to ET, libxml2 has a notion of documents that each element must be
in. This results in a major performance difference for creating independent
Elements that end up in independently created documents::
Modified: lxml/branch/lxml-1.3/doc/sax.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/sax.txt (original)
+++ lxml/branch/lxml-1.3/doc/sax.txt Tue Jun 12 18:53:57 2007
@@ -39,6 +39,10 @@
>>> lxml.etree.tostring(tree.getroot())
'<a><b foo="bar">Hello world</b></a>'
+By passing a ``makeelement`` function the constructor of
+``ElementTreeContentHandler``, e.g. the one of a parser you configured, you
+can determine which element class lookup scheme should be used.
+
Producing SAX events from an ElementTree or Element
---------------------------------------------------
Modified: lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi (original)
+++ lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi Tue Jun 12 18:53:57 2007
@@ -518,7 +518,6 @@
c_node = child._c_node
# store possible text node
c_next = c_node.next
- # XXX what if element is coming from a different document?
tree.xmlUnlinkNode(c_node)
# move node itself
tree.xmlAddChild(parent._c_node, c_node)
@@ -527,6 +526,38 @@
# parent element has moved; change them too..
moveNodeToDocument(child, parent._doc)
+cdef void _appendSibling(_Element element, _Element sibling):
+ """Append a new child to a parent element.
+ """
+ cdef xmlNode* c_next
+ cdef xmlNode* c_node
+ c_node = sibling._c_node
+ # store possible text node
+ c_next = c_node.next
+ tree.xmlUnlinkNode(c_node)
+ # move node itself
+ tree.xmlAddNextSibling(element._c_node, c_node)
+ _moveTail(c_next, c_node)
+ # uh oh, elements may be pointing to different doc when
+ # parent element has moved; change them too..
+ moveNodeToDocument(sibling, element._doc)
+
+cdef void _prependSibling(_Element element, _Element sibling):
+ """Append a new child to a parent element.
+ """
+ cdef xmlNode* c_next
+ cdef xmlNode* c_node
+ c_node = sibling._c_node
+ # store possible text node
+ c_next = c_node.next
+ tree.xmlUnlinkNode(c_node)
+ # move node itself
+ tree.xmlAddPrevSibling(element._c_node, c_node)
+ _moveTail(c_next, c_node)
+ # uh oh, elements may be pointing to different doc when
+ # parent element has moved; change them too..
+ moveNodeToDocument(sibling, element._doc)
+
cdef int isutf8(char* s):
cdef char c
c = s[0]
Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/etree.pyx (original)
+++ lxml/branch/lxml-1.3/src/lxml/etree.pyx Tue Jun 12 18:53:57 2007
@@ -531,6 +531,36 @@
"""
_appendChild(self, element)
+ def addnext(self, _Element element):
+ """Adds the element as a following sibling directly after this
+ element.
+
+ This is normally used to set a processing instruction or comment after
+ the root node of a document. Note that tail text is automatically
+ discarded when adding at the root level.
+ """
+ if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
+ if element._c_node.type != tree.XML_PI_NODE:
+ if element._c_node.type != tree.XML_COMMENT_NODE:
+ raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+ element.tail = None
+ _appendSibling(self, element)
+
+ def addprevious(self, _Element element):
+ """Adds the element as a preceding sibling directly before this
+ element.
+
+ This is normally used to set a processing instruction or comment
+ before the root node of a document. Note that tail text is
+ automatically discarded when adding at the root level.
+ """
+ if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
+ if element._c_node.type != tree.XML_PI_NODE:
+ if element._c_node.type != tree.XML_COMMENT_NODE:
+ raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+ element.tail = None
+ _prependSibling(self, element)
+
def extend(self, elements):
"""Extends the current children by the elements in the iterable.
"""
@@ -1096,6 +1126,9 @@
def items(self):
return []
+ def values(self):
+ return []
+
cdef class _Comment(__ContentOnlyElement):
property tag:
def __get__(self):
@@ -1749,6 +1782,8 @@
tree.xmlAddChild(<xmlNode*>c_doc, c_node)
return _elementFactory(doc, c_node)
+PI = ProcessingInstruction
+
def SubElement(_Element _parent not None, _tag,
attrib=None, nsmap=None, **_extra):
"""Subelement factory. This function creates an element instance, and appends it to an
Modified: lxml/branch/lxml-1.3/src/lxml/sax.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/sax.py (original)
+++ lxml/branch/lxml-1.3/src/lxml/sax.py Tue Jun 12 18:53:57 2007
@@ -1,5 +1,6 @@
from xml.sax.handler import ContentHandler
from etree import ElementTree, Element, SubElement, LxmlError
+from etree import XML, Comment, ProcessingInstruction
class SaxError(LxmlError):
pass
@@ -15,6 +16,7 @@
"""
def __init__(self, makeelement=None):
self._root = None
+ self._root_siblings = []
self._element_stack = []
self._default_ns = None
self._ns_mapping = { None : [None] }
@@ -82,6 +84,10 @@
if self._root is None:
element = self._root = \
self._makeelement(el_name, attrs, self._new_mappings)
+ if self._root_siblings and hasattr(element, 'addprevious'):
+ for sibling in self._root_siblings:
+ element.addprevious(sibling)
+ del self._root_siblings[:]
else:
element = SubElement(element_stack[-1], el_name,
attrs, self._new_mappings)
@@ -89,10 +95,16 @@
self._new_mappings.clear()
+ def processingInstruction(self, target, data):
+ pi = ProcessingInstruction(target, data)
+ if self._root is None:
+ self._root_siblings.append(pi)
+ else:
+ self._element_stack[-1].append(pi)
+
def endElementNS(self, ns_name, qname):
element = self._element_stack.pop()
- tag = element.tag
- if ns_name != _getNsTag(tag):
+ if ns_name != _getNsTag(element.tag):
raise SaxError, "Unexpected element closed: {%s}%s" % ns_name
def startElement(self, name, attributes=None):
@@ -106,10 +118,13 @@
try:
# if there already is a child element, we must append to its tail
last_element = last_element[-1]
- last_element.tail = (last_element.tail or u'') + data
+ last_element.tail = (last_element.tail or '') + data
except IndexError:
# otherwise: append to the text
- last_element.text = (last_element.text or u'') + data
+ last_element.text = (last_element.text or '') + data
+
+ ignorableWhitespace = characters
+
class ElementTreeProducer(object):
"""Produces SAX events for an element and children.
@@ -124,13 +139,41 @@
from xml.sax.xmlreader import AttributesNSImpl as attr_class
self._attr_class = attr_class
self._empty_attributes = attr_class({}, {})
-
+
def saxify(self):
self._content_handler.startDocument()
- self._recursive_saxify(self._element, {})
+
+ element = self._element
+ if hasattr(element, 'getprevious'):
+ siblings = []
+ sibling = element.getprevious()
+ while getattr(sibling, 'tag', None) is ProcessingInstruction:
+ siblings.append(sibling)
+ sibling = sibling.getprevious()
+ for sibling in siblings[::-1]:
+ self._recursive_saxify(sibling, {})
+
+ self._recursive_saxify(element, {})
+
+ if hasattr(element, 'getnext'):
+ sibling = element.getnext()
+ while getattr(sibling, 'tag', None) is ProcessingInstruction:
+ self._recursive_saxify(sibling, {})
+ sibling = sibling.getnext()
+
self._content_handler.endDocument()
def _recursive_saxify(self, element, prefixes):
+ content_handler = self._content_handler
+ tag = element.tag
+ if tag is Comment or tag is ProcessingInstruction:
+ if tag is ProcessingInstruction:
+ content_handler.processingInstruction(
+ element.target, element.text)
+ if element.tail:
+ content_handler.characters(element.tail)
+ return
+
new_prefixes = []
build_qname = self._build_qname
attribs = element.items()
@@ -146,10 +189,9 @@
else:
sax_attributes = self._empty_attributes
- ns_uri, local_name = _getNsTag(element.tag)
+ ns_uri, local_name = _getNsTag(tag)
qname = build_qname(ns_uri, local_name, prefixes, new_prefixes)
- content_handler = self._content_handler
for prefix, uri in new_prefixes:
content_handler.startPrefixMapping(prefix, uri)
content_handler.startElementNS((ns_uri, local_name),
Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original)
+++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Tue Jun 12 18:53:57 2007
@@ -78,8 +78,10 @@
if write_xml_declaration:
_writeDeclarationToBuffer(c_buffer, c_doc.version, encoding)
+ _writePrevSiblings(c_buffer, c_node, encoding, pretty_print)
tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding)
_writeTail(c_buffer, c_node, encoding, pretty_print)
+ _writeNextSiblings(c_buffer, c_node, encoding, pretty_print)
cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer,
char* version, char* encoding):
@@ -100,6 +102,36 @@
pretty_print, encoding)
c_node = c_node.next
+cdef void _writePrevSiblings(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node,
+ char* encoding, int pretty_print):
+ cdef xmlNode* c_sibling
+ if c_node.parent is not NULL and _isElement(c_node.parent):
+ return
+ # we are at a root node, so add PI and comment siblings
+ c_sibling = c_node
+ while c_sibling.prev != NULL and \
+ (c_sibling.prev.type == tree.XML_PI_NODE or \
+ c_sibling.prev.type == tree.XML_COMMENT_NODE):
+ c_sibling = c_sibling.prev
+ while c_sibling != c_node:
+ tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_sibling, 0,
+ pretty_print, encoding)
+ c_sibling = c_sibling.next
+
+cdef void _writeNextSiblings(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node,
+ char* encoding, int pretty_print):
+ cdef xmlNode* c_sibling
+ if c_node.parent is not NULL and _isElement(c_node.parent):
+ return
+ # we are at a root node, so add PI and comment siblings
+ c_sibling = c_node.next
+ while c_sibling != NULL and \
+ (c_sibling.type == tree.XML_PI_NODE or \
+ c_sibling.type == tree.XML_COMMENT_NODE):
+ tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_sibling, 0,
+ pretty_print, encoding)
+ c_sibling = c_sibling.next
+
# output to file-like objects
cdef class _FilelikeWriter:
Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py (original)
+++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Tue Jun 12 18:53:57 2007
@@ -404,6 +404,156 @@
Element = self.etree.Element
self.assertRaises(TypeError, Element('a').append, None)
+ def test_addnext(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ root = Element('root')
+ SubElement(root, 'a')
+ SubElement(root, 'b')
+
+ self.assertEquals(['a', 'b'],
+ [c.tag for c in root])
+ root[1].addnext(root[0])
+ self.assertEquals(['b', 'a'],
+ [c.tag for c in root])
+
+ def test_addprevious(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ root = Element('root')
+ SubElement(root, 'a')
+ SubElement(root, 'b')
+
+ self.assertEquals(['a', 'b'],
+ [c.tag for c in root])
+ root[0].addprevious(root[1])
+ self.assertEquals(['b', 'a'],
+ [c.tag for c in root])
+
+ def test_addnext_root(self):
+ Element = self.etree.Element
+ a = Element('a')
+ b = Element('b')
+ self.assertRaises(TypeError, a.addnext, b)
+
+ def test_addnext_root(self):
+ Element = self.etree.Element
+ a = Element('a')
+ b = Element('b')
+ self.assertRaises(TypeError, a.addnext, b)
+
+ def test_addprevious_pi(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ PI = self.etree.PI
+ root = Element('root')
+ SubElement(root, 'a')
+ pi = PI('TARGET', 'TEXT')
+ pi.tail = "TAIL"
+
+ self.assertEquals('<root><a></a></root>',
+ self._writeElement(root))
+ root[0].addprevious(pi)
+ self.assertEquals('<root><?TARGET TEXT?>TAIL<a></a></root>',
+ self._writeElement(root))
+
+ def test_addprevious_root_pi(self):
+ Element = self.etree.Element
+ PI = self.etree.PI
+ root = Element('root')
+ pi = PI('TARGET', 'TEXT')
+ pi.tail = "TAIL"
+
+ self.assertEquals('<root></root>',
+ self._writeElement(root))
+ root.addprevious(pi)
+ self.assertEquals('<?TARGET TEXT?>\n<root></root>',
+ self._writeElement(root))
+
+ def test_addnext_pi(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ PI = self.etree.PI
+ root = Element('root')
+ SubElement(root, 'a')
+ pi = PI('TARGET', 'TEXT')
+ pi.tail = "TAIL"
+
+ self.assertEquals('<root><a></a></root>',
+ self._writeElement(root))
+ root[0].addnext(pi)
+ self.assertEquals('<root><a></a><?TARGET TEXT?>TAIL</root>',
+ self._writeElement(root))
+
+ def test_addnext_root_pi(self):
+ Element = self.etree.Element
+ PI = self.etree.PI
+ root = Element('root')
+ pi = PI('TARGET', 'TEXT')
+ pi.tail = "TAIL"
+
+ self.assertEquals('<root></root>',
+ self._writeElement(root))
+ root.addnext(pi)
+ self.assertEquals('<root></root>\n<?TARGET TEXT?>',
+ self._writeElement(root))
+
+ def test_addnext_comment(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ Comment = self.etree.Comment
+ root = Element('root')
+ SubElement(root, 'a')
+ comment = Comment('TEXT ')
+ comment.tail = "TAIL"
+
+ self.assertEquals('<root><a></a></root>',
+ self._writeElement(root))
+ root[0].addnext(comment)
+ self.assertEquals('<root><a></a><!--TEXT -->TAIL</root>',
+ self._writeElement(root))
+
+ def test_addnext_root_comment(self):
+ Element = self.etree.Element
+ Comment = self.etree.Comment
+ root = Element('root')
+ comment = Comment('TEXT ')
+ comment.tail = "TAIL"
+
+ self.assertEquals('<root></root>',
+ self._writeElement(root))
+ root.addnext(comment)
+ self.assertEquals('<root></root>\n<!--TEXT -->',
+ self._writeElement(root))
+
+ def test_addprevious_comment(self):
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ Comment = self.etree.Comment
+ root = Element('root')
+ SubElement(root, 'a')
+ comment = Comment('TEXT ')
+ comment.tail = "TAIL"
+
+ self.assertEquals('<root><a></a></root>',
+ self._writeElement(root))
+ root[0].addprevious(comment)
+ self.assertEquals('<root><!--TEXT -->TAIL<a></a></root>',
+ self._writeElement(root))
+
+ def test_addprevious_root_comment(self):
+ Element = self.etree.Element
+ Comment = self.etree.Comment
+ root = Element('root')
+ comment = Comment('TEXT ')
+ comment.tail = "TAIL"
+
+ self.assertEquals('<root></root>',
+ self._writeElement(root))
+ root.addprevious(comment)
+ self.assertEquals('<!--TEXT -->\n<root></root>',
+ self._writeElement(root))
+
# gives error in ElementTree
def test_comment_empty(self):
Element = self.etree.Element
Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py (original)
+++ lxml/branch/lxml-1.3/src/lxml/tests/test_sax.py Tue Jun 12 18:53:57 2007
@@ -25,6 +25,30 @@
self.assertEquals('<a>ab<b>bb</b>ba</a>',
xml_out)
+ def test_etree_sax_comment(self):
+ tree = self.parse('<a>ab<!-- TEST -->ba</a>')
+ xml_out = self._saxify_serialize(tree)
+ self.assertEquals('<a>abba</a>',
+ xml_out)
+
+ def test_etree_sax_pi(self):
+ tree = self.parse('<a>ab<?this and that?>ba</a>')
+ xml_out = self._saxify_serialize(tree)
+ self.assertEquals('<a>ab<?this and that?>ba</a>',
+ xml_out)
+
+ def test_etree_sax_comment_root(self):
+ tree = self.parse('<!-- TEST --><a>ab</a>')
+ xml_out = self._saxify_serialize(tree)
+ self.assertEquals('<a>ab</a>',
+ xml_out)
+
+ def test_etree_sax_pi_root(self):
+ tree = self.parse('<?this and that?><a>ab</a>')
+ xml_out = self._saxify_serialize(tree)
+ self.assertEquals('<?this and that?><a>ab</a>',
+ xml_out)
+
def test_etree_sax_attributes(self):
tree = self.parse('<a aa="5">ab<b b="5"/>ba</a>')
xml_out = self._saxify_serialize(tree)
More information about the lxml-checkins
mailing list