[Lxml-checkins] r45644 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Tue Aug 14 11:03:43 CEST 2007
Author: scoder
Date: Tue Aug 14 11:03:43 2007
New Revision: 45644
Modified:
lxml/trunk/doc/FAQ.txt
Log:
FAQ entry on trailing .tail's on serialisation
Modified: lxml/trunk/doc/FAQ.txt
==============================================================================
--- lxml/trunk/doc/FAQ.txt (original)
+++ lxml/trunk/doc/FAQ.txt Tue Aug 14 11:03:43 2007
@@ -141,6 +141,30 @@
.. _threading: #threading
+What about that trailing text on serialised Elements?
+-----------------------------------------------------
+
+The ElementTree tree model defines an Element as a container with a tag name,
+contained text, child Elements and a tail text. This means that whenever you
+serialise an Element, you will get all parts of that Element::
+
+ >>> from lxml import etree
+ >>> root = etree.XML("<root><tag>text<child/></tag>tail</root>")
+ >>> print etree.tostring(root[0])
+ <tag>text<child/></tag>tail
+
+This is a huge simplification for the tree model as it avoids text nodes to
+appear in the list of children and makes access to them quick and simple. So
+this is a benefit in most applications and simplifies many, many XML tree
+algorithms.
+
+However, in document-like XML (and especially HTML), the above result can be
+unexpected to new users and can sometimes require a bit more overhead. A good
+way to deal with this is to use helper functions that copy the Element without
+its tail. The ``lxml.html`` package also deals with this in a couple of
+places, as most HTML algorithms benefit from a tail-free behaviour.
+
+
Installation
============
More information about the lxml-checkins
mailing list