[Lxml-checkins] r45645 - lxml/branch/lxml-1.3/doc

scoder at codespeak.net scoder at codespeak.net
Tue Aug 14 11:04:05 CEST 2007


Author: scoder
Date: Tue Aug 14 11:04:05 2007
New Revision: 45645

Modified:
   lxml/branch/lxml-1.3/doc/FAQ.txt
Log:
FAQ entry on trailing .tail's on serialisation

Modified: lxml/branch/lxml-1.3/doc/FAQ.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/FAQ.txt	(original)
+++ lxml/branch/lxml-1.3/doc/FAQ.txt	Tue Aug 14 11:04:05 2007
@@ -142,6 +142,30 @@
 .. _threading:        #threading
 
 
+What about that trailing text on serialised Elements?
+-----------------------------------------------------
+
+The ElementTree tree model defines an Element as a container with a tag name,
+contained text, child Elements and a tail text.  This means that whenever you
+serialise an Element, you will get all parts of that Element::
+
+    >>> from lxml import etree
+    >>> root = etree.XML("<root><tag>text<child/></tag>tail</root>")
+    >>> print etree.tostring(root[0])
+    <tag>text<child/></tag>tail
+
+This is a huge simplification for the tree model as it avoids text nodes to
+appear in the list of children and makes access to them quick and simple.  So
+this is a benefit in most applications and simplifies many, many XML tree
+algorithms.
+
+However, in document-like XML (and especially HTML), the above result can be
+unexpected to new users and can sometimes require a bit more overhead.  A good
+way to deal with this is to use helper functions that copy the Element without
+its tail.  The ``lxml.html`` package also deals with this in a couple of
+places, as most HTML algorithms benefit from a tail-free behaviour.
+
+
 Installation
 ============
 


More information about the lxml-checkins mailing list