[Lxml-checkins] r51014 - in lxml/trunk: . doc

scoder at codespeak.net scoder at codespeak.net
Fri Jan 25 10:36:10 CET 2008


Author: scoder
Date: Fri Jan 25 10:36:09 2008
New Revision: 51014

Modified:
   lxml/trunk/   (props changed)
   lxml/trunk/doc/tutorial.txt
Log:
 r3316 at delle:  sbehnel | 2008-01-25 09:54:41 +0100
 tutorial update: tostring(with_tail=False) and ElementPath


Modified: lxml/trunk/doc/tutorial.txt
==============================================================================
--- lxml/trunk/doc/tutorial.txt	(original)
+++ lxml/trunk/doc/tutorial.txt	Fri Jan 25 10:36:09 2008
@@ -17,7 +17,9 @@
      1.1  Elements are lists
      1.2  Elements carry attributes
      1.3  Elements contain text
-     1.4  Tree iteration
+     1.4  Using XPath to find text
+     1.5  Tree iteration
+     1.6  Serialisation
    2  The ElementTree class
    3  Parsing from strings and files
      3.1  The fromstring() function
@@ -29,9 +31,6 @@
    4  Namespaces
    5  The E-factory
    6  ElementPath
-     6.1  findall()
-     6.2  find()
-     6.3  findtext()
 
 
 A common way to import ``lxml.etree`` is as follows::
@@ -273,10 +272,42 @@
     >>> print etree.tostring(html)
     <html><body>TEXT<br/>TAIL</body></html>
 
-These two properties are enough to represent any text content in an XML
-document.  If you want to read the text without the intermediate tags,
-however, you have to recursively concatenate all ``text`` and ``tail``
-attributes in the correct order.  A simpler way to do this is XPath_::
+The two properties ``.text`` and ``.tail`` are enough to represent any
+text content in an XML document.  This way, the ElementTree API does
+not require any `special text nodes`_ in addition to the Element
+class, that tend to get in the way fairly often (as you might know
+from classic DOM_ APIs).
+
+However, there are cases where the tail text also gets in the way.
+For example, when you serialise an Element from within the tree, you
+do not always want its tail text in the result (although you would
+still want the tail text of its children).  For this purpose, the
+``tostring()`` function accepts the keyword argument ``with_tail``::
+
+    >>> print etree.tostring(br)
+    <br/>TAIL
+    >>> print etree.tostring(br, with_tail=False) # lxml.etree only!
+    <br/>
+
+.. _`special text nodes`: http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1312295772
+.. _DOM: http://www.w3.org/TR/DOM-Level-3-Core/core.html
+
+If you want to read *only* the text, i.e. without any intermediate
+tags, you have to recursively concatenate all ``text`` and ``tail``
+attributes in the correct order.  Again, the ``tostring()`` function
+comes to the rescue, this time using the ``method`` keyword::
+
+    >>> print etree.tostring(html, method="text")
+    TEXTTAIL
+
+
+Using XPath to find text
+------------------------
+
+.. _XPath: xpathxslt.html#xpath
+
+Another way to extract the text content of a tree is XPath_, which
+also allows you to extract the separate text chunks into a list::
 
     >>> print html.xpath("string()") # lxml.etree only!
     TEXTTAIL
@@ -315,8 +346,6 @@
     >>> print texts[1].is_tail
     True
 
-.. _XPath: xpathxslt.html#xpath
-
 
 Tree iteration
 --------------
@@ -638,7 +667,9 @@
 or whenever data comes in slowly or in chunks and you want to do other things
 while waiting for the next chunk.
 
-You can reuse the parser by calling its ``feed()`` method again::
+After calling the ``close()`` method (or when an exception was raised
+by the parser), you can reuse the parser by calling its ``feed()``
+method again::
 
     >>> parser.feed("<root/>")
     >>> root = parser.close()
@@ -814,7 +845,7 @@
 The Element creation based on attribute access makes it easy to build up a
 simple vocabulary for an XML language::
 
-    >>> from lxml.builder import ElementMaker
+    >>> from lxml.builder import ElementMaker # lxml only !
 
     >>> E = ElementMaker(namespace="http://my.de/fault/namespace",
     ...                  nsmap={'p' : "http://my.de/fault/namespace"})
@@ -858,11 +889,50 @@
 ElementPath
 ===========
 
-findall()
----------
+The ElementTree library comes with a simple XPath-like path language
+called ElementPath_.  The main difference is that you can use the
+``{namespace}tag`` notation in ElementPath expressions.  However,
+advanced features like value comparison and functions are not
+available.
+
+.. _ElementPath: http://effbot.org/zone/element-xpath.htm
+.. _`full XPath implementation`: xpathxslt.html#xpath
+
+In addition to a `full XPath implementation`_, lxml.etree supports the
+ElementPath language in the same way ElementTree does, even using
+(almost) the same implementation.  The API provides four methods here
+that you can find on Elements and ElementTrees:
+
+* ``iterfind()`` iterates over all Elements that match the path
+  expression
+
+* ``findall()`` returns a list of matching Elements
+
+* ``find()`` efficiently returns only the first match
+
+* ``findtext()`` returns the ``.text`` content of the first match
+
+Here are some examples::
+
+    >>> root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")
+
+Find a child of an Element::
+
+    >>> print root.find("b")
+    None
+    >>> print root.find("a").tag
+    a
 
-find()
-------
+Find an Element anywhere in the tree::
 
-findtext()
-----------
+    >>> print root.find(".//b").tag
+    b
+    >>> [ b.tag for b in root.iterfind(".//b") ]
+    ['b', 'b']
+
+Find Elements with a certain attribute::
+
+    >>> print root.findall(".//a[@x]")[0].tag
+    a
+    >>> print root.findall(".//a[@y]")
+    []


More information about the lxml-checkins mailing list