[Lxml-checkins] r43438 - in lxml/trunk: doc src/lxml/tests

scoder at codespeak.net scoder at codespeak.net
Wed May 16 22:19:19 CEST 2007


Author: scoder
Date: Wed May 16 22:19:17 2007
New Revision: 43438

Added:
   lxml/trunk/doc/tutorial.txt
Modified:
   lxml/trunk/doc/main.txt
   lxml/trunk/doc/mkhtml.py
   lxml/trunk/src/lxml/tests/test_etree.py
Log:
first take on an lxml.etree tutorial

Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt	(original)
+++ lxml/trunk/doc/main.txt	Wed May 16 22:19:17 2007
@@ -4,8 +4,8 @@
 .. contents::
 .. 
    1  Introduction
-   2  Download
-   3  Documentation
+   2  Documentation
+   3  Download
    4  Mailing list
    5  License
    6  Old Versions
@@ -25,42 +25,6 @@
 .. _FAQ:          FAQ.html
 
 
-Download
---------
-
-The best way to download binary versions is to visit `lxml at the Python
-cheeseshop`_.  It has the source, eggs and installers for various platforms.
-The source distribution is signed with `this key`_.
-
-.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/
-.. _`this key`: pubkey.asc
-
-The latest version is `lxml 1.3beta`_, released 2007-02-27 (`changes for 1.3beta`_).
-`Older versions`_ are listed below.
-
-.. _`lxml 1.3beta`: lxml-1.3beta.tgz
-.. _`CHANGES for 1.3beta`: changes-1.3beta.html
-.. _`Older versions`: #old-versions
-
-Please take a look at the `installation instructions`_!
-
-.. _`installation instructions`: installation.html
-
-It's also possible to check out the latest development version of lxml
-from svn directly, using a command like this::
-
-  svn co http://codespeak.net/svn/lxml/trunk lxml
-
-You can also `browse it through the web`_.  Please read `how to build lxml
-from source`_ first.  The `latest CHANGES`_ of the developer version are also
-accessible.  You can check there if a bug you found has been fixed or a
-feature you want has been implemented in the latest trunk version.
-
-.. _`how to build lxml from source`: build.html
-.. _`browse it through the web`: http://codespeak.net/svn/lxml
-.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
-
-
 Documentation
 -------------
 
@@ -74,6 +38,8 @@
 
 * lxml.etree:
 
+  * the `lxml.etree Tutorial`_
+
   * `lxml.etree specific API`_ documentation
 
   * parsing_ and validating_ XML
@@ -95,17 +61,19 @@
   * a brief comparison of `objectify and etree`_
 
 lxml.etree follows the ElementTree_ API as much as possible, building it on
-top of the native libxml2 tree.  See also the ElementTree compatibility_
-overview and the `benchmark results`_ comparing lxml to the original
-ElementTree_ and cElementTree_ implementations.
-
-Right after the ElementTree_ documentation, the most important place to look
-is the `lxml.etree specific API`_ documentation.  It describes how lxml extends the
-ElementTree API to expose libxml2 and libxslt specific functionality, such as
-XPath_, `Relax NG`_, `XML Schema`_, `XSLT`_, and `c14n`_.  Python code can be
-called from XPath expressions and XSLT stylesheets through the use of
-`extension functions`_.  lxml also offers a `SAX compliant API`_, that works
-with the SAX support in the standard library.
+top of the native libxml2 tree.  If you are new to ElementTree, start with the
+`lxml.etree Tutorial`_.  See also the ElementTree compatibility_ overview and
+the `benchmark results`_ comparing lxml to the original ElementTree_ and
+cElementTree_ implementations.
+
+Right after the `lxml.etree Tutorial`_ and the ElementTree_ documentation, the
+most important place to look is the `lxml.etree specific API`_ documentation.
+It describes how lxml extends the ElementTree API to expose libxml2 and
+libxslt specific functionality, such as XPath_, `Relax NG`_, `XML Schema`_,
+`XSLT`_, and `c14n`_.  Python code can be called from XPath expressions and
+XSLT stylesheets through the use of `extension functions`_.  lxml also offers
+a `SAX compliant API`_, that works with the SAX support in the standard
+library.
 
 There is a separate module `lxml.objectify`_ that implements a data-binding
 API on top of lxml.etree.  See the `objectify and etree`_ FAQ entry for a
@@ -120,6 +88,7 @@
 .. _ElementTree:  http://effbot.org/zone/element-index.htm
 .. _cElementTree: http://effbot.org/zone/celementtree.htm
 
+.. _`lxml.etree Tutorial`: tutorial.html
 .. _`benchmark results`: performance.html
 .. _`compatibility`: compatibility.html
 .. _`lxml.etree specific API`: api.html
@@ -140,6 +109,42 @@
 .. _`c14n`: http://www.w3.org/TR/xml-c14n
 
 
+Download
+--------
+
+The best way to download binary versions is to visit `lxml at the Python
+cheeseshop`_.  It has the source, eggs and installers for various platforms.
+The source distribution is signed with `this key`_.
+
+.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/
+.. _`this key`: pubkey.asc
+
+The latest version is `lxml 1.3beta`_, released 2007-02-27 (`changes for 1.3beta`_).
+`Older versions`_ are listed below.
+
+.. _`lxml 1.3beta`: lxml-1.3beta.tgz
+.. _`CHANGES for 1.3beta`: changes-1.3beta.html
+.. _`Older versions`: #old-versions
+
+Please take a look at the `installation instructions`_!
+
+.. _`installation instructions`: installation.html
+
+It's also possible to check out the latest development version of lxml
+from svn directly, using a command like this::
+
+  svn co http://codespeak.net/svn/lxml/trunk lxml
+
+You can also `browse it through the web`_.  Please read `how to build lxml
+from source`_ first.  The `latest CHANGES`_ of the developer version are also
+accessible.  You can check there if a bug you found has been fixed or a
+feature you want has been implemented in the latest trunk version.
+
+.. _`how to build lxml from source`: build.html
+.. _`browse it through the web`: http://codespeak.net/svn/lxml
+.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
+
+
 Mailing list
 ------------
 

Modified: lxml/trunk/doc/mkhtml.py
==============================================================================
--- lxml/trunk/doc/mkhtml.py	(original)
+++ lxml/trunk/doc/mkhtml.py	Wed May 16 22:19:17 2007
@@ -4,10 +4,11 @@
 SITE_STRUCTURE = [
     ('lxml', ('main.txt', 'intro.txt', 'FAQ.txt', 'compatibility.txt',
               'performance.txt', 'build.txt')),
-    ('Developing with lxml', ('api.txt', 'parsing.txt', 'validation.txt',
-                              'xpathxslt.txt', 'objectify.txt')),
-    ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt',
-                        'sax.txt', 'capi.txt')),
+    ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt',
+                              'validation.txt', 'xpathxslt.txt',
+                              'objectify.txt')),
+    ('Extending lxml', ('resolvers.txt', 'extensions.txt',
+                        'element_classes.txt', 'sax.txt', 'capi.txt')),
     ]
 
 RST2HTML_OPTIONS = " ".join([

Added: lxml/trunk/doc/tutorial.txt
==============================================================================
--- (empty file)
+++ lxml/trunk/doc/tutorial.txt	Wed May 16 22:19:17 2007
@@ -0,0 +1,336 @@
+=======================
+The lxml.etree Tutorial
+=======================
+
+This tutorial briefly overviews the main concepts of the `ElementTree API`_ as
+implemented by lxml.etree, and some simple enhancements that make your life as
+a programmer easier.
+
+.. _`ElementTree API`: http://effbot.org/zone/element-index.htm#documentation
+
+.. contents::
+.. 
+   1  Elements and ElementTrees
+     1.1  The Element class
+     1.2  The ElementTree class
+   2  Parsing and XML literals
+     2.1  The XML() function
+     2.2  The parse() function
+   3  Namespaces
+   4  The find*() methods
+     4.1  findall()
+     4.2  find()
+     4.3  findtext()
+
+
+A common way to import ``lxml.etree`` is as follows::
+
+    >>> from lxml import etree
+
+If your code only uses the ElementTree API and does not rely on any
+functionality that is specific to ``lxml.etree``, you can also use the
+following import chain as a fall-back to the original ElementTree::
+
+    try:
+      from lxml import etree
+      print "running with lxml.etree"
+    except ImportError:
+      try:
+        # Python 2.5
+        import xml.etree.cElementTree as etree
+        print "running with cElementTree on Python 2.5+"
+      except ImportError:
+        try:
+          # Python 2.5
+          import xml.etree.ElementTree as etree
+          print "running with ElementTree on Python 2.5+"
+        except ImportError:
+          try:
+            # normal cElementTree install
+            import cElementTree as etree
+            print "running with cElementTree"
+          except ImportError:
+            try:
+              # normal ElementTree install
+              import elementtree.ElementTree as etree
+              print "running with ElementTree"
+            except ImportError:
+              print "Failed to import ElementTree from any known place"
+
+To aid in writing portable code, this tutorial makes it clear in the examples
+which part of the presented API is an extension of lxml.etree over the
+original `ElementTree API`_, as defined by Fredrik Lundh's `ElementTree
+library`_.
+
+.. _`ElementTree library`: http://effbot.org/zone/element-index.htm
+
+
+The Element class
+=================
+
+An ``Element`` is the main container object for the ElementTree API.  Most of
+the XML tree functionality is accessed through this class.  Elements are
+easily created through the ``Element`` factory::
+
+    >>> root = etree.Element("root")
+
+The XML tag name of elements is accessed through the ``tag`` property::
+
+    >>> print root.tag
+    root
+
+Elements are organised in an XML tree structure.  To create child elements and
+add them to a parent element, you can use the ``append()`` method::
+
+    >>> root.append( etree.Element("child1") )
+
+However, a much more efficient and more common way to do this is through the
+``SubElement`` factory.  It accepts the same arguments as the ``Element``
+factory, but additionally requires the parent as first argument::
+
+    >>> child2 = etree.SubElement(root, "child2")
+    >>> child3 = etree.SubElement(root, "child3")
+
+To see that this is really XML, you can serialise the tree you have created::
+
+    >>> print etree.tostring(root, pretty_print=True)
+    <root>
+      <child1/>
+      <child2/>
+      <child3/>
+    </root>
+
+
+Elements are lists
+------------------
+
+To make the access to these subelements as easy and straight forward as
+possible, elements behave exactly like normal Python lists::
+
+    >>> child = root[0]
+    >>> print child.tag
+    child1
+
+    >>> for child in root:
+    ...     print child.tag
+    child1
+    child2
+    child3
+
+    >>> if root:
+    ...     print "root has children!"
+    root has children!
+
+    >>> root.insert(0, etree.Element("child0"))
+    >>> start = root[:1]
+    >>> end   = root[-1:]
+
+    >>> print start[0].tag
+    child0
+    >>> print end[0].tag
+    child3
+
+    >>> root[0] = root[-1]
+    >>> for child in root:
+    ...     print child.tag
+    child3
+    child1
+    child2
+
+Note how the last element was moved to a different position in the last
+example.  This is a difference from the original ElementTree (and from lists),
+where elements can sit in multiple positions of any number of trees.  In
+lxml.etree, elements can only sit in one position of one tree at a time.
+
+To retrieve a 'real' Python list of all children (or a *shallow copy* of the
+element children list), you can call the ``getchildren()`` method::
+
+    >>> children = root.getchildren()
+
+    >>> print type(children) is type([])
+    True
+
+    >>> for child in children:
+    ...     print child.tag
+    child3
+    child1
+    child2
+
+The way up in the tree is provided through the ``getparent()`` method::
+
+    >>> root is root[0].getparent()  # lxml.etree only!
+    True
+
+The siblings (or neighbours) of an element are accessed as next and previous
+elements::
+
+    >>> root[0] is root[1].getprevious() # lxml.etree only!
+    True
+    >>> root[1] is root[0].getnext() # lxml.etree only!
+    True
+
+
+Elements carry attributes
+-------------------------
+
+XML elements support attributes.  You can create them directly in the Element
+factory::
+
+    >>> root = etree.Element("root", interesting="totally")
+    >>> print etree.tostring(root)
+    <root interesting="totally"/>
+
+Fast and direct access to these attributes is provided by the ``set()`` and
+``get()`` methods of elements::
+
+    >>> print root.get("interesting")
+    totally
+
+    >>> root.set("interesting", "somewhat")
+    >>> print root.get("interesting")
+    somewhat
+
+However, a very convenient way of dealing with them is through the dictionary
+interface of the ``attrib`` property::
+
+    >>> attributes = root.attrib
+
+    >>> print attributes["interesting"]
+    somewhat
+
+    >>> print attributes.get("hello")
+    None
+
+    >>> attributes["hello"] = "Guten Tag"
+    >>> print attributes.get("hello")
+    Guten Tag
+    >>> print root.get("hello")
+    Guten Tag
+
+
+Elements carry text
+-------------------
+
+Elements can contain text::
+
+    >>> root = etree.Element("root")
+    >>> root.text = "TEXT"
+
+    >>> print root.text
+    TEXT
+
+    >>> print etree.tostring(root)
+    <root>TEXT</root>
+
+In many XML documents (so-called *data-centric* documents), this is the only
+place where text can be found.  It is encapsulated by a leaf tag somewhere in
+the tree hierarchy.
+
+However, if XML is used for tagged text documents such as (X)HTML, text can
+also appear between different elements, right in the middle of the tree::
+
+    <html><body>Hello<br/>World</body></html>
+
+Here, the ``<br/>`` tag is surrounded by text.  This is often referred to as
+*document-style* XML.  Elements support this through their ``tail`` property.
+It contains the text that directly follows the element, up to the next element
+in the XML tree::
+
+    >>> html = etree.Element("html")
+    >>> body = etree.SubElement(html, "body")
+    >>> body.text = "TEXT"
+
+    >>> print etree.tostring(html)
+    <html><body>TEXT</body></html>
+
+    >>> br = etree.SubElement(body, "br")
+    >>> print etree.tostring(html)
+    <html><body>TEXT<br/></body></html>
+
+    >>> br.tail = "TAIL"
+    >>> print etree.tostring(html)
+    <html><body>TEXT<br/>TAIL</body></html>
+
+These two properties are enough to represent any text content in an XML
+document.  If you want to read the text without the intermediate tags,
+however, you have to recursively concatenate all ``text`` and ``tail``
+attributes in the correct order.  A simpler way to do this is XPath_::
+
+    >>> print html.xpath("string()") # lxml.etree only!
+    TEXTTAIL
+
+.. _XPath: xpathxslt.txt#xpath
+
+
+Tree iteration
+--------------
+
+For problems like the above, where you want to recursively traverse the tree
+and do something with its elements, tree iteration is a very convenient
+solution.  Elements provide a tree iterator for this purpose.  It yields
+elements in *document order*, i.e. in the order their tags would appear if you
+serialised the tree to XML::
+
+    >>> root = etree.Element("root")
+    >>> etree.SubElement(root, "child").text = "Child 1"
+    >>> etree.SubElement(root, "child").text = "Child 2"
+    >>> etree.SubElement(root, "another").text = "Child 3"
+
+    >>> print etree.tostring(root, pretty_print=True)
+    <root>
+      <child>Child 1</child>
+      <child>Child 2</child>
+      <another>Child 3</another>
+    </root>
+
+    >>> for element in root.getiterator():
+    ...     print element.tag, '-', element.text
+    root - None
+    child - Child 1
+    child - Child 2
+    another - Child 3
+
+If you know you are only interested in a single tag, you can pass its name to
+``getiterator()`` to have it filter for you::
+
+    >>> for element in root.getiterator("child"):
+    ...     print element.tag, '-', element.text
+    child - Child 1
+    child - Child 2
+
+In lxml.etree, elements provide `further iterators`_ for all directions in the
+tree: children, parents (or rather ancestors) and siblings.
+
+.. _`further iterators`: api.html#iteration
+
+
+
+
+The ElementTree class
+=====================
+
+
+Parsing files and XML literals
+==============================
+
+The XML() function
+------------------
+
+The parse() function
+--------------------
+
+Namespaces
+==========
+
+
+ElementPath
+===========
+
+findall()
+---------
+
+find()
+------
+
+findtext()
+----------

Modified: lxml/trunk/src/lxml/tests/test_etree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_etree.py	(original)
+++ lxml/trunk/src/lxml/tests/test_etree.py	Wed May 16 22:19:17 2007
@@ -1567,6 +1567,8 @@
     suite.addTests([unittest.makeSuite(ElementIncludeTestCase)])
     suite.addTests([unittest.makeSuite(ETreeC14NTestCase)])
     suite.addTests(
+        [doctest.DocFileSuite('../../../doc/tutorial.txt')])
+    suite.addTests(
         [doctest.DocFileSuite('../../../doc/api.txt')])
     suite.addTests(
         [doctest.DocFileSuite('../../../doc/parsing.txt')])


More information about the lxml-checkins mailing list