[Lxml-checkins] r43438 - in lxml/trunk: doc src/lxml/tests
scoder at codespeak.net
scoder at codespeak.net
Wed May 16 22:19:19 CEST 2007
Author: scoder
Date: Wed May 16 22:19:17 2007
New Revision: 43438
Added:
lxml/trunk/doc/tutorial.txt
Modified:
lxml/trunk/doc/main.txt
lxml/trunk/doc/mkhtml.py
lxml/trunk/src/lxml/tests/test_etree.py
Log:
first take on an lxml.etree tutorial
Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt (original)
+++ lxml/trunk/doc/main.txt Wed May 16 22:19:17 2007
@@ -4,8 +4,8 @@
.. contents::
..
1 Introduction
- 2 Download
- 3 Documentation
+ 2 Documentation
+ 3 Download
4 Mailing list
5 License
6 Old Versions
@@ -25,42 +25,6 @@
.. _FAQ: FAQ.html
-Download
---------
-
-The best way to download binary versions is to visit `lxml at the Python
-cheeseshop`_. It has the source, eggs and installers for various platforms.
-The source distribution is signed with `this key`_.
-
-.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/
-.. _`this key`: pubkey.asc
-
-The latest version is `lxml 1.3beta`_, released 2007-02-27 (`changes for 1.3beta`_).
-`Older versions`_ are listed below.
-
-.. _`lxml 1.3beta`: lxml-1.3beta.tgz
-.. _`CHANGES for 1.3beta`: changes-1.3beta.html
-.. _`Older versions`: #old-versions
-
-Please take a look at the `installation instructions`_!
-
-.. _`installation instructions`: installation.html
-
-It's also possible to check out the latest development version of lxml
-from svn directly, using a command like this::
-
- svn co http://codespeak.net/svn/lxml/trunk lxml
-
-You can also `browse it through the web`_. Please read `how to build lxml
-from source`_ first. The `latest CHANGES`_ of the developer version are also
-accessible. You can check there if a bug you found has been fixed or a
-feature you want has been implemented in the latest trunk version.
-
-.. _`how to build lxml from source`: build.html
-.. _`browse it through the web`: http://codespeak.net/svn/lxml
-.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
-
-
Documentation
-------------
@@ -74,6 +38,8 @@
* lxml.etree:
+ * the `lxml.etree Tutorial`_
+
* `lxml.etree specific API`_ documentation
* parsing_ and validating_ XML
@@ -95,17 +61,19 @@
* a brief comparison of `objectify and etree`_
lxml.etree follows the ElementTree_ API as much as possible, building it on
-top of the native libxml2 tree. See also the ElementTree compatibility_
-overview and the `benchmark results`_ comparing lxml to the original
-ElementTree_ and cElementTree_ implementations.
-
-Right after the ElementTree_ documentation, the most important place to look
-is the `lxml.etree specific API`_ documentation. It describes how lxml extends the
-ElementTree API to expose libxml2 and libxslt specific functionality, such as
-XPath_, `Relax NG`_, `XML Schema`_, `XSLT`_, and `c14n`_. Python code can be
-called from XPath expressions and XSLT stylesheets through the use of
-`extension functions`_. lxml also offers a `SAX compliant API`_, that works
-with the SAX support in the standard library.
+top of the native libxml2 tree. If you are new to ElementTree, start with the
+`lxml.etree Tutorial`_. See also the ElementTree compatibility_ overview and
+the `benchmark results`_ comparing lxml to the original ElementTree_ and
+cElementTree_ implementations.
+
+Right after the `lxml.etree Tutorial`_ and the ElementTree_ documentation, the
+most important place to look is the `lxml.etree specific API`_ documentation.
+It describes how lxml extends the ElementTree API to expose libxml2 and
+libxslt specific functionality, such as XPath_, `Relax NG`_, `XML Schema`_,
+`XSLT`_, and `c14n`_. Python code can be called from XPath expressions and
+XSLT stylesheets through the use of `extension functions`_. lxml also offers
+a `SAX compliant API`_, that works with the SAX support in the standard
+library.
There is a separate module `lxml.objectify`_ that implements a data-binding
API on top of lxml.etree. See the `objectify and etree`_ FAQ entry for a
@@ -120,6 +88,7 @@
.. _ElementTree: http://effbot.org/zone/element-index.htm
.. _cElementTree: http://effbot.org/zone/celementtree.htm
+.. _`lxml.etree Tutorial`: tutorial.html
.. _`benchmark results`: performance.html
.. _`compatibility`: compatibility.html
.. _`lxml.etree specific API`: api.html
@@ -140,6 +109,42 @@
.. _`c14n`: http://www.w3.org/TR/xml-c14n
+Download
+--------
+
+The best way to download binary versions is to visit `lxml at the Python
+cheeseshop`_. It has the source, eggs and installers for various platforms.
+The source distribution is signed with `this key`_.
+
+.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/
+.. _`this key`: pubkey.asc
+
+The latest version is `lxml 1.3beta`_, released 2007-02-27 (`changes for 1.3beta`_).
+`Older versions`_ are listed below.
+
+.. _`lxml 1.3beta`: lxml-1.3beta.tgz
+.. _`CHANGES for 1.3beta`: changes-1.3beta.html
+.. _`Older versions`: #old-versions
+
+Please take a look at the `installation instructions`_!
+
+.. _`installation instructions`: installation.html
+
+It's also possible to check out the latest development version of lxml
+from svn directly, using a command like this::
+
+ svn co http://codespeak.net/svn/lxml/trunk lxml
+
+You can also `browse it through the web`_. Please read `how to build lxml
+from source`_ first. The `latest CHANGES`_ of the developer version are also
+accessible. You can check there if a bug you found has been fixed or a
+feature you want has been implemented in the latest trunk version.
+
+.. _`how to build lxml from source`: build.html
+.. _`browse it through the web`: http://codespeak.net/svn/lxml
+.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
+
+
Mailing list
------------
Modified: lxml/trunk/doc/mkhtml.py
==============================================================================
--- lxml/trunk/doc/mkhtml.py (original)
+++ lxml/trunk/doc/mkhtml.py Wed May 16 22:19:17 2007
@@ -4,10 +4,11 @@
SITE_STRUCTURE = [
('lxml', ('main.txt', 'intro.txt', 'FAQ.txt', 'compatibility.txt',
'performance.txt', 'build.txt')),
- ('Developing with lxml', ('api.txt', 'parsing.txt', 'validation.txt',
- 'xpathxslt.txt', 'objectify.txt')),
- ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt',
- 'sax.txt', 'capi.txt')),
+ ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt',
+ 'validation.txt', 'xpathxslt.txt',
+ 'objectify.txt')),
+ ('Extending lxml', ('resolvers.txt', 'extensions.txt',
+ 'element_classes.txt', 'sax.txt', 'capi.txt')),
]
RST2HTML_OPTIONS = " ".join([
Added: lxml/trunk/doc/tutorial.txt
==============================================================================
--- (empty file)
+++ lxml/trunk/doc/tutorial.txt Wed May 16 22:19:17 2007
@@ -0,0 +1,336 @@
+=======================
+The lxml.etree Tutorial
+=======================
+
+This tutorial briefly overviews the main concepts of the `ElementTree API`_ as
+implemented by lxml.etree, and some simple enhancements that make your life as
+a programmer easier.
+
+.. _`ElementTree API`: http://effbot.org/zone/element-index.htm#documentation
+
+.. contents::
+..
+ 1 Elements and ElementTrees
+ 1.1 The Element class
+ 1.2 The ElementTree class
+ 2 Parsing and XML literals
+ 2.1 The XML() function
+ 2.2 The parse() function
+ 3 Namespaces
+ 4 The find*() methods
+ 4.1 findall()
+ 4.2 find()
+ 4.3 findtext()
+
+
+A common way to import ``lxml.etree`` is as follows::
+
+ >>> from lxml import etree
+
+If your code only uses the ElementTree API and does not rely on any
+functionality that is specific to ``lxml.etree``, you can also use the
+following import chain as a fall-back to the original ElementTree::
+
+ try:
+ from lxml import etree
+ print "running with lxml.etree"
+ except ImportError:
+ try:
+ # Python 2.5
+ import xml.etree.cElementTree as etree
+ print "running with cElementTree on Python 2.5+"
+ except ImportError:
+ try:
+ # Python 2.5
+ import xml.etree.ElementTree as etree
+ print "running with ElementTree on Python 2.5+"
+ except ImportError:
+ try:
+ # normal cElementTree install
+ import cElementTree as etree
+ print "running with cElementTree"
+ except ImportError:
+ try:
+ # normal ElementTree install
+ import elementtree.ElementTree as etree
+ print "running with ElementTree"
+ except ImportError:
+ print "Failed to import ElementTree from any known place"
+
+To aid in writing portable code, this tutorial makes it clear in the examples
+which part of the presented API is an extension of lxml.etree over the
+original `ElementTree API`_, as defined by Fredrik Lundh's `ElementTree
+library`_.
+
+.. _`ElementTree library`: http://effbot.org/zone/element-index.htm
+
+
+The Element class
+=================
+
+An ``Element`` is the main container object for the ElementTree API. Most of
+the XML tree functionality is accessed through this class. Elements are
+easily created through the ``Element`` factory::
+
+ >>> root = etree.Element("root")
+
+The XML tag name of elements is accessed through the ``tag`` property::
+
+ >>> print root.tag
+ root
+
+Elements are organised in an XML tree structure. To create child elements and
+add them to a parent element, you can use the ``append()`` method::
+
+ >>> root.append( etree.Element("child1") )
+
+However, a much more efficient and more common way to do this is through the
+``SubElement`` factory. It accepts the same arguments as the ``Element``
+factory, but additionally requires the parent as first argument::
+
+ >>> child2 = etree.SubElement(root, "child2")
+ >>> child3 = etree.SubElement(root, "child3")
+
+To see that this is really XML, you can serialise the tree you have created::
+
+ >>> print etree.tostring(root, pretty_print=True)
+ <root>
+ <child1/>
+ <child2/>
+ <child3/>
+ </root>
+
+
+Elements are lists
+------------------
+
+To make the access to these subelements as easy and straight forward as
+possible, elements behave exactly like normal Python lists::
+
+ >>> child = root[0]
+ >>> print child.tag
+ child1
+
+ >>> for child in root:
+ ... print child.tag
+ child1
+ child2
+ child3
+
+ >>> if root:
+ ... print "root has children!"
+ root has children!
+
+ >>> root.insert(0, etree.Element("child0"))
+ >>> start = root[:1]
+ >>> end = root[-1:]
+
+ >>> print start[0].tag
+ child0
+ >>> print end[0].tag
+ child3
+
+ >>> root[0] = root[-1]
+ >>> for child in root:
+ ... print child.tag
+ child3
+ child1
+ child2
+
+Note how the last element was moved to a different position in the last
+example. This is a difference from the original ElementTree (and from lists),
+where elements can sit in multiple positions of any number of trees. In
+lxml.etree, elements can only sit in one position of one tree at a time.
+
+To retrieve a 'real' Python list of all children (or a *shallow copy* of the
+element children list), you can call the ``getchildren()`` method::
+
+ >>> children = root.getchildren()
+
+ >>> print type(children) is type([])
+ True
+
+ >>> for child in children:
+ ... print child.tag
+ child3
+ child1
+ child2
+
+The way up in the tree is provided through the ``getparent()`` method::
+
+ >>> root is root[0].getparent() # lxml.etree only!
+ True
+
+The siblings (or neighbours) of an element are accessed as next and previous
+elements::
+
+ >>> root[0] is root[1].getprevious() # lxml.etree only!
+ True
+ >>> root[1] is root[0].getnext() # lxml.etree only!
+ True
+
+
+Elements carry attributes
+-------------------------
+
+XML elements support attributes. You can create them directly in the Element
+factory::
+
+ >>> root = etree.Element("root", interesting="totally")
+ >>> print etree.tostring(root)
+ <root interesting="totally"/>
+
+Fast and direct access to these attributes is provided by the ``set()`` and
+``get()`` methods of elements::
+
+ >>> print root.get("interesting")
+ totally
+
+ >>> root.set("interesting", "somewhat")
+ >>> print root.get("interesting")
+ somewhat
+
+However, a very convenient way of dealing with them is through the dictionary
+interface of the ``attrib`` property::
+
+ >>> attributes = root.attrib
+
+ >>> print attributes["interesting"]
+ somewhat
+
+ >>> print attributes.get("hello")
+ None
+
+ >>> attributes["hello"] = "Guten Tag"
+ >>> print attributes.get("hello")
+ Guten Tag
+ >>> print root.get("hello")
+ Guten Tag
+
+
+Elements carry text
+-------------------
+
+Elements can contain text::
+
+ >>> root = etree.Element("root")
+ >>> root.text = "TEXT"
+
+ >>> print root.text
+ TEXT
+
+ >>> print etree.tostring(root)
+ <root>TEXT</root>
+
+In many XML documents (so-called *data-centric* documents), this is the only
+place where text can be found. It is encapsulated by a leaf tag somewhere in
+the tree hierarchy.
+
+However, if XML is used for tagged text documents such as (X)HTML, text can
+also appear between different elements, right in the middle of the tree::
+
+ <html><body>Hello<br/>World</body></html>
+
+Here, the ``<br/>`` tag is surrounded by text. This is often referred to as
+*document-style* XML. Elements support this through their ``tail`` property.
+It contains the text that directly follows the element, up to the next element
+in the XML tree::
+
+ >>> html = etree.Element("html")
+ >>> body = etree.SubElement(html, "body")
+ >>> body.text = "TEXT"
+
+ >>> print etree.tostring(html)
+ <html><body>TEXT</body></html>
+
+ >>> br = etree.SubElement(body, "br")
+ >>> print etree.tostring(html)
+ <html><body>TEXT<br/></body></html>
+
+ >>> br.tail = "TAIL"
+ >>> print etree.tostring(html)
+ <html><body>TEXT<br/>TAIL</body></html>
+
+These two properties are enough to represent any text content in an XML
+document. If you want to read the text without the intermediate tags,
+however, you have to recursively concatenate all ``text`` and ``tail``
+attributes in the correct order. A simpler way to do this is XPath_::
+
+ >>> print html.xpath("string()") # lxml.etree only!
+ TEXTTAIL
+
+.. _XPath: xpathxslt.txt#xpath
+
+
+Tree iteration
+--------------
+
+For problems like the above, where you want to recursively traverse the tree
+and do something with its elements, tree iteration is a very convenient
+solution. Elements provide a tree iterator for this purpose. It yields
+elements in *document order*, i.e. in the order their tags would appear if you
+serialised the tree to XML::
+
+ >>> root = etree.Element("root")
+ >>> etree.SubElement(root, "child").text = "Child 1"
+ >>> etree.SubElement(root, "child").text = "Child 2"
+ >>> etree.SubElement(root, "another").text = "Child 3"
+
+ >>> print etree.tostring(root, pretty_print=True)
+ <root>
+ <child>Child 1</child>
+ <child>Child 2</child>
+ <another>Child 3</another>
+ </root>
+
+ >>> for element in root.getiterator():
+ ... print element.tag, '-', element.text
+ root - None
+ child - Child 1
+ child - Child 2
+ another - Child 3
+
+If you know you are only interested in a single tag, you can pass its name to
+``getiterator()`` to have it filter for you::
+
+ >>> for element in root.getiterator("child"):
+ ... print element.tag, '-', element.text
+ child - Child 1
+ child - Child 2
+
+In lxml.etree, elements provide `further iterators`_ for all directions in the
+tree: children, parents (or rather ancestors) and siblings.
+
+.. _`further iterators`: api.html#iteration
+
+
+
+
+The ElementTree class
+=====================
+
+
+Parsing files and XML literals
+==============================
+
+The XML() function
+------------------
+
+The parse() function
+--------------------
+
+Namespaces
+==========
+
+
+ElementPath
+===========
+
+findall()
+---------
+
+find()
+------
+
+findtext()
+----------
Modified: lxml/trunk/src/lxml/tests/test_etree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_etree.py (original)
+++ lxml/trunk/src/lxml/tests/test_etree.py Wed May 16 22:19:17 2007
@@ -1567,6 +1567,8 @@
suite.addTests([unittest.makeSuite(ElementIncludeTestCase)])
suite.addTests([unittest.makeSuite(ETreeC14NTestCase)])
suite.addTests(
+ [doctest.DocFileSuite('../../../doc/tutorial.txt')])
+ suite.addTests(
[doctest.DocFileSuite('../../../doc/api.txt')])
suite.addTests(
[doctest.DocFileSuite('../../../doc/parsing.txt')])
More information about the lxml-checkins
mailing list