[Lxml-checkins] r39292 - in lxml/trunk: . doc

scoder at codespeak.net scoder at codespeak.net
Wed Feb 21 16:43:47 CET 2007


Author: scoder
Date: Wed Feb 21 16:43:46 2007
New Revision: 39292

Added:
   lxml/trunk/doc/parsing.txt
      - copied, changed from r39233, lxml/trunk/doc/api.txt
   lxml/trunk/doc/validation.txt
      - copied, changed from r39233, lxml/trunk/doc/api.txt
   lxml/trunk/doc/xpathxslt.txt
      - copied, changed from r39233, lxml/trunk/doc/api.txt
Modified:
   lxml/trunk/CHANGES.txt
   lxml/trunk/doc/api.txt
   lxml/trunk/doc/main.txt
   lxml/trunk/doc/mkhtml.py
Log:
first take on a major split-up of api.txt

Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt	(original)
+++ lxml/trunk/CHANGES.txt	Wed Feb 21 16:43:46 2007
@@ -17,6 +17,11 @@
 
 * The pattern for attribute names in ObjectPath was too restrictive
 
+Other changes
+-------------
+
+* major restructuring in the documentation
+
 
 1.2 (2007-02-20)
 ================

Modified: lxml/trunk/doc/api.txt
==============================================================================
--- lxml/trunk/doc/api.txt	(original)
+++ lxml/trunk/doc/api.txt	Wed Feb 21 16:43:46 2007
@@ -4,23 +4,35 @@
 
 lxml tries to follow established APIs wherever possible.  Sometimes, however,
 the need to expose a feature in an easy way led to the invention of a new API.
+This page describes the major differences and a few additions to the main
+ElementTree API.
+
+Separate pages describe the support for `parsing XML`_, executing `XPath and
+XSLT`_, `validating XML`_ and interfacing with other XML tools through the
+`SAX-API`_.
+
+lxml is extremely extensible through `XPath functions in Python`_, custom
+`Python element classes`_, custom `URL resolvers`_ and even `at the C-level`_.
+
+.. _`parsing XML`: parsing.html
+.. _`XPath and XSLT`: xpathxslt.html
+.. _`validating XML`: validation.html
+.. _`SAX-API`: sax.html
+.. _`XPath functions in Python`: extensions.html
+.. _`Python element classes`: element_classes.html
+.. _`at the C-level`: capi.html
+.. _`URL resolvers`: resolvers.txt
+
 
 .. contents::
 .. 
-   1   lxml.etree
-   2   Other Element APIs
-   3   Trees and Documents
-   4   Iteration
-   5   Parsers
-   6   iterparse and iterwalk
-   7   Error handling on exceptions
-   8   Python unicode strings
-   9   XPath
-   10  XSLT
-   11  RelaxNG
-   12  XMLSchema
-   13  xinclude
-   14  write_c14n on ElementTree
+   1  lxml.etree
+   2  Other Element APIs
+   3  Trees and Documents
+   4  Iteration
+   5  Error handling on exceptions
+   6  xinclude
+   7  write_c14n on ElementTree
 
 
 lxml.etree
@@ -167,208 +179,9 @@
   ['d']
 
 See also the section on the utility functions ``iterparse()`` and
-``iterwalk()`` below.
-
-
-Parsers
--------
+``iterwalk()`` in the `parser documentation`_.
 
-One of the differences is the parser.  There is support for both XML and
-(broken) HTML.  Both are based on libxml2 and therefore only support options
-that are backed by the library.  Parsers take a number of keyword arguments.
-The following is an example for namespace cleanup during parsing, first with
-the default parser, then with a parametrized one::
-
-  >>> xml = '<a xmlns="test"><b xmlns="test"/></a>'
-
-  >>> et     = etree.parse(StringIO(xml))
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b xmlns="test"/></a>
-
-  >>> parser = etree.XMLParser(ns_clean=True)
-  >>> et     = etree.parse(StringIO(xml), parser)
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b/></a>
-
-HTML parsing is similarly simple.  The parsers have a ``recover`` keyword
-argument that the HTMLParser sets by default.  It lets libxml2 try its best to
-return something usable without raising an exception.  You should use libxml2
-version 2.6.21 or newer to take advantage of this feature::
-
-  >>> broken_html = "<html><head><title>test<body><h1>page title</h3>"
-
-  >>> parser = etree.HTMLParser()
-  >>> et     = etree.parse(StringIO(broken_html), parser)
-
-  >>> print etree.tostring(et.getroot())
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-Lxml has an HTML function, similar to the XML shortcut known from
-ElementTree::
-
-  >>> html = etree.HTML(broken_html)
-  >>> print etree.tostring(html)
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-The support for parsing broken HTML depends entirely on libxml2's recovery
-algorithm.  It is *not* the fault of lxml if you find documents that are so
-heavily broken that the parser cannot handle them.  There is also no guarantee
-that the resulting tree will contain all data from the original document.  The
-parser may have to drop seriously broken parts when struggling to keep
-parsing.  Especially misplaced meta tags can suffer from this, which may lead
-to encoding problems.
-
-The use of the libxml2 parsers makes some additional information available at
-the API level.  Currently, ElementTree objects can access the DOCTYPE
-information provided by a parsed document, as well as the XML version and the
-original encoding::
-
-  >>> pub_id  = "-//W3C//DTD XHTML 1.0 Transitional//EN"
-  >>> sys_url = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
-  >>> doctype_string = '<!DOCTYPE html PUBLIC "%s" "%s">' % (pub_id, sys_url)
-  >>> xml_header = '<?xml version="1.0" encoding="ascii"?>'
-  >>> xhtml = xml_header + doctype_string + '<html><body></body></html>'
-
-  >>> tree = etree.parse(StringIO(xhtml))
-  >>> docinfo = tree.docinfo
-  >>> print docinfo.public_id
-  -//W3C//DTD XHTML 1.0 Transitional//EN
-  >>> print docinfo.system_url
-  http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
-  >>> docinfo.doctype == doctype_string
-  True
-
-  >>> print docinfo.xml_version
-  1.0
-  >>> print docinfo.encoding
-  ascii
-
-
-iterparse and iterwalk
-----------------------
-
-As known from ElementTree, the ``iterparse()`` utility function returns an
-iterator that generates parser events for an XML file (or file-like object),
-while building the tree.  The values are tuples ``(event-type, object)``.  The
-event types are 'start', 'end', 'start-ns' and 'end-ns'.
-
-The 'start' and 'end' events represent opening and closing elements and are
-accompanied by the respective element.  By default, only 'end' events are
-generated::
-
-  >>> xml = '''\
-  ... <root>
-  ...   <element key='value'>text</element>
-  ...   <element>text</element>tail
-  ...   <empty-element xmlns="testns" />
-  ... </root>
-  ... '''
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-  end {testns}empty-element
-  end root
-
-The resulting tree is available through the ``root`` property of the iterator::
-
-  >>> context.root.tag
-  'root'
-
-The other types can be activated with the ``events`` keyword argument::
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start root
-  start element
-  end element
-  start element
-  end element
-  start {testns}empty-element
-  end {testns}empty-element
-  end root
-
-You can modify the element and its descendants when handling the 'end' event.
-To save memory, for example, you can remove subtrees that are no longer
-needed::
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print len(elem),
-  ...     elem.clear()
-  0 0 0 3
-  >>> context.root.getchildren()
-  []
-
-**WARNING**: During the 'start' event, the descendants and following siblings
-are not yet available and should not be accessed.  During the 'end' event, the
-element and its descendants can be freely modified, but its following siblings
-should not be accessed.  During either of the two events, you **must not**
-modify or move the ancestors (parents) of the current element.  You should
-also avoid moving or discarding the element itself.  The golden rule is: do
-not touch anything that will have to be touched again by the parser later on.
-
-If you have elements with a long list of children in your XML file and want to
-save more memory during parsing, you can clean up the preceding siblings of
-the current element::
-
-  >>> for event, element in etree.iterparse(StringIO(xml)):
-  ...     # ... do something with the element
-  ...     element.clear()                # clean up children
-  ...     if element.getprevious():      # clean up preceding siblings
-  ...         del element.getparent()[0]
-
-You can use ``while`` instead of ``if`` if you skipped siblings using the
-``tag`` keyword argument.  The more selective your tag is, however, the more
-thought you will have to put into finding the right way to clean up the
-elements that were skipped.  Therefore, it is sometimes easier to traverse all
-elements and do the tag selection by hand in the event handler code.
-
-The 'start-ns' and 'end-ns' events notify about namespace declarations and
-generate tuples ``(prefix, URI)``::
-
-  >>> events = ("start-ns", "end-ns")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, obj in context:
-  ...     print action, obj
-  start-ns ('', 'testns')
-  end-ns None
-
-It is common practice to use a list as namespace stack and pop the last entry
-on the 'end-ns' event.
-
-lxml.etree supports two extensions compared to ElementTree.  It accepts a
-``tag`` keyword argument just like ``element.getiterator(tag)``.  This
-restricts events to a specific tag or namespace.
-
-  >>> context = etree.iterparse(StringIO(xml), tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events, tag="{testns}*")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start {testns}empty-element
-  end {testns}empty-element
-
-The second extension is the ``iterwalk()`` function.  It behaves exactly like
-``iterparse()``, but works on Elements and ElementTrees::
-
-  >>> root = context.root
-  >>> context = etree.iterwalk(root, events=events, tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start element
-  end element
-  start element
-  end element
+.. _`parser documentation`: parsing.html#iterparse-and-iterwalk
 
 
 Error handling on exceptions
@@ -415,467 +228,6 @@
 etc. which are described in their respective sections below.
 
 
-Python unicode strings
-----------------------
-
-lxml.etree has broader support for Python unicode strings than the ElementTree
-library.  First of all, where ElementTree would raise an exception, the
-parsers in lxml.etree can handle unicode strings straight away.  This is most
-helpful for XML snippets embedded in source code using the ``XML()``
-function::
-
-  >>> uxml = u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> uxml
-  u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> root = etree.XML(uxml)
-
-This requires, however, that unicode strings do not specify a conflicting
-encoding themselves and thus lie about their real encoding::
-
-  >>> etree.XML(u'<?xml version="1.0" encoding="ASCII"?>\n' + uxml)
-  Traceback (most recent call last):
-    ...
-  ValueError: Unicode strings with encoding declaration are not supported.
-
-Similarly, you will get errors when you try the same with HTML data in a
-unicode string that specifies a charset in a meta tag of the header.  You
-should generally avoid converting XML/HTML data to unicode before passing it
-into the parsers.  It is both slower and error prone.
-
-To serialize the result, you would normally use the ``tostring`` module
-function, which serializes to plain ASCII by default or a number of other
-encodings if asked for::
-
-  >>> etree.tostring(root)
-  '<test> &#63697; + &#63698; </test>'
-
-  >>> etree.tostring(root, 'UTF-8', xml_declaration=False)
-  '<test> \xef\xa3\x91 + \xef\xa3\x92 </test>'
-
-As an extension, lxml.etree has a new ``tounicode()`` function that you can
-call on XML tree objects to retrieve a Python unicode representation::
-
-  >>> etree.tounicode(root)
-  u'<test> \uf8d1 + \uf8d2 </test>'
-
-  >>> el = etree.Element("test")
-  >>> etree.tounicode(el)
-  u'<test/>'
-
-  >>> subel = etree.SubElement(el, "subtest")
-  >>> etree.tounicode(el)
-  u'<test><subtest/></test>'
-
-  >>> et = etree.ElementTree(el)
-  >>> etree.tounicode(et)
-  u'<test><subtest/></test>'
-
-The result of ``tounicode()`` can be treated like any other Python unicode
-string and then passed back into the parsers.  However, if you want to save
-the result to a file or pass it over the network, you should use ``write()``
-or ``tostring()`` with an encoding argument (typically UTF-8) to serialize the
-XML.  The main reason is that unicode strings returned by ``tounicode()``
-never have an XML declaration and therefore do not specify their encoding.
-These strings are most likely not parsable by other XML libraries.
-
-In contrast, the ``tostring()`` function automatically adds a declaration as
-needed that reflects the encoding of the returned string.  This makes it
-possible for other parsers to correctly parse the XML byte stream.  Note that
-using ``tostring()`` with UTF-8 is also considerably faster in most cases.
-
-
-XPath
------
-
-lxml.etree supports the simple path syntax of the ``findall()`` etc.  methods
-on ElementTree and Element, as known from the original ElementTree library.
-As an extension, these classes also provide an ``xpath()`` method that
-supports expressions in the complete XPath syntax.
-
-There are also specialized XPath evaluator classes that are more efficient for
-frequent evaluation: ``XPath`` and ``XPathEvaluator``.  See the `performance
-comparison`_ to learn when to use which.  Their semantics when used on
-Elements and ElementTrees are the same as for the ``xpath()`` method described
-here.
-
-.. _`performance comparison`: performance.html#xpath
-
-For ElementTree, the xpath method performs a global XPath query against the
-document (if absolute) or against the root node (if relative)::
-
-  >>> f = StringIO('<foo><bar></bar></foo>')
-  >>> tree = etree.parse(f)
-
-  >>> r = tree.xpath('/foo/bar')
-  >>> len(r)
-  1
-  >>> r[0].tag
-  'bar'
-
-  >>> r = tree.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-When ``xpath()`` is used on an element, the XPath expression is evaluated
-against the element (if relative) or against the root tree (if absolute)::
-
-  >>> root = tree.getroot()
-  >>> r = root.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> bar = root[0]
-  >>> r = bar.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> tree = bar.getroottree()
-  >>> r = tree.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-Optionally, you can provide a ``namespaces`` keyword argument, which should be
-a dictionary mapping the namespace prefixes used in the XPath expression to
-namespace URIs::
-
-  >>> f = StringIO('''\
-  ... <a:foo xmlns:a="http://codespeak.net/ns/test1" 
-  ...       xmlns:b="http://codespeak.net/ns/test2">
-  ...    <b:bar>Text</b:bar>
-  ... </a:foo>
-  ... ''')
-  >>> doc = etree.parse(f)
-  >>> r = doc.xpath('/t:foo/b:bar', {'t': 'http://codespeak.net/ns/test1', 
-  ...                                'b': 'http://codespeak.net/ns/test2'})
-  >>> len(r)
-  1
-  >>> r[0].tag
-  '{http://codespeak.net/ns/test2}bar'
-  >>> r[0].text
-  'Text'
-
-There is also an optional ``extensions`` argument which is used to define
-`extension functions`_ in Python that are local to this evaluation.
-
-.. _`extension functions`: extensions.html
-
-The return values of XPath evaluations vary, depending on the XPath expression
-used:
-
-* True or False, when the XPath expression has a boolean result
-
-* a float, when the XPath expression has a numeric result (integer or float)
-
-* a (unicode) string, when the XPath expression has a string result.
-
-* a list of items, when the XPath expression has a list as result.  The items
-  may include elements, strings and tuples.  Text nodes and attributes in the
-  result are returned as strings (the text node content or attribute value).
-  Comments are also returned as strings, enclosed by the usual ``<!--`` and
-  ``-->`` markers.  Namespace declarations are returned as tuples of strings:
-  ``(prefix, URI)``.
-
-A related convenience method of ElementTree objects is ``getpath(element)``,
-which returns a structural, absolute XPath expression to find that element::
-
-  >>> a  = etree.Element("a")
-  >>> b  = etree.SubElement(a, "b")
-  >>> c  = etree.SubElement(a, "c")
-  >>> d1 = etree.SubElement(c, "d")
-  >>> d2 = etree.SubElement(c, "d")
-
-  >>> tree = etree.ElementTree(c)
-  >>> print tree.getpath(d2)
-  /c/d[2]
-  >>> tree.xpath(tree.getpath(d2)) == [d2]
-  True
-
-
-XSLT
-----
-
-lxml.etree introduces a new class, lxml.etree.XSLT. The class can be
-given an ElementTree object to construct an XSLT transformer::
-
-  >>> f = StringIO('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> xslt_doc = etree.parse(f)
-  >>> transform = etree.XSLT(xslt_doc)
-
-You can then run the transformation on an ElementTree document by simply
-calling it, and this results in another ElementTree object::
-
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-  >>> result = transform(doc)
-
-The result object can be accessed like a normal ElementTree document::
-
-  >>> result.getroot().text
-  'Text'
-
-but, as opposed to normal ElementTree objects, can also be turned into an (XML
-or text) string by applying the str() function::
-
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-The result is always a plain string, encoded as requested by the
-``xsl:output`` element in the stylesheet.  If you want a Python unicode string
-instead, you should set this encoding to ``UTF-8`` (unless the `ASCII` default
-is sufficient).  This allows you to call the builtin ``unicode()`` function on
-the result::
-
-  >>> unicode(result)
-  u'<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-You can use other encodings at the cost of multiple recoding.  Encodings that
-are not supported by Python will result in an error::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:output encoding="UCS4"/>
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-
-  >>> result = transform(doc)
-  >>> unicode(result)
-  Traceback (most recent call last):
-    [...]
-  LookupError: unknown encoding: UCS4
-
-It is possible to pass parameters, in the form of XPath expressions, to the
-XSLT template::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="$a" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-
-The parameters are passed as keyword parameters to the transform call. First
-let's try passing in a simple string expression::
-
-  >>> result = transform(doc, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-Let's try a non-string XPath expression now::
-
-  >>> result = transform(doc, a="/a/b/text()")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-There's also a convenience method on the tree object for doing XSL
-transformations.  This is less efficient if you want to apply the same XSL
-transformation to multiple documents, but is shorter to write for one-shot
-operations, as you do not have to instantiate a stylesheet yourself::
-
-  >>> result = doc.xslt(xslt_tree, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-By default, XSLT supports all extension functions from libxslt and libexslt as
-well as Python regular expressions through EXSLT.  Note that some extensions
-enable style sheets to read and write files on the local file system.  See the
-`document loader documentation`_ on how to deal with this.
-
-.. _`document loader documentation`: resolvers.html
-
-If you want to know how your stylesheet performed, pass the ``profile_run``
-keyword to the transform::
-
-  >>> result = transform(doc, a="/a/b/text()", profile_run=True)
-  >>> profile = result.xslt_profile
-
-The value of the ``xslt_profile`` property is an ElementTree with profiling
-data about each template, similar to the following::
-
-  <profile>
-    <template rank="1" match="/" name="" mode="" calls="1" time="1" average="1"/>
-  </profile>
-
-Note that this is a read-only document.  You must not move any of its elements
-to other documents.  Please deep-copy the document if you need to modify it.
-If you want to free it from memory, just do::
-
-  >>> del result.xslt_profile
-
-
-RelaxNG
--------
-
-lxml.etree introduces a new class, lxml.etree.RelaxNG. The class can
-be given an ElementTree object to construct a Relax NG validator::
-
-  >>> f = StringIO('''\
-  ... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
-  ...  <zeroOrMore>
-  ...     <element name="b">
-  ...       <text />
-  ...     </element>
-  ...  </zeroOrMore>
-  ... </element>
-  ... ''')
-  >>> relaxng_doc = etree.parse(f)
-  >>> relaxng = etree.RelaxNG(relaxng_doc)
-
-You can then validate some ElementTree document against the schema. You'll get
-back True if the document is valid against the Relax NG schema, and False if
-not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> relaxng.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> relaxng.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not relaxng(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> relaxng.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> relaxng.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Starting with version 0.9, lxml now has a simple API to report the errors
-generated by libxml2. If you want to find out why the validation failed in the
-second case, you can look up the error log of the validation process and check
-it for relevant messages::
-
-  >>> log = relaxng.error_log
-  >>> print log.last_error
-  <string>:1:ERROR:RELAXNGV:ERR_LT_IN_ATTRIBUTE: Did not expect element c there
-
-You can see that the error (ERROR) happened during RelaxNG validation
-(RELAXNGV).  The message then tells you what went wrong.  Note that this error
-is local to the RelaxNG object.  It will only contain log entries that
-appeares during the validation.  The DocumentInvalid exception raised by the
-``assertValid`` method above provides access to the global error log (like all
-other lxml exceptions).
-
-Similar to XSLT, there's also a less efficient but easier shortcut method to
-do one-shot RelaxNG validation::
-
-  >>> doc.relaxng(relaxng_doc)
-  1
-  >>> doc2.relaxng(relaxng_doc)
-  0
-
-
-XMLSchema
----------
-
-lxml.etree also has a XML Schema (XSD) support, using the class
-lxml.etree.XMLSchema. This support is very similar to the Relax NG
-support. The class can be given an ElementTree object to construct a
-XMLSchema validator::
-
-  >>> f = StringIO('''\
-  ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
-  ... <xsd:element name="a" type="AType"/>
-  ... <xsd:complexType name="AType">
-  ...   <xsd:sequence>
-  ...     <xsd:element name="b" type="xsd:string" />
-  ...   </xsd:sequence>
-  ... </xsd:complexType>
-  ... </xsd:schema>
-  ... ''')
-  >>> xmlschema_doc = etree.parse(f)
-  >>> xmlschema = etree.XMLSchema(xmlschema_doc)
-
-You can then validate some ElementTree document with this. Like with
-RelaxNG, you'll get back true if the document is valid against the XML
-schema, and false if not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> xmlschema.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> xmlschema.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not xmlschema(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> xmlschema.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> xmlschema.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Error reporting works like for the RelaxNG class::
-
-  >>> log = xmlschema.error_log
-  >>> error = log.last_error
-  >>> print error.domain_name
-  SCHEMASV
-  >>> print error.type_name
-  SCHEMAV_ELEMENT_CONTENT
-
-If you were to print this log entry, you would get something like the
-following.  Note that the error message depends on the libxml2 version in
-use::
-
-  <string>:1:ERROR::SCHEMAV_ELEMENT_CONTENT: Element 'c': This element is not expected. Expected is ( b ).
-
-Similar to XSLT and RelaxNG, there's also a less efficient but easier shortcut
-method to do XML Schema validation::
-
-  >>> doc.xmlschema(xmlschema_doc)
-  1
-  >>> doc2.xmlschema(xmlschema_doc)
-  0
-
-
 xinclude
 --------
 

Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt	(original)
+++ lxml/trunk/doc/main.txt	Wed Feb 21 16:43:46 2007
@@ -66,6 +66,8 @@
 
   * `lxml.etree specific API`_ documentation
 
+  * `XML validation`_ with RelaxNG and XML Schema
+
   * Python `extension functions`_ for XPath and XSLT
 
   * `custom element classes`_ for custom XML APIs
@@ -109,6 +111,7 @@
 .. _`benchmark results`: performance.html
 .. _`compatibility`: compatibility.html
 .. _`lxml.etree specific API`: api.html
+.. _`XML validation`: validation.html
 .. _`extension functions`: extensions.html
 .. _`custom element classes`: element_classes.html
 .. _`SAX compliant API`: sax.html

Modified: lxml/trunk/doc/mkhtml.py
==============================================================================
--- lxml/trunk/doc/mkhtml.py	(original)
+++ lxml/trunk/doc/mkhtml.py	Wed Feb 21 16:43:46 2007
@@ -14,7 +14,8 @@
     for name in ['main.txt', 'intro.txt', 'api.txt', 'compatibility.txt',
                  'extensions.txt', 'element_classes.txt', 'sax.txt',
                  'build.txt', 'FAQ.txt', 'performance.txt', 'resolvers.txt',
-                 'capi.txt', 'objectify.txt']:
+                 'capi.txt', 'objectify.txt', 'validation.txt',
+                 'xpathxslt.txt', 'parsing.txt']:
         path = os.path.join(doc_dir, name)
         outname = os.path.splitext(name)[0] + '.html'
         outpath = os.path.join(dirname, outname)

Copied: lxml/trunk/doc/parsing.txt (from r39233, lxml/trunk/doc/api.txt)
==============================================================================
--- lxml/trunk/doc/api.txt	(original)
+++ lxml/trunk/doc/parsing.txt	Wed Feb 21 16:43:46 2007
@@ -1,173 +1,15 @@
 =====================
-APIs specific to lxml
+Parsing XML with lxml
 =====================
 
-lxml tries to follow established APIs wherever possible.  Sometimes, however,
-the need to expose a feature in an easy way led to the invention of a new API.
+lxml provides a very simple and powerful API for parsing XML.  It supports
+one-step parsing as well as step-by-step parsing using an event-driven API.
 
 .. contents::
 .. 
-   1   lxml.etree
-   2   Other Element APIs
-   3   Trees and Documents
-   4   Iteration
-   5   Parsers
-   6   iterparse and iterwalk
-   7   Error handling on exceptions
-   8   Python unicode strings
-   9   XPath
-   10  XSLT
-   11  RelaxNG
-   12  XMLSchema
-   13  xinclude
-   14  write_c14n on ElementTree
-
-
-lxml.etree
-----------
-
-lxml.etree tries to follow the `ElementTree API`_ wherever it can.  There are
-however some incompatibilities (see `compatibility`_).  The extensions are
-documented here.
-
-.. _`ElementTree API`: http://effbot.org/zone/element-index.htm
-.. _`compatibility`:   compatibility.html
-
-If you need to know which version of lxml is installed, you can access the
-``lxml.etree.LXML_VERSION`` attribute to retrieve a version tuple.  Note,
-however, that it did not exist before version 1.0, so you will get an
-AttributeError in older versions.  The versions of libxml2 and libxslt are
-available through the attributes ``LIBXML_VERSION`` and ``LIBXSLT_VERSION``.
-
-The following examples usually assume this to be executed first::
-
-  >>> from lxml import etree
-  >>> from StringIO import StringIO
-
-
-Other Element APIs
-------------------
-
-While lxml.etree itself uses the ElementTree API, it is possible to replace
-the Element implementation by `custom element subclasses`_.  This has been
-used to implement well-known XML APIs on top of lxml.  The ``lxml.elements``
-package contains examples.  Currently, there is a data-binding implementation
-called `objectify`_, which is similar to the `Amara bindery`_ tool.
-
-Additionally, the `lxml.elements.classlookup`_ module provides a number of
-different schemes to customize the mapping between libxml2 nodes and the
-Element classes used by lxml.etree.
-
-.. _`custom element subclasses`: namespace_extensions.html
-.. _`objectify`: objectify.html
-.. _`lxml.elements.classlookup`: elements.html#lxml.elements.classlookup
-.. _`Amara bindery`: http://uche.ogbuji.net/tech/4suite/amara/
-
-
-Trees and Documents
--------------------
-
-Compared to the original ElementTree API, lxml.etree has an extended tree
-model.  It knows about parents and siblings of elements::
-
-  >>> root = etree.Element("root")
-  >>> a = etree.SubElement(root, "a")
-  >>> b = etree.SubElement(root, "b")
-  >>> c = etree.SubElement(root, "c")
-  >>> d = etree.SubElement(root, "d")
-  >>> e = etree.SubElement(d,    "e")
-  >>> b.getparent() == root
-  True
-  >>> print b.getnext().tag
-  c
-  >>> print c.getprevious().tag
-  b
-
-Elements always live within a document context in lxml.  This implies that
-there is also a notion of an absolute document root.  You can retrieve an
-ElementTree for the root node of a document from any of its elements::
-
-  >>> tree = d.getroottree()
-  >>> print tree.getroot().tag
-  root
-
-Note that this is different from wrapping an Element in an ElementTree.  You
-can use ElementTrees to create XML trees with an explicit root node::
-
-  >>> tree = etree.ElementTree(d)
-  >>> print tree.getroot().tag
-  d
-  >>> print etree.tostring(tree)
-  <d><e/></d>
-
-All operations that you run on such an ElementTree (like XPath, XSLT, etc.)
-will understand the explicitly chosen root as root node of a document.  They
-will not see any elements outside the ElementTree.  However, ElementTrees do
-not modify their Elements::
-
-  >>> element = tree.getroot()
-  >>> print element.tag
-  d
-  >>> print element.getparent().tag
-  root
-  >>> print element.getroottree().getroot().tag
-  root
-
-The rule is that all operations that are applied to Elements use either the
-Element itself as reference point, or the absolute root of the document that
-contains this Element (e.g. for absolute XPath expressions).  All operations
-on an ElementTree use its explicit root node as reference.
-
-
-Iteration
----------
-
-The ElementTree API makes Elements iterable to supports iteration over their
-children.  Using the tree defined above, we get::
-
-  >>> [ el.tag for el in root ]
-  ['a', 'b', 'c', 'd']
-
-Tree traversal is commonly based on the ``element.getiterator()`` method::
-
-  >>> [ el.tag for el in root.getiterator() ]
-  ['root', 'a', 'b', 'c', 'd', 'e']
-
-lxml.etree also supports this, but additionally features an extended API for
-iteration over the children, following/preceding siblings, ancestors and
-descendants of an element, as defined by the respective XPath axis::
-
-  >>> [ el.tag for el in root.iterchildren() ]
-  ['a', 'b', 'c', 'd']
-  >>> [ el.tag for el in root.iterchildren(reversed=True) ]
-  ['d', 'c', 'b', 'a']
-  >>> [ el.tag for el in b.itersiblings() ]
-  ['c', 'd']
-  >>> [ el.tag for el in c.itersiblings(preceding=True) ]
-  ['b', 'a']
-  >>> [ el.tag for el in e.iterancestors() ]
-  ['d', 'root']
-  >>> [ el.tag for el in root.iterdescendants() ]
-  ['a', 'b', 'c', 'd', 'e']
-
-Note how ``element.iterdescendants()`` does not include the element itself, as
-opposed to ``element.getiterator()``.  The latter effectively implements the
-'descendant-or-self' axis in XPath.
-
-All of these iterators support an additional ``tag`` keyword argument that
-filters the generated elements by tag name::
-
-  >>> [ el.tag for el in root.iterchildren(tag='a') ]
-  ['a']
-  >>> [ el.tag for el in d.iterchildren(tag='a') ]
-  []
-  >>> [ el.tag for el in root.iterdescendants(tag='d') ]
-  ['d']
-  >>> [ el.tag for el in root.getiterator(tag='d') ]
-  ['d']
-
-See also the section on the utility functions ``iterparse()`` and
-``iterwalk()`` below.
+   1  Parsers
+   2  iterparse and iterwalk
+   3  Python unicode strings
 
 
 Parsers
@@ -371,50 +213,6 @@
   end element
 
 
-Error handling on exceptions
-----------------------------
-
-Libxml2 provides error messages for failures, be it during parsing, XPath
-evaluation or schema validation.  Whenever an exception is raised, you can
-retrieve the errors that occured and "might have" lead to the problem::
-
-  >>> etree.clearErrorLog()
-  >>> broken_xml = '<a>'
-  >>> try:
-  ...   etree.parse(StringIO(broken_xml))
-  ... except etree.XMLSyntaxError, e:
-  ...   pass # just put the exception into e
-  >>> log = e.error_log.filter_levels(etree.ErrorLevels.FATAL)
-  >>> print log
-  <string>:1:FATAL:PARSER:ERR_TAG_NOT_FINISHED: Premature end of data in tag a line 1
-
-This might look a little cryptic at first, but it is the information that
-libxml2 gives you.  At least the message at the end should give you a hint
-what went wrong and you can see that the fatal error (FATAL) happened during
-parsing (PARSER) line 1 of a string (<string>, or filename if available).
-Here, PARSER is the so-called error domain, see lxml.etree.ErrorDomains for
-that.  You can get it from a log entry like this::
-
-  >>> entry = log[0]
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-There is also a convenience attribute ``last_error`` that returns the last
-error or fatal error that occurred::
-
-  >>> entry = e.error_log.last_error
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-Alternatively, lxml.etree supports logging libxml2 messages to the Python
-stdlib logging module.  This is done through the ``etree.PyErrorLog`` class.
-It disables the error reporting from exceptions and forwards log messages to a
-Python logger.  To use it, see the descriptions of the function
-``etree.useGlobalPythonLog`` and the class ``etree.PyErrorLog`` for help.
-Note that this does not affect the local error logs of XSLT, XMLSchema,
-etc. which are described in their respective sections below.
-
-
 Python unicode strings
 ----------------------
 
@@ -482,429 +280,3 @@
 needed that reflects the encoding of the returned string.  This makes it
 possible for other parsers to correctly parse the XML byte stream.  Note that
 using ``tostring()`` with UTF-8 is also considerably faster in most cases.
-
-
-XPath
------
-
-lxml.etree supports the simple path syntax of the ``findall()`` etc.  methods
-on ElementTree and Element, as known from the original ElementTree library.
-As an extension, these classes also provide an ``xpath()`` method that
-supports expressions in the complete XPath syntax.
-
-There are also specialized XPath evaluator classes that are more efficient for
-frequent evaluation: ``XPath`` and ``XPathEvaluator``.  See the `performance
-comparison`_ to learn when to use which.  Their semantics when used on
-Elements and ElementTrees are the same as for the ``xpath()`` method described
-here.
-
-.. _`performance comparison`: performance.html#xpath
-
-For ElementTree, the xpath method performs a global XPath query against the
-document (if absolute) or against the root node (if relative)::
-
-  >>> f = StringIO('<foo><bar></bar></foo>')
-  >>> tree = etree.parse(f)
-
-  >>> r = tree.xpath('/foo/bar')
-  >>> len(r)
-  1
-  >>> r[0].tag
-  'bar'
-
-  >>> r = tree.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-When ``xpath()`` is used on an element, the XPath expression is evaluated
-against the element (if relative) or against the root tree (if absolute)::
-
-  >>> root = tree.getroot()
-  >>> r = root.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> bar = root[0]
-  >>> r = bar.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> tree = bar.getroottree()
-  >>> r = tree.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-Optionally, you can provide a ``namespaces`` keyword argument, which should be
-a dictionary mapping the namespace prefixes used in the XPath expression to
-namespace URIs::
-
-  >>> f = StringIO('''\
-  ... <a:foo xmlns:a="http://codespeak.net/ns/test1" 
-  ...       xmlns:b="http://codespeak.net/ns/test2">
-  ...    <b:bar>Text</b:bar>
-  ... </a:foo>
-  ... ''')
-  >>> doc = etree.parse(f)
-  >>> r = doc.xpath('/t:foo/b:bar', {'t': 'http://codespeak.net/ns/test1', 
-  ...                                'b': 'http://codespeak.net/ns/test2'})
-  >>> len(r)
-  1
-  >>> r[0].tag
-  '{http://codespeak.net/ns/test2}bar'
-  >>> r[0].text
-  'Text'
-
-There is also an optional ``extensions`` argument which is used to define
-`extension functions`_ in Python that are local to this evaluation.
-
-.. _`extension functions`: extensions.html
-
-The return values of XPath evaluations vary, depending on the XPath expression
-used:
-
-* True or False, when the XPath expression has a boolean result
-
-* a float, when the XPath expression has a numeric result (integer or float)
-
-* a (unicode) string, when the XPath expression has a string result.
-
-* a list of items, when the XPath expression has a list as result.  The items
-  may include elements, strings and tuples.  Text nodes and attributes in the
-  result are returned as strings (the text node content or attribute value).
-  Comments are also returned as strings, enclosed by the usual ``<!--`` and
-  ``-->`` markers.  Namespace declarations are returned as tuples of strings:
-  ``(prefix, URI)``.
-
-A related convenience method of ElementTree objects is ``getpath(element)``,
-which returns a structural, absolute XPath expression to find that element::
-
-  >>> a  = etree.Element("a")
-  >>> b  = etree.SubElement(a, "b")
-  >>> c  = etree.SubElement(a, "c")
-  >>> d1 = etree.SubElement(c, "d")
-  >>> d2 = etree.SubElement(c, "d")
-
-  >>> tree = etree.ElementTree(c)
-  >>> print tree.getpath(d2)
-  /c/d[2]
-  >>> tree.xpath(tree.getpath(d2)) == [d2]
-  True
-
-
-XSLT
-----
-
-lxml.etree introduces a new class, lxml.etree.XSLT. The class can be
-given an ElementTree object to construct an XSLT transformer::
-
-  >>> f = StringIO('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> xslt_doc = etree.parse(f)
-  >>> transform = etree.XSLT(xslt_doc)
-
-You can then run the transformation on an ElementTree document by simply
-calling it, and this results in another ElementTree object::
-
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-  >>> result = transform(doc)
-
-The result object can be accessed like a normal ElementTree document::
-
-  >>> result.getroot().text
-  'Text'
-
-but, as opposed to normal ElementTree objects, can also be turned into an (XML
-or text) string by applying the str() function::
-
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-The result is always a plain string, encoded as requested by the
-``xsl:output`` element in the stylesheet.  If you want a Python unicode string
-instead, you should set this encoding to ``UTF-8`` (unless the `ASCII` default
-is sufficient).  This allows you to call the builtin ``unicode()`` function on
-the result::
-
-  >>> unicode(result)
-  u'<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-You can use other encodings at the cost of multiple recoding.  Encodings that
-are not supported by Python will result in an error::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:output encoding="UCS4"/>
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-
-  >>> result = transform(doc)
-  >>> unicode(result)
-  Traceback (most recent call last):
-    [...]
-  LookupError: unknown encoding: UCS4
-
-It is possible to pass parameters, in the form of XPath expressions, to the
-XSLT template::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="$a" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-
-The parameters are passed as keyword parameters to the transform call. First
-let's try passing in a simple string expression::
-
-  >>> result = transform(doc, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-Let's try a non-string XPath expression now::
-
-  >>> result = transform(doc, a="/a/b/text()")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-There's also a convenience method on the tree object for doing XSL
-transformations.  This is less efficient if you want to apply the same XSL
-transformation to multiple documents, but is shorter to write for one-shot
-operations, as you do not have to instantiate a stylesheet yourself::
-
-  >>> result = doc.xslt(xslt_tree, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-By default, XSLT supports all extension functions from libxslt and libexslt as
-well as Python regular expressions through EXSLT.  Note that some extensions
-enable style sheets to read and write files on the local file system.  See the
-`document loader documentation`_ on how to deal with this.
-
-.. _`document loader documentation`: resolvers.html
-
-If you want to know how your stylesheet performed, pass the ``profile_run``
-keyword to the transform::
-
-  >>> result = transform(doc, a="/a/b/text()", profile_run=True)
-  >>> profile = result.xslt_profile
-
-The value of the ``xslt_profile`` property is an ElementTree with profiling
-data about each template, similar to the following::
-
-  <profile>
-    <template rank="1" match="/" name="" mode="" calls="1" time="1" average="1"/>
-  </profile>
-
-Note that this is a read-only document.  You must not move any of its elements
-to other documents.  Please deep-copy the document if you need to modify it.
-If you want to free it from memory, just do::
-
-  >>> del result.xslt_profile
-
-
-RelaxNG
--------
-
-lxml.etree introduces a new class, lxml.etree.RelaxNG. The class can
-be given an ElementTree object to construct a Relax NG validator::
-
-  >>> f = StringIO('''\
-  ... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
-  ...  <zeroOrMore>
-  ...     <element name="b">
-  ...       <text />
-  ...     </element>
-  ...  </zeroOrMore>
-  ... </element>
-  ... ''')
-  >>> relaxng_doc = etree.parse(f)
-  >>> relaxng = etree.RelaxNG(relaxng_doc)
-
-You can then validate some ElementTree document against the schema. You'll get
-back True if the document is valid against the Relax NG schema, and False if
-not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> relaxng.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> relaxng.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not relaxng(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> relaxng.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> relaxng.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Starting with version 0.9, lxml now has a simple API to report the errors
-generated by libxml2. If you want to find out why the validation failed in the
-second case, you can look up the error log of the validation process and check
-it for relevant messages::
-
-  >>> log = relaxng.error_log
-  >>> print log.last_error
-  <string>:1:ERROR:RELAXNGV:ERR_LT_IN_ATTRIBUTE: Did not expect element c there
-
-You can see that the error (ERROR) happened during RelaxNG validation
-(RELAXNGV).  The message then tells you what went wrong.  Note that this error
-is local to the RelaxNG object.  It will only contain log entries that
-appeares during the validation.  The DocumentInvalid exception raised by the
-``assertValid`` method above provides access to the global error log (like all
-other lxml exceptions).
-
-Similar to XSLT, there's also a less efficient but easier shortcut method to
-do one-shot RelaxNG validation::
-
-  >>> doc.relaxng(relaxng_doc)
-  1
-  >>> doc2.relaxng(relaxng_doc)
-  0
-
-
-XMLSchema
----------
-
-lxml.etree also has a XML Schema (XSD) support, using the class
-lxml.etree.XMLSchema. This support is very similar to the Relax NG
-support. The class can be given an ElementTree object to construct a
-XMLSchema validator::
-
-  >>> f = StringIO('''\
-  ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
-  ... <xsd:element name="a" type="AType"/>
-  ... <xsd:complexType name="AType">
-  ...   <xsd:sequence>
-  ...     <xsd:element name="b" type="xsd:string" />
-  ...   </xsd:sequence>
-  ... </xsd:complexType>
-  ... </xsd:schema>
-  ... ''')
-  >>> xmlschema_doc = etree.parse(f)
-  >>> xmlschema = etree.XMLSchema(xmlschema_doc)
-
-You can then validate some ElementTree document with this. Like with
-RelaxNG, you'll get back true if the document is valid against the XML
-schema, and false if not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> xmlschema.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> xmlschema.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not xmlschema(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> xmlschema.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> xmlschema.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Error reporting works like for the RelaxNG class::
-
-  >>> log = xmlschema.error_log
-  >>> error = log.last_error
-  >>> print error.domain_name
-  SCHEMASV
-  >>> print error.type_name
-  SCHEMAV_ELEMENT_CONTENT
-
-If you were to print this log entry, you would get something like the
-following.  Note that the error message depends on the libxml2 version in
-use::
-
-  <string>:1:ERROR::SCHEMAV_ELEMENT_CONTENT: Element 'c': This element is not expected. Expected is ( b ).
-
-Similar to XSLT and RelaxNG, there's also a less efficient but easier shortcut
-method to do XML Schema validation::
-
-  >>> doc.xmlschema(xmlschema_doc)
-  1
-  >>> doc2.xmlschema(xmlschema_doc)
-  0
-
-
-xinclude
---------
-
-Simple XInclude support exists.  You can let lxml process xinclude statements
-in a document by calling the xinclude() method on a tree::
-
-  >>> data = StringIO('''\
-  ... <doc xmlns:xi="http://www.w3.org/2001/XInclude">
-  ... <foo/>
-  ... <xi:include href="doc/test.xml" />
-  ... </doc>''')
-
-  >>> tree = etree.parse(data)
-  >>> tree.xinclude()
-  >>> etree.tostring(tree.getroot())
-  '<doc xmlns:xi="http://www.w3.org/2001/XInclude">\n<foo/>\n<a xml:base="doc/test.xml"/>\n</doc>'
-
-
-write_c14n on ElementTree
--------------------------
-
-The lxml.etree.ElementTree class has a method write_c14n, which takes a file
-object as argument.  This file object will receive an UTF-8 representation of
-the canonicalized form of the XML, following the W3C C14N recommendation.  For
-example::
-
-  >>> f = StringIO('<a><b/></a>')
-  >>> tree = etree.parse(f)
-  >>> f2 = StringIO()
-  >>> tree.write_c14n(f2)
-  >>> f2.getvalue()
-  '<a><b></b></a>'

Copied: lxml/trunk/doc/validation.txt (from r39233, lxml/trunk/doc/api.txt)
==============================================================================
--- lxml/trunk/doc/api.txt	(original)
+++ lxml/trunk/doc/validation.txt	Wed Feb 21 16:43:46 2007
@@ -1,719 +1,18 @@
-=====================
-APIs specific to lxml
-=====================
+====================
+Validation with lxml
+====================
+
+Apart from DTD support in the parsers, lxml currently supports two schema
+languages: `Relax NG`_ and `XML Schema`_.  Both provide identical APIs,
+represented by a validator class with the obvious names.
 
-lxml tries to follow established APIs wherever possible.  Sometimes, however,
-the need to expose a feature in an easy way led to the invention of a new API.
+.. _`Relax NG`: http://www.relaxng.org/
+.. _`XML Schema`: http://www.w3.org/XML/Schema
 
 .. contents::
 .. 
-   1   lxml.etree
-   2   Other Element APIs
-   3   Trees and Documents
-   4   Iteration
-   5   Parsers
-   6   iterparse and iterwalk
-   7   Error handling on exceptions
-   8   Python unicode strings
-   9   XPath
-   10  XSLT
-   11  RelaxNG
-   12  XMLSchema
-   13  xinclude
-   14  write_c14n on ElementTree
-
-
-lxml.etree
-----------
-
-lxml.etree tries to follow the `ElementTree API`_ wherever it can.  There are
-however some incompatibilities (see `compatibility`_).  The extensions are
-documented here.
-
-.. _`ElementTree API`: http://effbot.org/zone/element-index.htm
-.. _`compatibility`:   compatibility.html
-
-If you need to know which version of lxml is installed, you can access the
-``lxml.etree.LXML_VERSION`` attribute to retrieve a version tuple.  Note,
-however, that it did not exist before version 1.0, so you will get an
-AttributeError in older versions.  The versions of libxml2 and libxslt are
-available through the attributes ``LIBXML_VERSION`` and ``LIBXSLT_VERSION``.
-
-The following examples usually assume this to be executed first::
-
-  >>> from lxml import etree
-  >>> from StringIO import StringIO
-
-
-Other Element APIs
-------------------
-
-While lxml.etree itself uses the ElementTree API, it is possible to replace
-the Element implementation by `custom element subclasses`_.  This has been
-used to implement well-known XML APIs on top of lxml.  The ``lxml.elements``
-package contains examples.  Currently, there is a data-binding implementation
-called `objectify`_, which is similar to the `Amara bindery`_ tool.
-
-Additionally, the `lxml.elements.classlookup`_ module provides a number of
-different schemes to customize the mapping between libxml2 nodes and the
-Element classes used by lxml.etree.
-
-.. _`custom element subclasses`: namespace_extensions.html
-.. _`objectify`: objectify.html
-.. _`lxml.elements.classlookup`: elements.html#lxml.elements.classlookup
-.. _`Amara bindery`: http://uche.ogbuji.net/tech/4suite/amara/
-
-
-Trees and Documents
--------------------
-
-Compared to the original ElementTree API, lxml.etree has an extended tree
-model.  It knows about parents and siblings of elements::
-
-  >>> root = etree.Element("root")
-  >>> a = etree.SubElement(root, "a")
-  >>> b = etree.SubElement(root, "b")
-  >>> c = etree.SubElement(root, "c")
-  >>> d = etree.SubElement(root, "d")
-  >>> e = etree.SubElement(d,    "e")
-  >>> b.getparent() == root
-  True
-  >>> print b.getnext().tag
-  c
-  >>> print c.getprevious().tag
-  b
-
-Elements always live within a document context in lxml.  This implies that
-there is also a notion of an absolute document root.  You can retrieve an
-ElementTree for the root node of a document from any of its elements::
-
-  >>> tree = d.getroottree()
-  >>> print tree.getroot().tag
-  root
-
-Note that this is different from wrapping an Element in an ElementTree.  You
-can use ElementTrees to create XML trees with an explicit root node::
-
-  >>> tree = etree.ElementTree(d)
-  >>> print tree.getroot().tag
-  d
-  >>> print etree.tostring(tree)
-  <d><e/></d>
-
-All operations that you run on such an ElementTree (like XPath, XSLT, etc.)
-will understand the explicitly chosen root as root node of a document.  They
-will not see any elements outside the ElementTree.  However, ElementTrees do
-not modify their Elements::
-
-  >>> element = tree.getroot()
-  >>> print element.tag
-  d
-  >>> print element.getparent().tag
-  root
-  >>> print element.getroottree().getroot().tag
-  root
-
-The rule is that all operations that are applied to Elements use either the
-Element itself as reference point, or the absolute root of the document that
-contains this Element (e.g. for absolute XPath expressions).  All operations
-on an ElementTree use its explicit root node as reference.
-
-
-Iteration
----------
-
-The ElementTree API makes Elements iterable to supports iteration over their
-children.  Using the tree defined above, we get::
-
-  >>> [ el.tag for el in root ]
-  ['a', 'b', 'c', 'd']
-
-Tree traversal is commonly based on the ``element.getiterator()`` method::
-
-  >>> [ el.tag for el in root.getiterator() ]
-  ['root', 'a', 'b', 'c', 'd', 'e']
-
-lxml.etree also supports this, but additionally features an extended API for
-iteration over the children, following/preceding siblings, ancestors and
-descendants of an element, as defined by the respective XPath axis::
-
-  >>> [ el.tag for el in root.iterchildren() ]
-  ['a', 'b', 'c', 'd']
-  >>> [ el.tag for el in root.iterchildren(reversed=True) ]
-  ['d', 'c', 'b', 'a']
-  >>> [ el.tag for el in b.itersiblings() ]
-  ['c', 'd']
-  >>> [ el.tag for el in c.itersiblings(preceding=True) ]
-  ['b', 'a']
-  >>> [ el.tag for el in e.iterancestors() ]
-  ['d', 'root']
-  >>> [ el.tag for el in root.iterdescendants() ]
-  ['a', 'b', 'c', 'd', 'e']
-
-Note how ``element.iterdescendants()`` does not include the element itself, as
-opposed to ``element.getiterator()``.  The latter effectively implements the
-'descendant-or-self' axis in XPath.
-
-All of these iterators support an additional ``tag`` keyword argument that
-filters the generated elements by tag name::
-
-  >>> [ el.tag for el in root.iterchildren(tag='a') ]
-  ['a']
-  >>> [ el.tag for el in d.iterchildren(tag='a') ]
-  []
-  >>> [ el.tag for el in root.iterdescendants(tag='d') ]
-  ['d']
-  >>> [ el.tag for el in root.getiterator(tag='d') ]
-  ['d']
-
-See also the section on the utility functions ``iterparse()`` and
-``iterwalk()`` below.
-
-
-Parsers
--------
-
-One of the differences is the parser.  There is support for both XML and
-(broken) HTML.  Both are based on libxml2 and therefore only support options
-that are backed by the library.  Parsers take a number of keyword arguments.
-The following is an example for namespace cleanup during parsing, first with
-the default parser, then with a parametrized one::
-
-  >>> xml = '<a xmlns="test"><b xmlns="test"/></a>'
-
-  >>> et     = etree.parse(StringIO(xml))
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b xmlns="test"/></a>
-
-  >>> parser = etree.XMLParser(ns_clean=True)
-  >>> et     = etree.parse(StringIO(xml), parser)
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b/></a>
-
-HTML parsing is similarly simple.  The parsers have a ``recover`` keyword
-argument that the HTMLParser sets by default.  It lets libxml2 try its best to
-return something usable without raising an exception.  You should use libxml2
-version 2.6.21 or newer to take advantage of this feature::
-
-  >>> broken_html = "<html><head><title>test<body><h1>page title</h3>"
-
-  >>> parser = etree.HTMLParser()
-  >>> et     = etree.parse(StringIO(broken_html), parser)
-
-  >>> print etree.tostring(et.getroot())
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-Lxml has an HTML function, similar to the XML shortcut known from
-ElementTree::
-
-  >>> html = etree.HTML(broken_html)
-  >>> print etree.tostring(html)
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-The support for parsing broken HTML depends entirely on libxml2's recovery
-algorithm.  It is *not* the fault of lxml if you find documents that are so
-heavily broken that the parser cannot handle them.  There is also no guarantee
-that the resulting tree will contain all data from the original document.  The
-parser may have to drop seriously broken parts when struggling to keep
-parsing.  Especially misplaced meta tags can suffer from this, which may lead
-to encoding problems.
-
-The use of the libxml2 parsers makes some additional information available at
-the API level.  Currently, ElementTree objects can access the DOCTYPE
-information provided by a parsed document, as well as the XML version and the
-original encoding::
-
-  >>> pub_id  = "-//W3C//DTD XHTML 1.0 Transitional//EN"
-  >>> sys_url = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
-  >>> doctype_string = '<!DOCTYPE html PUBLIC "%s" "%s">' % (pub_id, sys_url)
-  >>> xml_header = '<?xml version="1.0" encoding="ascii"?>'
-  >>> xhtml = xml_header + doctype_string + '<html><body></body></html>'
-
-  >>> tree = etree.parse(StringIO(xhtml))
-  >>> docinfo = tree.docinfo
-  >>> print docinfo.public_id
-  -//W3C//DTD XHTML 1.0 Transitional//EN
-  >>> print docinfo.system_url
-  http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
-  >>> docinfo.doctype == doctype_string
-  True
-
-  >>> print docinfo.xml_version
-  1.0
-  >>> print docinfo.encoding
-  ascii
-
-
-iterparse and iterwalk
-----------------------
-
-As known from ElementTree, the ``iterparse()`` utility function returns an
-iterator that generates parser events for an XML file (or file-like object),
-while building the tree.  The values are tuples ``(event-type, object)``.  The
-event types are 'start', 'end', 'start-ns' and 'end-ns'.
-
-The 'start' and 'end' events represent opening and closing elements and are
-accompanied by the respective element.  By default, only 'end' events are
-generated::
-
-  >>> xml = '''\
-  ... <root>
-  ...   <element key='value'>text</element>
-  ...   <element>text</element>tail
-  ...   <empty-element xmlns="testns" />
-  ... </root>
-  ... '''
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-  end {testns}empty-element
-  end root
-
-The resulting tree is available through the ``root`` property of the iterator::
-
-  >>> context.root.tag
-  'root'
-
-The other types can be activated with the ``events`` keyword argument::
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start root
-  start element
-  end element
-  start element
-  end element
-  start {testns}empty-element
-  end {testns}empty-element
-  end root
-
-You can modify the element and its descendants when handling the 'end' event.
-To save memory, for example, you can remove subtrees that are no longer
-needed::
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print len(elem),
-  ...     elem.clear()
-  0 0 0 3
-  >>> context.root.getchildren()
-  []
-
-**WARNING**: During the 'start' event, the descendants and following siblings
-are not yet available and should not be accessed.  During the 'end' event, the
-element and its descendants can be freely modified, but its following siblings
-should not be accessed.  During either of the two events, you **must not**
-modify or move the ancestors (parents) of the current element.  You should
-also avoid moving or discarding the element itself.  The golden rule is: do
-not touch anything that will have to be touched again by the parser later on.
-
-If you have elements with a long list of children in your XML file and want to
-save more memory during parsing, you can clean up the preceding siblings of
-the current element::
-
-  >>> for event, element in etree.iterparse(StringIO(xml)):
-  ...     # ... do something with the element
-  ...     element.clear()                # clean up children
-  ...     if element.getprevious():      # clean up preceding siblings
-  ...         del element.getparent()[0]
-
-You can use ``while`` instead of ``if`` if you skipped siblings using the
-``tag`` keyword argument.  The more selective your tag is, however, the more
-thought you will have to put into finding the right way to clean up the
-elements that were skipped.  Therefore, it is sometimes easier to traverse all
-elements and do the tag selection by hand in the event handler code.
-
-The 'start-ns' and 'end-ns' events notify about namespace declarations and
-generate tuples ``(prefix, URI)``::
-
-  >>> events = ("start-ns", "end-ns")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, obj in context:
-  ...     print action, obj
-  start-ns ('', 'testns')
-  end-ns None
-
-It is common practice to use a list as namespace stack and pop the last entry
-on the 'end-ns' event.
-
-lxml.etree supports two extensions compared to ElementTree.  It accepts a
-``tag`` keyword argument just like ``element.getiterator(tag)``.  This
-restricts events to a specific tag or namespace.
-
-  >>> context = etree.iterparse(StringIO(xml), tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events, tag="{testns}*")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start {testns}empty-element
-  end {testns}empty-element
-
-The second extension is the ``iterwalk()`` function.  It behaves exactly like
-``iterparse()``, but works on Elements and ElementTrees::
-
-  >>> root = context.root
-  >>> context = etree.iterwalk(root, events=events, tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start element
-  end element
-  start element
-  end element
-
-
-Error handling on exceptions
-----------------------------
-
-Libxml2 provides error messages for failures, be it during parsing, XPath
-evaluation or schema validation.  Whenever an exception is raised, you can
-retrieve the errors that occured and "might have" lead to the problem::
-
-  >>> etree.clearErrorLog()
-  >>> broken_xml = '<a>'
-  >>> try:
-  ...   etree.parse(StringIO(broken_xml))
-  ... except etree.XMLSyntaxError, e:
-  ...   pass # just put the exception into e
-  >>> log = e.error_log.filter_levels(etree.ErrorLevels.FATAL)
-  >>> print log
-  <string>:1:FATAL:PARSER:ERR_TAG_NOT_FINISHED: Premature end of data in tag a line 1
-
-This might look a little cryptic at first, but it is the information that
-libxml2 gives you.  At least the message at the end should give you a hint
-what went wrong and you can see that the fatal error (FATAL) happened during
-parsing (PARSER) line 1 of a string (<string>, or filename if available).
-Here, PARSER is the so-called error domain, see lxml.etree.ErrorDomains for
-that.  You can get it from a log entry like this::
-
-  >>> entry = log[0]
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-There is also a convenience attribute ``last_error`` that returns the last
-error or fatal error that occurred::
-
-  >>> entry = e.error_log.last_error
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-Alternatively, lxml.etree supports logging libxml2 messages to the Python
-stdlib logging module.  This is done through the ``etree.PyErrorLog`` class.
-It disables the error reporting from exceptions and forwards log messages to a
-Python logger.  To use it, see the descriptions of the function
-``etree.useGlobalPythonLog`` and the class ``etree.PyErrorLog`` for help.
-Note that this does not affect the local error logs of XSLT, XMLSchema,
-etc. which are described in their respective sections below.
-
-
-Python unicode strings
-----------------------
-
-lxml.etree has broader support for Python unicode strings than the ElementTree
-library.  First of all, where ElementTree would raise an exception, the
-parsers in lxml.etree can handle unicode strings straight away.  This is most
-helpful for XML snippets embedded in source code using the ``XML()``
-function::
-
-  >>> uxml = u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> uxml
-  u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> root = etree.XML(uxml)
-
-This requires, however, that unicode strings do not specify a conflicting
-encoding themselves and thus lie about their real encoding::
-
-  >>> etree.XML(u'<?xml version="1.0" encoding="ASCII"?>\n' + uxml)
-  Traceback (most recent call last):
-    ...
-  ValueError: Unicode strings with encoding declaration are not supported.
-
-Similarly, you will get errors when you try the same with HTML data in a
-unicode string that specifies a charset in a meta tag of the header.  You
-should generally avoid converting XML/HTML data to unicode before passing it
-into the parsers.  It is both slower and error prone.
-
-To serialize the result, you would normally use the ``tostring`` module
-function, which serializes to plain ASCII by default or a number of other
-encodings if asked for::
-
-  >>> etree.tostring(root)
-  '<test> &#63697; + &#63698; </test>'
-
-  >>> etree.tostring(root, 'UTF-8', xml_declaration=False)
-  '<test> \xef\xa3\x91 + \xef\xa3\x92 </test>'
-
-As an extension, lxml.etree has a new ``tounicode()`` function that you can
-call on XML tree objects to retrieve a Python unicode representation::
-
-  >>> etree.tounicode(root)
-  u'<test> \uf8d1 + \uf8d2 </test>'
-
-  >>> el = etree.Element("test")
-  >>> etree.tounicode(el)
-  u'<test/>'
-
-  >>> subel = etree.SubElement(el, "subtest")
-  >>> etree.tounicode(el)
-  u'<test><subtest/></test>'
-
-  >>> et = etree.ElementTree(el)
-  >>> etree.tounicode(et)
-  u'<test><subtest/></test>'
-
-The result of ``tounicode()`` can be treated like any other Python unicode
-string and then passed back into the parsers.  However, if you want to save
-the result to a file or pass it over the network, you should use ``write()``
-or ``tostring()`` with an encoding argument (typically UTF-8) to serialize the
-XML.  The main reason is that unicode strings returned by ``tounicode()``
-never have an XML declaration and therefore do not specify their encoding.
-These strings are most likely not parsable by other XML libraries.
-
-In contrast, the ``tostring()`` function automatically adds a declaration as
-needed that reflects the encoding of the returned string.  This makes it
-possible for other parsers to correctly parse the XML byte stream.  Note that
-using ``tostring()`` with UTF-8 is also considerably faster in most cases.
-
-
-XPath
------
-
-lxml.etree supports the simple path syntax of the ``findall()`` etc.  methods
-on ElementTree and Element, as known from the original ElementTree library.
-As an extension, these classes also provide an ``xpath()`` method that
-supports expressions in the complete XPath syntax.
-
-There are also specialized XPath evaluator classes that are more efficient for
-frequent evaluation: ``XPath`` and ``XPathEvaluator``.  See the `performance
-comparison`_ to learn when to use which.  Their semantics when used on
-Elements and ElementTrees are the same as for the ``xpath()`` method described
-here.
-
-.. _`performance comparison`: performance.html#xpath
-
-For ElementTree, the xpath method performs a global XPath query against the
-document (if absolute) or against the root node (if relative)::
-
-  >>> f = StringIO('<foo><bar></bar></foo>')
-  >>> tree = etree.parse(f)
-
-  >>> r = tree.xpath('/foo/bar')
-  >>> len(r)
-  1
-  >>> r[0].tag
-  'bar'
-
-  >>> r = tree.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-When ``xpath()`` is used on an element, the XPath expression is evaluated
-against the element (if relative) or against the root tree (if absolute)::
-
-  >>> root = tree.getroot()
-  >>> r = root.xpath('bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> bar = root[0]
-  >>> r = bar.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-  >>> tree = bar.getroottree()
-  >>> r = tree.xpath('/foo/bar')
-  >>> r[0].tag
-  'bar'
-
-Optionally, you can provide a ``namespaces`` keyword argument, which should be
-a dictionary mapping the namespace prefixes used in the XPath expression to
-namespace URIs::
-
-  >>> f = StringIO('''\
-  ... <a:foo xmlns:a="http://codespeak.net/ns/test1" 
-  ...       xmlns:b="http://codespeak.net/ns/test2">
-  ...    <b:bar>Text</b:bar>
-  ... </a:foo>
-  ... ''')
-  >>> doc = etree.parse(f)
-  >>> r = doc.xpath('/t:foo/b:bar', {'t': 'http://codespeak.net/ns/test1', 
-  ...                                'b': 'http://codespeak.net/ns/test2'})
-  >>> len(r)
-  1
-  >>> r[0].tag
-  '{http://codespeak.net/ns/test2}bar'
-  >>> r[0].text
-  'Text'
-
-There is also an optional ``extensions`` argument which is used to define
-`extension functions`_ in Python that are local to this evaluation.
-
-.. _`extension functions`: extensions.html
-
-The return values of XPath evaluations vary, depending on the XPath expression
-used:
-
-* True or False, when the XPath expression has a boolean result
-
-* a float, when the XPath expression has a numeric result (integer or float)
-
-* a (unicode) string, when the XPath expression has a string result.
-
-* a list of items, when the XPath expression has a list as result.  The items
-  may include elements, strings and tuples.  Text nodes and attributes in the
-  result are returned as strings (the text node content or attribute value).
-  Comments are also returned as strings, enclosed by the usual ``<!--`` and
-  ``-->`` markers.  Namespace declarations are returned as tuples of strings:
-  ``(prefix, URI)``.
-
-A related convenience method of ElementTree objects is ``getpath(element)``,
-which returns a structural, absolute XPath expression to find that element::
-
-  >>> a  = etree.Element("a")
-  >>> b  = etree.SubElement(a, "b")
-  >>> c  = etree.SubElement(a, "c")
-  >>> d1 = etree.SubElement(c, "d")
-  >>> d2 = etree.SubElement(c, "d")
-
-  >>> tree = etree.ElementTree(c)
-  >>> print tree.getpath(d2)
-  /c/d[2]
-  >>> tree.xpath(tree.getpath(d2)) == [d2]
-  True
-
-
-XSLT
-----
-
-lxml.etree introduces a new class, lxml.etree.XSLT. The class can be
-given an ElementTree object to construct an XSLT transformer::
-
-  >>> f = StringIO('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> xslt_doc = etree.parse(f)
-  >>> transform = etree.XSLT(xslt_doc)
-
-You can then run the transformation on an ElementTree document by simply
-calling it, and this results in another ElementTree object::
-
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-  >>> result = transform(doc)
-
-The result object can be accessed like a normal ElementTree document::
-
-  >>> result.getroot().text
-  'Text'
-
-but, as opposed to normal ElementTree objects, can also be turned into an (XML
-or text) string by applying the str() function::
-
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-The result is always a plain string, encoded as requested by the
-``xsl:output`` element in the stylesheet.  If you want a Python unicode string
-instead, you should set this encoding to ``UTF-8`` (unless the `ASCII` default
-is sufficient).  This allows you to call the builtin ``unicode()`` function on
-the result::
-
-  >>> unicode(result)
-  u'<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-You can use other encodings at the cost of multiple recoding.  Encodings that
-are not supported by Python will result in an error::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:output encoding="UCS4"/>
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="/a/b/text()" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-
-  >>> result = transform(doc)
-  >>> unicode(result)
-  Traceback (most recent call last):
-    [...]
-  LookupError: unknown encoding: UCS4
-
-It is possible to pass parameters, in the form of XPath expressions, to the
-XSLT template::
-
-  >>> xslt_tree = etree.XML('''\
-  ... <xsl:stylesheet version="1.0"
-  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-  ...     <xsl:template match="/">
-  ...         <foo><xsl:value-of select="$a" /></foo>
-  ...     </xsl:template>
-  ... </xsl:stylesheet>''')
-  >>> transform = etree.XSLT(xslt_tree)
-  >>> f = StringIO('<a><b>Text</b></a>')
-  >>> doc = etree.parse(f)
-
-The parameters are passed as keyword parameters to the transform call. First
-let's try passing in a simple string expression::
-
-  >>> result = transform(doc, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-Let's try a non-string XPath expression now::
-
-  >>> result = transform(doc, a="/a/b/text()")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>Text</foo>\n'
-
-There's also a convenience method on the tree object for doing XSL
-transformations.  This is less efficient if you want to apply the same XSL
-transformation to multiple documents, but is shorter to write for one-shot
-operations, as you do not have to instantiate a stylesheet yourself::
-
-  >>> result = doc.xslt(xslt_tree, a="'A'")
-  >>> str(result)
-  '<?xml version="1.0"?>\n<foo>A</foo>\n'
-
-By default, XSLT supports all extension functions from libxslt and libexslt as
-well as Python regular expressions through EXSLT.  Note that some extensions
-enable style sheets to read and write files on the local file system.  See the
-`document loader documentation`_ on how to deal with this.
-
-.. _`document loader documentation`: resolvers.html
-
-If you want to know how your stylesheet performed, pass the ``profile_run``
-keyword to the transform::
-
-  >>> result = transform(doc, a="/a/b/text()", profile_run=True)
-  >>> profile = result.xslt_profile
-
-The value of the ``xslt_profile`` property is an ElementTree with profiling
-data about each template, similar to the following::
-
-  <profile>
-    <template rank="1" match="/" name="" mode="" calls="1" time="1" average="1"/>
-  </profile>
-
-Note that this is a read-only document.  You must not move any of its elements
-to other documents.  Please deep-copy the document if you need to modify it.
-If you want to free it from memory, just do::
-
-  >>> del result.xslt_profile
+   1  RelaxNG
+   2  XMLSchema
 
 
 RelaxNG
@@ -874,37 +173,3 @@
   1
   >>> doc2.xmlschema(xmlschema_doc)
   0
-
-
-xinclude
---------
-
-Simple XInclude support exists.  You can let lxml process xinclude statements
-in a document by calling the xinclude() method on a tree::
-
-  >>> data = StringIO('''\
-  ... <doc xmlns:xi="http://www.w3.org/2001/XInclude">
-  ... <foo/>
-  ... <xi:include href="doc/test.xml" />
-  ... </doc>''')
-
-  >>> tree = etree.parse(data)
-  >>> tree.xinclude()
-  >>> etree.tostring(tree.getroot())
-  '<doc xmlns:xi="http://www.w3.org/2001/XInclude">\n<foo/>\n<a xml:base="doc/test.xml"/>\n</doc>'
-
-
-write_c14n on ElementTree
--------------------------
-
-The lxml.etree.ElementTree class has a method write_c14n, which takes a file
-object as argument.  This file object will receive an UTF-8 representation of
-the canonicalized form of the XML, following the W3C C14N recommendation.  For
-example::
-
-  >>> f = StringIO('<a><b/></a>')
-  >>> tree = etree.parse(f)
-  >>> f2 = StringIO()
-  >>> tree.write_c14n(f2)
-  >>> f2.getvalue()
-  '<a><b></b></a>'

Copied: lxml/trunk/doc/xpathxslt.txt (from r39233, lxml/trunk/doc/api.txt)
==============================================================================
--- lxml/trunk/doc/api.txt	(original)
+++ lxml/trunk/doc/xpathxslt.txt	Wed Feb 21 16:43:46 2007
@@ -1,487 +1,14 @@
-=====================
-APIs specific to lxml
-=====================
+========================
+XPath and XSLT with lxml
+========================
 
-lxml tries to follow established APIs wherever possible.  Sometimes, however,
-the need to expose a feature in an easy way led to the invention of a new API.
+lxml supports both XPath and XSLT through libxml2 and libxslt in a standards
+compliant way.
 
 .. contents::
 .. 
-   1   lxml.etree
-   2   Other Element APIs
-   3   Trees and Documents
-   4   Iteration
-   5   Parsers
-   6   iterparse and iterwalk
-   7   Error handling on exceptions
-   8   Python unicode strings
-   9   XPath
-   10  XSLT
-   11  RelaxNG
-   12  XMLSchema
-   13  xinclude
-   14  write_c14n on ElementTree
-
-
-lxml.etree
-----------
-
-lxml.etree tries to follow the `ElementTree API`_ wherever it can.  There are
-however some incompatibilities (see `compatibility`_).  The extensions are
-documented here.
-
-.. _`ElementTree API`: http://effbot.org/zone/element-index.htm
-.. _`compatibility`:   compatibility.html
-
-If you need to know which version of lxml is installed, you can access the
-``lxml.etree.LXML_VERSION`` attribute to retrieve a version tuple.  Note,
-however, that it did not exist before version 1.0, so you will get an
-AttributeError in older versions.  The versions of libxml2 and libxslt are
-available through the attributes ``LIBXML_VERSION`` and ``LIBXSLT_VERSION``.
-
-The following examples usually assume this to be executed first::
-
-  >>> from lxml import etree
-  >>> from StringIO import StringIO
-
-
-Other Element APIs
-------------------
-
-While lxml.etree itself uses the ElementTree API, it is possible to replace
-the Element implementation by `custom element subclasses`_.  This has been
-used to implement well-known XML APIs on top of lxml.  The ``lxml.elements``
-package contains examples.  Currently, there is a data-binding implementation
-called `objectify`_, which is similar to the `Amara bindery`_ tool.
-
-Additionally, the `lxml.elements.classlookup`_ module provides a number of
-different schemes to customize the mapping between libxml2 nodes and the
-Element classes used by lxml.etree.
-
-.. _`custom element subclasses`: namespace_extensions.html
-.. _`objectify`: objectify.html
-.. _`lxml.elements.classlookup`: elements.html#lxml.elements.classlookup
-.. _`Amara bindery`: http://uche.ogbuji.net/tech/4suite/amara/
-
-
-Trees and Documents
--------------------
-
-Compared to the original ElementTree API, lxml.etree has an extended tree
-model.  It knows about parents and siblings of elements::
-
-  >>> root = etree.Element("root")
-  >>> a = etree.SubElement(root, "a")
-  >>> b = etree.SubElement(root, "b")
-  >>> c = etree.SubElement(root, "c")
-  >>> d = etree.SubElement(root, "d")
-  >>> e = etree.SubElement(d,    "e")
-  >>> b.getparent() == root
-  True
-  >>> print b.getnext().tag
-  c
-  >>> print c.getprevious().tag
-  b
-
-Elements always live within a document context in lxml.  This implies that
-there is also a notion of an absolute document root.  You can retrieve an
-ElementTree for the root node of a document from any of its elements::
-
-  >>> tree = d.getroottree()
-  >>> print tree.getroot().tag
-  root
-
-Note that this is different from wrapping an Element in an ElementTree.  You
-can use ElementTrees to create XML trees with an explicit root node::
-
-  >>> tree = etree.ElementTree(d)
-  >>> print tree.getroot().tag
-  d
-  >>> print etree.tostring(tree)
-  <d><e/></d>
-
-All operations that you run on such an ElementTree (like XPath, XSLT, etc.)
-will understand the explicitly chosen root as root node of a document.  They
-will not see any elements outside the ElementTree.  However, ElementTrees do
-not modify their Elements::
-
-  >>> element = tree.getroot()
-  >>> print element.tag
-  d
-  >>> print element.getparent().tag
-  root
-  >>> print element.getroottree().getroot().tag
-  root
-
-The rule is that all operations that are applied to Elements use either the
-Element itself as reference point, or the absolute root of the document that
-contains this Element (e.g. for absolute XPath expressions).  All operations
-on an ElementTree use its explicit root node as reference.
-
-
-Iteration
----------
-
-The ElementTree API makes Elements iterable to supports iteration over their
-children.  Using the tree defined above, we get::
-
-  >>> [ el.tag for el in root ]
-  ['a', 'b', 'c', 'd']
-
-Tree traversal is commonly based on the ``element.getiterator()`` method::
-
-  >>> [ el.tag for el in root.getiterator() ]
-  ['root', 'a', 'b', 'c', 'd', 'e']
-
-lxml.etree also supports this, but additionally features an extended API for
-iteration over the children, following/preceding siblings, ancestors and
-descendants of an element, as defined by the respective XPath axis::
-
-  >>> [ el.tag for el in root.iterchildren() ]
-  ['a', 'b', 'c', 'd']
-  >>> [ el.tag for el in root.iterchildren(reversed=True) ]
-  ['d', 'c', 'b', 'a']
-  >>> [ el.tag for el in b.itersiblings() ]
-  ['c', 'd']
-  >>> [ el.tag for el in c.itersiblings(preceding=True) ]
-  ['b', 'a']
-  >>> [ el.tag for el in e.iterancestors() ]
-  ['d', 'root']
-  >>> [ el.tag for el in root.iterdescendants() ]
-  ['a', 'b', 'c', 'd', 'e']
-
-Note how ``element.iterdescendants()`` does not include the element itself, as
-opposed to ``element.getiterator()``.  The latter effectively implements the
-'descendant-or-self' axis in XPath.
-
-All of these iterators support an additional ``tag`` keyword argument that
-filters the generated elements by tag name::
-
-  >>> [ el.tag for el in root.iterchildren(tag='a') ]
-  ['a']
-  >>> [ el.tag for el in d.iterchildren(tag='a') ]
-  []
-  >>> [ el.tag for el in root.iterdescendants(tag='d') ]
-  ['d']
-  >>> [ el.tag for el in root.getiterator(tag='d') ]
-  ['d']
-
-See also the section on the utility functions ``iterparse()`` and
-``iterwalk()`` below.
-
-
-Parsers
--------
-
-One of the differences is the parser.  There is support for both XML and
-(broken) HTML.  Both are based on libxml2 and therefore only support options
-that are backed by the library.  Parsers take a number of keyword arguments.
-The following is an example for namespace cleanup during parsing, first with
-the default parser, then with a parametrized one::
-
-  >>> xml = '<a xmlns="test"><b xmlns="test"/></a>'
-
-  >>> et     = etree.parse(StringIO(xml))
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b xmlns="test"/></a>
-
-  >>> parser = etree.XMLParser(ns_clean=True)
-  >>> et     = etree.parse(StringIO(xml), parser)
-  >>> print etree.tostring(et.getroot())
-  <a xmlns="test"><b/></a>
-
-HTML parsing is similarly simple.  The parsers have a ``recover`` keyword
-argument that the HTMLParser sets by default.  It lets libxml2 try its best to
-return something usable without raising an exception.  You should use libxml2
-version 2.6.21 or newer to take advantage of this feature::
-
-  >>> broken_html = "<html><head><title>test<body><h1>page title</h3>"
-
-  >>> parser = etree.HTMLParser()
-  >>> et     = etree.parse(StringIO(broken_html), parser)
-
-  >>> print etree.tostring(et.getroot())
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-Lxml has an HTML function, similar to the XML shortcut known from
-ElementTree::
-
-  >>> html = etree.HTML(broken_html)
-  >>> print etree.tostring(html)
-  <html><head><title>test</title></head><body><h1>page title</h1></body></html>
-
-The support for parsing broken HTML depends entirely on libxml2's recovery
-algorithm.  It is *not* the fault of lxml if you find documents that are so
-heavily broken that the parser cannot handle them.  There is also no guarantee
-that the resulting tree will contain all data from the original document.  The
-parser may have to drop seriously broken parts when struggling to keep
-parsing.  Especially misplaced meta tags can suffer from this, which may lead
-to encoding problems.
-
-The use of the libxml2 parsers makes some additional information available at
-the API level.  Currently, ElementTree objects can access the DOCTYPE
-information provided by a parsed document, as well as the XML version and the
-original encoding::
-
-  >>> pub_id  = "-//W3C//DTD XHTML 1.0 Transitional//EN"
-  >>> sys_url = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
-  >>> doctype_string = '<!DOCTYPE html PUBLIC "%s" "%s">' % (pub_id, sys_url)
-  >>> xml_header = '<?xml version="1.0" encoding="ascii"?>'
-  >>> xhtml = xml_header + doctype_string + '<html><body></body></html>'
-
-  >>> tree = etree.parse(StringIO(xhtml))
-  >>> docinfo = tree.docinfo
-  >>> print docinfo.public_id
-  -//W3C//DTD XHTML 1.0 Transitional//EN
-  >>> print docinfo.system_url
-  http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
-  >>> docinfo.doctype == doctype_string
-  True
-
-  >>> print docinfo.xml_version
-  1.0
-  >>> print docinfo.encoding
-  ascii
-
-
-iterparse and iterwalk
-----------------------
-
-As known from ElementTree, the ``iterparse()`` utility function returns an
-iterator that generates parser events for an XML file (or file-like object),
-while building the tree.  The values are tuples ``(event-type, object)``.  The
-event types are 'start', 'end', 'start-ns' and 'end-ns'.
-
-The 'start' and 'end' events represent opening and closing elements and are
-accompanied by the respective element.  By default, only 'end' events are
-generated::
-
-  >>> xml = '''\
-  ... <root>
-  ...   <element key='value'>text</element>
-  ...   <element>text</element>tail
-  ...   <empty-element xmlns="testns" />
-  ... </root>
-  ... '''
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-  end {testns}empty-element
-  end root
-
-The resulting tree is available through the ``root`` property of the iterator::
-
-  >>> context.root.tag
-  'root'
-
-The other types can be activated with the ``events`` keyword argument::
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start root
-  start element
-  end element
-  start element
-  end element
-  start {testns}empty-element
-  end {testns}empty-element
-  end root
-
-You can modify the element and its descendants when handling the 'end' event.
-To save memory, for example, you can remove subtrees that are no longer
-needed::
-
-  >>> context = etree.iterparse(StringIO(xml))
-  >>> for action, elem in context:
-  ...     print len(elem),
-  ...     elem.clear()
-  0 0 0 3
-  >>> context.root.getchildren()
-  []
-
-**WARNING**: During the 'start' event, the descendants and following siblings
-are not yet available and should not be accessed.  During the 'end' event, the
-element and its descendants can be freely modified, but its following siblings
-should not be accessed.  During either of the two events, you **must not**
-modify or move the ancestors (parents) of the current element.  You should
-also avoid moving or discarding the element itself.  The golden rule is: do
-not touch anything that will have to be touched again by the parser later on.
-
-If you have elements with a long list of children in your XML file and want to
-save more memory during parsing, you can clean up the preceding siblings of
-the current element::
-
-  >>> for event, element in etree.iterparse(StringIO(xml)):
-  ...     # ... do something with the element
-  ...     element.clear()                # clean up children
-  ...     if element.getprevious():      # clean up preceding siblings
-  ...         del element.getparent()[0]
-
-You can use ``while`` instead of ``if`` if you skipped siblings using the
-``tag`` keyword argument.  The more selective your tag is, however, the more
-thought you will have to put into finding the right way to clean up the
-elements that were skipped.  Therefore, it is sometimes easier to traverse all
-elements and do the tag selection by hand in the event handler code.
-
-The 'start-ns' and 'end-ns' events notify about namespace declarations and
-generate tuples ``(prefix, URI)``::
-
-  >>> events = ("start-ns", "end-ns")
-  >>> context = etree.iterparse(StringIO(xml), events=events)
-  >>> for action, obj in context:
-  ...     print action, obj
-  start-ns ('', 'testns')
-  end-ns None
-
-It is common practice to use a list as namespace stack and pop the last entry
-on the 'end-ns' event.
-
-lxml.etree supports two extensions compared to ElementTree.  It accepts a
-``tag`` keyword argument just like ``element.getiterator(tag)``.  This
-restricts events to a specific tag or namespace.
-
-  >>> context = etree.iterparse(StringIO(xml), tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  end element
-  end element
-
-  >>> events = ("start", "end")
-  >>> context = etree.iterparse(StringIO(xml), events=events, tag="{testns}*")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start {testns}empty-element
-  end {testns}empty-element
-
-The second extension is the ``iterwalk()`` function.  It behaves exactly like
-``iterparse()``, but works on Elements and ElementTrees::
-
-  >>> root = context.root
-  >>> context = etree.iterwalk(root, events=events, tag="element")
-  >>> for action, elem in context:
-  ...     print action, elem.tag
-  start element
-  end element
-  start element
-  end element
-
-
-Error handling on exceptions
-----------------------------
-
-Libxml2 provides error messages for failures, be it during parsing, XPath
-evaluation or schema validation.  Whenever an exception is raised, you can
-retrieve the errors that occured and "might have" lead to the problem::
-
-  >>> etree.clearErrorLog()
-  >>> broken_xml = '<a>'
-  >>> try:
-  ...   etree.parse(StringIO(broken_xml))
-  ... except etree.XMLSyntaxError, e:
-  ...   pass # just put the exception into e
-  >>> log = e.error_log.filter_levels(etree.ErrorLevels.FATAL)
-  >>> print log
-  <string>:1:FATAL:PARSER:ERR_TAG_NOT_FINISHED: Premature end of data in tag a line 1
-
-This might look a little cryptic at first, but it is the information that
-libxml2 gives you.  At least the message at the end should give you a hint
-what went wrong and you can see that the fatal error (FATAL) happened during
-parsing (PARSER) line 1 of a string (<string>, or filename if available).
-Here, PARSER is the so-called error domain, see lxml.etree.ErrorDomains for
-that.  You can get it from a log entry like this::
-
-  >>> entry = log[0]
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-There is also a convenience attribute ``last_error`` that returns the last
-error or fatal error that occurred::
-
-  >>> entry = e.error_log.last_error
-  >>> print entry.domain_name, entry.type_name, entry.filename
-  PARSER ERR_TAG_NOT_FINISHED <string>
-
-Alternatively, lxml.etree supports logging libxml2 messages to the Python
-stdlib logging module.  This is done through the ``etree.PyErrorLog`` class.
-It disables the error reporting from exceptions and forwards log messages to a
-Python logger.  To use it, see the descriptions of the function
-``etree.useGlobalPythonLog`` and the class ``etree.PyErrorLog`` for help.
-Note that this does not affect the local error logs of XSLT, XMLSchema,
-etc. which are described in their respective sections below.
-
-
-Python unicode strings
-----------------------
-
-lxml.etree has broader support for Python unicode strings than the ElementTree
-library.  First of all, where ElementTree would raise an exception, the
-parsers in lxml.etree can handle unicode strings straight away.  This is most
-helpful for XML snippets embedded in source code using the ``XML()``
-function::
-
-  >>> uxml = u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> uxml
-  u'<test> \uf8d1 + \uf8d2 </test>'
-  >>> root = etree.XML(uxml)
-
-This requires, however, that unicode strings do not specify a conflicting
-encoding themselves and thus lie about their real encoding::
-
-  >>> etree.XML(u'<?xml version="1.0" encoding="ASCII"?>\n' + uxml)
-  Traceback (most recent call last):
-    ...
-  ValueError: Unicode strings with encoding declaration are not supported.
-
-Similarly, you will get errors when you try the same with HTML data in a
-unicode string that specifies a charset in a meta tag of the header.  You
-should generally avoid converting XML/HTML data to unicode before passing it
-into the parsers.  It is both slower and error prone.
-
-To serialize the result, you would normally use the ``tostring`` module
-function, which serializes to plain ASCII by default or a number of other
-encodings if asked for::
-
-  >>> etree.tostring(root)
-  '<test> &#63697; + &#63698; </test>'
-
-  >>> etree.tostring(root, 'UTF-8', xml_declaration=False)
-  '<test> \xef\xa3\x91 + \xef\xa3\x92 </test>'
-
-As an extension, lxml.etree has a new ``tounicode()`` function that you can
-call on XML tree objects to retrieve a Python unicode representation::
-
-  >>> etree.tounicode(root)
-  u'<test> \uf8d1 + \uf8d2 </test>'
-
-  >>> el = etree.Element("test")
-  >>> etree.tounicode(el)
-  u'<test/>'
-
-  >>> subel = etree.SubElement(el, "subtest")
-  >>> etree.tounicode(el)
-  u'<test><subtest/></test>'
-
-  >>> et = etree.ElementTree(el)
-  >>> etree.tounicode(et)
-  u'<test><subtest/></test>'
-
-The result of ``tounicode()`` can be treated like any other Python unicode
-string and then passed back into the parsers.  However, if you want to save
-the result to a file or pass it over the network, you should use ``write()``
-or ``tostring()`` with an encoding argument (typically UTF-8) to serialize the
-XML.  The main reason is that unicode strings returned by ``tounicode()``
-never have an XML declaration and therefore do not specify their encoding.
-These strings are most likely not parsable by other XML libraries.
-
-In contrast, the ``tostring()`` function automatically adds a declaration as
-needed that reflects the encoding of the returned string.  This makes it
-possible for other parsers to correctly parse the XML byte stream.  Note that
-using ``tostring()`` with UTF-8 is also considerably faster in most cases.
+   1  XPath
+   2  XSLT
 
 
 XPath
@@ -714,197 +241,3 @@
 If you want to free it from memory, just do::
 
   >>> del result.xslt_profile
-
-
-RelaxNG
--------
-
-lxml.etree introduces a new class, lxml.etree.RelaxNG. The class can
-be given an ElementTree object to construct a Relax NG validator::
-
-  >>> f = StringIO('''\
-  ... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
-  ...  <zeroOrMore>
-  ...     <element name="b">
-  ...       <text />
-  ...     </element>
-  ...  </zeroOrMore>
-  ... </element>
-  ... ''')
-  >>> relaxng_doc = etree.parse(f)
-  >>> relaxng = etree.RelaxNG(relaxng_doc)
-
-You can then validate some ElementTree document against the schema. You'll get
-back True if the document is valid against the Relax NG schema, and False if
-not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> relaxng.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> relaxng.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not relaxng(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> relaxng.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> relaxng.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Starting with version 0.9, lxml now has a simple API to report the errors
-generated by libxml2. If you want to find out why the validation failed in the
-second case, you can look up the error log of the validation process and check
-it for relevant messages::
-
-  >>> log = relaxng.error_log
-  >>> print log.last_error
-  <string>:1:ERROR:RELAXNGV:ERR_LT_IN_ATTRIBUTE: Did not expect element c there
-
-You can see that the error (ERROR) happened during RelaxNG validation
-(RELAXNGV).  The message then tells you what went wrong.  Note that this error
-is local to the RelaxNG object.  It will only contain log entries that
-appeares during the validation.  The DocumentInvalid exception raised by the
-``assertValid`` method above provides access to the global error log (like all
-other lxml exceptions).
-
-Similar to XSLT, there's also a less efficient but easier shortcut method to
-do one-shot RelaxNG validation::
-
-  >>> doc.relaxng(relaxng_doc)
-  1
-  >>> doc2.relaxng(relaxng_doc)
-  0
-
-
-XMLSchema
----------
-
-lxml.etree also has a XML Schema (XSD) support, using the class
-lxml.etree.XMLSchema. This support is very similar to the Relax NG
-support. The class can be given an ElementTree object to construct a
-XMLSchema validator::
-
-  >>> f = StringIO('''\
-  ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
-  ... <xsd:element name="a" type="AType"/>
-  ... <xsd:complexType name="AType">
-  ...   <xsd:sequence>
-  ...     <xsd:element name="b" type="xsd:string" />
-  ...   </xsd:sequence>
-  ... </xsd:complexType>
-  ... </xsd:schema>
-  ... ''')
-  >>> xmlschema_doc = etree.parse(f)
-  >>> xmlschema = etree.XMLSchema(xmlschema_doc)
-
-You can then validate some ElementTree document with this. Like with
-RelaxNG, you'll get back true if the document is valid against the XML
-schema, and false if not::
-
-  >>> valid = StringIO('<a><b></b></a>')
-  >>> doc = etree.parse(valid)
-  >>> xmlschema.validate(doc)
-  1
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> xmlschema.validate(doc2)
-  0
-
-Calling the schema object has the same effect as calling its validate
-method. This is sometimes used in conditional statements::
-
-  >>> invalid = StringIO('<a><c></c></a>')
-  >>> doc2 = etree.parse(invalid)
-  >>> if not xmlschema(doc2):
-  ...     print "invalid!"
-  invalid!
-
-If you prefer getting an exception when validating, you can use the
-``assert_`` or ``assertValid`` methods::
-
-  >>> xmlschema.assertValid(doc2)
-  Traceback (most recent call last):
-    [...]
-  DocumentInvalid: Document does not comply with schema
-
-  >>> xmlschema.assert_(doc2)
-  Traceback (most recent call last):
-    [...]
-  AssertionError: Document does not comply with schema
-
-Error reporting works like for the RelaxNG class::
-
-  >>> log = xmlschema.error_log
-  >>> error = log.last_error
-  >>> print error.domain_name
-  SCHEMASV
-  >>> print error.type_name
-  SCHEMAV_ELEMENT_CONTENT
-
-If you were to print this log entry, you would get something like the
-following.  Note that the error message depends on the libxml2 version in
-use::
-
-  <string>:1:ERROR::SCHEMAV_ELEMENT_CONTENT: Element 'c': This element is not expected. Expected is ( b ).
-
-Similar to XSLT and RelaxNG, there's also a less efficient but easier shortcut
-method to do XML Schema validation::
-
-  >>> doc.xmlschema(xmlschema_doc)
-  1
-  >>> doc2.xmlschema(xmlschema_doc)
-  0
-
-
-xinclude
---------
-
-Simple XInclude support exists.  You can let lxml process xinclude statements
-in a document by calling the xinclude() method on a tree::
-
-  >>> data = StringIO('''\
-  ... <doc xmlns:xi="http://www.w3.org/2001/XInclude">
-  ... <foo/>
-  ... <xi:include href="doc/test.xml" />
-  ... </doc>''')
-
-  >>> tree = etree.parse(data)
-  >>> tree.xinclude()
-  >>> etree.tostring(tree.getroot())
-  '<doc xmlns:xi="http://www.w3.org/2001/XInclude">\n<foo/>\n<a xml:base="doc/test.xml"/>\n</doc>'
-
-
-write_c14n on ElementTree
--------------------------
-
-The lxml.etree.ElementTree class has a method write_c14n, which takes a file
-object as argument.  This file object will receive an UTF-8 representation of
-the canonicalized form of the XML, following the W3C C14N recommendation.  For
-example::
-
-  >>> f = StringIO('<a><b/></a>')
-  >>> tree = etree.parse(f)
-  >>> f2 = StringIO()
-  >>> tree.write_c14n(f2)
-  >>> f2.getvalue()
-  '<a><b></b></a>'


More information about the lxml-checkins mailing list