[Lxml-checkins] r44181 - lxml/branch/lxml-1.3/doc
scoder at codespeak.net
scoder at codespeak.net
Tue Jun 12 19:00:37 CEST 2007
Author: scoder
Date: Tue Jun 12 19:00:36 2007
New Revision: 44181
Modified:
lxml/branch/lxml-1.3/doc/element_classes.txt
lxml/branch/lxml-1.3/doc/resolvers.txt
lxml/branch/lxml-1.3/doc/xpathxslt.txt
Log:
doc updates from trunk (XPath/XSLT)
Modified: lxml/branch/lxml-1.3/doc/element_classes.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/element_classes.txt (original)
+++ lxml/branch/lxml-1.3/doc/element_classes.txt Tue Jun 12 19:00:36 2007
@@ -4,8 +4,8 @@
lxml has very sophisticated support for custom Element classes. You can
provide your own classes for Elements and have lxml use them by default, for
-all elements generated by a specific parser or only for a specific tag name in
-a specific namespace.
+all elements generated by a specific parser, for a specific tag name in a
+specific namespace or for an exact element at a specific position in the tree.
Custom Elements must inherit from the ``lxml.etree.ElementBase`` class, which
provides the Element interface for subclasses::
@@ -33,7 +33,7 @@
Element initialization
-----------------------
+======================
There is one thing to know up front. Element classes *must not* have a
constructor, neither must there be any internal state (except for the data
@@ -43,10 +43,12 @@
called, the object may not even be initialized yet to represent the XML tag,
so there is not much use in providing an ``__init__`` method in subclasses.
-However, there is one possible way to do things on element initialization, if
-you really need to. ElementBase classes have an ``_init()`` method that can
-be overridden. It can be used to modify the XML tree, e.g. to construct
-special children or verify and update attributes.
+Most use cases will not require any class initialisation, so you can content
+yourself with skipping to the next section for now. However, if you really
+need to set up your element class on instantiation, there is one possible way
+to do so. ElementBase classes have an ``_init()`` method that can be
+overridden. It can be used to modify the XML tree, e.g. to construct special
+children or verify and update attributes.
The semantics of ``_init()`` are as follows:
@@ -72,7 +74,7 @@
Setting up a class lookup scheme
---------------------------------
+================================
The first thing to do when deploying custom element classes is to register a
class lookup scheme on a parser. lxml.etree provides quite a number of
@@ -139,7 +141,7 @@
Default class lookup
-....................
+--------------------
This is the most simple lookup mechanism. It always returns the default
element class. Consequently, no further fallbacks are supported, but this
@@ -178,7 +180,7 @@
Namespace class lookup
-......................
+----------------------
This is an advanced lookup mechanism that supports namespace/tag-name specific
element classes. You can select it by calling::
@@ -203,14 +205,15 @@
Attribute based lookup
-......................
+----------------------
This scheme uses a mapping from attribute values to classes. An attribute
name is set at initialisation time and is then used to find the corresponding
value. It is set up as follows::
>>> id_class_mapping = {} # maps attribute values to element classes
- >>> lookup = etree.AttributeBasedElementClassLookup('id', id_class_mapping)
+ >>> lookup = etree.AttributeBasedElementClassLookup(
+ ... 'id', id_class_mapping)
>>> parser = etree.XMLParser()
>>> parser.setElementClassLookup(lookup)
@@ -229,7 +232,7 @@
Custom element class lookup
-...........................
+---------------------------
This is the most customisable way of finding element classes. It allows you
to implement a custom lookup scheme in a subclass::
@@ -251,7 +254,7 @@
Implementing namespaces
------------------------
+=======================
lxml allows you to implement namespaces, in a rather literal sense. After
setting up the namespace class lookup mechanism as described above, you can
Modified: lxml/branch/lxml-1.3/doc/resolvers.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/resolvers.txt (original)
+++ lxml/branch/lxml-1.3/doc/resolvers.txt Tue Jun 12 19:00:36 2007
@@ -3,13 +3,20 @@
.. contents::
..
- 1 Document loaders in context
- 2 I/O access control in XSLT
+ 1 Resolvers
+ 2 Document loading in context
+ 3 I/O access control in XSLT
Lxml has support for custom document loaders in both the parsers and XSL
transformations. These so-called resolvers are subclasses of the
-etree.Resolver class as in the following example::
+etree.Resolver class.
+
+
+Resolvers
+---------
+
+Here is an example of a custom resolver::
>>> from lxml import etree
@@ -32,10 +39,10 @@
* ``resolve_file`` takes an open file-like object that has at least a read() method
* ``resolve_empty`` resolves into an empty document
-The ``resolve`` method may choose to return None, in which case the next
-registered resolver (or the default resolver) is consulted. It is never
-called if the resolver returns the result of any of the above ``resolve_*``
-methods.
+The ``resolve()`` method may choose to return None, in which case the next
+registered resolver (or the default resolver) is consulted. Resolving always
+terminates if ``resolve()`` returns the result of any of the above
+``resolve_*()`` methods.
Resolvers are registered local to a parser::
@@ -58,7 +65,7 @@
fragment.
-Document loaders in context
+Document loading in context
---------------------------
XML documents memorise their initial parser (and its resolvers) during their
@@ -180,12 +187,16 @@
I/O access control in XSLT
--------------------------
-XSLT has an additional mechanism to control the access to certain I/O
-operations during the transformation process. This is most interesting where
-XSL scripts come from potentially insecure sources and must be prevented from
-modifying the local file system. Note, however, that there is no way to keep
-them from eating up your precious CPU time, so this should not stop you from
-thinking about what XSLT you execute.
+By default, XSLT supports all extension functions from libxslt and libexslt as
+well as Python regular expressions through EXSLT. Some extensions enable
+style sheets to read and write files on the local file system.
+
+XSLT has a mechanism to control the access to certain I/O operations during
+the transformation process. This is most interesting where XSL scripts come
+from potentially insecure sources and must be prevented from modifying the
+local file system. Note, however, that there is no way to keep them from
+eating up your precious CPU time, so this should not stop you from thinking
+about what XSLT you execute.
Access control is configured using the ``XSLTAccessControl`` class. It can be
called with a number of keyword arguments that allow or deny specific
Modified: lxml/branch/lxml-1.3/doc/xpathxslt.txt
==============================================================================
--- lxml/branch/lxml-1.3/doc/xpathxslt.txt (original)
+++ lxml/branch/lxml-1.3/doc/xpathxslt.txt Tue Jun 12 19:00:36 2007
@@ -6,9 +6,19 @@
compliant way.
.. contents::
-..
+..
1 XPath
+ 1.1 The ``xpath()`` method
+ 1.2 XPath return values
+ 1.3 The ``XPath`` class
+ 1.4 The ``XPathEvaluator`` classes
+ 1.5 ``ETXPath``
2 XSLT
+ 2.1 XSLT result objects
+ 2.2 Stylesheet parameters
+ 2.3 The ``xslt()`` tree method
+ 2.4 Profiling
+
The usual setup procedure::
@@ -17,12 +27,17 @@
XPath
------
+=====
-lxml.etree supports the simple path syntax of the ``findall()`` etc. methods
-on ElementTree and Element, as known from the original ElementTree library.
-As an extension, these classes also provide an ``xpath()`` method that
-supports expressions in the complete XPath syntax.
+lxml.etree supports the simple path syntax of the `find, findall and
+findtext`_ methods on ElementTree and Element, as known from the original
+ElementTree library (ElementPath_). As an lxml specific extension, these
+classes also provide an ``xpath()`` method that supports expressions in the
+complete XPath syntax, as well as `custom extension functions`_.
+
+.. _ElementPath: http://effbot.org/zone/element-xpath.htm
+.. _`find, findall and findtext`: http://effbot.org/zone/element.htm#searching-for-subelements
+.. _`custom extension functions`: extensions.html
There are also specialized XPath evaluator classes that are more efficient for
frequent evaluation: ``XPath`` and ``XPathEvaluator``. See the `performance
@@ -32,6 +47,10 @@
.. _`performance comparison`: performance.html#xpath
+
+The ``xpath()`` method
+----------------------
+
For ElementTree, the xpath method performs a global XPath query against the
document (if absolute) or against the root node (if relative)::
@@ -48,7 +67,7 @@
>>> r[0].tag
'bar'
-When ``xpath()`` is used on an element, the XPath expression is evaluated
+When ``xpath()`` is used on an Element, the XPath expression is evaluated
against the element (if relative) or against the root tree (if absolute)::
>>> root = tree.getroot()
@@ -66,6 +85,19 @@
>>> r[0].tag
'bar'
+The ``xpath()`` method has support for XPath variables::
+
+ >>> expr = "//*[local-name() = $name]"
+
+ >>> print root.xpath(expr, name = "foo")[0].tag
+ foo
+
+ >>> print root.xpath(expr, name = "bar")[0].tag
+ bar
+
+ >>> print root.xpath("$text", text = "Hello World!")
+ Hello World!
+
Optionally, you can provide a ``namespaces`` keyword argument, which should be
a dictionary mapping the namespace prefixes used in the XPath expression to
namespace URIs::
@@ -87,9 +119,11 @@
'Text'
There is also an optional ``extensions`` argument which is used to define
-`extension functions`_ in Python that are local to this evaluation.
+`custom extension functions`_ in Python that are local to this evaluation.
+
-.. _`extension functions`: extensions.html
+XPath return values
+-------------------
The return values of XPath evaluations vary, depending on the XPath expression
used:
@@ -101,11 +135,10 @@
* a (unicode) string, when the XPath expression has a string result.
* a list of items, when the XPath expression has a list as result. The items
- may include elements, strings and tuples. Text nodes and attributes in the
- result are returned as strings (the text node content or attribute value).
- Comments are also returned as strings, enclosed by the usual ``<!--`` and
- ``-->`` markers. Namespace declarations are returned as tuples of strings:
- ``(prefix, URI)``.
+ may include elements (also comments and processing instructions), strings
+ and tuples. Text nodes and attributes in the result are returned as strings
+ (the text node content or attribute value). Namespace declarations are
+ returned as tuples of strings: ``(prefix, URI)``.
A related convenience method of ElementTree objects is ``getpath(element)``,
which returns a structural, absolute XPath expression to find that element::
@@ -123,8 +156,98 @@
True
+The ``XPath`` class
+-------------------
+
+The ``XPath`` class compiles an XPath expression into a callable function::
+
+ >>> root = etree.XML("<root><a><b/></a><b/></root>")
+
+ >>> find = etree.XPath("//b")
+ >>> print find(root)[0].tag
+ b
+
+The compilation takes as much time as in the ``xpath()`` method, but it is
+done only once per class instantiation. This makes it especially efficient
+for repeated evaluation of the same XPath expression.
+
+Just like the ``xpath()`` method, the ``XPath`` class supports XPath
+variables::
+
+ >>> count_elements = etree.XPath("count(//*[local-name() = $name])")
+
+ >>> print count_elements(root, name = "a")
+ 1.0
+ >>> print count_elements(root, name = "b")
+ 2.0
+
+This supports very efficient evaluation of modified versions of an XPath
+expression, as compilation is still only required once.
+
+Prefix-to-namespace mappings can be passed as second parameter::
+
+ >>> root = etree.XML("<root xmlns='NS'><a><b/></a><b/></root>")
+
+ >>> find = etree.XPath("//n:b", {'n':'NS'})
+ >>> print find(root)[0].tag
+ {NS}b
+
+
+The ``XPathEvaluator`` classes
+------------------------------
+
+lxml.etree provides two other efficient XPath evaluators that work on
+ElementTrees or Elements respectively: ``XPathDocumentEvaluator`` and
+``XPathElementEvaluator``. They are automatically selected if you use the
+XPathEvaluator helper for instantiation::
+
+ >>> root = etree.XML("<root><a><b/></a><b/></root>")
+ >>> xpatheval = etree.XPathEvaluator(root)
+
+ >>> print isinstance(xpatheval, etree.XPathElementEvaluator)
+ True
+
+ >>> print xpatheval("//b")[0].tag
+ b
+
+This class provides efficient support for evaluating different XPath
+expressions on the same Element or ElementTree.
+
+
+``ETXPath``
+-----------
+
+ElementTree supports a language named ElementPath_ in its ``find*()`` methods.
+One of the main differences between XPath and ElementPath is that the XPath
+language requires an indirection through prefixes for namespace support,
+whereas ElementTree uses the Clark notation (``{ns}name``) to avoid prefixes
+completely. The other major difference regards the capabilities of both path
+languages. Where XPath supports various sophisticated ways of restricting the
+result set through functions and boolean expressions, ElementPath only
+supports pure path traversal without nesting or further conditions. So, while
+the ElementPath syntax is self-contained and therefore easier to write and
+handle, XPath is much more powerful and expressive.
+
+lxml.etree bridges this gap through the class ``ETXPath``, which accepts XPath
+expressions with namespaces in Clark notation. It is identical to the
+``XPath`` class, except for the namespace notation. Normally, you would
+write::
+
+ >>> root = etree.XML("<root xmlns='ns'><a><b/></a><b/></root>")
+
+ >>> find = etree.XPath("//p:b", {'p' : 'ns'})
+ >>> print find(root)[0].tag
+ {ns}b
+
+``ETXPath`` allows you to change this to::
+
+ >>> find = etree.ETXPath("//{ns}b")
+ >>> print find(root)[0].tag
+ {ns}b
+
+
XSLT
-----
+====
lxml.etree introduces a new class, lxml.etree.XSLT. The class can be
given an ElementTree object to construct an XSLT transformer::
@@ -144,9 +267,28 @@
>>> f = StringIO('<a><b>Text</b></a>')
>>> doc = etree.parse(f)
- >>> result = transform(doc)
+ >>> result_tree = transform(doc)
-The result object can be accessed like a normal ElementTree document::
+By default, XSLT supports all extension functions from libxslt and libexslt
+as well as Python regular expressions through the `EXSLT regexp functions`_.
+Also see the documentation on `custom extension functions`_ and `document
+resolvers`_. There is a separate section on `controlling access`_ to
+external documents and resources.
+
+.. _`EXSLT regexp functions`: http://www.exslt.org/regexp/
+.. _`document resolvers`: resolvers.html
+.. _`controlling access`: resolvers.html#i-o-access-control-in-xslt
+
+
+XSLT result objects
+-------------------
+
+The result of an XSL transformation can be accessed like a normal ElementTree
+document::
+
+ >>> f = StringIO('<a><b>Text</b></a>')
+ >>> doc = etree.parse(f)
+ >>> result = transform(doc)
>>> result.getroot().text
'Text'
@@ -185,6 +327,10 @@
[...]
LookupError: unknown encoding: UCS4
+
+Stylesheet parameters
+---------------------
+
It is possible to pass parameters, in the form of XPath expressions, to the
XSLT template::
@@ -212,7 +358,11 @@
>>> str(result)
'<?xml version="1.0"?>\n<foo>Text</foo>\n'
-There's also a convenience method on the tree object for doing XSL
+
+The ``xslt()`` tree method
+--------------------------
+
+There's also a convenience method on ElementTree objects for doing XSL
transformations. This is less efficient if you want to apply the same XSL
transformation to multiple documents, but is shorter to write for one-shot
operations, as you do not have to instantiate a stylesheet yourself::
@@ -221,12 +371,16 @@
>>> str(result)
'<?xml version="1.0"?>\n<foo>A</foo>\n'
-By default, XSLT supports all extension functions from libxslt and libexslt as
-well as Python regular expressions through EXSLT. Note that some extensions
-enable style sheets to read and write files on the local file system. See the
-`document loader documentation`_ on how to deal with this.
+This is a shortcut for the following code::
+
+ >>> transform = etree.XSLT(xslt_tree)
+ >>> result = transform(doc, a="'A'")
+ >>> str(result)
+ '<?xml version="1.0"?>\n<foo>A</foo>\n'
+
-.. _`document loader documentation`: resolvers.html
+Profiling
+---------
If you want to know how your stylesheet performed, pass the ``profile_run``
keyword to the transform::
More information about the lxml-checkins
mailing list