[Lxml-checkins] r43531 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Mon May 21 18:01:17 CEST 2007
Author: scoder
Date: Mon May 21 18:01:16 2007
New Revision: 43531
Modified:
lxml/trunk/doc/objectify.txt
Log:
huge restructuring in objectify.txt, Holger's section on XSI annotation
Modified: lxml/trunk/doc/objectify.txt
==============================================================================
--- lxml/trunk/doc/objectify.txt (original)
+++ lxml/trunk/doc/objectify.txt Mon May 21 18:01:16 2007
@@ -2,8 +2,8 @@
lxml.objectify
==============
-:Author:
- Stefan Behnel
+:Authors:
+ Stefan Behnel, Holger Joukl
lxml supports an alternative API similar to the Amara_ bindery or
gnosis.xml.objectify_ through a custom Element implementation. The main idea
@@ -25,22 +25,27 @@
.. contents::
..
- 1 Setting up lxml.objectify
- 2 Creating objectify trees
- 3 Element access through object attributes
- 4 Namespace handling
- 5 ObjectPath
- 6 Python data types
- 7 Defining additional data classes
- 8 Recursive string representation of elements
- 9 What is different from ElementTree?
- 10 Resetting the API
+ 1 Setting up lxml.objectify
+ 2 The lxml.objectify API
+ 2.1 Creating objectify trees
+ 2.2 Element access through object attributes
+ 2.3 Namespace handling
+ 3 ObjectPath
+ 4 Python data types
+ 5 Recursive tree dump
+ 5.1 Recursive string representation of elements
+ 6 How data types are matched
+ 6.1 Type annotations
+ 6.2 XML Schema datatype annotation
+ 6.3 The DataElement factory
+ 6.4 Defining additional data classes
+ 7 What is different from lxml.etree?
Setting up lxml.objectify
--------------------------
+=========================
-To make use of ``objectify``, you need both the ``lxml.etree`` module and
+To set up and use ``objectify``, you need both the ``lxml.etree`` module and
``lxml.objectify``::
>>> from lxml import etree
@@ -74,6 +79,13 @@
.. _`namespace specific classes`: element_classes.html#namespace-class-lookup
+The lxml.objectify API
+======================
+
+In ``lxml.objectify``, element trees provide an API that models the behaviour
+of normal Python object trees as closely as possible.
+
+
Creating objectify trees
------------------------
@@ -318,7 +330,7 @@
ObjectPath
-----------
+==========
For both convenience and speed, objectify supports its own path language,
represented by the ``ObjectPath`` class::
@@ -455,7 +467,7 @@
Python data types
------------------
+=================
The objectify module knows about Python data types and tries its best to let
element content behave like them. For example, they support the normal math
@@ -488,6 +500,67 @@
>>> print root.d % (1234, 12345)
1234 - 12345
+However, data elements continue to provide the objectify API. This means that
+sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
+cannot behave as the Python types. Like all other tree elements, they show
+the normal slicing behaviour of objectify elements::
+
+ >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
+ >>> print root.a + ' me' # behaves like a string, right?
+ test me
+ >>> len(root.a) # but there's only one 'a' element!
+ 1
+ >>> [ a.tag for a in root.a ]
+ ['a']
+ >>> print root.a[0].tag
+ a
+
+ >>> print root.a
+ test
+ >>> [ str(a) for a in root.a[:1] ]
+ ['test']
+
+If you need to run sequence operations on data types, you must ask the API for
+the *real* Python value. The string value is always available through the
+normal ElementTree ``.text`` attribute. Additionally, all data classes
+provide a ``.pyval`` attribute that returns the value as plain Python type::
+
+ >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
+ >>> root.a.text
+ 'test'
+ >>> root.a.pyval
+ 'test'
+
+ >>> root.b.text
+ '5'
+ >>> root.b.pyval
+ 5
+
+Note, however, that both attributes are read-only in objectify. If you want
+to change values, just assign them directly to the attribute::
+
+ >>> root.a.text = "25"
+ Traceback (most recent call last):
+ ...
+ TypeError: attribute 'text' of 'StringElement' objects is not writable
+
+ >>> root.a.pyval = 25
+ Traceback (most recent call last):
+ ...
+ TypeError: attribute 'pyval' of 'StringElement' objects is not writable
+
+ >>> root.a = 25
+ >>> print root.a
+ 25
+ >>> print root.a.pyval
+ 25
+
+In other words, ``objectify`` data elements behave like immutable Python
+types. You can replace them, but not modify them.
+
+
+Recursive tree dump
+-------------------
To see the data types that are currently used, you can call the module level
``dump()`` function that returns a recursive string representation for
@@ -547,64 +620,46 @@
a = 2 [IntElement]
a = 3 [IntElement]
-However, data elements continue to provide the objectify API. This means that
-sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
-cannot behave as the Python types. Like all other tree elements, they show
-the normal slicing behaviour of objectify elements::
-
- >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
- >>> print root.a + ' me' # behaves like a string, right?
- test me
- >>> len(root.a) # but there's only one 'a' element!
- 1
- >>> [ a.tag for a in root.a ]
- ['a']
- >>> print root.a[0].tag
- a
- >>> print root.a
- test
- >>> [ str(a) for a in root.a[:1] ]
- ['test']
-
-If you need to run sequence operations on data types, you must ask the API for
-the *real* Python value. The string value is always available through the
-normal ElementTree ``.text`` attribute. Additionally, all data classes
-provide a ``.pyval`` attribute that returns the value as plain Python type::
-
- >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
- >>> root.a.text
- 'test'
- >>> root.a.pyval
- 'test'
+Recursive string representation of elements
+-------------------------------------------
- >>> root.b.text
- '5'
- >>> root.b.pyval
- 5
+Normally, elements use the standard string representation for str() that is
+provided by lxml.etree. You can enable a pretty-print representation for
+objectify elements like this::
-Note, however, that both attributes are read-only in objectify. If you want
-to change values, just assign them directly to the attribute::
+ >>> objectify.enableRecursiveStr()
- >>> root.a.text = "25"
- Traceback (most recent call last):
- ...
- TypeError: attribute 'text' of 'StringElement' objects is not writable
+ >>> root = objectify.fromstring("""
+ ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+ ... <a attr1="foo" attr2="bar">1</a>
+ ... <a>1.2</a>
+ ... <b>1</b>
+ ... <b>true</b>
+ ... <c>what?</c>
+ ... <d xsi:nil="true"/>
+ ... </root>
+ ... """)
- >>> root.a.pyval = 25
- Traceback (most recent call last):
- ...
- TypeError: attribute 'pyval' of 'StringElement' objects is not writable
+ >>> print str(root)
+ root = None [ObjectifiedElement]
+ a = 1 [IntElement]
+ * attr1 = 'foo'
+ * attr2 = 'bar'
+ a = 1.2 [FloatElement]
+ b = 1 [IntElement]
+ b = True [BoolElement]
+ c = 'what?' [StringElement]
+ d = None [NoneElement]
+ * xsi:nil = 'true'
- >>> root.a = 25
- >>> print root.a
- 25
+This behaviour can be switched off in the same way::
-In other words, objectify data elements behave like immutable Python types.
+ >>> objectify.enableRecursiveStr(False)
How data types are matched
---------------------------
+==========================
Objectify uses two different types of Elements. Structural Elements (or tree
Elements) represent the object tree structure. Data Elements represent the
@@ -639,6 +694,10 @@
classes used in these cases. By default, ``tree_class`` is a class called
``ObjectifiedElement`` and ``empty_data_class`` is a ``StringElement``.
+
+Type annotations
+----------------
+
The "type hint" mechanism deploys an XML attribute defined as
``lxml.objectify.PYTYPE_ATTRIBUTE``. It may contain any of the following
string values: int, long, float, str, unicode, none::
@@ -682,12 +741,17 @@
b = 5 [IntElement]
* py:pytype = 'int'
+
+XML Schema datatype annotation
+------------------------------
+
A second way of specifying data type information uses XML Schema types as
element annotations. Objectify knows those that can be mapped to normal
Python types::
>>> root = objectify.fromstring('''\
- ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+ ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ ... xmlns:xsd="http://www.w3.org/2001/XMLSchema">
... <d xsi:type="xsd:double">5</d>
... <l xsi:type="xsd:long" >5</l>
... <s xsi:type="xsd:string">5</s>
@@ -758,6 +822,10 @@
l = 5 [IntElement]
s = 5 [IntElement]
+
+The DataElement factory
+-----------------------
+
For convenience, the ``DataElement()`` factory creates an Element with a
Python value in one step. You can pass the required Python type name or the
XSI type name::
@@ -798,13 +866,106 @@
provide the type of a data element by hand::
>>> root = objectify.Element("root")
- >>> root.s = objectify.DataElement(5, _pytype="str")
+ >>> root.s = objectify.DataElement(5, _pytype="str")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
s = '5' [StringElement]
* py:pytype = 'str'
-
+Likewise, the data type can be provided as an XML Schema type using the _xsi
+argument of ``DataElement()``::
+
+ >>> root = objectify.Element("root")
+ >>> root.s = objectify.DataElement(5, _xsi="string")
+ >>> print objectify.dump(root)
+ root = None [ObjectifiedElement]
+ s = '5' [StringElement]
+ * py:pytype = 'str'
+ * xsi:type = 'xsd:string'
+
+XML Schema types reside in the XML schema namespace thus ``DataElement()``
+tries to correctly prefix the xsi:type attribute value for you::
+
+ >>> root = objectify.Element("root")
+ >>> root.s = objectify.DataElement(5, _xsi="string")
+
+ >>> objectify.deannotate(root, xsi=False)
+ >>> print etree.tostring(root, pretty_print=True)
+ <root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+ <s xsi:type="xsd:string">5</s>
+ </root>
+
+``DataElement()`` uses a default nsmap to set these prefixes::
+
+ >>> el = objectify.DataElement('5', _xsi='string')
+ >>> for prefix, namespace in el.nsmap.items():
+ ... print prefix, '-', namespace
+ py - http://codespeak.net/lxml/objectify/pytype
+ xsd - http://www.w3.org/2001/XMLSchema
+ xsi - http://www.w3.org/2001/XMLSchema-instance
+
+ >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+ xsd:string
+
+While you can set custom namespace prefixes, it is necessary to provide valid
+namespace information if you choose to do so::
+
+ >>> el = objectify.DataElement('5', _xsi='foo:string',
+ ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'})
+ >>> for prefix, namespace in el.nsmap.items():
+ ... print prefix, '-', namespace
+ ns0 - http://codespeak.net/lxml/objectify/pytype
+ ns1 - http://www.w3.org/2001/XMLSchema-instance
+ foo - http://www.w3.org/2001/XMLSchema
+
+ >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+ foo:string
+
+ >>> el = objectify.DataElement('5', _xsi='foo:string',
+ ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema',
+ ... 'myxsi': 'http://www.w3.org/2001/XMLSchema-instance'})
+ >>> for prefix, namespace in el.nsmap.items():
+ ... print prefix, '-', namespace
+ ns0 - http://codespeak.net/lxml/objectify/pytype
+ foo - http://www.w3.org/2001/XMLSchema
+ myxsi - http://www.w3.org/2001/XMLSchema-instance
+
+ >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+ foo:string
+
+Care must be taken if different namespace prefixes have been used for the same
+namespace. Namespace information gets merged to avoid duplicate definitions
+when adding a new sub-element to a tree, but this mechanism does not adapt the
+prefixes of attribute values::
+
+ >>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")
+ >>> print etree.tostring(root, pretty_print=True)
+ <root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>
+
+ >>> s = objectify.DataElement("17", _xsi="string")
+ >>> print etree.tostring(s, pretty_print=True)
+ <value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>
+
+ >>> root.s = s
+ >>> print etree.tostring(root, pretty_print=True)
+ <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
+ <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s>
+ </root>
+
+It is your responsibility to fix the prefixes of attribute values if you
+choose to deviate from the standard prefixes. A convenient way to do this for
+xsi:type attributes is to use the ``xsiannotate()`` utility::
+
+ >>> objectify.xsiannotate(root)
+ >>> print etree.tostring(root, pretty_print=True)
+ <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
+ <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s>
+ </root>
+
+Of course, it is discouraged to use different prefixes for one and the same
+namespace when building up an objectify tree.
+
+
Defining additional data classes
--------------------------------
@@ -894,45 +1055,8 @@
after all references are gone and the Python object is garbage collected.
-Recursive string representation of elements
--------------------------------------------
-
-Normally, elements use the standard string representation for str() that is
-provided by lxml.etree. You can enable a pretty-print representation for
-objectify elements like this::
-
- >>> objectify.enableRecursiveStr()
-
- >>> root = objectify.fromstring("""
- ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- ... <a attr1="foo" attr2="bar">1</a>
- ... <a>1.2</a>
- ... <b>1</b>
- ... <b>true</b>
- ... <c>what?</c>
- ... <d xsi:nil="true"/>
- ... </root>
- ... """)
-
- >>> print str(root)
- root = None [ObjectifiedElement]
- a = 1 [IntElement]
- * attr1 = 'foo'
- * attr2 = 'bar'
- a = 1.2 [FloatElement]
- b = 1 [IntElement]
- b = True [BoolElement]
- c = 'what?' [StringElement]
- d = None [NoneElement]
- * xsi:nil = 'true'
-
-This behaviour can be switched off in the same way::
-
- >>> objectify.enableRecursiveStr(False)
-
-
-What is different from ElementTree?
------------------------------------
+What is different from lxml.etree?
+==================================
Such a different Element API obviously implies some side effects to the normal
behaviour of the rest of the API.
@@ -945,7 +1069,8 @@
can access all children with the ``iterchildren()`` method on elements or
retrieve a list by calling the ``getchildren()`` method.
-* The find, findall and findtext methods use a different implementation as
- they rely on the original iteration scheme. This has the disadvantage that
- they may not be 100% backwards compatible, and the additional advantage that
- they now support any XPath expression.
+* The find, findall and findtext methods require a different implementation
+ based on ETXPath. In ``lxml.etree``, they use a Python implementation based
+ on the original iteration scheme. This has the disadvantage that they may
+ not be 100% backwards compatible, and the additional advantage that they now
+ support any XPath expression.
More information about the lxml-checkins
mailing list