[Lxml-checkins] r43531 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Mon May 21 18:01:17 CEST 2007


Author: scoder
Date: Mon May 21 18:01:16 2007
New Revision: 43531

Modified:
   lxml/trunk/doc/objectify.txt
Log:
huge restructuring in objectify.txt, Holger's section on XSI annotation

Modified: lxml/trunk/doc/objectify.txt
==============================================================================
--- lxml/trunk/doc/objectify.txt	(original)
+++ lxml/trunk/doc/objectify.txt	Mon May 21 18:01:16 2007
@@ -2,8 +2,8 @@
 lxml.objectify
 ==============
 
-:Author:
-  Stefan Behnel
+:Authors:
+  Stefan Behnel, Holger Joukl
 
 lxml supports an alternative API similar to the Amara_ bindery or
 gnosis.xml.objectify_ through a custom Element implementation.  The main idea
@@ -25,22 +25,27 @@
 
 .. contents::
 ..
-   1   Setting up lxml.objectify
-   2   Creating objectify trees
-   3   Element access through object attributes
-   4   Namespace handling
-   5   ObjectPath
-   6   Python data types
-   7   Defining additional data classes
-   8   Recursive string representation of elements
-   9   What is different from ElementTree?
-   10  Resetting the API
+   1  Setting up lxml.objectify
+   2  The lxml.objectify API
+     2.1  Creating objectify trees
+     2.2  Element access through object attributes
+     2.3  Namespace handling
+   3  ObjectPath
+   4  Python data types
+   5  Recursive tree dump
+     5.1  Recursive string representation of elements
+   6  How data types are matched
+     6.1  Type annotations
+     6.2  XML Schema datatype annotation
+     6.3  The DataElement factory
+     6.4  Defining additional data classes
+   7  What is different from lxml.etree?
 
 
 Setting up lxml.objectify
--------------------------
+=========================
 
-To make use of ``objectify``, you need both the ``lxml.etree`` module and
+To set up and use ``objectify``, you need both the ``lxml.etree`` module and
 ``lxml.objectify``::
 
     >>> from lxml import etree
@@ -74,6 +79,13 @@
 .. _`namespace specific classes`: element_classes.html#namespace-class-lookup
 
 
+The lxml.objectify API
+======================
+
+In ``lxml.objectify``, element trees provide an API that models the behaviour
+of normal Python object trees as closely as possible.
+
+
 Creating objectify trees
 ------------------------
 
@@ -318,7 +330,7 @@
 
 
 ObjectPath
-----------
+==========
 
 For both convenience and speed, objectify supports its own path language,
 represented by the ``ObjectPath`` class::
@@ -455,7 +467,7 @@
 
 
 Python data types
------------------
+=================
 
 The objectify module knows about Python data types and tries its best to let
 element content behave like them.  For example, they support the normal math
@@ -488,6 +500,67 @@
     >>> print root.d % (1234, 12345)
     1234 - 12345
 
+However, data elements continue to provide the objectify API.  This means that
+sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
+cannot behave as the Python types.  Like all other tree elements, they show
+the normal slicing behaviour of objectify elements::
+
+    >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
+    >>> print root.a + ' me' # behaves like a string, right?
+    test me
+    >>> len(root.a) # but there's only one 'a' element!
+    1
+    >>> [ a.tag for a in root.a ]
+    ['a']
+    >>> print root.a[0].tag
+    a
+
+    >>> print root.a
+    test
+    >>> [ str(a) for a in root.a[:1] ]
+    ['test']
+
+If you need to run sequence operations on data types, you must ask the API for
+the *real* Python value.  The string value is always available through the
+normal ElementTree ``.text`` attribute.  Additionally, all data classes
+provide a ``.pyval`` attribute that returns the value as plain Python type::
+
+    >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
+    >>> root.a.text
+    'test'
+    >>> root.a.pyval
+    'test'
+
+    >>> root.b.text
+    '5'
+    >>> root.b.pyval
+    5
+
+Note, however, that both attributes are read-only in objectify.  If you want
+to change values, just assign them directly to the attribute::
+
+    >>> root.a.text  = "25"
+    Traceback (most recent call last):
+      ...
+    TypeError: attribute 'text' of 'StringElement' objects is not writable
+
+    >>> root.a.pyval = 25
+    Traceback (most recent call last):
+      ...
+    TypeError: attribute 'pyval' of 'StringElement' objects is not writable
+
+    >>> root.a = 25
+    >>> print root.a
+    25
+    >>> print root.a.pyval
+    25
+
+In other words, ``objectify`` data elements behave like immutable Python
+types.  You can replace them, but not modify them.
+
+
+Recursive tree dump
+-------------------
 
 To see the data types that are currently used, you can call the module level
 ``dump()`` function that returns a recursive string representation for
@@ -547,64 +620,46 @@
         a = 2 [IntElement]
         a = 3 [IntElement]
 
-However, data elements continue to provide the objectify API.  This means that
-sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
-cannot behave as the Python types.  Like all other tree elements, they show
-the normal slicing behaviour of objectify elements::
-
-    >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
-    >>> print root.a + ' me' # behaves like a string, right?
-    test me
-    >>> len(root.a) # but there's only one 'a' element!
-    1
-    >>> [ a.tag for a in root.a ]
-    ['a']
-    >>> print root.a[0].tag
-    a
 
-    >>> print root.a
-    test
-    >>> [ str(a) for a in root.a[:1] ]
-    ['test']
-
-If you need to run sequence operations on data types, you must ask the API for
-the *real* Python value.  The string value is always available through the
-normal ElementTree ``.text`` attribute.  Additionally, all data classes
-provide a ``.pyval`` attribute that returns the value as plain Python type::
-
-    >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
-    >>> root.a.text
-    'test'
-    >>> root.a.pyval
-    'test'
+Recursive string representation of elements
+-------------------------------------------
 
-    >>> root.b.text
-    '5'
-    >>> root.b.pyval
-    5
+Normally, elements use the standard string representation for str() that is
+provided by lxml.etree.  You can enable a pretty-print representation for
+objectify elements like this::
 
-Note, however, that both attributes are read-only in objectify.  If you want
-to change values, just assign them directly to the attribute::
+    >>> objectify.enableRecursiveStr()
 
-    >>> root.a.text  = "25"
-    Traceback (most recent call last):
-      ...
-    TypeError: attribute 'text' of 'StringElement' objects is not writable
+    >>> root = objectify.fromstring("""
+    ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+    ...   <a attr1="foo" attr2="bar">1</a>
+    ...   <a>1.2</a>
+    ...   <b>1</b>
+    ...   <b>true</b>
+    ...   <c>what?</c>
+    ...   <d xsi:nil="true"/>
+    ... </root>
+    ... """)
 
-    >>> root.a.pyval = 25
-    Traceback (most recent call last):
-      ...
-    TypeError: attribute 'pyval' of 'StringElement' objects is not writable
+    >>> print str(root)
+    root = None [ObjectifiedElement]
+        a = 1 [IntElement]
+          * attr1 = 'foo'
+          * attr2 = 'bar'
+        a = 1.2 [FloatElement]
+        b = 1 [IntElement]
+        b = True [BoolElement]
+        c = 'what?' [StringElement]
+        d = None [NoneElement]
+          * xsi:nil = 'true'
 
-    >>> root.a = 25
-    >>> print root.a
-    25
+This behaviour can be switched off in the same way::
 
-In other words, objectify data elements behave like immutable Python types.
+    >>> objectify.enableRecursiveStr(False)
 
 
 How data types are matched
---------------------------
+==========================
 
 Objectify uses two different types of Elements.  Structural Elements (or tree
 Elements) represent the object tree structure.  Data Elements represent the
@@ -639,6 +694,10 @@
 classes used in these cases.  By default, ``tree_class`` is a class called
 ``ObjectifiedElement`` and ``empty_data_class`` is a ``StringElement``.
 
+
+Type annotations
+----------------
+
 The "type hint" mechanism deploys an XML attribute defined as
 ``lxml.objectify.PYTYPE_ATTRIBUTE``.  It may contain any of the following
 string values: int, long, float, str, unicode, none::
@@ -682,12 +741,17 @@
         b = 5 [IntElement]
           * py:pytype = 'int'
 
+
+XML Schema datatype annotation
+------------------------------
+
 A second way of specifying data type information uses XML Schema types as
 element annotations.  Objectify knows those that can be mapped to normal
 Python types::
 
     >>> root = objectify.fromstring('''\
-    ...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+    ...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    ...          xmlns:xsd="http://www.w3.org/2001/XMLSchema">
     ...      <d xsi:type="xsd:double">5</d>
     ...      <l xsi:type="xsd:long"  >5</l>
     ...      <s xsi:type="xsd:string">5</s>
@@ -758,6 +822,10 @@
         l = 5 [IntElement]
         s = 5 [IntElement]
 
+
+The DataElement factory
+-----------------------
+
 For convenience, the ``DataElement()`` factory creates an Element with a
 Python value in one step.  You can pass the required Python type name or the
 XSI type name::
@@ -798,13 +866,106 @@
 provide the type of a data element by hand::
 
     >>> root = objectify.Element("root")
-    >>> root.s = objectify.DataElement(5,  _pytype="str")
+    >>> root.s = objectify.DataElement(5, _pytype="str")
     >>> print objectify.dump(root)
     root = None [ObjectifiedElement]
         s = '5' [StringElement]
           * py:pytype = 'str'
 
- 
+Likewise, the data type can be provided as an XML Schema type using the _xsi
+argument of ``DataElement()``::
+
+    >>> root = objectify.Element("root")
+    >>> root.s = objectify.DataElement(5, _xsi="string")
+    >>> print objectify.dump(root)
+    root = None [ObjectifiedElement]
+        s = '5' [StringElement]
+          * py:pytype = 'str'
+          * xsi:type = 'xsd:string'
+
+XML Schema types reside in the XML schema namespace thus ``DataElement()`` 
+tries to correctly prefix the xsi:type attribute value for you::
+
+    >>> root = objectify.Element("root")
+    >>> root.s = objectify.DataElement(5, _xsi="string")
+
+    >>> objectify.deannotate(root, xsi=False)
+    >>> print etree.tostring(root, pretty_print=True)
+    <root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+      <s xsi:type="xsd:string">5</s>
+    </root>
+
+``DataElement()`` uses a default nsmap to set these prefixes::
+
+    >>> el = objectify.DataElement('5', _xsi='string')
+    >>> for prefix, namespace in el.nsmap.items():
+    ...     print prefix, '-', namespace
+    py - http://codespeak.net/lxml/objectify/pytype
+    xsd - http://www.w3.org/2001/XMLSchema
+    xsi - http://www.w3.org/2001/XMLSchema-instance
+
+    >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+    xsd:string
+
+While you can set custom namespace prefixes, it is necessary to provide valid
+namespace information if you choose to do so::
+
+    >>> el = objectify.DataElement('5', _xsi='foo:string',
+    ...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'})
+    >>> for prefix, namespace in el.nsmap.items():
+    ...     print prefix, '-', namespace
+    ns0 - http://codespeak.net/lxml/objectify/pytype
+    ns1 - http://www.w3.org/2001/XMLSchema-instance
+    foo - http://www.w3.org/2001/XMLSchema
+
+    >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+    foo:string
+
+    >>> el = objectify.DataElement('5', _xsi='foo:string',
+    ...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema',
+    ...                 'myxsi': 'http://www.w3.org/2001/XMLSchema-instance'})
+    >>> for prefix, namespace in el.nsmap.items():
+    ...     print prefix, '-', namespace
+    ns0 - http://codespeak.net/lxml/objectify/pytype
+    foo - http://www.w3.org/2001/XMLSchema
+    myxsi - http://www.w3.org/2001/XMLSchema-instance
+
+    >>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
+    foo:string
+
+Care must be taken if different namespace prefixes have been used for the same
+namespace.  Namespace information gets merged to avoid duplicate definitions
+when adding a new sub-element to a tree, but this mechanism does not adapt the
+prefixes of attribute values::
+
+    >>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")
+    >>> print etree.tostring(root, pretty_print=True)
+    <root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>
+
+    >>> s = objectify.DataElement("17", _xsi="string")
+    >>> print etree.tostring(s, pretty_print=True)
+    <value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>
+
+    >>> root.s = s
+    >>> print etree.tostring(root, pretty_print=True)
+    <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
+      <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s>
+    </root>
+
+It is your responsibility to fix the prefixes of attribute values if you
+choose to deviate from the standard prefixes.  A convenient way to do this for
+xsi:type attributes is to use the ``xsiannotate()`` utility::
+
+    >>> objectify.xsiannotate(root)
+    >>> print etree.tostring(root, pretty_print=True)
+    <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
+      <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s>
+    </root>
+
+Of course, it is discouraged to use different prefixes for one and the same
+namespace when building up an objectify tree.
+
+
 Defining additional data classes
 --------------------------------
 
@@ -894,45 +1055,8 @@
 after all references are gone and the Python object is garbage collected.
 
 
-Recursive string representation of elements
--------------------------------------------
-
-Normally, elements use the standard string representation for str() that is
-provided by lxml.etree.  You can enable a pretty-print representation for
-objectify elements like this::
-
-    >>> objectify.enableRecursiveStr()
-
-    >>> root = objectify.fromstring("""
-    ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
-    ...   <a attr1="foo" attr2="bar">1</a>
-    ...   <a>1.2</a>
-    ...   <b>1</b>
-    ...   <b>true</b>
-    ...   <c>what?</c>
-    ...   <d xsi:nil="true"/>
-    ... </root>
-    ... """)
-
-    >>> print str(root)
-    root = None [ObjectifiedElement]
-        a = 1 [IntElement]
-          * attr1 = 'foo'
-          * attr2 = 'bar'
-        a = 1.2 [FloatElement]
-        b = 1 [IntElement]
-        b = True [BoolElement]
-        c = 'what?' [StringElement]
-        d = None [NoneElement]
-          * xsi:nil = 'true'
-
-This behaviour can be switched off in the same way::
-
-    >>> objectify.enableRecursiveStr(False)
-
-
-What is different from ElementTree?
------------------------------------
+What is different from lxml.etree?
+==================================
 
 Such a different Element API obviously implies some side effects to the normal
 behaviour of the rest of the API.
@@ -945,7 +1069,8 @@
   can access all children with the ``iterchildren()`` method on elements or
   retrieve a list by calling the ``getchildren()`` method.
 
-* The find, findall and findtext methods use a different implementation as
-  they rely on the original iteration scheme.  This has the disadvantage that
-  they may not be 100% backwards compatible, and the additional advantage that
-  they now support any XPath expression.
+* The find, findall and findtext methods require a different implementation
+  based on ETXPath.  In ``lxml.etree``, they use a Python implementation based
+  on the original iteration scheme.  This has the disadvantage that they may
+  not be 100% backwards compatible, and the additional advantage that they now
+  support any XPath expression.


More information about the lxml-checkins mailing list