[Lxml-checkins] r47664 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Sun Oct 21 09:22:42 CEST 2007


Author: scoder
Date: Sun Oct 21 09:22:40 2007
New Revision: 47664

Modified:
   lxml/trunk/doc/tutorial.txt
Log:
tutorial section on serialisation

Modified: lxml/trunk/doc/tutorial.txt
==============================================================================
--- lxml/trunk/doc/tutorial.txt	(original)
+++ lxml/trunk/doc/tutorial.txt	Sun Oct 21 09:22:40 2007
@@ -334,13 +334,74 @@
 .. _`further iterators`: api.html#iteration
 
 
+Serialisation
+-------------
+
+Serialisation commonly uses with the ``tostring()`` function that
+returns a string, or the ``ElementTree.write()`` method that writes to
+a file or file-like object.  Both accept the same keyword arguments
+like ``pretty_print`` for formatted output or ``encoding`` to select a
+specific output encoding other than plain ASCII::
+
+   >>> root = etree.XML('<root><a><b/></a></root>')
+
+   >>> print etree.tostring(root)
+   <root><a><b/></a></root>
+
+   >>> print etree.tostring(root, xml_declaration=True)
+   <?xml version='1.0' encoding='ASCII'?>
+   <root><a><b/></a></root>
+
+   >>> print etree.tostring(root, encoding='iso-8859-1')
+   <?xml version='1.0' encoding='iso-8859-1'?>
+   <root><a><b/></a></root>
+
+   >>> print etree.tostring(root, pretty_print=True)
+   <root>
+     <a>
+       <b/>
+     </a>
+   </root>
+
+
+Since lxml 2.0 (and ElementTree 1.3), the serialisation functions can
+do more than XML serialisation and optional pretty printing.  You can
+serialise to HTML or extract the text content by passing the
+``method`` keyword::
+
+   >>> root = etree.XML('<html><head/><body><p>Hello<br/>World</p></body></html>')
+
+   >>> print etree.tostring(root) # default: method = 'xml'
+   <html><head/><body><p>Hello<br/>World</p></body></html>
+
+   >>> print etree.tostring(root, method='xml') # same as above
+   <html><head/><body><p>Hello<br/>World</p></body></html>
+
+   >>> print etree.tostring(root, method='html')
+   <html><head></head><body><p>Hello<br>World</p></body></html>
+
+   >>> print etree.tostring(root, method='html', pretty_print=True)
+   <html>
+   <head></head>
+   <body><p>Hello<br>World</p></body>
+   </html>
+
+   >>> print etree.tostring(root, method='text')
+   HelloWorld
+
+For the plain text output, the ``tounicode()`` function might become handy::
+
+   >>> etree.tounicode(root, method='text')
+   u'HelloWorld'
+
+
 The ElementTree class
 =====================
 
 An ``ElementTree`` is mainly a document wrapper around a tree with a root
 node.  It provides a couple of methods for parsing, serialisation and general
 document handling.  One of the bigger differences is that it serialises as a
-complete document, as opposed to a single Element.  This includes top-level
+complete document, as opposed to a single ``Element``.  This includes top-level
 processing instructions and comments, as well as a DOCTYPE and other DTD
 content in the document::
 


More information about the lxml-checkins mailing list