[Lxml-checkins] r46298 - in lxml/trunk: doc src/lxml

scoder at codespeak.net scoder at codespeak.net
Tue Sep 4 09:22:22 CEST 2007


Author: scoder
Date: Tue Sep  4 09:22:21 2007
New Revision: 46298

Modified:
   lxml/trunk/doc/parsing.txt
   lxml/trunk/src/lxml/parser.pxi
Log:
doc update on the feed parser

Modified: lxml/trunk/doc/parsing.txt
==============================================================================
--- lxml/trunk/doc/parsing.txt	(original)
+++ lxml/trunk/doc/parsing.txt	Tue Sep  4 09:22:21 2007
@@ -9,8 +9,17 @@
 .. contents::
 .. 
    1  Parsers
-   2  iterparse and iterwalk
-   3  Python unicode strings
+     1.1  Parser options
+     1.2  Parsing HTML
+     1.3  Doctype information
+   2  The feed parser interface
+   3  iterparse and iterwalk
+     3.1  Selective tag events
+     3.2  Modifying the tree
+     3.3  iterwalk
+   4  Python unicode strings
+     4.1  Serialising to Unicode strings
+
 
 The usual setup procedure::
 
@@ -167,6 +176,45 @@
   ascii
 
 
+The feed parser interface
+=========================
+
+Since lxml 2.0, the parsers have a feed parser interface that is compatible to
+the `ElementTree parsers`_.  You can use it to feed data into the parser in a
+controlled step-by-step way.  Note that you can only use one interface at a
+time: the ``parse()`` or ``XML()`` functions, or the feed parser interface.
+
+.. _`ElementTree parsers`: http://effbot.org/elementtree/elementtree-xmlparser.htm
+
+To start parsing with a feed parser, just call its ``feed()`` method::
+
+  >>> parser = etree.XMLParser()
+
+  >>> for data in ('<?xml versio', 'n="1.0"?', '><roo', 't><a', '/></root>'):
+  ...     parser.feed(data)
+
+When you are done parsing, you **must** call the ``close()`` method to
+retrieve the root Element of the parse result document, and to unlock the
+parser::
+
+  >>> root = parser.close()
+
+  >>> print root.tag
+  root
+  >>> print root[0].tag
+  a
+
+If you do not call ``close()``, the parser will stay locked and subsequent
+usages will block till the end of times.  So make sure you also close it in
+the exception case.
+
+Another way of achieving the same step-by-step parsing is by writing your own
+file-like object that returns a chunk of data on each ``read()`` call.  Where
+the feed parser interface allows you to actively pass data chunks into the
+parser, a file-like object passively responds to ``read()`` requests of the
+parser itself.  Depending on the data source, either way may be more natural.
+
+
 iterparse and iterwalk
 ======================
 

Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi	(original)
+++ lxml/trunk/src/lxml/parser.pxi	Tue Sep  4 09:22:21 2007
@@ -578,7 +578,7 @@
 
         This method must be called after passing the last chunk of data into
         the ``feed()`` method.  It should only be called when using the feed
-        parser interface is used, all other usage is undefined.
+        parser interface, all other usage is undefined.
         """
         cdef xmlParserCtxt* pctxt
         cdef xmlDoc* c_doc


More information about the lxml-checkins mailing list