[Lxml-checkins] r46298 - in lxml/trunk: doc src/lxml
scoder at codespeak.net
scoder at codespeak.net
Tue Sep 4 09:22:22 CEST 2007
Author: scoder
Date: Tue Sep 4 09:22:21 2007
New Revision: 46298
Modified:
lxml/trunk/doc/parsing.txt
lxml/trunk/src/lxml/parser.pxi
Log:
doc update on the feed parser
Modified: lxml/trunk/doc/parsing.txt
==============================================================================
--- lxml/trunk/doc/parsing.txt (original)
+++ lxml/trunk/doc/parsing.txt Tue Sep 4 09:22:21 2007
@@ -9,8 +9,17 @@
.. contents::
..
1 Parsers
- 2 iterparse and iterwalk
- 3 Python unicode strings
+ 1.1 Parser options
+ 1.2 Parsing HTML
+ 1.3 Doctype information
+ 2 The feed parser interface
+ 3 iterparse and iterwalk
+ 3.1 Selective tag events
+ 3.2 Modifying the tree
+ 3.3 iterwalk
+ 4 Python unicode strings
+ 4.1 Serialising to Unicode strings
+
The usual setup procedure::
@@ -167,6 +176,45 @@
ascii
+The feed parser interface
+=========================
+
+Since lxml 2.0, the parsers have a feed parser interface that is compatible to
+the `ElementTree parsers`_. You can use it to feed data into the parser in a
+controlled step-by-step way. Note that you can only use one interface at a
+time: the ``parse()`` or ``XML()`` functions, or the feed parser interface.
+
+.. _`ElementTree parsers`: http://effbot.org/elementtree/elementtree-xmlparser.htm
+
+To start parsing with a feed parser, just call its ``feed()`` method::
+
+ >>> parser = etree.XMLParser()
+
+ >>> for data in ('<?xml versio', 'n="1.0"?', '><roo', 't><a', '/></root>'):
+ ... parser.feed(data)
+
+When you are done parsing, you **must** call the ``close()`` method to
+retrieve the root Element of the parse result document, and to unlock the
+parser::
+
+ >>> root = parser.close()
+
+ >>> print root.tag
+ root
+ >>> print root[0].tag
+ a
+
+If you do not call ``close()``, the parser will stay locked and subsequent
+usages will block till the end of times. So make sure you also close it in
+the exception case.
+
+Another way of achieving the same step-by-step parsing is by writing your own
+file-like object that returns a chunk of data on each ``read()`` call. Where
+the feed parser interface allows you to actively pass data chunks into the
+parser, a file-like object passively responds to ``read()`` requests of the
+parser itself. Depending on the data source, either way may be more natural.
+
+
iterparse and iterwalk
======================
Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi (original)
+++ lxml/trunk/src/lxml/parser.pxi Tue Sep 4 09:22:21 2007
@@ -578,7 +578,7 @@
This method must be called after passing the last chunk of data into
the ``feed()`` method. It should only be called when using the feed
- parser interface is used, all other usage is undefined.
+ parser interface, all other usage is undefined.
"""
cdef xmlParserCtxt* pctxt
cdef xmlDoc* c_doc
More information about the lxml-checkins
mailing list