[Lxml-checkins] r44936 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Thu Jul 12 00:03:26 CEST 2007


Author: scoder
Date: Thu Jul 12 00:03:25 2007
New Revision: 44936

Modified:
   lxml/trunk/doc/parsing.txt
Log:
note on parsing from filenames

Modified: lxml/trunk/doc/parsing.txt
==============================================================================
--- lxml/trunk/doc/parsing.txt	(original)
+++ lxml/trunk/doc/parsing.txt	Thu Jul 12 00:03:25 2007
@@ -28,10 +28,21 @@
 
   >>> xml = '<a xmlns="test"><b xmlns="test"/></a>'
 
-  >>> et = etree.parse(StringIO(xml))
-  >>> print etree.tostring(et.getroot())
+  >>> tree = etree.parse(StringIO(xml))
+  >>> print etree.tostring(tree.getroot())
   <a xmlns="test"><b xmlns="test"/></a>
 
+Note how the ``parse()`` function reads from a file-like object here.  If
+parsing is done from a real file, it is more common (and also more efficient)
+to pass a filename or a URL.  HTTP and FTP access is directly supported by
+libxml2, as well as gzip-compressed files (.gz).
+
+If you want to parse from memory and still provide a base URL for the document
+(e.g. to support relative paths in an XInclude), you can provide the
+``base_url`` keyword argument::
+
+  >>> tree = etree.parse("test.xml")
+
 
 Parser options
 --------------
@@ -40,8 +51,8 @@
 example is easily extended to clean up namespaces during parsing::
 
   >>> parser = etree.XMLParser(ns_clean=True)
-  >>> et     = etree.parse(StringIO(xml), parser)
-  >>> print etree.tostring(et.getroot())
+  >>> tree   = etree.parse(StringIO(xml), parser)
+  >>> print etree.tostring(tree.getroot())
   <a xmlns="test"><b/></a>
 
 The keyword arguments in the constructor are mainly based on the libxml2
@@ -81,9 +92,9 @@
   >>> broken_html = "<html><head><title>test<body><h1>page title</h3>"
 
   >>> parser = etree.HTMLParser()
-  >>> et     = etree.parse(StringIO(broken_html), parser)
+  >>> tree   = etree.parse(StringIO(broken_html), parser)
 
-  >>> print etree.tostring(et.getroot(), pretty_print=True)
+  >>> print etree.tostring(tree.getroot(), pretty_print=True)
   <html>
     <head>
       <title>test</title>


More information about the lxml-checkins mailing list