[Lxml-checkins] r48025 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Fri Oct 26 11:36:08 CEST 2007


Author: scoder
Date: Fri Oct 26 11:36:08 2007
New Revision: 48025

Modified:
   lxml/trunk/doc/lxmlhtml.txt
Log:
mention ElementSoup in lxmlhtml.txt

Modified: lxml/trunk/doc/lxmlhtml.txt
==============================================================================
--- lxml/trunk/doc/lxmlhtml.txt	(original)
+++ lxml/trunk/doc/lxmlhtml.txt	Fri Oct 26 11:36:08 2007
@@ -8,13 +8,24 @@
 
 .. contents::
 .. 
-   1  Running HTML doctests
-   2  Parsing HTML
-     2.1  Parsing HTML fragments
-   3  Creating HTML with the E-factory
-   4  Working with links
-   5  Cleaning up HTML
-
+   1  Parsing HTML
+     1.1  Parsing HTML fragments
+     1.2  Really broken pages
+   2  HTML Element Methods
+   3  Running HTML doctests
+   4  Creating HTML with the E-factory
+     4.1  Viewing your HTML
+   5  Working with links
+     5.1  Functions
+   6  Forms
+     6.1  Form Filling Example
+     6.2  Form Submission
+   7  Cleaning up HTML
+     7.1  autolink
+     7.2  wordwrap
+   8  HTML Diff
+   9  Examples
+     9.1  Microformat Example
 
 The main API is based on the `lxml.etree`_ API, and thus, on the ElementTree_
 API.
@@ -59,6 +70,19 @@
     on whether the string looks like a full document, or just a
     fragment.
 
+Really broken pages
+-------------------
+
+The normal HTML parser is capable of handling broken HTML, but for
+pages that are far enough from HTML to call them 'tag soup', it may
+still fail to parse the page.  A way to deal with this is
+ElementSoup_, which deploys the well-known BeautifulSoup_ parser to
+build an lxml HTML tree.
+
+.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
+.. _ElementSoup: elementsoup.html
+
+
 HTML Element Methods
 ====================
 


More information about the lxml-checkins mailing list