[Lxml-checkins] r48025 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Fri Oct 26 11:36:08 CEST 2007
Author: scoder
Date: Fri Oct 26 11:36:08 2007
New Revision: 48025
Modified:
lxml/trunk/doc/lxmlhtml.txt
Log:
mention ElementSoup in lxmlhtml.txt
Modified: lxml/trunk/doc/lxmlhtml.txt
==============================================================================
--- lxml/trunk/doc/lxmlhtml.txt (original)
+++ lxml/trunk/doc/lxmlhtml.txt Fri Oct 26 11:36:08 2007
@@ -8,13 +8,24 @@
.. contents::
..
- 1 Running HTML doctests
- 2 Parsing HTML
- 2.1 Parsing HTML fragments
- 3 Creating HTML with the E-factory
- 4 Working with links
- 5 Cleaning up HTML
-
+ 1 Parsing HTML
+ 1.1 Parsing HTML fragments
+ 1.2 Really broken pages
+ 2 HTML Element Methods
+ 3 Running HTML doctests
+ 4 Creating HTML with the E-factory
+ 4.1 Viewing your HTML
+ 5 Working with links
+ 5.1 Functions
+ 6 Forms
+ 6.1 Form Filling Example
+ 6.2 Form Submission
+ 7 Cleaning up HTML
+ 7.1 autolink
+ 7.2 wordwrap
+ 8 HTML Diff
+ 9 Examples
+ 9.1 Microformat Example
The main API is based on the `lxml.etree`_ API, and thus, on the ElementTree_
API.
@@ -59,6 +70,19 @@
on whether the string looks like a full document, or just a
fragment.
+Really broken pages
+-------------------
+
+The normal HTML parser is capable of handling broken HTML, but for
+pages that are far enough from HTML to call them 'tag soup', it may
+still fail to parse the page. A way to deal with this is
+ElementSoup_, which deploys the well-known BeautifulSoup_ parser to
+build an lxml HTML tree.
+
+.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
+.. _ElementSoup: elementsoup.html
+
+
HTML Element Methods
====================
More information about the lxml-checkins
mailing list