========= lxml.html ========= Since version 2.0, lxml provides a dedicated package for dealing with HTML: ``lxml.html``. It provides a special Element API for HTML elements, as well as a number of utilities for common tasks. .. contents:: .. 1 Running HTML doctests 2 Parsing HTML 2.1 Parsing HTML fragments 3 Creating HTML with the E-factory 4 Working with links 5 Cleaning up HTML The main API is based on the `lxml.etree`_ API, and thus, on the ElementTree_ API. .. _`lxml.etree`: tutorial.html .. _ElementTree: http://effbot.org/zone/element-index.htm Parsing HTML ============ Parsing HTML fragments ---------------------- HTML Element Methods ==================== HTML elements have all the methods that come with ElementTree, but also include some extra methods: ``.drop_tree()``: Drops the element and all its children. Unlike ``el.getparent().remove(el)`` this does *not* remove the tail text; with ``drop_tree`` the tail text is merged with the previous element. ``.drop_tag()``: Drops the tag, but keeps its children and text. ``.find_class(class_name)``: Returns a list of all the elements with the given CSS class name. Note that class names are space separated in HTML, so ``doc.find_class_name('highlight')`` will find an element like ``