===================== APIs specific to lxml ===================== lxml tries to follow established APIs wherever possible. Sometimes however the need to expose a feature in an easy way led to the invention of a new API. lxml.etree ========== lxml.etree tries to follow the etree API wherever it can. There are however some incompatibilities (see compatibility.txt). There are also some extensions. The following examples usually assume this to be executed first:: >>> import lxml.etree >>> from StringIO import StringIO Parsers ------- One of the differences is the parser. There is support for both XML and (broken) HTML. Both are based on libxml2 and therefore only support options that are backed by the library. Parsers take a number of keyword arguments. The following is an example for namespace cleanup during parsing, first with the default parser, then with a parametrized one:: >>> xml = '' >>> et = lxml.etree.parse(StringIO(xml)) >>> print lxml.etree.tostring(et.getroot()) >>> parser = lxml.etree.XMLParser(ns_clean=True) >>> et = lxml.etree.parse(StringIO(xml), parser) >>> print lxml.etree.tostring(et.getroot()) HTML parsing is similarly simple:: >>> broken_html = "