[lxml-dev] new ElementSoup module in lxml.html

Stefan Behnel stefan_ml at behnel.de
Mon Jul 16 15:16:03 CEST 2007


Hi,

I rewrote Fredrik's ElementSoup.py module for lxml.html so that you can now
have lxml read in tag soup with BeautifulSoup and convert it into an lxml.html
tree of Elements. While libxml2 can also parse broken HTML, it is not made to
parse sick soup of tags, so if you need to work with web pages that sort of
look like they might have been HTML once, the lxml.html.ElementSoup module can
help you get there.

http://codespeak.net/svn/lxml/branch/html/doc/elementsoup.txt
http://codespeak.net/svn/lxml/branch/html/src/lxml/html/ElementSoup.py

Have fun,
Stefan


More information about the lxml-dev mailing list