[lxml-dev] new ElementSoup module in lxml.html
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 16 15:16:03 CEST 2007
Hi,
I rewrote Fredrik's ElementSoup.py module for lxml.html so that you can now
have lxml read in tag soup with BeautifulSoup and convert it into an lxml.html
tree of Elements. While libxml2 can also parse broken HTML, it is not made to
parse sick soup of tags, so if you need to work with web pages that sort of
look like they might have been HTML once, the lxml.html.ElementSoup module can
help you get there.
http://codespeak.net/svn/lxml/branch/html/doc/elementsoup.txt
http://codespeak.net/svn/lxml/branch/html/src/lxml/html/ElementSoup.py
Have fun,
Stefan
More information about the lxml-dev
mailing list