[lxml-dev] new ElementSoup module in lxml.html

Roger Patterson rogerpatterson at gmail.com
Sat Jul 21 00:10:21 CEST 2007


Hi Stefan,
I hadn't tried to use the lxml.html module before, but it doesn't seem
to be in trunk (only in branch).  So I guess this means it can only be
installed from source?  (eggs are only made from the trunk?)

In which case, does your elementsoup.py really need lxml.html?  I
noticed elementsoup.py only uses "makeelement" from
lxml.html.html_parser.  Can I get away with using anything from the
trunk instead?
cheers,
-Roger

Stefan Behnel wrote:
> Hi,
>
> I rewrote Fredrik's ElementSoup.py module for lxml.html so that you can now
> have lxml read in tag soup with BeautifulSoup and convert it into an lxml.html
> tree of Elements. While libxml2 can also parse broken HTML, it is not made to
> parse sick soup of tags, so if you need to work with web pages that sort of
> look like they might have been HTML once, the lxml.html.ElementSoup module can
> help you get there.
>
> http://codespeak.net/svn/lxml/branch/html/doc/elementsoup.txt
> http://codespeak.net/svn/lxml/branch/html/src/lxml/html/ElementSoup.py
>
> Have fun,
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
>   


More information about the lxml-dev mailing list