[lxml-dev] new ElementSoup module in lxml.html
Roger Patterson
rogerpatterson at gmail.com
Sat Jul 21 00:10:21 CEST 2007
Hi Stefan,
I hadn't tried to use the lxml.html module before, but it doesn't seem
to be in trunk (only in branch). So I guess this means it can only be
installed from source? (eggs are only made from the trunk?)
In which case, does your elementsoup.py really need lxml.html? I
noticed elementsoup.py only uses "makeelement" from
lxml.html.html_parser. Can I get away with using anything from the
trunk instead?
cheers,
-Roger
Stefan Behnel wrote:
> Hi,
>
> I rewrote Fredrik's ElementSoup.py module for lxml.html so that you can now
> have lxml read in tag soup with BeautifulSoup and convert it into an lxml.html
> tree of Elements. While libxml2 can also parse broken HTML, it is not made to
> parse sick soup of tags, so if you need to work with web pages that sort of
> look like they might have been HTML once, the lxml.html.ElementSoup module can
> help you get there.
>
> http://codespeak.net/svn/lxml/branch/html/doc/elementsoup.txt
> http://codespeak.net/svn/lxml/branch/html/src/lxml/html/ElementSoup.py
>
> Have fun,
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
>
More information about the lxml-dev
mailing list