[lxml-dev] Converting from XML to HTML parsed trees.
Stefan Behnel
stefan_ml at behnel.de
Fri Aug 8 18:30:49 CEST 2008
Hi,
John Krukoff wrote:
> I have some XML data already parsed into an lxml ElementTree. Is there
> any easy way to reparse that using lxml.html, or is the only way to do
> it to serialize the XML to a string and reparse using one of the
> lxml.html parsers?
In 2.1, lxml.html has two functions html_to_xhtml() and xhtml_to_html() that
might do what you want, but they will not change the tree API into the one of
lxml.html. There are two ways to do that:
1) lxml's parser and serialiser are so fast that it might actually be fast
enough to serialise into HTML (method="html") and parse using lxml.html.
2) Create a new <html> Element using lxml.html and append all children of your
original root Element to it. They will then inherit the lxml.html API from
their root. In this case, you have to make sure that you let go of all
references to these Elements, though, as the Element proxy objects will keep
their API as long as they stay alive.
Just try both to see what works best for you.
Stefan
More information about the lxml-dev
mailing list