[lxml-dev] Question on clean_html
Stefan Behnel
stefan_ml at behnel.de
Tue Dec 2 09:23:05 CET 2008
Ian Bicking wrote:
> for el in list(doc.iter()):
> if el.tag not in ['a']:
> el.drop_tag()
>
> I'm not 100% sure what happens if you modify the tree in place like
> this, though I think list() will make it work.
It will at least refuse to drop the root element. Running through
list(root.iterdescendants()) should work, though, although the above will
definitely not result in a valid HTML document.
If you are really only interested in a couple of tags without a meaningful
structure, you should collect them in a list rather than cutting
everything else out of the document (which is quite costly).
Stefan
More information about the lxml-dev
mailing list