[lxml-dev] Question on clean_html

Stefan Behnel stefan_ml at behnel.de
Tue Dec 2 09:23:05 CET 2008


Ian Bicking wrote:
> for el in list(doc.iter()):
>      if el.tag not in ['a']:
>          el.drop_tag()
>
> I'm not 100% sure what happens if you modify the tree in place like
> this, though I think list() will make it work.

It will at least refuse to drop the root element. Running through
list(root.iterdescendants()) should work, though, although the above will
definitely not result in a valid HTML document.

If you are really only interested in a couple of tags without a meaningful
structure, you should collect them in a list rather than cutting
everything else out of the document (which is quite costly).

Stefan



More information about the lxml-dev mailing list