[lxml-dev] Question on clean_html

Brian Neal bgneal at gmail.com
Sun Nov 30 07:03:29 CET 2008


Hi,

I would like to use lxml to remove all tags except 'a' tags. Is this possible?

I don't seem to understand the arguments to the Cleaner class. What
does allow_tags do?

I tried this:

>>> c = Cleaner(allow_tags=('a',), remove_unknown_tags=False)
>>> print c.clean_html('<b>Hi</b>')
<b>Hi</b>

Do I instead have to list all the tags I don't want, except for 'a',
in a remove_tags keyword argument?

Any hints? Thank you.


More information about the lxml-dev mailing list