[lxml-dev] Getting 'user-visible' text from HTML
Adam Nelson
adam at varud.com
Fri Jul 24 16:30:03 CEST 2009
Is there a shortcut method (or even a pasted script) that allows lxml to get all
the 'user-visible' text?
I'm writing a screen scraper that then takes that text and looks for
banned words next to an
advertiser's content - and therefore I need to run a regular
expression on everything a user
might see (including meta keywords, etc...) but I don't care
about the actual tags
themselves, or urls, etc...
Right now, I'm just doing the regex on the entire HTML block.
Thanks,
Adam
More information about the lxml-dev
mailing list