[lxml-dev] clean_html
Stefan Behnel
stefan_ml at behnel.de
Wed Jun 24 15:45:36 CEST 2009
Francesco wrote:
> Thank you very much!
>
> I need now a way to find out the encoding of my data... Because it is a webpage
> there must be a way to extract that information...
>
> Should I look for something like charset=XXXXXXX?
>
> Is there a way to extract that info easily after a call to urlopen?
> html = urlopen(webpage).read()
The HTML parser knows about the meta/charset tags in HTML, so if the web
page provides it, there is no need to override the parser encoding.
Stefan
More information about the lxml-dev
mailing list