[lxml-dev] clean_html

Stefan Behnel stefan_ml at behnel.de
Wed Jun 24 15:45:36 CEST 2009


Francesco wrote:
> Thank you very much!
> 
> I need now a way to find out the encoding of my data... Because it is a webpage 
> there must be a way to extract that information...
> 
> Should I look for something like charset=XXXXXXX?
> 
> Is there a way to extract that info easily after a call to urlopen?
> html = urlopen(webpage).read()

The HTML parser knows about the meta/charset tags in HTML, so if the web
page provides it, there is no need to override the parser encoding.

Stefan



More information about the lxml-dev mailing list