Thank you very much for your answers! The html string is read from a file with: inputfile = "test.txt" # where test.txt contains "<title>My site » Homepage</title>" input = open(inputfile, "rb") html = input.read() How could I define the encoding for html? Thanks, Francesco