[lxml-dev] clean_html

Stefan Behnel stefan_ml at behnel.de
Wed Jun 24 15:18:56 CEST 2009


Piet van Oostrum wrote:
>>>>>> Francesco <cattafra at hotmail.com> (F) wrote:
>
>>F> Thank you very much for your answers!
>>F> The html string is read from a file with:
>>F> inputfile = "test.txt"
>>F> # where test.txt contains "<title>My site &raquo; Homepage</title>"
>>F> input = open(inputfile, "rb")
>>F> html = input.read()
>
> Why do you use "rb"?

Because the file contains byte encoded data.

Stefan



More information about the lxml-dev mailing list