[lxml-dev] Encoding problems with lxml

Stefan Behnel stefan_ml at behnel.de
Thu Jun 28 11:35:18 CEST 2007



Bruno Barberi Gnecco wrote:
> a) when reading pages in iso-8859-1, accented characters are converted to HTML
> sequences, such as à for ` + a. I don't want this to happen, how to avoid it?

I only noticed now that this was referring to parsing. Any reason you don't
want entities resolved her?

lxml 2.0 will allow you to keep entities in the tree, although they are rarely
of any help.

Stefan



More information about the lxml-dev mailing list