[lxml-dev] Weird errors in tostring

Bruno brunobg at gmail.com
Sat Apr 12 22:38:20 CEST 2008


Hi,

I'm getting a weird error in lxml.html.tostring; it happens in one machine but
not in another, although both are using lxml 2.0.2, but one has python 2.5
(which works all the time) and the other python 2.4 (which doesn't). Here's 
the relevant backtrace:

  File "/home/spyder/spyder/core/base.py", line 289, in treetostring
    return tostring(root, method='xml', encoding=unicode)
  File
"/usr/lib/python2.4/site-packages/lxml-2.0.2-py2.4-linux-i686.egg/lxml/html/
__init__.py", line 1313, in tostring
    encoding=encoding)
  File "lxml.etree.pyx", line 2455, in lxml.etree.tostring
  File "serializer.pxi", line 61, in lxml.etree._tostring
  File "serializer.pxi", line 126, in lxml.etree._tounicode
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 21-24: 
invalid data

In the other machine all goes well. FYI, the tree (root variable) is being 
built with root = lxml.html.fromstring(data). I'm parsing data in utf8 and
iso-8859-1, and this particular backtrace happened in a HTML document 
correctly labelled with a meta charset=iso-8859-1. 

If you have any ideas of how to trace what is going wrong?



More information about the lxml-dev mailing list