[lxml-dev] Struggling with unicode again
Stefan Behnel
stefan_ml at behnel.de
Wed Jun 3 19:55:20 CEST 2009
Collioud, Olivier wrote:
> I'm trying to update an etree in a WSGI application with data coming from a posted form.
> The data is converted first using urllib.unquote_plus.
>
> I know that the data (text) is then UTF-8 encoded.
>
> LXML is giving:
>
> Traceback (most recent call last):
> File "D:/Applications/IPC_Definitions_Editor/defedit/defedit.py", line 130, in application
> elt.text = text
> File "lxml.etree.pyx", line 835, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:9595),
> File "apihelpers.pxi", line 409, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:28436)
> File "apihelpers.pxi", line 951, in lxml.etree._utf8 (src/lxml/lxml.etree.c:32423)
> AssertionError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes
>
> What encoding do I need to convert 'text' to and how ?
No encoding. You have to /de/code it from UTF-8 into Unicode and pass a
Python unicode string, i.e. do
elt.text = text.decode("utf-8")
Stefan
More information about the lxml-dev
mailing list