[lxml-dev] objectify E-factory does not handle unicode text
jholg at gmx.de
jholg at gmx.de
Wed Jul 4 11:28:48 CEST 2007
Hi,
playing around with the new E-factory I found that it does not handle
unicode the way the rest of the API does:
>>> STR = objectify.E.str
>>> STR(unicode("äöü", 'latin-1'))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/data/pydev/hjoukl/LXML/lxml-1.3/build/lib.solaris-2.8-sun4u-2.4/lxml/builder.py", line 43, in <lambda>
return lambda *args, **kwargs: func(tag, *args, **kwargs)
File "/data/pydev/hjoukl/LXML/lxml-1.3/build/lib.solaris-2.8-sun4u-2.4/lxml/builder.py", line 177, in __call__
v = t(elem, item)
File "objectify.pyx", line 1661, in objectify.__add_text
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>>
This is easily fixed by changing __add_text to
def __add_text(_Element elem not None, text):
cdef tree.xmlNode* c_child
if not python._isString(text):
if isinstance(text, bool):
text = str(text).lower()
else:
text = str(text)
c_child = cetree.findChildBackwards(elem._c_node, 0)
[...]
>>> STR = objectify.E.str
>>> STR(unicode("äöü", 'latin-1'))
Patches for trunk / 1.3 branch appended.
Another issue with E-factory is that it currently does not have support for the custom objectify classes that you can add with the PyType mechanisms: E.g. I'm using datetime and decimal additions, which leads to
>>> import decimal
>>> DEC = objectify.E.decimal
>>> DEC(decimal.Decimal(0))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/data/pydev/hjoukl/LXML/lxml-1.3/build/lib.solaris-2.8-sun4u-2.4/lxml/builder.py", line 43, in <lambda>
return lambda *args, **kwargs: func(tag, *args, **kwargs)
File "/data/pydev/hjoukl/LXML/lxml-1.3/build/lib.solaris-2.8-sun4u-2.4/lxml/builder.py", line 175, in __call__
raise TypeError("bad argument type: %r" % item)
TypeError: bad argument type: Decimal("0")
>>>
So I'd have to add decimal.decimal into objectify.E._typemap. The nicest way to handle this would be PyType.register() doing it for me, but
PyType uses type names rather than type objects for its purposes. Maybe the easiest thing is to instrument ElementMaker with its own register/unregister(<type>) methods and well-document it?
Holger
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trunk_efactory_unicode.patch
Type: application/octet-stream
Size: 671 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070704/65923b81/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: branch13_efactory_unicode.patch
Type: application/octet-stream
Size: 671 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070704/65923b81/attachment-0001.obj
More information about the lxml-dev
mailing list