[z3-five] unicodes and strings in Zope 2.9's ZPT with Zope 3's i18n
Philipp von Weitershausen
philipp at weitershausen.de
Tue Jul 25 10:29:01 CEST 2006
Maciej Wiśniowski wrote:
>>> Is this possible to see that code? I'm very interested in it :)
>>>
>>>
>> I know people sometimes do it, but it makes code unportable and could
>> cause other subtle problems.
>>
>>
> I've already found some examples of sitecustomize.py that
> change defaultencoding.
> We're using windows and linux so it may be unportable, but
> I think it may be still helpful in some cases.
>
> Maybe it should be a option in zope.conf that will allow to
> change this behaviour. Firstly I though that
> 'zpublisher-default-encoding' or 'locale' will do the thing...
> but, as you know, it doesn't.
>
>> Unicode is more fun also.
>>
>>
> But is more difficult especially for begineers
It's pretty easy, I think. Most people just don't explain it well enough.
Unicode is an *abstraction*, like an image you see on your screen. An
image on your screen cannot be saved simply like that. You have to save
it in some sort of format, which means, in some way that arranges bytes.
There are several of such formats, e.g. PNG, JPG, etc. Some of them are
"lossy" because they cannot transfer the whole image information.
Same with unicode. A unicode object is a nice object-oriented
abstraction. It can hold any unicode "character". But, you cannot simply
save unicode to the filesystem or send it over the wire. You have to do
this in bytes, and again, a format (=encoding) tells you how to convert
unicode into such a format. Some of the are lossy (because they don't
support the whole Unicode range) and some of them aren't (e.g. UTF-*).
To make a long story short:
- In Python, we want to work with the abstraction. Working with
8bit-strings is like working with the byte representation of an image to
do image manipulation. That sucks
- When going to the filesystem or the HTTP client, we have to use some
encoding. We really only need to worry about the filesystem; talking to
the HTTP client is the ZPublisher's task. And, since the ZPublisher is a
Python component, we want to talk unicode to it.
The ZODB is also a Python component. Hence, we store unicode. SQL
database adapters are also Python components. They really SHOULD give us
unicode, not some 8bit encoded jibberish.
Now, I realize there's legacy data in the ZODB where 8bit strings are
stored. When you know this, you can explicitly deal with this. But don't
introduce legacy when writing new stuff...
> and especially in Zope2. It's a bit strange to write strings with u'.
> I've never seen this in other languages.
And I've never seen meaningful indention in other languages. What's your
point? You want to reject unicode because you don't like its syntax???
> So far it seems that we have to use functions like
> yours:
>
> def u(s, encoding="utf-8"):
> """Convert from UTF8 to Unicode (if needed)"""
> ...
Huh? Simply do some_unicode_string.encode('utf-8'). But, again, unless
you're actually writing stuff to the filesystem, I doubt you should ever
have to manually encode or decode stuff, even in Zope 2 (provided you
use certain helpers from Five).
> for data stored in attributes and for data retrieved from database
> and be careful to use u'...' strings instead of '...'.
Anything that contains human text should be unicode. That's a very
simple rule.
Philipp
More information about the z3-five
mailing list