[py-dev] py.test in Unicode context

François Pinard pinard at iro.umontreal.ca
Mon Apr 17 16:22:00 CEST 2006


Hi, people.  I hope this is an appropriate forum for discussing such 
things, otherwise, please friendly tell me! :-)

For a while now, and more as it goes, we are using py.test and py.log in 
a some projects.  A few being bigger, most being smaller...

A few weeks ago, we started the experiment of fully converting a set of
programs to full Unicode internally.  That is, for example, *all* 
constant strings in the sources got a 'u' prepended by the application 
of ``unipy *.py``, where ``unipy`` is a script of ours.  A bit sadly, 
Python is not fully ready for such usage -- comments censored :-) --
yet with a few appropriate local stunts, it seems we can manage 
nevertheless.  In fact, it sounds promising.  The ``unipy`` scripts adds 
the following special line near the start of Python modules::

    from Unicode import file, isinstance, open, os, str, sys, unicode

and also cleans out pre-existing import statements from ``os`` and 
``sys`` references.  The effect is that, for example, ``file`` or 
``os.popen`` have a Unicode-aware filter automatically installed around 
the real file object, and this is true as well for ``sys.stdin`` and 
``sys.stdout`` say, but only for modules using the special ``from`` 
line, the real things are left alone for non unipy-ized modules.

py.test and py.log does not behave well in such contexts, and I would 
much like not giving on them, so my incentive for this conversation.  
I'll likely adjust a local copy of py.log, but py.test is less easy for 
me.  It uses some magic by which, for example, ``sys.stdout`` is 
overriden in the tested module space, and by a ``cStringIO`` object.  
For one thing, ``cStringIO`` does not work with Unicode strings, while 
``StringIO`` does, but it should not even be a problem, because the 
special ``sys`` imported from our ``Unicode`` module should, for 
example, write only 8-bit strings to the real ``sys.stdout``, so I would 
guess the interception installed by ``py.test`` is not low level enough: 
it should ideally not play in the tested module namespace.

Do you have any opinion, suggestion, or thought you would feel like 
sharing, on this matter?

[On a parallel line of thought, I also wonder if the pylib project could 
not adopt, as one of its sub-projects, the seek for a workable solution 
to the problematic of those like us, who try to match Python and Unicode 
for real. :-)]

-- 
François Pinard   http://pinard.progiciels-bpi.ca


More information about the py-dev mailing list