[Cython] constant Py_UNICODE arrays

Aaron DeVore aaron.devore at gmail.com
Sat Nov 8 01:02:01 CET 2008


On Thu, Nov 6, 2008 at 10:26 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> I may be biased since I've been working on the lxml XML library for quite a
> while now, but may I ask why you use unicode strings and Py_UNICODE
> internally, instead of a UTF-8 encoded byte buffer?

The tree is built using a pure Python parser, even though the tree
itself is in Cython. The strings are already passed from the parser as
unicode objects so it's easier for me to just store a pointer to the
PyObject. I don't know if there's a performance hit (or gain?) but
I've found that method more convenient. If there's a better way then I
would be happy to hear it.

I'll take a look this evening at how Cython deals with interning strings.

-Aaron


More information about the Cython-dev mailing list