[Cython] Cython array type: Summary, introducing CEP 518
Dag Sverre Seljebotn
dagss at student.matnat.uio.no
Thu Jun 18 15:48:41 CEST 2009
Sorry about the medium-sized length, but I'd like this to be close to my
last email on the subject. I'd just refer to Robert's mail, but I guess
some more explanation about NumPy semantics is in order for the benefit
of non-NumPy-users, so I've made a summary of that.
Stefan Behnel wrote:
> Dag Sverre Seljebotn wrote:
>> Stefan Behnel wrote:
>>> we have three types:
>>>
>>> 1) a dynamic array type
>>> - allocates memory on creation
>>> - reallocates on (explicit) resizing, e.g. a .resize() method
>>> - supports PEP 3118 (and disables shrinking with live buffers)
>>> - returns a typed value on indexing
>>> - returns a typed array copy on slicing
>>> - behaves like a tuple otherwise
>>>
>>> 2) a typed memory view
>>> - created on top of a buffer (or array)
>>> - never allocates memory (for data, that is)
>>> - creates a new view object on slicing
>>> - behaves like an array otherwise
>> This last point is dangerous as we seem to disagree about what an array
>> is.
>
> It's what I described under 1).
>
>
>>> 3) a SIMD memory view
>>> - created on top of a buffer, array or memory view
>>> - supports parallel per-item arithmetic
>>> - behaves like a memory view otherwise
>> Good summary. Starting from this: I want int[:,:] to be the combination
>> of 2) and 3)
>
> You mean "3) and not 2)", right? Could you explain why you need a syntax
> for this if it's only a view?
I suppose I meant some variation of 3) with some extra bullet points
(slicing in particular). We need a syntax because SIMD operations must
be handled as a special-case compile-time.
Robert put it well; what I want is the core NumPy array semantics on a
view to any array memory -- builtin, so that it can be optimized
compile-time. We need to return to that; trying to distill something
else and more generic out of this seems to only bring confusion.
(This is about 1) and 2) in Robert's mail only though.)
I'll make a list of what we mean by NumPy semantics below. At the very
bottom is some things which I think should *not* be included.
First:
1) Nobody is claming this is elegant or Pythonic. It is catering for a
numerical special interest, nothing more nor less.
2) As Robert put it: He won't use it himself, but the rest of Cython
indirectly benefits from all the Cython interest from the numerical users.
3) The proposed semantics below are really not up for in-detail
discussion, what I'm really after is a "yes" or "no" -- I just don't
have the time, and NumPy is the de facto standard for Python numerics
and what everybody expects anyway. I don't want to invent something
entirely new.
That said, here's a long list of what I mean with NumPy semantics,
assuming both CEPs are implemented.
# make x a compile-time-optimizeable 2D view on memoryview(obj)
cdef int[:,:] x = obj
# make an unassigned 1D view
cdef int[:] y
# Indexing
x[2,3]
# Access shape, stride info, raw data pointer
x.shape
x.strides
x.data
# Slicing out new view of third row (in two ways)
y = x[2,:]
y = x[2,...]
# Now, modifying y modifies what x points to too.
# Make a copy so that y points to seperate memory:
y = y.copy()
# Indexing with None creates new, 1-length axis
x = y[None, :] # x.shape == (1, y.shape[0])
x = y[:, None] # x.shape == (y.shape[0], 1)
# Now, this:
x[0, 3] = 2
# modifies y[3] too.
# New view of exactly same data
x[:,:]
x[...]
# Set all entries in array 12
x[...] = 12
# Set only first row to 10
x[0, :] = 10
# Some ways of multiplying all elements with 2
x *= 2
x[...] *= 2
x[:,:] *= 2
x += x
x[...] += x
# A more complicated expression...allocates memory
x = stdmath.sqrt(x*x + x*(x+1)/(x+2))
# A more complicated expression...overwrites existing
# memory
x[...] = stdmath.sqrt(x*x + x*(x+1)/(x+2))
# Boolean operators
cdef bint[:,:] b # perhaps we could support 8-bit bool too
b = (x == 2)
# b is now an array the shape of x, containing True where x[i,j] == 2
# Get sum of elements
import numpy as np
np.sum(x)
# As for printing/coercion to Python object, that remains
# TBD. Either memoryview, or a pretty-printing subclass
# of memoryview, implementing NumPy's __toarray__ protocol
# as well for better compatability
Here's what I do NOT want to include from NumPy:
# Get sum and mean
x.sum()
x.mean()
# and so on, you have to do np.sum(x).
# "Fancy indexing" is a mess because the returned object
# (due to implementation constraints) is a copy, not a view,
# thus being inconsistent with the above. My stance is that
# this can go in when we can support treating it as a view,
# instead of following NumPy with making a copy. I have ideas
# for how to do this.
# Get the intersecting array of rows 1, 4 and 5 and
# colums 2 and 1
new_data_copy = x[[1,4,5], [2,1]]
# Set the same intersection to 0. This is where NumPy gets
# really inconsistent; making an exception specifically
# in __setitem__ for this case.
x[[1,4,5], [2,1]] = 0
# modified x
# If y has length 3, pick out element 0 and 4
y[[True, False, False, True]]
...and so on.
--
Dag Sverre
More information about the Cython-dev
mailing list