[Cython] Cython array type: Summary, introducing CEP 518
Dag Sverre Seljebotn
dagss at student.matnat.uio.no
Wed Jun 17 18:57:12 CEST 2009
Robert Bradshaw wrote:
> On Jun 15, 2009, at 9:12 AM, Dag Sverre Seljebotn wrote:
>
>> Thanks to everybody who contributed to the discussion on a Cython
>> array
>> type last week! Here's a summary to attempt focusing the discussion.
>>
>> There are now two CEPs:
>> - CEP 517, array type: http://wiki.cython.org/enhancements/array
>> - CEP 518, SIMD operations: http://wiki.cython.org/enhancements/simd
>>
>> I mostly just added a "what does this facilitate"-section is added
>> near
>> the beginning of each, and the multidimensional aspect of the
>> arrays has
>> been emphasised. No need to reread it.
>
> Looks good. I assume this supersedes http://wiki.cython.org/
> enhancements/buffersyntax ; are there any other wiki pages that are
> made obsolete by these proposals?
It is connected with http://wiki.cython.org/enhancements/arraytypes,
although not all points are in (like conversion to list), and
furthermore it should perhaps be made into your and Stefan's proposed
type (which you called [int] or int[]) instead with + for concatenation.
BTW, those list-like types can likely share a lot of implementation with
my proposed int[:]; the major change would be restricting it to 1D,
different arithmetic behaviour, and, say, default coercion to list
instead of memoryview (perhaps!, but let's not go there now). But it
could still be PEP 3118-backed, coerceable from C pointers, etc., and
share implementation for that. Basically it would be two different
frontends to the same underlying type.
> I still have some questions, but I am certainly in favor of something
> like this happening.
Yes, I didn't intend to solve all the questions now, just .
I suppose I'm still waiting for Stefan's opinion though, given his last
comment last week. If positive, I think me and Kurt can do the main work
with hammering out the details, though of course you can comment then as
much (or little) as you want.
(I notice that I'm promoted to lead developer on the Cython front page
-- thanks! -- but I don't take it for granted that it should be a case
of majority vote, and at any rate I'd never push for something which
would make Stefan less interested in the project.)
> One thing that isn't quite clear is how exactly the reference
> counting/memory allocation is going to work. You give an example of
> explicitly creating an int[:,:] via int[:100,:100](). Would some kind
> of memoryview be created in the background? A string to hold the
> data? (This doesn't have to be decided now, just curious.) This is
> also needed for the copy "method," or any implicit copying that happens.
>
> On a related note, it's still a bit unclear how these things can be
> passed around and stored. Are they just a Py_buffer + PyObject*? (I'm
> hoping you're thinking they can be passed around and stored with
> ease, with allocation either take care of by the corresponding object
> (which will clean up the memory when it gets collected) or if there
> is no object attached, the user needs to treat it as they would a raw
> pointer).
I skipped over it as it is a long story. If you really are curious:
And int[:] is a pass-by-value struct, containing subslice info and a
reference to an acquired view, which in turn is acquired from a
memory-holding object.
This seems heavy but really is necesarry due to how PEP 3118 is. ("The
only problem which can't be solved by another layer of indirection is
too many layers of indirection.")
In detail:
There are three levels:
1) Memory-holding object. When Cython allocates, we need a new type
(probably inheriting directly from object), which allocates memory and
stores shape/stride information, and returns the right information in
tp_getbuffer. Would be a 20-liner in Cython unless we want to go with
PyVarObject for allocation in which case I suppose it is a 200-liner in C.
2) The acquired Py_buffer on that object. That would happen the same way
as with every array; preferably by using memoryview, or if not, another
custom Python type (need to backport memoryview anyway) or if we can't
seem to avoid it a custom refcounted struct.
3) Accessing Py_buffer directly is too inefficient, so it must be
unpacked in to a custom struct on the stack which basically holds
shape/stride information and a reference to the Py_buffer-holding thing
in 2). This is the actual variable/temporary type, and is passed-by-value.
When taking a slice, the struct in 3) is copied (while adjusting the
shape/strides), increfing the view 2) in the process, and 2) holds on to 1).
Then there's fields and global variables which call for seperate
decisions; probably the most consistent, and efficient in both speed and
memory, is to store the structs of 3), although it is a bit
counter-intuitive.
> Until CEP 518, how would arithmetic happen? (Assuming I'm to lazy to
> write the loops myself) would I create two numpy objects to wrap
> them, add those objects, and then "unpack" the result. For large
> enough datasets, that's not too much overhead.
I suppose the real answer is that if you're too lazy to write the loops
yourself, you'll be using ndarray[int] in the first place :-)
But yes, you can do
cdef int[:] a = ..., b = ...
a = np.array(a) + b
(Or, not 100% sure, but at least
a = np.array(a) + np.array(b)
will work).
Even before NumPy supports PEP 3118 this can work, as we can coerce
int[:] to our own subclass of memoryview which implements NumPy's
__toarray__ special function.
--
Dag Sverre
More information about the Cython-dev
mailing list