[Cython] Draft for compile-time calculation inliner
Dag Sverre Seljebotn
dagss at student.matnat.uio.no
Sat Mar 15 23:52:18 CET 2008
> time calculations are a valuable optimization, but I think inlining
> (which one can explicitly request in the C output) and loop unrolling
> are well handled by GCC and is probably best handled at the this
> level for most things (for now at least).
Yes, I explained my rationale for that badly.
The reason I wanted to do loop unrolling is because I think it would look
very bad if the C code was littered with extra for-loops for every NumPy
lookup, see:
http://wiki.cython.org/enhancements/operators/ambitious
which I have updated and made clearer. About inlining, as long as it
wouldn't affect GCC's caching or noncaching of the stride calculations,
I'm fine without.
Details:
ctypdef class numpy ...
def __getitem__(self, index):
return (<int*>self.data)[self.strides[0] // 4 * index[0] +
self.strides[1] // 4 * index[0]]
The above is the code as it would look in the parse-tree *after*
compile-time optimization with the knowledge that type type is int and the
dimensions 2. Note that the loop has been unrolled, resulting in that list
of +.... also note that those stride calculations must either be cached by
GCC or must also happen...hmm...
I really should do some experiments I guess...
Dag Sverre
More information about the Cython-dev
mailing list