[Cython] Assignment nodes, buffers and transforms

Dag Sverre Seljebotn dagss at student.matnat.uio.no
Sun Jul 20 15:30:30 CEST 2008


I've had some skirmishes with Robert and Stefan on this and I think it 
is time for a thread.

Currently, for buffers, I must somehow write code for two different 
cases that have to do with assignment:
  1. When buffers variables are somehow assigned to
  2. When buffers have items assigned to

The question is whether to write more code in Nodes.py to do this, or 
make it easier for transforms to react to assignments. The latter seem 
more intrusive to how you usually do things, however I also see more 
usecases than just buffers (especially for optimizing refcount in a 
better way while keeping the algorithmic code simple, as well as for 
parts of type inference, as well as compile-time evaluation, and just 
overall transform complexity in number of node types to deal with).

I'm not sure if I'm suggesting that there should only be one type of 
assignment node, but I'd love for them to be a bit different:

- SetItemNode
- SetAttributeNode
- SetCVarNode

...so that the node type starts corresponding to what is going on in C 
without having to examine the lhs. I feel that would simplify a lot of 
code, also existing code (and it could be made "backwards-compatible" 
through inheritance anyway). However that is a second step that relies 
on reducing to a single assignment node first (to avoid combinatorial 
explosion).

Current ways of assigning a variable x:
a) def f(x):
b) cdef type x = 2
c) x = 2
d) x, (y, z) = (2, (3, 4))
e) x = y = 2
f) except Exception, x
g) for x in C:

That's all I can think of (with is turned into c) already). All of these 
cases must be covered for both 1 and 2 above. What I've done so far is 
support a) and c) directly, and transformed b) into c) in the parsing stage.

What I *could* do now is move on and turn d) and e) into c) as well. The 
effect of all these things is to i) reduce the number of different node 
types, and ii) remove code from Nodes.py that requires you to know about 
how all code generation/analysis work and instead add code that requires 
you to know about writing tree transforms (I think that even if the 
latter has usually more lines of code, they all follow a fundamental 
principle and results in code that is more robust to changes other 
places in the code, more "isolated" code).

f) and g) can also be turned into c) but with a bit extra work and in 
less obvious ways. If you think about SetItemNode, SetAttributeNode etc. 
you see that it isn't wholly unatural for a loop to reduce into such 
instructions (that is of course what is happening today too, but less 
explicit).

All in all I'm unsure about this point -- I have to transform them all, 
or write BufferNodes to wrap namenodes in the tree anyway, in which case 
the transforms weren't really necesarry. So I ask for input. (Help me 
counter the exhilirating feeling I get from having Buffer.py "just work" 
in more cases by refactoring unrelated pieces of code :-) )

 From my own experiences, whenever I have to do something in Nodes.py or 
ExprNodes.py I invariably introduce bugs that it takes hours to find, 
because I do not really understand it. Of course I could learn, but I'd 
rather write the transforms _if that is a preferred direction anyway_. 
So the ideal preferred direction is what I'd like input on, and then 
I'll take the final call on work amount myself (and on how smart things 
I can cook up for f) and g), if they end up being hopeless then I'll 
drop it).


-- 
Dag Sverre


More information about the Cython-dev mailing list