.. raw:: latex \thispagestyle{empty} \begin{center} \begin{tabular}{c} \hline \hline \\[0.7cm] {\Huge Towards a More Effective} \\ {\Huge Implementation of Python on Top of }\\ {\Huge Statically Typed Virtual Machines} \\[1cm] {\Large Antonio Cuni} \\[.5cm] {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[2.5cm] {\large Thesis Advisor:} \\ [1cm] {\Large Dott.~Davide Ancona} \\[.5cm] {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[2.5cm] {\large Thesis Proposal} \\[1cm] {\large PhD in Computer Science} \\ {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[0.7cm] \hline \hline \end{tabular} \vfill \end{center} Abstract ======== Despite the continuously growing interest in Python shown by both the academic and industrial communities, all the available implementations of this object-oriented language still suffer from performance problems, mainly because the techniques currently adopted are not clever enough for implementing Python semantics efficiently. The purpose of this thesis will be to develop new techniques, and possibly adapt old ones which have not been yet applied to Python, in order to provide a new, more efficient implementation of the language. Since Python is dynamically typed, it is very hard for the compiler to make assumptions on the types of objects a certain expression will denote during the execution of a program. Most of current performance issues are related to this lack of compile time information, therefore we believe that the design of an expressive type system is crucial for allowing the compiler to produce highly optimized code. Since Python is not statically typed, the type system we are planning to design will not perform any static checks, but rather instrument the compiler to perform code optimization, so that, in the best cases, the compiler will produce highly optimized code, while in the worst cases the performance of the generated code will be comparable with that obtained by the currently available implementations. More formally, this means that compilation is modeled as a function depending not only on the specific Python fragment, but also on the type assigned to it. Since the degree of optimization of the generated code strictly depends on the accuracy of the type information, particular care has to be taken to design the algorithm for assigning types to Python code. To tackle this challenging issue, we plan to adopt a hybrid technique based on different kinds of approaches: conventional type inference, optional type annotations, and type feedback. Instead of writing yet another Python implementation from scratch, we will modify PyPy to suit our needs. PyPy is an extremely flexible Python interpreter written in Python itself and translated to a number of different target languages, among which the most important are C, .NET and JVM. Because of the extreme similarity of .NET and JVM and thanks to the modular design of PyPy, most of the results obtained for .NET will apply also to the JVM one, and conversely. Our work will be mostly focused on .NET and JVM, but hopefully the newly developed techniques will also improve the performances of PyPy when compiled to C. Finally, we will try to exploit the developed type system also to enhance the interoperability between Python and the underlying platforms (either .NET or JVM); until now it has been extremely difficult to integrate Python with C# and Java code, because of the lack of static types, but we are confident that the type information generated for optimized compilation will be also useful to provide a more flexible way to combine Python with C# and Java code. The problem =========== At the moment, there are four main implementations of Python as a language: CPython is the reference implementation, written in C; Jython is written in Java and targets the JVM; IronPython is written in C# and targets the CLI (i.e., the VM of .NET); finally, PyPy is written in Python itself and targets a number of different platforms, including C, JVM and .NET. In the following, we are discussing CPython, when not explicitly stated otherwise; most of concepts are equally applicable to the other implementations, though. Running Python code is slow; depending on the benchmark used, Python programs can run up to 250 times slower than the corresponding programs written in C. There are several reasons that causes Python to be so slow; here is a rough attempt to list some of what we think are the most important ones: 1. interpretation overhead; 2. boxed arithmetic and automatic overflow handling; 3. extreme introspective and reflective capabilities; 4. dynamic lookup of methods and attributes; 5. dynamic dispatch of operations. Some of the following sections show the results of some benchmarks. The benchmarking machine runs a Linux 2.6.23.8 operating system on an Intel Core Due T7200 @ 2.00 GHz CPU and 2 GB of RAM. Interpretation overhead ----------------------- Python programs are compiled into bytecode which is then interpreted by a virtual machine; this causes roughly a 2.5x slowdown compared to executing native code [D12.1]_. Both Jython and IronPython do not suffer from this problem, since they emit bytecode for the underlying virtual machine (either JVM of CLI), which is then translated into native code by the JIT compiler. Boxed arithmetic and automatic overflow handling ------------------------------------------------ In Python all values are objects, numbers included; the default type for signed integers is ``int``, which represents native integers (i.e., numbers whose length is the same as a WORD, typically 32 bit). Moreover, Python provides also a ``long`` type, representing arbitrary precision numbers; recent versions of Python takes care of switching automatically to ``long`` whenever the result of an operations cannot fit into ``int``. The most natural way to implement these features is to allocate both values of type ``int`` and ``long`` in the heap, thus giving them the status of "objects" to switch from one to the other very easily. Unfortunately, this causes a severe performance penalty; below there is a simple loop doing only arithmetic operations, without overflows:: i = 0 while i < 10000000: i = i+1 Benchmarks show that this code runs about 40 times slower than the corresponding program written in C and compiled by ``gcc`` without optimizations. However, most of the integer objects used in Python programs does not escape the function in which they are used and never or rarely overflow, so they do not really need to be *effectively* represented as objects in the heap; a smart implementations could detect places where numbers can be stored "on the stack" and unboxed, and emit efficient machine code for those cases. The same discussion applies equally well also to floating point numbers, even if in this case we do not need to check for overflows at every operation. Extreme introspective and reflective capabilities ------------------------------------------------- Python offers a lot of ways to inspect and modify a running program. For example, you can find out the methods of a class, the attributes of an instance, the names and the values of all locals variables defined in the current scope, and so on. Moreover, often it is also possible to modify those information: it is possible to add, remove or modify methods of a class, change the class of an object, etc. All these features derive directly from the naive implementation of CPython, which makes them straightforward to implement. By contrast, it is very challenging to find alternative ways to implement them efficiently. In particular, CPython allows access to all the frames that compose the current execution stack: each frame contains information such as the instruction pointer, the frame of the caller and the local variables; moreover, under some circumstances it is possible to mutate the value of the local variables of the callers, as shown by the following example:: def fill_list(name): # get the frame object of the caller frame = sys._getframe().f_back # get the variable "name" in the caller's context lst = frame.f_locals[name] lst.append(42) def foo(): mylist = [] fill_list("mylist") print mylist # prints [42] Even though the abuse of such functionalities is strongly discouraged, they are needed by some kinds of applications such as debuggers. Dynamic lookup of methods and attributes ---------------------------------------- In Python, lookup of methods and attributes of an object is performed at run-time; moreover, there are ways to get and set attributes by passing their name as a string. For example, the following snippet is a valid Python program and prints 42 twice:: # an empty class class MyClass: pass # arguments passed to foo can be of any type def foo(target, flag): # the attribute x is set only if flag is True if flag: target.x = 42 obj = MyClass() # if we pass False, obj would not have any attribute x foo(obj, True) print obj.x print getattr(obj, "x") # use a string for the attribute name As usual, if on one hand this features allow a great degree of flexibility, on the other hand they are very difficult to be implemented efficiently; since the set of methods and attributes is not statically known, it is impossible to use a fixed memory layout for objects, like happens in the implementation of statically typed object-oriented languages based on the notion of virtual method tables. Instead, classes and objects are usually implemented using a dictionary that maps attributes (represented as strings) to values. Dynamic dispatch of operations ------------------------------ Most of operations in Python are dynamically overloaded, i.e., they can be dynamically applied to values of different types. Consider for example the + operator: depending on the types of its arguments, it can either add two numbers, concatenate two strings, append two lists, or call some user-defined method. This means that every time the virtual machine executes the ``BINARY_ADD`` operation, it needs to dispatch it to the proper implementation. This could lead to poor performances, especially inside a loop:: i = 0 while i < 1000: i = i+1 In the loop above, the virtual machine always dispatches ``BINARY_ADD`` to the same operation. In this example, it is very clear how information given by a type system could allow the compiler to statically dispatch ``BINARY_ADD``. Towards a possible solution =========================== We believe that the first step towards a more efficient Python implementation is to use a compiler instead of an interpreter; as we will see in the next sections, the interpretation overhead per se is only about 2.5x, but it prevents a lot of optimizations. Even though the kernel of Python is very easy to learn and has a simple semantics, a lot of technical details have to be considered in order to implement the whole language; since there is no standard specification, compliance is measured against the reference implementation, which is CPython. Usually it is quite easy, when writing a new implementation, to get an interpreter or a compiler that is *almost compliant to CPython*, but it is very hard to get one that is *completely compliant*; indeed, there exist certain complex Python programs which cannot be run under IronPython or Jython. So far, PyPy has proven to be the closest implementation to CPython, passing over 98% of CPython own regression tests; our goal is to reuse as much as possible PyPy's implementation, in order to get a compliant compiler. To give examples of how particular constructs could be compiled into machine code, we will often present Python code and the corresponding low-level code side-by-side for clarity; the low-level code will be often presented in a higher level language such as Java, C# or C. From Python bytecode to native machine code ------------------------------------------- Both CPython and PyPy provide a Python virtual machine that runs programs by interpreting Python bytecode. The Python VM is a stack based machine with very high level operations; most of Python's semantics is encoded *inside* the implementation of the single operations. For example, consider the following Python function and the corresponding bytecode:: def add(a, b): return a+b LOAD_FAST 0 (a) LOAD_FAST 1 (b) BINARY_ADD RETURN_VALUE Despite its simplicity, this function has a complex semantics, because as already explained the + operator is dynamically overloaded. Internally, all Python objects are represented by instances of classes which are subclasses of ``W_Root``. Currently, PyPy's VM is implemented by a main loop which iterates over the bytecode and dispatch each operation to a function with the same name; ideally, we can think of ``BINARY_ADD`` as a function that takes two ``W_Root`` parameters and returns a ``W_Root``. A possible translation of the above function into C# is (assuming that all operations are static methods of the ``Operations`` class):: public static W_Root add(W_Root a, W_Root b) { return Operations.BINARY_ADD(a, b); } From the interpreter to the compiler ------------------------------------ PyPy is composed of two main parts: an interpreter written in RPython [RPY]_ (a subset of Python designed to be easily analyzable) and a framework called the *Translation Toolchain* which takes the interpreter and compiles it into efficient executables. The main advantage of this approach is that the entire Python semantics is encoded in a relatively small and simple piece of software, the interpreter. Additional features are metaprogrammed and automatically woven in by the translation toolchain; another advantage of this approach is that if the additional features are generic enough, they can be also applied to interpreters other than Python, as long as they are written in RPython; the PyPy team did some successful experiments with implementations of JavaScript and Prolog. The following figure shows the high level architecture of PyPy: .. image:: arch.pdf :scale: 40 The Python interpreter is fed into the translation toolchain which then produces different kinds of Python implementations; at the moment, it can produce, among others, executables running on C/Posix platforms (including Linux, Windows and Mac Os X), .NET and JVM; moreover, it can produce Python interpreters with a Just In Time compiler for Intel i386 and PowerPC processors; it is important to underline that the JIT compiler is not written by hand but **automatically produced** by the translation toolchain. For details about how this works, see [D05.1]_, [D08.2]_ and [VMCDLS]_. Our goal is to enhance the JIT compiler generator to make it also an Ahead Of Time (i.e. static) compiler generator; the AOT compiler will exploit the same techniques as the JIT compiler, but at compile time rather than at run-time; this will allow us to share a lot of code between the JIT compiler generator and the AOT compiler generator, leading to at least two great benefits: - the AOT compiler generator will be much simpler to develop, since a lot of code has been already written and tested; - the JIT compiler generator will be hopefully improved with the features initially developed for the AOT; for example, our main targets for the AOT will be .NET and JVM, but it is very likely that we will be able to adapt the code generators to also work in the JIT case with a small effort. JIT vs AOT ---------- Although still experimental, the JIT compiler of PyPy is an advanced and high quality component that produces very fast code; however, there are situations in which we believe the JIT compiler may not be the most appropriate solution; in particular, emitting bytecode for .NET and JVM on the fly may not be as cheap as emitting native code for real processors such as i386 or PowerPc; moreover, the gap should be even worse when we need to dynamically *modify* the generated code, since neither .NET nor JVM allows dynamic modification of the bytecode of a method. Unfortunately, to our knowledge, in literature there are no works comparing AOT versus JIT bytecode generation for object oriented virtual machines such as the JVM and the CLI. One of the goals of this thesis is to compare the two approaches by analyzing their performances. Optimize the fast paths ----------------------- As we said above, Python is a very dynamic language: classes can be created and modified on the fly, objects can change class, the same function can be invoked on objects of many different types, etc. However, the fact that Python code **could** be so dynamic does not imply that it will always be so: many application developed in Python do not exploit all the dynamicity offered by the language. Our approach will be to find and optimize what we call *fast paths* inside the code; *fast paths* capture situations that occur often during the execution of the program, so optimizing them should lead to a great performance improvement. Every time we emit an optimized fast path, we also emit code that works in the general case, thus guaranteeing the correct Python semantics in all cases. Consider the following function:: def fn(a, b): return (a+b)*(a-b) Given what we said in section 3.1, the compiler would produce something like this (in C#):: public static W_Root fn(W_Root a, W_Root b) { W_Root tmp1 = Operations.BINARY_ADD(a, b); W_Root tmp2 = Operations.BINARY_SUB(a, b); return Operations.BINARY_MUL(tmp1, tmp2); } This function can work equally well on integers, strings, lists, tuples and user-defined types which overloads the + operator. However, it is possible (and likely) that the program will call it using only one or at most a few of the possible types. Suppose we found that most of the times this function is called on integers; we could generate something like this:: public static W_Root fn(W_Root a, W_Root b) { if (a.GetType() == typeof(W_Integer) && b.GetType() == typeof(W_Integer)) { int aa = a.ToInt(); int bb = b.ToInt(); return new W_Integer(fn_fast(aa, bb)); } else { W_Root tmp1 = Operations.BINARY_ADD(a, b); W_Root tmp2 = Operations.BINARY_SUB(a, b); return Operations.BINARY_MUL(tmp1, tmp2); } } public static int fn_fast(int a, int b) { return (a+b)*(a-b); } The first branch of the ``if`` is the fast path to be followed in most cases, while the ``else`` branch is a fallback for the general case. Actually, the code above is not correct because it ignores the automatic promotion to arbitrary precision numbers in case of overflow, but it gives an idea of how we intend to specialize the fast paths; for a more general solution to the overflow problem, see the next section. To determine the *hot* fast paths, i.e. those that would lead to a great performance benefit, we need to develop a type system for Python, which is the topic of section 4. Boxed arithmetic and automatic overflow handling ------------------------------------------------ In the previous section, we already saw an example of how to generate fast paths for functions that are expected to operate on numeric values; however, the approach described there does not work in all cases because in Python native integers are automatically promoted to arbitrary precision integers in case of overflow. The idea is to have two parallel implementation of the function: one optimized for the non-overflow case (the *fast path*), and the other (the *safe path*) which will work in any case, containing code very similar to what we would get without the optimizations. In the fast path, each operation that could overflow is checked; in case of overflow, the control flow is transferred to the safe path, paying attention to copy the whole local state of the function. To make this approach work, the safe path must accept multiple entry-points, one for each checked operation. Emitting functions with multiple entry-points is also a challenge, especially if we target a high level virtual machine such as CLI or JVM. As an example, consider the following function, that computes the Nth Fibonacci's number:: def fibo(N): a, b = 1, 1 for i in xrange(N-1): a, b = b, a+b return b For the sake of simplicity, there is only one operation that could overflows, i.e. the addition inside the loop; the ``xrange`` operation cannot overflow because ``xrange`` only accepts integers, not longs. First, let's see how to implement the safe path (assuming for simplicity that ``xrange`` is never overridden):: static W_Root fibo_safe(W_Root w_N) { int N = ConvertToInt(w_N); W_Root a = W_Int(1); W_Root b = W_Int(1); for(int i=0; i D`` represents a function which takes three arguments of type ``A``, ``B`` and ``C`` respectively, and returns an object of type ``D``. The knowledge of the precise type of a function will let the compiler to generate efficient code for calling it; for simplicity, at a preliminary stage of the development of the type system, functions that accept a non-fixed number of arguments could be assigned the type ``Any``, thus forcing the compiler to use a slower but more general calling mechanism. Type assignment =============== There are several possible strategies for assigning types to an expression. Since precise type inference for dynamic languages is usually very challenging, we envisage an hybrid approach based on the three different strategies described in the following sections. Static analysis (type inference) -------------------------------- One possible approach is to perform static analysis of the source code, trying to find the most precise type through type inference. Previous works such as Starkiller [STARK]_ and Brett Cannon's thesis [BCT]_ showed that static type inference for Python is possible through the use of the Cartesian Product Algorithm [CPA]_. Static analysis can be either global or compositional (a.k.a incremental); the latter is usually preferable, but for a language as Python a whole program analysis can lead to better results, so we will try to support both. Global static analysis can be used not only to annotate variables with types, but also to discover information about how they are used. One feature that is usually hard to implement is the possibility of making changes to classes at run-time; in Python it is possible to e.g. dynamically add a method to a class:: # an empty class class MyClass: pass def greet(self, who): print 'Hello', who def main(): obj = MyClass() MyClass.greet = greet obj.greet() For example, we might extend the static analysis to determine if a given class could eventually be modified or not; if we statically know that a class will never be modified, we can apply some optimizations that would be unsafe without this assumption. Unfortunately, in a highly dynamic language like Python static analysis cannot be always precise; consider for example the ``exec`` statement: it allows one to run an arbitrary string as Python code; since the string can be created dynamically, it is not really possible to statically know the effects of its execution; it turns out that in presence of an ``exec`` (or its cousin ``eval``) most of the statically inferred type information static assumptions are no longer valid or, at least, become less precise. Consider the following code:: def evil(name): exec "%s.greet = greet" % name evil("MyClass") This code add the method ``greet`` to ``MyClass``, but in a way that it is impossible to detect with static analysis; so, the mere presence of a single ``exec`` force us to consider all classes as eventually mutable. Optional annotations and user hints ----------------------------------- In cases when the type inference engine is not enough, we might allow the user to explicitly annotate some variables with the desired type; this feature would be useful e.g. to optimize a piece of code which is executed very often. Type annotations are not the only way the user can help the compiler to produce better code; consider again the example of mutable classes. To partially solve this problem, we could create a new ``FrozenType`` metaclass that does not allow any modification after the class has been created; then, the compiler could exploit this extra knowledge to produce better code:: class MyClass: __metaclass__ = FrozenType def greet(self, who): print 'Hello', who MyClass.greet = None # TypeError (at run-time) It is important to underline that we do not need to change Python semantics or break compatibility: ``FrozenType`` would be a custom metaclass, written in Python and running on any Python implementation with the desired semantics: there would be no noticeable difference between running that program in CPython, IronPython, Jython or PyPy, provided that each of those implements metaclasses correctly. Type feedback (without dynamic recompilation) ---------------------------------------------- In a dynamic language like Python, there are cases in which static analysis could be too conservative to make some optimizations effective; furthermore, the user might not be able to provide useful hints when type inference fails. Type feedback is another possible solution based on software profiling: the program is generated without optimizations, and then, during execution useful type information are collected, which can be usefully exploited to perform better optimization when the program is re-compiled. By design, type feedback cannot be precise because you are not guaranteed to have seen *all* the types that might happen at run-time; in terms of our type system, it means that if we saw that an expression contained values of types ``A`` and ``B``, we will assign it the type ``Case(Any, A, B)``. Type feedback with dynamic specialization/recompilation -------------------------------------------------------- The main disadvantage of the approach of type feedback is that it requires to run the program in advance in order to optimize it for successive executions; moreover, it is not guaranteed that each execution path is followed, thus leading to sub-optimal results. To overcome this limitation, we might postpone the specialization of functions until they are really needed, i.e. compile them just in time, when we know all the information we need. This is exactly what PyPy's JIT does currently for the i386 and PowerPC backend; in case of JVM and CLI it would mean to dynamically generate JVM or IL bytecode, which will in turn be compiled by the native JIT of those VM; as far as we know from reading the literature, this approach has never been tried. Combinations of the previous ---------------------------- We expect best results by combining all the techniques described above: first, a static analysis of the program can find all the places where we know the precise type of some expression; then, we can gain extra information through type annotations and type feedback. Finally, dynamic code generation can still produce fast code for paths that were not covered by the previous steps. Workplan ======== The following points sketch the work plan of the proposed research: 1. Write a JVM or a CLI backend for the PyPy JIT compiler generator; at this stage the backend does not need to be complete: in particular, the ability to dynamically modify already generated code is not needed to work on subsequent points; 2. Modify the JIT compiler generator into an AOT compiler generator; 3. Implement the optimizations described in sections 3.4 to 3.7; 4. Add the ability to optimize based on manual annotations; 5. Design and implement the static type inference engine; 6. Design and implement the type feedback engine; 7. Complete the work on the JIT backend developed in point 1; in particular, we would need to find and implement an efficient way to dynamically modify generated code for .NET or JVM. Points 1 and 2 have the highest priority; however, remaining tasks need *not* be performed in a strictly sequential way; points 4, 5, 6 or 7 can be independently developed, but all depend on point 3. Related work ============ The idea of annotating only parts of a program with specific types, and all the rest with the type ``Any`` is similar to the solution proposed by *Gradual typing* [GTO]_, although the focus for that work is on type safety while our focus is on code optimization. Type feedback and static type inference for dynamic languages have been already studied for SELF [TFTI]_. This reference is particularly interesting because both the type feedback and type inference approaches it studies are closely related to how we intend to handle the problem of type assignment. In particular, the type inference engine described there makes use of the Cartesian Product Algorithm [CPA]_, as we plan to do. Our major contribution over that work will be to combine the two approaches of type feedback and type inference to give the compiler a better approximation of the real types of the expressions. Starkiller [STARK]_ is a static type inference engine for Python based on the Cartesian Product Algorithm; it gave good results, but it didn't implement the full Python semantics because it failed, e.g., on automatic overflow handling; moreover, the source code has never been released. The main contribution of this thesis over Starkiller is that our plan is to generate the compiler automatically instead of writing it by hand: this would solve Starkiller's biggest problem, lack complete compliance with CPython. Also, our work will be focused on emitting code for object-oriented virtual machines instead of directly emitting machine code. In his master thesis [BCT]_, Brett Cannon tried to improve performances of CPython through static analysis and type inference. He concluded that Python is not suitable for type inference, but we believe this result was driven by the fact that his implementation was based on CPython's bytecode interpreter. We believe we can get much better results by compiling directly to native code. .. References .. [D05.1] `Compiling Dynamic Language Implementations`, PyPy EU-Report, 2005 .. [D08.2] `JIT Compiler Architecture`, PyPy EU-Report, 2007 .. [D12.1] `High-Level Backends and Interpreter Feature Prototypes`, PyPy EU-Report, 2007 .. [VMCDLS] `PyPy's approach to virtual machine construction`, Armin Rigo, Samuele Pedroni, in OOPSLA '06: Companion to the 21st ACM SIGPLAN conference on Object-oriented programming languages, systems, and applications, pp. 944-953, ACM Press, 2006 .. [RPY] `RPython: a Step Towards Reconciling Dynamically and Statically Typed OO Languages`, Davide Ancona, Massimo Ancona, Antonio Cuni, Nicholas D. Matsakis, in OOPSLA 2007 Companion, Dynamic Language Symposium 2007, 2007 .. [OOPIC] `Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic Inline Caches`, Urs Hlzle, Craig Chambers, and David Ungar, In ECOOP'91 Conference Proceedings, Geneva, 1991. Published as Springer Verlag Lecture Notes in Computer Science 512, Springer Verlag, Berlin, 1991. .. [GTO] `Gradual Typing for Objects`, Jeremy Siek and Walid Taha in ECOOP 2007: 21st European Conference on Object-Oriented Programming .. [TFTI] `Type Feedback vs Concrete Type Inference: A Comparison of Optimisation Techniques for OO Languages`, O. Agesen and U. Holzle, in OOPSLA '95, pages 91-107, 1995 .. [STARK] `Faster than C: Static Type Inference with Starkiller`, Mike Salib, in PyCon 2004 .. [BCT] `Localized Type Inference of Atomic Types in Python`, Brett Cannon, Master Thesis, 2005. .. [CPA] `The cartesian product algorithm`, Ole Agesen, In W. Oltho, editor, ECOOP'05 - Object-Oriented Programming, volume 952 of Lecture Notes in Computer Science, pages 2-26. Springer, 1995. .. [ITR] `The Infer Type Refactoring and its Use for Interface-Based Programming`, Friedrich Steimann, Journal of Object Technology 6(2): (2007) .. [PyCC] `Python Reference Manual: Calls`, Guido Van Rossum, http://docs.python.org/ref/calls.html