\chapter{The \emph{CLI} backend} \label{cha:gencli} As we saw in section \ref{sec:bigpicture} the goal of \gencli\ is to compile RPython programs to the \emph{CLI} virtual machine. This chapter explains both how \gencli\ works and the reasons behind its design, giving the pros and the cons of the alternatives that came up during the development. Most of the code belonging to \gencli\ is located in the \texttt{pypy.translator.cli} subpackage, so referred \gencli\ modules are located in the \texttt{pypy/translator/cli/} subdirectory. \section{Target environment and language} \label{sec:cli-target} The target of \gencli\ is the \emph{Common Language Infrastructure} environment as defined by \cite{ecma-335}. While in an ideal world we might suppose \gencli\ to run fine with every implementation conforming to that standard, we know the world we live in is far from ideal, so extra efforts can be needed to mantain compatibility with more than one implementation. At the moment of writing the two most popular implementations of the standard are supported: Microsoft \textbf{Common Language Runtime (CLR)} \cite{clr} and \textbf{Mono} \cite{mono}. Then we have to choose how to generate the real executables. There are two main alternatives: generating source files in some high level language (such as C\#) or generating assembly level code in \textbf{Intermediate Language (IL)}. The \emph{IL} approach is much faster during the code generation phase, because it doesn't need to call a compiler. By contrast the high level approach has two main advantages: \begin{itemize} \item the code generation part could be easier because the target language supports \textbf{high level control structures} such as structured loops; \item the generated executables take advantage of \textbf{compiler's optimizations}. \end{itemize} In reality the first point is not an advantage in the \pypy\ context, because as we saw in section \ref{sec:flowmodel} the flow graph we start from is quite \textbf{low level} and Python loops are already expressed in terms of branches (i.e., \texttt{goto}s). About the compiler optimizations we must remember that the flow graph we receive from earlier stages is already optimized: \pypy\ implements a number of optimizations such a \textbf{constant propagation} and \textbf{dead code removal}, so it's not obvious if the compiler could do more. Moreover by emitting IL instruction we are not constrained to rely on compiler choices but can directly choose how to map \emph{ootypesystem} operations to \textbf{CLI opcodes} (see section \ref{sec:cli-instructions}): since the backend often know more than the compiler about the context, we might expect to produce more efficient code by selecting the most appropriate instruction; e.g., we can check for \textbf{arithmetic overflow} only when strictly necessary (see section \ref{sec:cli-exceptions}). The last but not least reason for choosing the low level approach is \textbf{flexibility} in how to get an executable starting from the IL code we generate: \begin{itemize} \item we can write IL code to a file, then call the \textbf{\texttt{ilasm} assembler}; \item we can directly generate code on the fly by accessing the facilities exposed by the \textbf{\texttt{System.Reflection.Emit} API}. \end{itemize} The second point is not feasible yet because at the moment there is no support for accessing system libraries, but in future it could lead to an interesting \gencli\ feature, i.e. the ability to \textbf{emitting dynamic code} at runtime. \section{Handling platform differences} \label{sec:cli-sdk} Since our goal is to support both \emph{Mono} and \emph{Microsoft CLR} we have to handle the differences between the twos; in particular the main differences are in the name of the helper tools we need to call: \begin{itemize} \item we call \texttt{ilasm} on CLR and \texttt{ilasm2} on Mono to assemble IL files; \item we call \texttt{csc} on CLR and \texttt{gmcs} on Mono to compile C\# files; \item on Mono we need to call the runtime \texttt{mono} to execute programs, while on CLR we can start them directly. \end{itemize} The code that handles these differences is located in the \texttt{sdk.py} module: it defines an abstract class exposing some methods returning the name of the helpers and one subclass for each of the two supported platforms, as shown by listing \ref{lst:sdk}. \begin{SaveVerbatim}{Side1} class MicrosoftSDK(AbstractSDK): RUNTIME = [] ILASM = 'ilasm' CSC = 'csc' \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} class MonoSDK(AbstractSDK): RUNTIME = ['mono'] ILASM = 'ilasm2' CSC = 'gmcs' \end{SaveVerbatim} \CompareWithSize{Platform SDK specification}{lst:sdk}{7cm}{6cm} Then, we choose the default SDK to use based on the platform we are running on: \texttt{MicrosoftSDK} on Windows, \texttt{MonoSDK} on other platforms. \section{Targeting the CLI Virtual Machine} \label{sec:cli-targeting} In order to write a CLI backend we have to take a number of decisions. First, we have to choose the typesystem to use: given that CLI natively supports primitives like classes and instances, \textbf{ootypesystem} is the most natural choice (see chapter \ref{cha:ootypesystem}). Once the typesystem has been chosen there is a number of steps we have to do for completing the backend: \begin{itemize} \item map ootypesystem's types to CLI \textbf{Common Type System}'s types; \item map ootypesystem's low level operation to \textbf{CLI instructions}; \item map Python \textbf{exceptions} to CLI exceptions; \item write a \textbf{code generator} that translates a flow graph into a list of CLI instructions; \item write a \textbf{class generator} that translates ootypesystem's classes into CLI classes. \end{itemize} \section{Mapping primitive types} \label{sec:cli-primitive-types} As discussed in section \ref{sec:rpython} the RTyper give us a flow graph annotated with types belonging to \emph{ootypesystem} (see chapter \ref{cha:ootypesystem}): in order to produce CLI code we need to translate these types into their \textbf{Common Type System} equivalents. For \textbf{numeric types} the conversion is straightforward, since there is a one-to-one mapping between the two typesystems, so that e.g. \texttt{Signed} maps to \texttt{int32} and \texttt{Float} maps to \texttt{float64}. For \textbf{character types} the choice is more difficult: RPython has two distinct types for plain ASCII and Unicode characters (named \texttt{Char} and \texttt{UniChar}), while .NET only supports Unicode with the \texttt{char} type. There are at least two ways to map plain \texttt{Char} to CTS: \begin{itemize} \item map \texttt{Char} to \texttt{int8} and \texttt{UniChar} to \texttt{char}, thus mantaining the original distinction between the two types: this has the advantage of being a one-to-one translation, but has the disadvantage that RPython \textbf{strings will not be recognized} as .NET strings, since they only would be sequences of bytes; \item map both \textbf{Char} and \textbf{UniChar} to \textbf{char}, so that Python strings will be treated as strings also by .NET: in this case there could be problems with existing Python modules that use strings as sequences of byte, such as the built-in \texttt{struct} module, so we need to pay special attention. \end{itemize} We think that \textbf{mapping Python strings to .NET strings} is fundamental, so we chose the second option. The code that implements the \textbf{type-mapping} is located in the module \texttt{cts.py}. \section{Mapping built-in types} \label{sec:cli-built-in-types} As we saw in section \ref{sec:oogenerics}, \emph{ootypesystem} defines a set of types that take advantage of \textbf{built-in} types offered by the platform. For the sake of simplicity we decided to write \textbf{wrappers} around .NET classes in order to match the signatures required by \emph{ootypesystem}. These wrappers are in \emph{pypylib.dll} (see section \ref{sec:cli-rte}); table \ref{tab:cli-built-in} shows the .NET classes which they are built on top of. \begin{table}[ht] \small \begin{tabular}{|l|l|} \hline String & \texttt{System.String} \\ StringBuilder & \texttt{System.Text.StringBuilder} \\ List & \texttt{System.Collections.Generic.List} \\ Dict & \texttt{System.Collections.Generic.Dictionary} \\ CustomDict & \emph{not implemented, yet} \\ DictItemsIterator & \texttt{pypy.runtime.DictItemsIterator} \\ \hline \end{tabular} \caption{\gencli\ built-in types} \label{tab:cli-built-in} \end{table} \sloppypar{Wrappers exploit inheritance for wrapping the original classes, so, for example, \texttt{pypy.runtime.List} is a subclass of \texttt{System.Collections.Generic.List} that provides methods whose names match those found in the \texttt{\_GENERIC\_METHODS} of \texttt{ootype.List}.} The only exception to this rule is the \texttt{String} class, which is not wrapped since in .NET we can not subclass \texttt{System.String}. Instead, we provide a bunch of \texttt{static methods} in \emph{pypylib.dll} that implement the methods declared by \texttt{ootype.String.\_GENERIC\_METHODS}, then we call them by explicitly passing the string object in the argument list. Listing \ref{lst:cli-adt-types} shows an excerpt of both the \texttt{List} and the \texttt{String} classes: note how the two implementations differ, because on the left we have an \textbf{instance method} (hence we use \texttt{this}), while on the right we have a plain \textbf{static method}. \begin{SaveVerbatim}{Side1} public class List: System.Collections.\ Generic.List { public int ll_length() { return this.Count; } ... } \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} public class String { public static int ll_strlen(string s) { return s.Length; } } \end{SaveVerbatim} \compare{Wrappers around built-in types}{lst:cli-adt-types} \section{Mapping instructions} \label{sec:cli-instructions} \pypy's low level operations are expressed in \textbf{Static Single Information} (SSI) form; they looks like listing \ref{lst:add}.1, where \texttt{v0} and \texttt{v1} are the arguments of the operation and \texttt{v2} is the result. By contrast the CLI virtual machine is \textbf{stack based}, that means the each operation pops its arguments from the top of the stacks and pushes its result there. The most straightforward way to translate SSI operations into stack based operations is to \textbf{explicitly load the arguments and store the result} into the appropriate places. Listing \ref{lst:add} shows an example of how basic operations are translated: the code produced works correctly but has some inefficiency issue that can be addressed during the optimization phase. \begin{SaveVerbatim}{Side1} v2 = int_add(v0, v1) \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} LOAD v0 LOAD v1 int_add STORE v2 \end{SaveVerbatim} \compare{Example of basic operation}{lst:add} The CLI Virtual Machine is fairly expressive, so the conversion between \textbf{\pypy's low level operations} and \textbf{CLI instruction} is relatively simple: many operations maps directly to the correspondent instruction, e.g \texttt{int\_add} and \texttt{int\_sub} maps to \texttt{add} and \texttt{sub}. By contrast some instructions do not have a direct correspondent and have to be rendered as a \textbf{sequence of CLI instructions}: this is the case of the ``less-equal'' and ``greater-equal'' family of instructions, that are rendered as ``greater'' or ``less'' followed by a boolean ``not'', respectively. Finally, there are some instructions that cannot be rendered directly without increasing the complexity of the code generator, such as \texttt{int\_abs} (which returns the absolute value of its argument). These operations are translated by calling some \textbf{helper function} written in C\# (see section \ref{sec:cli-rte}). The code that implements the mapping is in the modules \texttt{metavm.py} and \texttt{opcodes.py}. \section{Mapping exceptions} \label{sec:cli-exceptions} Both RPython and CLI have its own set of \textbf{exception classes}: some of these are pretty similar; e.g., we have \texttt{OverflowError}, \texttt{ZeroDivisionError} and \texttt{IndexError} on the first side and \texttt{OverflowException}, \texttt{DivideByZeroException} and \texttt{IndexOutOfRangeException} on the other side. The first attempt was to map RPython classes to their corresponding CLI ones: this worked for simple cases, but it would have triggered subtle bugs in more complex ones, because the two \textbf{exception hierarchies don't} completely \textbf{overlap}. At the moment we've choosen to build an RPython exception hierarchy completely \textbf{independent} from the CLI one, but this means that we can't rely on exceptions raised by \textbf{built-in operations}. The currently implemented solution is to do an \textbf{exception translation} on-the-fly. As an example consider the RPython \texttt{int\_add\_ovf} operation, that sums two integers and raises an \texttt{OverflowError} exception in case of overflow. For implementing it we can use the built-in \texttt{add.ovf} CLI instruction that raises \texttt{System.OverflowExcepion} when the result overflows, catch that exception and throw a new one, as shown in listing \ref{lst:ovf}. \begin{program} \begin{verbatim} .try { ldarg 'x_0' ldarg 'y_0' add.ovf stloc 'v1' leave __check_block_2 } catch [mscorlib]System.OverflowException { newobj instance void class OverflowError::.ctor() throw } \end{verbatim} \label{lst:ovf} \caption{Exception translation} \end{program} \subsection{A possible optimization} \label{sec:gencli-exception-opt} Though we haven't measured timings yet we can guess that this machinery brings to some performance penalties even in the non-overflow case; a possible optimization is to do the on-the-fly translation only when it is \textbf{strictly necessary}, i.e. only when the except clause catches an exception class whose \textbf{subclass hierarchy} is compatible with the built-in one. As an example, consider listing \ref{lst:IndexError}.1: since \texttt{IndexError} has no subclasses, we can map it to \texttt{IndexOutOfBoundException} and \textbf{directly catch this one}, as shown by listing \ref{lst:IndexError}.2 \begin{SaveVerbatim}{Side1} try: return mylist[0] except IndexError: return -1 \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} try { ldloc 'mylist' ldc.i4 0 call int32 getitem(MyListType, int32) ... } catch [mscorlib] System.IndexOutOfBoundException { // return -1 ... } \end{SaveVerbatim} \CompareWithSize{Exception mapping optimization}{lst:IndexError}{4cm}{9cm} By contrast we can't do so if the except clause catches classes that don't directly map to any built-in class, as shown by listing \ref{lst:LookupError}. \begin{SaveVerbatim}{Side1} try: return mylist[0] except LookupError: return -1 \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} .try { ldloc 'mylist' ldc.i4 0 .try { call int32 getitem(MyListType, int32) } catch [mscorlib] System.IndexOutOfBoundException { // throw a fresh exception newobj instance void class IndexError::.ctor() throw } ... } .catch LookupError { // return -1 ... } \end{SaveVerbatim} \CompareWithSize{Exception translation}{lst:LookupError}{4cm}{9cm} \section{Translating flow graphs} \label{sec:cli-translation} As we saw in section \ref{sec:flowmodel} in \pypy\ function and method bodies are represented by \textbf{flow graphs}, so we need to translate them to CLI IL code. Flow graphs are expressed in a format that is very suitable for being translated to low level code, so that phase is quite straightforward, though the code is a bit involed because we need to take care of three different types of blocks. The code doing this work is located in the \texttt{Function.render} method in the file \texttt{function.py}. First of all it \textbf{searches for variable} names and types used by each block; once they are collected it emits a \texttt{.local} IL statement used for indicating the virtual machine the number and type of local variables used. Then it sequentally renders all blocks in the graph, starting from the \textbf{start block} (see section \ref{sec:FunctionGraph}); special care is taken for the \textbf{return block} which is always rendered at last to meet CLI requirements. Each block starts with an \textbf{unique label} that is used for jumping across, followed by the low level instructions the block is composed of; finally there is some code that jumps to the appropriate next block. Conditional and unconditional jumps are rendered with their corresponding IL instructions: \texttt{br}, \texttt{brtrue}, \texttt{brfalse}. Blocks that needs to \textbf{catch exceptions} use the native facilities offered by the CLI virtual machine: the entire block is surrounded by a \texttt{.try} statement followed by as many \texttt{catch} as needed: each catching sub-block then branches to the appropriate block, as shown by listing \ref{lst:TryCatch}. \begin{SaveVerbatim}{Side1} try: # block0 ... except ValueError: # block1 ... except TypeError: # block2 ... # block3 \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} block0: .try { ... leave block3 } catch ValueError { ... leave block1 } catch TypeError { ... leave block2 } block1: ... br block3 block2: ... br block3 block3: ... \end{SaveVerbatim} \compare{Exception handling}{lst:TryCatch} \section{Translating classes} \label{sec:cli-classes} As we saw in section \ref{sec:ootypes}, the semantic of \emph{ootypesystem} classes is very similar to the .NET one, so the translation is mostly straightforward. The related code is located in the module \texttt{class\_.py}. Rendered classes are composed of four parts: \begin{itemize} \item fields; \item user defined methods; \item default constructor; \item the \texttt{ToString} method, mainly for testing purposes (see section \ref{sec:cli-testing}). \end{itemize} All user defined methods are declared as \texttt{virtual}, since \emph{ootypesystem} implicitly assumes method calls to be late bound. As a future optimization we could check if the \texttt{virtual} flag is really needed, and drop it if it's not. The constructor does nothing more than initializing class fields to their default value. Inheritance is straightforward too, as it is natively supported by CLI. The only noticeable thing is that we map \emph{ootypesystem}'s \texttt{Root} class (see section \ref{sec:ootypes}) to the CLI equivalent \texttt{System.Object}. \section{The Runtime Environment} \label{sec:cli-rte} The \textbf{runtime environment} is a collection of helper classes and functions used and referenced by many of the \gencli\ submodules. It is written in C\#, compiled to a \emph{DLL} (Dynamic Link Library), then linked to generated code at compile-time. It is composed of two files: a C\# source file containing the real code (\texttt{src/pypylib.cs}), and a Python module (\texttt{rte.py}) which ensures the library is \textbf{recompiled whenever the source if modified}, thus preventing bug due to forget to recompile the library. \texttt{pypylib} is composed of three parts: \begin{itemize} \item a set of helper functions used to implements complex RPython low-level instructions such as \texttt{runtimenew} and \texttt{ooparse\_int} (see section \ref{sec:cli-instructions}),; \item a set of helper classes wrapping built-in types, as we saw in section \ref{sec:cli-built-in-types}; \item a set of helpers used by the test framework (see section \ref{sec:cli-testing}). \end{itemize} The first two parts are contained in the \texttt{pypy.runtime} namespace, while the third is in the \texttt{pypy.test} one. \section{Testing \gencli} \label{sec:cli-testing} As the whole \pypy, \gencli\ is a test-driven project: there is at least one \textbf{unit test} for almost each single feature of the backend. This development methodology allowed us to early discover many subtle bugs and to do some big refactoring of the code with the confidence not to break anything. We made a big effort on writing good tests: at the moment of writing there are 310 \gencli\ unit tests, composed by about one thousand of lines, i.e. one third of the global three thousands lines of code \gencli\ is composed of. The core of the testing framework is in the module \texttt{pypy.translator.cli.test.runtest}; one of the most important function of this module is \texttt{compile\_function()}: it takes a Python function, compiles it to CLI and returns a Python object that runs the just created executable when called. This way we can test \gencli\ generated code just as if it were a simple Python function; we can also directly run the generated executable, whose default name is \texttt{main.exe}, from a shell: the function parameters are passed as command line arguments, and the return value is printed on the standard output, as shown by listing \ref{lst:cli-compile-function}. \begin{SaveVerbatim}{Side1} from pypy.translator.cli.test.runtest\ import compile_function def foo(x, y): return x+y, x*y f = compile_function(foo, [int, int]) assert f(3, 4) == (7, 12) \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} $ mono main.exe 3 4 (7, 12) \end{SaveVerbatim} % $ % to stop the broken xemacs syntax highlighting \CompareWithSize{Implicit and explicit execution of CLI code}{lst:cli-compile-function}{8.5cm}{4.5cm} \gencli\ supports only few RPython types as parameters: \texttt{int}, \texttt{r\_uint}, \texttt{r\_longlong}, \texttt{r\_ulonglong}, \texttt{bool}, \texttt{float} and one-length strings (i.e., chars). By contrast, most types are fine for being returned: these include all primitive types, list, tuples and instances. There are some \gencli\ features whose only purpose is to support the test framework. In particular, as we saw in section \ref{sec:cli-rte}, \emph{pypylib.dll} contains some helper functions that formats CLI objects in a way that can be understood by Python, to be used when printing the result value of a function. % TODO: optimizations?