\section{RPython Front-end}\label{frontend-sect} \begin{figure} \centering \epsfig{file=architecture2, scale=0.3} \caption{The compiler architecture.}\label{arch-fig} \end{figure} The overall architecture of the compiler is summarized in Figure~\ref{arch-fig}. The RPython source objects are first analyzed by the \emph{Flow Graph Generator}, which creates flow graphs representing the control flow, but without any type information. These flow graphs are analyzed by the \emph{Annotator}, which adds annotations about the type and other details of the values in the program. In compiler terminology, these annotated flow graphs correspond to a High-Level IR, as they are defined in terms of the source language. These annotated flow graphs are then lowered by the RTyper to lower-level graphs more suitable for code generation. These lower-level graphs correspond to a mid- or low-level IR in a typical compiler, as they target a simplified model of the actual machine for which code is eventually generated. Finally, the chosen backend is used to produce actual executable code based on these lower-level graphs. In Section \ref{backend-sect} of this paper, we discuss two such backends, for the .NET and JVM platforms, in detail. The rest of this section provides an overview of the workings of the Flow Graph Generator and Annotator; for full details, however, the reader is referred to the corresponding publications of the PyPy project (\cite{RigoPedroni06, D05.1, PyPyTrans}). \subsection{Flow Graph Generator and Annotator}\label{sect:annot} The \emph{Flow Graph Generator} uses abstract interpretation to transform the live Python objects which form the source code of an RPython program into a set of control flow graphs. The resulting flow graphs are in an untyped Static Single Information (SSI) form, and the operations are at a very high level, corresponding directly to their RPython origins. SSI \cite{ananian99static} is an intermediate format derived from SSA in which no variable is modified, and no variables are live across basic blocks. The Annotator exploits the SSI format to perform a static analysis of the flow graphs which assigns each variable a specific RPython type (see also Section~\ref{typesystem}) which describes all of the possible types the variable may have at run-time. If no suitable RPython type can be found for a variable, then the program is rejected and processing stops. Note that both the Flow Graph Generator and the Annotator perform a global analysis of a program as a whole. The user is required to supply a main function that serves as an entry point, and a set of type annotations for any functions that parameter requires. While this approach works fine for PyPy's main purpose of writing interpreters, it has drawbacks when attempting to use RPython as a more general purpose language (see Section~\ref{futureWork-sect}). The types inferred by the annotator correspond directly to the RPython types. For example, a variable may be annotated with the type \lstinline{SomeList(SomeInteger())}; such a list exposes most of the operations of a standard Python list, with the difference that the types of elements are known to be integers, rather than being any object, as in Python. \subsection{RTyper} RTyper is an abbreviation for ``RPython low-level typer'' and is the component which bridges between the high-level flow graphs of the front-end and the low-level details required by the back-end. While it should be possible in theory to write a back-end which operates directly on the flow graphs, it is generally too complicated to be feasible without an intermediate processing stage. In addition, moving analysis and transformation into the RTyper allows it to be shared between backends. The RTyper performs two main kinds of transformations on the typed flow graphs: \begin{itemize} \item each annotation is translated from a high-level RPython type into an appropriate lower-level type; \item correspondingly, each high-level operation contained in a block is translated into one or few corresponding lower-level operations. \end{itemize} In fact, the RTyper itself is customized by a specific type system, which determines the exact lower-level types and operations that are used. This allows the backends for lower-level targets, such as the C language, to obtain a lower-level representation than the backends for higher-level targets, such as the JVM or CLI. The next section discussed these type systems in more detail. \subsubsection{The \lltype and \ootype type systems}\label{sect:ootype} So far, two type systems have been developed for use with the RTyper: \lltype, intended for lower-level targets, and \ootype, intended for high-lever, virtual-machine targets. The main difference between the \lltype and \ootype type systems is that the former is based on struct, array and pointer types, whereas the latter uses class and object types. To clarify the difference between the two type systems, let us consider again the annotation \lstinline{SomeList(SomeInteger())}. In \lltype this RPython type would be mapped into the following struct type: \begin{lstlisting}[language=C] struct list { int length; int* items; }; \end{lstlisting} By contrast, the \ootype will translate \lstinline{SomeList(SomeInteger())} into \lstinline{List(Signed)}, where \lstinline{Signed} is the low-level type for \lstinline{SomeInteger()}. The high-level type \lstinline{SomeList} differs from the low-level type \lstinline{List} beacuse the the former supports operations typical of RPython \footnotemark, while the latter is designed to have only operations that are presumed to be directly available in the target platform. Typically, instances of type \lstinline{List} would be mapped to a built-in library class for representing lists, such as the \lstinline{ArrayList} class offered by Java and C\#. \footnotetext{As an example of RPython-specific semantics, consider the extended index notation for accessing list elements: e.g. \lstinline{mylist[-1]} refers to the last element of the list, and \lstinline{mylist[2:5]} extracts a \emph{sublist} containing the second, third and fourth element.} The full description of the \ootype type system is available in \cite{D12.1}; it supports a set of \emph{primitive types} for numbers and strings (e.g., \lstinline{Signed}, \lstinline{Unsigned}, \lstinline{String}), some \emph{collection types} (\lstinline{List}, \lstinline{Dict}), functions and classes as first-order values (\lstinline{Class}, \lstinline{StaticMethod}) and of course user-defined classes. \ootype's object model is Java-like: \begin{itemize} \item classes have a static set of typed methods and attributes; \item only single inheritance is supported, and the class hierarchy is rooted by the predefined class \lstinline!ROOT!; \item every method is \emph{virtual}, that is, can be overridden in some subclasses; furthermore, methods can be \emph{abstract}. \item classes, attributes and methods do not have access restrictions: \ootype is only used internally by the translator, so there is no need to enforce accessibility rules. \end{itemize} Moreover, as for their RPython counterparts, collection types such as \lstinline{List} and \lstinline{Dict} are \emph{generic} in order to allow back-ends such as GenCLI to exploit native support for generic programming; back-ends for platforms like Java need to implement a type erasure technique similar to that performed by the Java compiler. % if we target an object oriented platform we can expect it %to have a builtin \lstinline{list} type with a set of standard operation such as %\lstinline{append} and \lstinline{remove}: thus, \ootype simply implements the high %level \lstinline{SomeList} operations in terms of those standard operations, %then the backend will choose the appropriate builtin type for the %concrete implementation.