\chapter{An overview of \pypy} \label{cha:overview} \section{What is \pypy?} \label{sec:mission} Here is the \emph{mission statement} of the \pypy\ project: \begin{quote} \pypy\ is an implementation of the Python programming language written in Python itself, flexible and easy to experiment with. Our long-term goals are to target a large variety of platforms, small and large, by providing a compiler toolsuite that can produce custom Python versions. Platform, memory and threading models are to become aspects of the translation process - as opposed to encoding low level details into the language implementation itself. Eventually, dynamic optimization techniques - implemented as another translation aspect - should become robust against language changes. \end{quote} \section{Architecture overview} \label{sec:architecture} \pypy\ is composed of two independent subsystems: the \emph{standard interpreter} and the \emph{translation process}. The \textbf{standard interpreter} is the subsystem implementing the Python language, starting from the parser ending to the bytecode interpreter. Note that it can run fine on top of CPython if one is willing to pay for performance penalty for double interpretation. The \textbf{translation process} aims at producing a different (low-level) representation of our standard interpreter. It is composed of four steps: \begin{description} \item[Flow graph generation] a \emph{flow graph} representation of the standard interpreter is produced. A combination of the bytecode interpreter and a \emph{flow object space} performs \emph{abstract interpretation} to record the flow of objects and execution throughout a python program into such a \emph{flow graph}; \item[Annotation] the \emph{annotator} performs type inference on the flow graph; \item[RTyping] the \emph{RTyper} basing on type annotations, turns the flow graph into one using only low-level operations that fit the model of the target platform; \item[Code generation] the selected \emph{backend} compiles the resulting flow graph into the target environment; examples of backends are C, LLVM, Javascript. \end{description} \section{RPython and translation} \label{sec:rpython} One of \pypy's now achieved objectives is to enable translation of our \textbf{standard interpreter} into a lower-level language. In order for our translation and type inference mechanisms to work effectively, we need to restrict the dynamism of our interpreter-level Python code at some point. In the start-up phase, we are completely free to use all kinds of powerful python constructs, including metaclasses and execution of dynamically constructed strings. However, when the initialization phase finishes, all code objects involved need to adhere to a more static subset of Python: \textbf{Restricted Python}, also known as \textbf{RPython}. RPython code is restricted in such a way that the Annotator is able to infer consistent types. How much dynamism we allow in RPython depends on, and is restricted by, the Flow Object Space and the Annotator implementation. The more we can improve this translation phase, the more dynamism we can allow. In some cases, however, it is more feasible and practical to just get rid of some of the dynamism we use in our interpreter level code. It is mainly because of this trade-off situation that the definition of RPython has shifted over time. Although the Annotator is pretty stable now and able to process the whole of \pypy, the RPython definition will probably continue to shift marginally as we improve it. \section{RPython typesystems} \label{sec:typesystems} The annotator give us a flow graph whose variables are marked with high level type descriptors, such as \texttt{SomeInteger}, \texttt{SomeBool} or \texttt{SomeList}. Before generating low level code we need to assign each annotated function a ``real'' type that can easly fit in the target machine: for example, if we want to generate C source code we might translate \texttt{SomeInteger} and \texttt{SomeBool} into plain \texttt{int} and \texttt{SomeList} into a struct containing an array of items and the lenght of that array. This process is done by the \textbf{RTyper} and is called \emph{rtyping}: since different target machines support different primitive operations, the rtyper allow backend writers to choose which \textbf{typesystem} to use. Currently \pypy\ supports two different typesystems: \begin{description} \item[lltypesystem(Low Level Typesystem)] represents RPython objects in terms of structs, pointers and arrays and is suitable for very low level backends such as those targeting C and LLVM; \item[ootypesystem (Object Oriented Typesystem)] \sloppypar{\hfill represents RPython objects in terms of classes and instances and is suitable for target with object oriented primitives, such as Java or CLI.} \end{description} \section{The Big Picture} \label{sec:bigpicture} Figure \ref{fig:big-picture} shows how \pypy's subsystems are related. \begin{figure}[h] \centering \fbox{\scalebox{0.60}{\includegraphics{images/big-picture.png}}} \caption{\pypy\ subsystems} \label{fig:big-picture} \end{figure} The goal is to produce a \textbf{CLI backend}, i.e. a compiler that accepts RPython programs and produces .NET executables; following \pypy\ naming conventions it has been named \gencli. Once the backend works we can run it on top of CPython to compile the \emph{Standard Interpreter} and obtain a .NET Python interpreter. Since \pypy's Standard Interpreter aims to be compatible with CPython ideally it will be possible to run the entire translation chain on top of the just created .NET Python interpreter. As we saw in section \ref{sec:architecture} the translation process is composed of four steps; since our tool stays at the very end of the chain we should take a look at what is produced by earlier steps in order to understand how the \emph{CLI backend} works. In particular, chapter \ref{cha:flowgraph} will examine the \emph{Flow graph generation} and \emph{Annotation} steps, while chapter \ref{cha:ootypesystem} will examine the \emph{RTyping} step. Once we will have a good knowledge of backends' starting point, \ref{cha:gencli} will take a deep look at \gencli\ internals.