\chapter{Conclusions and future work} \label{cha:conclusions} \section{Current status of \gencli} \label{sec:gencli-status} At the moment of writing \gencli\ is \textbf{quite mature} but still not completed: it can successfully compile a large number of test snippet (see section \ref{sec:cli-testing}) and the only two medium-sized RPython programs available: \emph{rpystone} and \emph{richards}, which are used for benchmarking purposes, as we will see in section \ref{sec:gencli-benchmarks}. The only big feature \gencli\ lacks is the support for the \texttt{CustomDict} built-in type, as we saw in section \ref{sec:cli-built-in-types}. Moreover there are few known bugs that are waiting to be fixed and that could prevent the compilation to be successful, so we have not tried to compile the whole \pypy\ interpreter yet, though it is very likely that \gencli\ will be able to compile it in a few months. Once \gencli\ will have been completed, there are at least three directions we might follow to improve it in the near future: \begin{itemize} \item optimizations; \item integration of application-level code with the .NET runtime; \item integration of RPython-level code with the .NET runtime, i.e. \gencli\ as a general .NET compiler. \end{itemize} \section{Early benchmarks} \label{sec:gencli-benchmarks} The \pypy\ distribution comes with two standard benchmarks for measuring performances: \textbf{rpystone} and \textbf{richards}: the first is an RPython porting of the standard benchmark \emph{pystone} Python benchmark, while the second is based on a Java version of a benchmark originally written by Dr. Martin Richards in \emph{BCPL}. The main difference between the twos is that \emph{rpystone} is focused on algorithmic performances, while \emph{richards} uses a lot of object oriented features such as inheritance and late-binding. We will see later how this difference impacts \gencli\ performances. The benchmarks have been ran on an box with the \emph{AMD Athlon XP-M 3000+} CPU and 512 MB of RAM, under \emph{Linux} and \emph{Mono 1.1.13.4}. The results are compared to those obtained by \emph{genc} with and without backend optimizations, which \gencli\ is not able to take advantage of, yet (see section \ref{sec:gencli-backendopt}). \begin{table}[ht] \begin{tabular}{|l|r|r|} \hline \textbf{Backend} & \textbf{Result} (pystone/seconds) & \textbf{Factor} \\ \hline \emph{genc} & 4,926,108 & 1.0x \\ \emph{genc} w/o optimizations & 1,592,356 & 3.1x \\ \gencli & 177,429 & 27.8x \\ \hline \end{tabular} \caption{\emph{rpystone} results} \label{table-rpystone} \end{table} \begin{table}[ht] \begin{tabular}{|l|r|r|} \hline \textbf{Backend} & \textbf{Result} (ms/iteration) & \textbf{Factor} \\ \hline \emph{genc} & 7.43 & 1.0x \\ \emph{genc} w/o optimizations & 16.20 & 2.2x \\ \gencli & 28.65 & 3.8x \\ \hline \end{tabular} \caption{\emph{richards} results} \label{table-richards} \end{table} Table \ref{table-rpystone} show the results for \emph{rpystone}: as expected, \emph{genc} is much more performant than \gencli, especially with optimizations turned on. The big surprise come when examining table \ref{table-richards}, which shows result for the \emph{richards} benchmark. \gencli\ is much closer to \emph{genc}: about \textbf{3.3 times} and \textbf{1.7 times} slower that \emph{genc} with optimizations turned on and off respectively. This is a big result, considering that at the moment the code generated by \gencli\ is not optimized at all; probably one of the reasons of this great result is that the \emph{Mono Virtual Machine} is tailored for the efficient execution of object oriented features used by \emph{richards}. \section{Optimizations} \label{sec:gencli-optimizations} There is a number of way we can improve the speed of the code generated by \gencli. \subsection{Backend optimizazions} \label{sec:gencli-backendopt} Before generating code, low-level backends such as the C and the LLVM ones run the \textbf{backend optimization} phase on the rtyped flow graph. This phase is designed to be ran with \emph{lltypesystem}, but we might be able to use some of the optimizazions with \emph{ootypesystem}, too. Available optimizazions include: inlining, constant folding, dead-code removal, tail-recursion optimization. \subsection{Stack push/pop optimitazion} \label{sec:gencli-stack-opt} The CLI Virtual Machine is a \textbf{stack based machine}: this fact doesn't play nicely with the SSI form the flowgraphs are generated in. At the moment \gencli\ does a literal translation of the SSI statements, allocating a new local variable for each variable of the flowgraph, as we saw in section \ref{sec:cli-instructions}. For example, consider the RPython code and the corresponding flowgraph in listing \ref{lst:stack-opt-rpython}. Listing \ref{lst:stack-opt-il}.1 shows the code as it is generated by \gencli: as you can see, the results of \texttt{add} and \texttt{sub} are stored in \texttt{v0} and \texttt{v1}, respectively, then \texttt{v0} and \texttt{v1} are reloaded onto stack. These store/load is redundant, since the code would work nicely even without them, as shown by listing \ref{lst:stack-opt-il}.2. \begin{SaveVerbatim}{Side1} def bar(x, y): foo(x+y, x-y) \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} inputargs: x_0 y_0 v0 = int_add(x_0, y_0) v1 = int_sub(x_0, y_0) v2 = directcall((sm foo), v0, v1) \end{SaveVerbatim} \CompareWithSize{RPython snippet and its flow graph}{lst:stack-opt-rpython}{5.5cm}{7.7cm} \begin{SaveVerbatim}{Side1} .locals init (int32 v0, int32 v1, int32 v2) block0: ldarg 'x_0' ldarg 'y_0' add stloc 'v0' ldarg 'x_0' ldarg 'y_0' sub stloc 'v1' ldloc 'v0' ldloc 'v1' call int32 foo(int32, int32) stloc 'v2' \end{SaveVerbatim} \begin{SaveVerbatim}{Side2} .locals init (int32 v2) block0: ldarg 'x_0' ldarg 'y_0' add ldarg 'x_0' ldarg 'y_0' sub call int32 foo(int32, int32) stloc 'v2' \end{SaveVerbatim} \compare{Unoptimized and optimized IL code}{lst:stack-opt-il} If we check the native code generated by the \textbf{Mono JIT compiler} on \emph{x86} we can see that this redundand code is not optimized, so we might consider to optimize it manually; it should not be so difficult, but it is not trivial becasue we have to make sure that the dropped locals are used only once. \subsection{Mapping RPython exceptions to native CLI exceptions} We have already addressed this optimization in section \ref{sec:gencli-exception-opt}. \section{Integrate the interpreter with the .NET Framework} \label{sec:app-level-integration} Once we get the \pypy\ interpreter to run on the \emph{CLI} virtual machine, we will want to integrate it with the surrounding .NET Framwork. As an example, these are some of the goals we might want to achieve: \begin{itemize} \item let Python code to access .NET libraries; \item let Python code to be called from the outside by other .NET languages; \item integrate Python classes with .NET classes, e.g., let Python classes to subclass the .NET ones and vice-versa; \item possibility of building stand-alone executables. \end{itemize} They are not easy tasks, mainly because some Python constructs are not directly supported by .NET and vice-versa: for example, .NET doesn't support multiple inheritance and runtime addition/remotion of attributes to a class, while Python doesn't support function overloading. This means that before implementing anything we would need to carefully design how the two languages integrate. \section{\gencli\ as a .NET compiler} At the moment of writing it's not possible to use \gencli\ to, say, compiling an RPython program to a \emph{DLL} that can be easily reused by other .NET applications. The biggest problem is that names of classes and functions are mangled to assure they are unique, so it's impossible to design a clean interface for users. Another issue to be considered is the integration with the framework: at the moment is not possible to access system libraries, e.g. to call \texttt{System.Console.WriteLine}. Finally, there is the same problem we saw in section \ref{sec:app-level-integration}: RPython and .NET semantics don't completely overlap. Fortunately in this case the problem is much easier to solve, because of the more static-ness of RPython: many constructs that could cause problem are not allowed (e.g., runtime addition/remotion of attributes to a class), but there is still some small issue that need to be addressed, such as how to expose function overloading to RPython programs. In conclusion, there is still some work to do on \gencli\ to make it a ``real'' .NET compiler, but it should not be so hard to get it done.