\section{Related Work} Flexswitches are closely related to the concept of \emph{promotion}, as described by \cite{PyPyJIT}, \cite{PyPyJIT09}. Psyco is a run-time specialiser for Python that uses promotion (called ``unlift'' in \cite{DBLP:conf/pepm/Rigo04}). However, Psyco is a manually written JIT, is not applicable to other languages and cannot be retargetted. Psyco is a good example of how to implement flexswitches for targets that don't have the limitations of the CLI. The idea of promotion is a generalization of \emph{Polymorphic Inline Caches} \cite{hoelzle_optimizing_1991}, as well as the idea of using runtime feedback to produce more efficient code \cite{hoelzle_type_feedback_1994}. The main difference between the two is that PICs only works on types, whereas promotion can work on every kind of value. PyPy-style JIT compilers are hard to write manually, thus we chose to write a JIT generator. Tracing JIT compilers \cite{gal_hotpathvm_2006} also give good results but are much easier to write, making the need for an automatic generator less urgent. However so far tracing JITs have less general allocation removal techniques, which makes them get less speedup in a dynamic language with boxing. Another difference is that tracing JITs concentrate on loops, which makes them produce a lot less code. This issue is being addressed by current research in PyPy \cite{PyPyTracing}. The code generated by tracing JITs code typically contains guards; in recent research \cite{gal_incremental_2006} on Java, these guards' behaviour is extended to be similar to our promotion. This has been used twice to implement a dynamic language (JavaScript), by Tamarin\footnote{{\tt http://www.mozilla.org/projects/tamarin/}} and in \cite{chang_efficient_2007}. IronPython and Jython are two popular implementations of Python for, respectively, the CLI and the JVM, whose approach differs fundamentally from PyPy. The source code of PyPy contains a Python interpreter, which the JIT compiler is automatically generated from: the resulting executable contains both the interpreter and the compiler, so that it is possible to compile only the desired parts of the program. On the other hand, both IronPython and Jython implements only the compiler: both compile code lazily (when a Python module is loaded), but since they do not exploit the extra information potentially available at runtime, it is more a delayed static compilation than a true JIT one. As a result, they run Python programs much slower than their equivalent written in C\#\footnote{\texttt{http://shootout.alioth.debian.org/gp4/\\benchmark.php?test=all\&lang=iron\&lang2=csharp}} or Java\footnote{\texttt{http://blog.dhananjaynene.com/2008/07/performance-\\comparison-c-java-python-ruby-jython-jruby-groovy/}}. The \emph{Dynamic Language Runtime}\footnote{\texttt{http://www.codeplex.com/dlr}} (DLR) is a library designed to ease the implementation of dynamic languages for .NET: the DLR is closely related to IronPython\footnote{In fact, the DLR started as a spin-off of IronPython, and nowadays the latter is based on the former.} and employs the techniques described above; thus, the remarks about the differences between PyPy and IronPython apply to all DLR based languages. \section{Conclusion and Future Work} In this paper we gave an overview of PyPy's JIT compiler generator, which can automatically turn an interpreter into a JIT compiler, requiring the language developers to only add few hints to guide the generation process. Then, we presented the CLI backend for PyPy's JIT compiler generator, whose goal is to produce .NET bytecode at runtime. We showed how it is possible to circumvent intrinsic limitations of the virtual machine to implement flexswitches. As a result, we proved that the idea of \emph{JIT layering} is worth of further exploration, as it makes possible for dynamically typed languages to be even faster than their statically typed counterpart in some cases. As a future work, we want to explore different strategies to make the frontend producing less code, maintaining comparable or better performances. In particular, we are working on a way to automatically detect loops in the user code, as tracing JITs do \cite{gal_hotpathvm_2006}. By compiling whole loops at once, the backends should be able to produce better code. At the moment, some bugs and minor missing features prevent the CLI JIT backend to handle more complex languages such as Python and Smalltalk. We are confident that once these problems will be fixed, we will get performance results comparable to TLC, as the other backends already demonstrate \cite{PyPyJIT}. However, if the current implementation of flexswitches will turn out to be too slow for some purposes, alternative implementation strategies could be explored by considering the novel features offered the new generation of virtual machines. In particular, the \emph{Da Vinci Machine Project} \footnote{\texttt{http://openjdk.java.net/projects/mlvm/}} is exploring and implementing new features to ease the implementation of dynamic languages on top of the JVM: some of these features, such as the new \emph{invokedynamic}\footnote{\texttt{http://jcp.org/en/jsr/detail?id=292}} instruction and the \emph{tail call optimization} can probably be exploited by a potential JVM backend to generate even more efficient code. \section*{Acknowledgements} The authors would like to thank Carl Friedrich Bolz, Maciej Fijalkowski and the referees of ICOOOLPS'09 for helpful comments on earlier versions of this paper.