\section{Related Work} Promotion is a concept that we have already explored in other contexts. Psyco is a run-time specialiser for Python that uses promotion (called ``unlift'' in \cite{DBLP:conf/pepm/Rigo04}). However, Psyco is a manually written JIT, is not applicable to other languages and cannot be retargetted. Moreover, the idea of promotion is a generalization of \emph{Polymorphic Inline Caches} \cite{hoelzle_optimizing_1991}, as well as the idea of using runtime feedback to produce more efficient code \cite{hoelzle_type_feedback_1994}. PyPy-style JIT compilers are hard to write manually, thus we chose to write a JIT generator. Tracing JIT compilers \cite{gal_hotpathvm_2006} also give good results but are much easier to write, making the need for an automatic generator less urgent. However so far tracing JITs have less general allocation removal techniques, which makes them get less speedup in a dynamic language with boxing. Another difference is that tracing JITs concentrate on loops, which makes them produce a lot less code. This issue will be addressed by future research in PyPy. The code generated by tracing JITs code typically contains guards; in recent research \cite{gal_incremental_2006} on Java, these guards' behaviour is extended to be similar to our promotion. This has been used twice to implement a dynamic language (JavaScript), by Tamarin\footnote{{\tt http://www.mozilla.org/projects/tamarin/}} and in \cite{chang_efficient_2007}. There has been an enormous amount of work on partial evaluation for compiler generation. A good introduction is given in \cite{Jones:peval}. However, most of it is for generating ahead-of-time compilers, which cannot produce very good performance results for dynamic languages. However, there is also some research on runtime partial evaluation. One of the earliest examples is Tempo for C \cite{DBLP:conf/popl/ConselN96,DBLP:conf/dagstuhl/ConselHNNV96}. However, it is essentially an offline specializer ``packaged as a library''; decisions about what can be specialized and how are pre-determined. Another work in this direction is DyC \cite{grant_dyc_2000}, another runtime specializer for C. Specialization decisions are also pre-determined, but ``polyvariant program-point specialization'' gives a coarse-grained equivalent of our promotion. Targeting the C language makes higher-level specialization difficult, though (e.g.\ \texttt{mallocs} are not removed). Greg Sullivan introduced "Dynamic Partial Evaluation", which is a special form of partial evaluation at runtime \cite{sullivan_dynamic_2001} and describes an implementation for a small dynamic language based on lambda calculus. This work is conceptually very close to our own. % XXX there are no performance figures, we have no clue how much of this is % implemented. not sure how to write this Our algorithm to avoid allocation of unneeded intermediate objects fits into the research area of escape analysis: in comparison to advanced techniques \cite{Blanchet99escapeanalysis}, \cite{Choi99escapeanalysis} our algorithm is totally simple-minded, but it is still useful in practise. \section{Conclusion and Future Work} In this paper we presented PyPy's JIT compiler generator, based on partial evaluation techniques, which can automatically turn an interpreter into a JIT compiler, requiring the language developers to only add few \texttt{hint}s to guide the generation process. We showed that classical partial evaluation cannot remove all the overhead proper of dynamically typed languages, and how the new operation called \emph{promotion} solves the problem, by delaying compile-time until the JIT knows enough to produce efficient code, and by continuously intermixing compile-time and runtime. Moreover, we showed that our simple but still practically useful technique to avoid allocation of intermediate unnecessary objects plays well with promotion and helps to produce even better code. Finally, we presented the CLI backend for PyPy's JIT compiler generator, whose goal is to produce .NET bytecode at runtime. We showed how it is possible to circumvent intrinsic limitations of the virtual machine to implement promotion. As a result, we proved that the idea of \emph{JIT layering} is worth of further exploration, as it makes possible for dynamically typed languages to be even faster than their statically typed counterpart in some circumstances. As a future work, we want to explore different strategies to make the frontend producing less code, maintaining comparable or better performances. In particular, we are working on a way to automatically detect loops in the user code, as tracing JITs do \cite{gal_hotpathvm_2006}. By compilining whole loops at once, the backends should be able to produce better code than today. At the moment, some bugs and minor missing features prevent the CLI JIT backend to handle more complex languages such as Python and Smalltalk. We are confident that once these problems will be fixed, we will get performance results comparable to TLC, as the other backends already demonstrate \cite{PyPyJIT}. Moreover, if the current implementation of flexswitches will prove to be too slow for some purposes, we want to explore alternative implementation strategies, also considering the new features that might be integrated into virtual machines.