.. raw:: latex \thispagestyle{empty} \begin{center} \begin{tabular}{c} \hline \hline \\[0.7cm] {\Huge Towards a More Effective} \\ {\Huge Implementation of Python on Top of }\\ {\Huge Statically Typed Virtual Machines} \\[1cm] {\Large Antonio Cuni} \\[.5cm] {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[2.5cm] {\large Thesis Advisor:} \\ [1cm] {\Large Dott.~Davide Ancona} \\[.5cm] {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[2.5cm] {\large Thesis Progress Report} \\[1cm] {\large PhD in Computer Science} \\ {\large Dipartimento di Informatica e Scienze dell'Informazione} \\ {\large Universit\`a di Genova, Italy} \\[0.7cm] \hline \hline \end{tabular} \vfill \end{center} Introduction --------------- The main goal of the Thesis Proposal [Cuni07]_ was "to develop new techniques, and possibly adapt old ones which have not been yet applied to Python, in order to provide a new, **more efficient implementation** of the language". Our research mainly focus on providing a good and portable implementation of Python for object oriented virtual machines, like CLI and JVM. We believe that the main reason for bad performances of all current Python implementations is the lack of information about types at compile time, due to the extreme dynamicity of the language. PyPy is a mature project which provides a Python implementation for several platforms, but unfortunately it still suffers from efficiency problems. We proposed to investigate three different ways to tackle this problem: 1. developing a **JIT compiler** so that the exact type information can be exploited to perform effective optimizations at run time; 2. performing a **precise static analysis** of Python source code in order to develop an AOT (ahead of time) compiler able to exploit the precise information produced by the analyzer to statically emit efficient executable code; 3. integrating the two approaches proposed above, in order to have advantages of both approaches without their drawbacks: because of the highly dynamic features of the language, in some cases the exact type information needed to optimize code can only be retrieved at run time; on the other hand, a good static analysis can still produce efficient code in several circumstances so that JIT compilation can be minimized and exploited only for those code fragments for which the AOT compiler was not able to perform effective optimizations. We expected the third approach to encompass the other two, even thought the corresponding required amount of work is clearly much higher, since 3 subsumes both 1 and 2. This report describes how these points have been faced during the year 2008. Current status of the research ------------------------------ The first step was to develop a technique to allow a simple development and integration of an AOT and a JIT compiler for Python. Instead of writing new compilers by scratch, we reused PyPy translation toolchain which, at the beginning of the year, already provided an automatic JIT compiler generator from the the interpreter of Python written in RPython. This solution seemed to be reasonable, because it would allow us to derive in a rather simple way an AOT compiler from the generated JIT compiler; in this way, the AOT compiler could be integrated with the JIT compiler from which it was derived. Unfortunately, the automatically generated JIT compiler was only able to emit CPU machine code and, therefore, was not usable for virtual machines as CLI and JVM. We spent roughly the first half of 2008 to adapt the existing JIT compiler generator to target object oriented virtual machines; then, we spent the second half of the year developing a **CLI backend** for the JIT compiler generator. Unfortunately, these two steps required more time than expected, so it is likely that we will not be able to complete the work planned in the thesis proposal; however, the preliminary results we got from running simple benchmarks (see more details below) are encouraging and seem to prove that the approach 1 proposed above already provides an acceptable solution to the problem we planned to tackle. Because of this positive results, rather than trying to fully investigate the approach 2 and the integration of 1 and 2, we would tend to focus on the promising direction of approach 1. The CLI JIT backend is almost completed now, and all the challenging problems have been resolved. Unfortunately, some minor features are still unimplemented, thus preventing the JIT generator to be applied to the full Python language, but this is a minor implementation detail. As already mentioned, the preliminary results we got are very promising and much better than those expected. Since we still cannot experiment the approach with the full Python language, we decided to implement an interpreter for a toy language with dynamic features similar to those of Python and experiment with the JIT compiler automatically generated from the interpreter of this toy language. We experimented with some example programs in the toy language and measured the execution time of such programs by running the normal interpreter with or without the JIT compiler automatically generated from it. With the JIT compiler it was possible to obtain a **speed-up of up to 500**, which means an execution time 1.5 times slower than the same algorithm written in C# (see the posts available in the official PyPy Blog [Cuni08-1]_, [Cuni08-2]_ and the not yet published [Cuni08-3]_). These results show that a JIT compiler can produce efficient code also for object oriented virtual machines. To our knowledge, this is the first time that JIT compilation have been extensively applied to emit bytecode for virtual machines. This benchmark involves mainly numeric operations, but by construction the JIT compilers generated by PyPy can yield substantial improvements also for other operations; for example, they can easily eliminate the overhead of dynamic method dispatch, and we plan to write new benchmarks to demonstrate that. Unfortunately, sometimes the CLI is **not expressive enough** for our needs, forcing us to emit non optimal code in some cases; the results described above show that this does not prevent our toy language to run very fast, but it is unclear whether this will be true also for the full Python language. However, we are confident that it is possible to find alternate solutions that will allow us to generate much better code at the cost of more time spent during the compilation phase. Analog considerations can be made for the JVM, as its execution model is very similar to the one of the CLI. However, there are a lot of ongoing proposals and works to add new "dynamic" features to the JVM, which we could exploit for our needs. From our point of view, the most interesting feature is the new *invokedynamic* instruction; this is detailed by the *Java Specification Request 292* [JSR-292]_ and implemented by the experimental *Da Vinci Machine* [MLVM]_, whose goal is to provide facilities to implement languages other than Java, and in particular dynamic languages, on the JVM. Advancements of the state of the art -------------------------------------- There has been a lot of research in the area of dynamic languages in the last year. Comparing to the situation described in the proposal, the state of the art changed and there are new projects that needs to be taken into account. In particular, two new implementations of the JavaScript language have been released: Google's *V8* [V8]_ and Mozilla's *TraceMoneky* [TM]_. They are two very fast implementation of JavaScript, and both use a JIT compiler to achieve good performances: - V8's JIT compiler is based on well known techniques, used in an innovative way; in particular, most of its ideas derive from SELF [Chamb89]_, but to our knowledge this is the first time they are applied to a non-academic language; - on the other hand, TraceMoneky exploits innovative techniques ([Gal06]_, [Chang07]_), which seems to give very good results. Contrarily to PyPy, both these JIT compilers are manually written; in particular, in the near future PyPy should be able to produce automatically JIT compilers that are equivalent to the one developed for V8; moreover, other PyPy developers are experimenting to exploit TraceMonkey's ideas, with great results. Overall, it seems that inside the dynamic language community all the major efforts to improve performances are going towards using a JIT compiler. We think that PyPy can play a major role here, as the effort required to manually write such JIT compilers is very high, unlike the case of automatically generated JITs. The fact that a lot of researchers are working on JIT for dynamic languages is an indirect confirmation that this approach is the right one to make dynamic languages faster. On the other hand, to our knowledge all the ongoing research is still focused on producing JITs that emits CPU machine code, so by targeting object oriented virtual machines we are doing research in a new and fairly unexplored area. Moreover, as anticipated by the previous section, the Java community is seeking for enhancements to the JVM to make it a better platform for dynamic languages. These efforts resulted in the launch of the experimental *Da Vinci Machine*, whose new features could be exploited to improve our JIT backends. Being so new, there are no publications on it, but it is possible to find a lot of details on the blog of the main developer ([Rose08-1]_, [Rose08-2]_, [Rose08-3]). Updated workplan ----------------- As we said in the previous sections, developing the first prototype of the compiler generator for CLI took more time than expected; on the other hand, the early results on the JIT front are better than expected, forcing us to adapt the workplan to these new facts. The original goal was to reach performances of the same order of magnitude to those of programs written in C#; we believed that the most promising approach was the hybrid one (as described in the introduction), but it seems that the JIT compiler could already reach that goal alone. However, we cannot know until we apply our JIT compiler generator to the full Python language and measure the results. Taking these consideration into account, we propose the following updated workplan: 1. Finish to implement the missing features of the CLI JIT backend, until it can handle the full Python language; measure the performances we get, and whether they are close enough to the optimal case; 2. If the benchmarks show that the JIT approach has the potential to meet our original goals, do more work on the CLI JIT backend to produce better code in all cases, including the ones that are known to produce inefficient code at the moment; 3. Investigate about the new features developed for the JVM (see previous section) to see if and how they could be useful for our work; depending on the time left, this could mean the development of a new JVM JIT backend or (more likely) the development of some prototype to compare the performances of the old and new techniques; this prototype could be expanded in a fully working JVM JIT backend after the completion of the PhD thesis; 4. Investigate if and how static analysis of source code could still lead to performances improvements; depending on the time left, this point could be expanded only into a theoretical dissertation about the topic, not into a working prototype. Obviously, point (2) depends on the fact that the benchmarks done in point (1) show the expected good results. Otherwise, we will try to switch to the static compilation approach, by turning the JIT compiler generator into a static compiler generator. Structure of the thesis ------------------------- Depending on the actual directions followed during the last year, the topics covered by the thesis could change. However, it is possible to provide a provisional planned structure of the thesis: 1. Introduction 2. Related work and state of the art 3. Automatic generation of JIT compilers 4. Dynamic bytecode generation for object oriented Virtual Machines 5. How static analysis could/could not improve performances 6. Conclusion & future work .. References .. [Chamb89] `An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes`, Craig Chambers, David Ungar, and Elgin Lee, in OOPSLA '89 Conference Proceedings, pp. 49-70, New Orleans, LA, October, 1989. .. [Chang07] `Efficient Just-In-Time Execution of Dynamically Typed Languages Via Code Specialization Using Precise Runtime Type Inference`, Mason Chang, Michael Bebenita, Alexander Yermolovich, Andreas Gal and Michael Franz, Technical Report, 2007. .. [Cuni07] `Towards a More Effective Implementation of Python on Top of Statically Typed Virtual Machines` -- PhD Thesis Proposal, Antonio Cuni, 2007. .. [Cuni08-1] `Porting the JIT to CLI (Part 1)`, Antonio Cuni, http://morepypy.blogspot.com/2008/11/porting-jit-to-cli-part-1.html .. [Cuni08-2] `Porting the JIT to CLI (Part 2)`, Antonio Cuni, http://morepypy.blogspot.com/2008/11/porting-jit-to-cli-part-2.html .. [Cuni08-3] `Porting the JIT to CLI (Part 3)`, Antonio Cuni, http://morepypy.blogspot.com/2008/11/porting-jit-to-cli-part-3.html .. [Gal06] `Incremental Dynamic Code Generation with Trace Trees`, Andreas Gal and Michael Franz, Technical Report, 2006. .. [JSR-292] `JSR 292: Supporting Dynamically Typed Languages on the Java Platform`, http://jcp.org/en/jsr/detail?id=292 .. [MLVM] `Da Vinci Machine Project`, http://www.openjdk.org/projects/mlvm/ .. [Rose08-1] `Anonymous Classes in the VM`, http://blogs.sun.com/jrose/entry/anonymous_classes_in_the_vm .. [Rose08-2] `Method Handles in a Nutshell`, http://blogs.sun.com/jrose/entry/method_handles_in_a_nutshell .. [Rose08-3] `Dynamic Invocation in the VM`, http://blogs.sun.com/jrose/entry/dynamic_invocation_in_the_vm .. [TM] `TraceMoneky`, https://wiki.mozilla.org/JavaScript:TraceMonkey .. [V8] `V8 JavaScript Engine`, http://code.google.com/apis/v8/design.html