High performance implementation of Python for CLI/.NET with JIT compiler generator for dynamic languages. ========================================================================================================= Introduction ------------ - Dynamic langugages are popular, in particular Python, etc. etc. - .NET/JVM are very important nowadays, but have not been designed to run dynamic languages - Writing a simple implementation is easy but it's going to be slow - Writing an efficient implementation is hard and time consuming, and it's going to be hard to maintain - We try to take the best of both worlds The problem ----------- - Is Python intrisically slow? - Why Python is hard to implement efficiently? * dynamic typing * dynamic lookup * long integers * the-world-could-change-under-your-feet problem * etc. etc. - Interpreters vs. compilers: limits of static analysis for Python - Existing solutions * fast interpreters (e.g., direct threading techniques --> forth, V8) * JIT compilers: SELF, Psyco, Tracemonkey - Our solution: (automatically generated) JIT compiler Enter PyPy ---------- - What is PyPy? - Brief history - Why is it useful for our goals? - Interpreter vs. translation toolchain (hopefully we'll have a proper name for the latter soon :-)) - JIT compiler generator * ??? Brief history of various JIT generations? - .NET backend Characterization of the target platform --------------------------------------- - Why CLI is a good target - Why CLI is **not** a good target - JIT layering - Different implementations: CLR and Mono, each one with tons of different versions - During the development, we made some implicit assumptions about the performance characteristics of the VM: * I'm *sure* I made some implicit assumptions, but I've not reversed-engineered my mind yet :-) - It's hard to determine which features are fast and which are slow * We wrote a set of microbenchmarks for each important feature we use. For each microbenchmark, we programmatically check that the resuling performances meet the expectations we had when we wrote the backend. * When a new CLI implementation/version appears, we can easly check that our assumptions are still valid - The set of implicit assumptions + the set of explicit microbenchmarks define our target machine. While the concrete result of the thesis is bound to a specific software tool, our research is generally applicable to all the platforms that meet (all/part of) our definition. E.g., the JVM The failed attempt ------------------ ??? I'm not sure if we should talk about this, or just kill this chapter and pass directly to the tracing JIT - 2nd generation of PyPy JIT - Psyco-style, with promotion and flexswitches - Nice proof-of-concept, but had some problems * x86 backend did a poor job because it sees little pieces of code at a time, and cannot optimize * .NET backend does not have this problem because the code is JITted again by the VM: first hint/proof that JIT layering works * Unfortunately, it's impossible to implement flexswitches efficiently in .NET * Worked very well with a toy language, not with Python (actually, we never managed to compile pypy-cli-jit with this backend: how could I justify that I suddenly switched to another approach without trying?) * Cite my ICOOLPS paper Introduction to tracing JITs ---------------------------- - 5th generation of PyPy JIT, inspired by tracemonkey et al - Explain how a tracing jit works - etc. etc. The PyPy JIT compilers generator -------------------------------- - how it works - translation time vs. compile time vs. runtime - lower level than other existing tracing JITs: we trace the interpreter, not the user program (cite - how to detect user-level hot loop: cite cfbolz et al. ICOOLPS paper - ``hints`` required by the JIT generator - impact on the backends: * they see one loop at a time: better chance to generate good code * we still need to emit jumps to code that still does not exists. For CLI, it's as hard as flexswitches, but less critical for performances The CLI JIT backend ------------------- - boring description of how it works Benchmarks ---------- - ??? What do we compare against? pypy-cli without jit, cpython, ironpython? Conclusion, related work, future work ------------------------------------- - My contributions: - JIT layering works - current VMs lack features that would help tracing jit a lot, so they are sub-optimal - neverthless, we have very good performance against IronPython, proving that the approach is interesting