Introduction ------------ Interpretation is a common implementation technique for a wide range of programming languages whose primary objective is not efficiency. Interpreters are well-suited for small-to-medium languages, favouring a simple and clear engine. XXX aren't there "large" interpreted languages? XXX what is a large language? example? XXX armin et.al..: isn't the key in the following para that today's language design is "monolithic" and not configurable, i.e. all bigger programming languages are not designed in a modular way and thus give the user no choice which components/aspects they want to use for their target application? But that today's applications are to be developed for an increasing variety of devices and environments (Big-Iron, cars, watches, mobile phones, banks, whatever). What about a security-goal btw? XXX nico: modularity in langage is a very interesting feature. see below. As a language becomes popular, however, the costs inherent to this approach are increasingly pointed out. These costs are twofold. Firstly, no matter how much effort is made to optimize the core components, the remaining interpretative overhead is large, both in term of processor cycles and memory consumption. Secondly, interpreters are notoriously difficult to factor into relatively independent components, which makes structural changes to a large interpreter cumbersome. While the efficiency costs are well-known, the costs inherent to the lack flexibility [LHJ95]_ [H98]_ are often underestimated by the community of language users. In practice, what often occurs for newly popular languages is that the user base creates pressure about the efficiency issue and motivates a group of developers to write a native Just In Time (JIT) compiler, adding a significant amount of code and complexity and further impairing the flexibility. [S03]_ The Flexibility Goal -------------------- The flexibility problem is pernicious. It affects almost all aspects of the language implementation: * user pressure tends to shape interpreters towards performance at the expense of other aspects, like small footprints and code simplicity. * there are a lot of other trade-offs; experimenting with different solutions can be close to impossible for aspects that affect the interpreter globally. These aspects are usually frozen in early design decisions (memory management, object model, multithreading,...). * optimisation drives the interpreter code base, an error-prone and scope-limiting process. It is impossible to reuse the code base for implementation on new, non-C platforms; you need to start from scratch. Moreover, conflicting with performance and manageable complexity, there is a large number of aspects and features that users would like to see integrated in the interpreter (or would even do so themselves, if it were possible at all). These features are well-established as middleware tools that can be accessed from common languages, but would benefit a lot from direct support from the language implementation -- which they generally only have in a couple of experimental languages. Some examples: * distributed execution (SMP or Networked) * persistency * FIXME list our application-level goals here * XXX nico: could talk about security, logic programming, aspect-oriented programming, etc. The first major goal of the PyPy project is to produce an interpreter with flexibility in mind, and put this flexibility to good use. More specifically, our objectives are: 1. to create a implementation of the Python language in Python. 2. to implement a subset of the above-mentioned features (FIXME which ones?) as an immediately useful proof-of-concept. 3. to allow third-parties to implement any other interpreter-level feature more easily than they would in a traditional interpreter. The main building piece behind the proposed flexibility is -- besides using a high-level language in the first place -- the concept of Object Space, which captures at the level of the interpreter the semantics of individual operations between objects. The interpreter's main dispatch loop handles control flow and supporting data structures (frame stack, bytecode objects...); each individual operation is dispatched to the object space. Object spaces are pluggable, which means that we can very easily replace an object space implementing the normal Python semantics with another one doing different things, like accessing remote objects over a network, keeping logs, measuring performance or doing persistence -- all things that are better implemented in the interpreter itself, so that they can be fully transparent to the user of the language. More theoretically, replacing an object space with another one performing abstract operations turns our interpreter into an abstract interpreter at no effort. Such a use will be shown below. The Performance Goal, Phase 1 ----------------------------- Flexibility is often associated with poor performances, although theoretically, high-level languages are more expressive and thus are open to a wider range of optimization than low-level languages, and should then outperform them. Of course this trend is not exactly confirmed in practice so far; the only examples that one could give are derived from bad or buggy low-level algorithms being repeatedly reimplemented in C programs while higher-level languages natively provide the correct algorithms. Also note that languages can be static or dynamic independently of being high or low-level: high-level expressive languages can be static and efficient [OCAML] while inexpressive ones can be dynamic and much less efficient [PROLOG]. The PyPy project -- phase one -- aims to write a Python interpreter using Python itself as the implementation language for flexibility, with the additional constraint that core parts should be written in a more static subset of the language, a restriction that opens the door to static analysis and translation to a low-level language like C. This approach has already been taken, e.g. for the Scheme [K97]_ and Squeak [IKM97]_ language, but we have an original approach to the translation process that we will describe later. The net result will be an interpreter whose performance is comparable to today's C-based implementation, but more flexible. The translation is not a one-shot: the maintained source is the Python one. Moreover, aspects that were design decisions in the current C Python are now merely customizable behaviour of the translator. In other words, the translator is not merely a restricted-Python-to-C translator; it is an essential piece towards the flexibility goals of the previous section. Aspects like memory layout of objects are not defined by the PyPy Python source, but derived from it by the translator with the support of a small run-time system written in C. It is thus expected that the translator should be a highly configurable tool, with pluggable components, that will allow us (and third-parties) to produce customised C versions of PyPy with tailored trade-offs. Moreover, the exact set of Python source that the translator will be applied to can be varied as well, producing C versions with different capabilities. Thus, our objectives are: 4. to produce several different C versions for different identified usages of Python; 5. to allow third-parties to produce customised C version easily; 6. to adapt the translator to target other non-C platforms (Java and/or .NET and/or other). Moreover, we will also consider targetting a virtual machine that already has a high-performance JIT compiler, possibly Self (see below). JIT compilers ------------- JIT compilers have been reasonably well studied; an account of their history is given in [A03]_ . But actually writing a JIT for a given language is generally a major task. Different techniques to ease this path have been recently explored; let us cite: * to implement a dynamic language in a flexible way, it can be written on top of another one, e.g. as a dynamic translator that produces bytecodes for an existing virtual machine. If the virtual machine is already equipped with state-of-the-art JIT compilation, it is possible to leverage its benefits to the new language. This path is not only much shorter than designing a complete custom JIT, but it is also easier to maintain and evolve. This idea is explored for the Self virtual machine in [WAU99]_ . As pointed out in that paper, some source language features may not match any of the target virtual machine features. When this issue araises, we are left with the hard problem of refactoring an efficient JIT-based virtual machine. XXX state-of-the-art in Self: [HCU91]_ [HU94]_ . * a completely different approach: making it easier to derive a JIT from an existing C interpreter. DynamoRIO instrumentates the execution of compiled programs and optimizes them dynamically. It has been extended with specific support for interpreters. With minimal amounts of changes in the source of an interpreter, it can significantly reduce the processor-time interpretative overhead [S03]_ . While this offers a highly competitive gain/effort ratio, performance does not reach the levels of a hand-crafted JIT. The Performance Goal, Phase 2 ----------------------------- Building upon phase 1, we aim to add a JIT compiler to PyPy. This will be done using a different technique than described in the previous paragraph: our goal is to *generate* the JIT compiler from the high-level Python source of PyPy. This will be accomplished by integrating the technology developed in Psyco [R03]_ , a prototype JIT for the Python programming language. XXX [APJ03]_ It is more precisely a specializing JIT compiler based on abstract interpretation. Although Psyco was hand-written, large parts have a one-to-one correspondence to whole sections of the C Python interpreter. This is uncommon for JITs, but Psyco's good results give ample proof-of-concept. Moreover, this property can be explicitely related to the DynamoRIO project cited above, which instrumentates the interpreter itself. Indeed, the one-to-one correspondence between parts of C Python and Psyco is as follows: to each expression in C Python corresponds a more complex expression in Psyco, which does instrumentation and optimisations. The difference with DynamoRIO is that the latter analyses the machine code resulting from the compilation of the interpreter at run-time. In Psyco, the one-to-one correspondance is with the C source instead. It could certainly have been done by automated analysis of the C source of C Python, but this is difficult and error-prone. In the PyPy project, on the other hand, the source code of the interpreter will be written in Python instead of C. This is much easier to analyse safely. Thus, the plan is to raise the level further and rely on yet another customisation of the translator, to let it emit the complex Psyco-style expressions instead of (or in addition to) the normal C expressions that would be the direct translation. The last objective is thus: 7. to provide a flexible Python interpreter running at JIT speed. The Translator -------------- Finally, let us further demonstrate the benefits of writing an interpreter flexibly in a high-level language. So far, the translator sounds like a magic piece of difficult-to-write software. This is not so for analysing pieces of Python source code is not a very complex task. Some type inference is needed, but it can be kept simple by suitably restricting the amount of allowed dynamism. XXX rewrite all this stuff However, instead of analysing source code, we propose another approach: doing the type inference by abstract interpretation. This is essentially the way Psyco works, albeit it does it at run-time. [FIXME: has it already been done elsewhere?] [otherwise explain in a FOOTNOTE: the essential idea is to "interpret" the Python source code that we want to translate; all manipulated objects are abstracted into their type only. Operations between two such "abstract objects" return a new "abstract object", which represents the type of the result.] As we have seen above, abstract interpretation is easily achieved by plugging a custom object space into the PyPy code base. The translator can be built as an extension of this "type inference" object space, emitting translated code as a side-effect along the analysis. Thus the translator is nothing more than PyPy itself with a custom object space: when you use this special version of PyPy to "interpret" a Python function, it becomes in fact translated... In summary, to translate PyPy itself to C, we will be using PyPy! References: .. [S03] Gregory Sullivan et. al., "Dynamic Native Optimization of Native Interpreters". IVME 03. 2003. http://www.ai.mit.edu/~gregs/dynamorio.html .. file = ivme03.pdf .. [LHJ95] Sheng Liang, Paul Hudak and Mark Jones. "Monad Transformers and Modular Interpreters". 22nd ACM Symposium on Principles of Programming Languages (POPL'95). January 1995. http://java.sun.com/people/sl/papers/popl95.ps.gz .. file = popl95.ps.gz .. [H98] Paul Hudak. "Modular Domain Specific Languages and Tools". ICSR 98. 1998. http://haskell.org/frp/dsl.ps .. file = dsl.ps.gz .. [K97] Richard Kelsey, "Pre-Scheme: A Scheme Dialect for Systems Programming". 1997. http://citeseer.nj.nec.com/kelsey97prescheme.html .. file = kelsey97prescheme.pdf .. [IKM97] Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, and Alan Kay. "Back to the future: The story of Squeak, A practical Smalltalk written in itself." In Proceedings OOPSLA'97, pages 318--326, November 1997. ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html .. file = OOPSLA.Squeak.htm .. [A03] John Aycock, "A Brief History of Just-In-Time", ACM Computing Surveys 35, 2 (June 2003), pp. 97-113. .. file = jit-history.pdf .. [WAU99] Mario Wolczko, Ole Agesen, David Ungar, "Towards a Universal Implementation Substrate for Object-Oriented Languages", OOPSLA 99 workshop on Simplicity, Performance and Portability in Virtual Machine Design, OOPSLA '99, Denver, CO, Nov 2, 1999. http://research.sun.com/research/kanban/oopsla-vm-wkshp.pdf .. file = oopsla-vm-wkshp.pdf .. [R03] Armin Rigo, http://psyco.sourceforge.net .. file = psycoguide.ps.gz .. [APJ03] John Aycock and David Pereira and Georges Jodoin, "UCPy: Reverse-Engineering Python", PyCon DC 2003, 9pp. http://pages.cpsc.ucalgary.ca/~aycock/papers/ucpy.pdf .. file = ucpy-reverse-engineering-python.pdf .. [HCU91] Urs Hölzle, Craig Chambers, and David Ungar, "Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic Inline Caches", ECOOP'91 Conference Proceedings, Geneva, 1991. Published as Springer Verlag Lecture Notes in Computer Science 512, Springer Verlag, Berlin, 1991. http://self.sunlabs.com/papers/ecoop91.ps.Z .. file = hlzle91optimizing.ps.gz .. [HU94] Urs Hölzle, David Ungar, "Reconciling Responsiveness with Performance in Pure Object-Oriented Languages", PLDI '94 and OOPSLA '94 http://www.cs.ucsb.edu/oocsb/papers/toplas96.pdf/reconciling-responsiveness-with-performance.pdf .. file = reconciling-responsiveness-with-performance.pdf .. [KLM97] Gregor Kiczales and John Lamping and Anurag Menhdhekar and Chris Maeda and Cristina Lopes and Jean-Marc Loingtier and John Irwin, "Aspect-Oriented Programming", ECOOP'97 http://www2.parc.com/csl/groups/sda/publications/papers/Kiczales-ECOOP97/for-web.pdf .. file = kiczales97aspectoriented.pdf