========================== Draft of a PyPy work plan ========================== 1. The !PyPy Interpreter --------------------------- The goal is to make a complete Python interpreter that runs under any existing Python implementation. a) develop and complete the !PyPy interpreter itself, as a regular Python program, until it contains all the parts of CPython that we don't want to move to b). Further investigate the unorthodox multimethod concepts that the standard object space is based on, and how to hook in the bytecode compiler. b) translate all other parts of !CPython into regular Python libraries. These ones should also work without !PyPy, being just plain-Python replacements for existing !CPython functionality. This includes the bytecode compiler. 2. Translation of !RPython ------------------------------ The goal is to be able to translate arbitrary RPython source code (e.g. the one produced in 1a) into low-level code (C, Pyrex, Java, others). This includes making a stand-alone, not-!PyPy-related tool for general optimization of arbitrary but suitably restricted Python application or parts thereof. a) analyse code to produce the relevant typing information. Investigate if we can use the annotation object space only or if additional AST-based control flow analysis is needed. Give a formal definition of RPython. b) produce low-level code out of the data gathered in (a). Again investigate how this is best done (AST-guided translation or reverse-engeneering of the low-level control flow gathered by the annotation object space). Compare different low-level environment that we could target (C, Pyrex, others?). 3. Bootstrapping !PyPy -------------------------- The goal is to put (1) and (2) together. a) investigate the particular problems specific to the global translation of !PyPy, as opposed to general to any !RPython program. According to the requirements and insights of (2) we will probably have to redesign specific parts of !PyPy, e.g. make the various app-level/interp-level interface designs converge. b) build the low-level-specific run-time components of !PyPy, most notably the object layout, the memory management, possibly threading support, and multimethod dispatch. Here, if we target C code, important parts can be directly re-used from !CPython. 4. High-performance !PyPy-Python ----------------------------------- The goal is to optimize `3. Bootstrapping !PyPy`_ in possibly various ways, building on its flexibility to go beyond !CPython. a) develop several object implementations for the same types, as explicitely allowed by the standard object space, and develop heuristics to switch between implementations during execution. b) identify which optimizations would benefit from support from the translator (2). These are the optimizations not easily available to !CPython because they would require large-scale code rewrites. c) for each issue, work on several solutions when no one is obviously better than the other ones. The meta-programming underlying (b) -- namely the work on the translator instead of on the resulting code -- is what gives us the possibility of actually implementing several very different schemes. d) integrate existing technology that traditionally depended on closely following !CPython's code base, notably Psyco and Stackless. Rewrite each one as a meta-component that hooks into the translator (2) plus a dedicated run-time component (3b). Further develop these technologies based on the results gathered in (c), e.g. identify when these technologies would guide specific choices among the solutions developed in (a) and (b). Annex to (a) ~~~~~~~~~~~~ Some major uses for several implementations of the built-in types: * dictionaries as hash-table vs. plain (key, value) lists vs. b-trees, or with string-only or integer-only keys. Dictionaries with specific support for "on-change" callbacks (useful for Psyco). * strings as plain immutable memory buffers vs. immutable but more complex data structures (see functional languages) vs. internally mutable data structures (e.g. Psyco's concatenated strings) * ints as machine words vs. two machine words vs. internal longs vs. external bignum library (investigate if completely unifying ints and longs is possible in the Python language at this stage). * etc. (lists as range() or chained lists, ...) The above are mostly independent from any particular low-level run-time environment. Annex to (b) ~~~~~~~~~~~~ Here are some of the main issues and tricks. Note that compatibility with legacy C extensions can be acheived by choosing, for each of the following issues, the same one as !CPython did. * object layout and memory management strategy or strategies, e.g. reference counting vs. Boehm garbage collection vs. our own. Includes speed vs. data size trade-offs. * code size vs. speed trade-offs (e.g. whether the final interpreter should still include compact precompiled bytecode or be completely translated into C). * the complex issue of threading (global interpreter lock vs. alternatives). * multimethod dispatching * pointer tagging, e.g. encoding an integer object as a pointer with a special value instead of a real pointer to a data structure representing the integer. The above are mostly specific to a particular low-level run-time. 5. Low-level targets, tools and releases -------------------------------------------- The goal is to identify, among those low-level targets that are in widespread use (e.g. workstation usage vs. web server vs. high-performance computing vs. memory-starved hand-held device; C/Unix vs. Java vs. .NET environment), which ones would benefit most from a high-performance Python interpreter. For each of these, focus will be given to: a) develop the translation process, run-time and those optimizations that depend on low-level details. b) design interfaces for extension modules. Some can be very general (e.g. a pure Python one that should allow generic third-party code to hook into the !PyPy interpreter source code without worrying about the translation process). Others depend on the low-level environment and on the choices made for the issues of (4). c) combine different solutions for the different issues discussed in (4). Gather statistics with real-work Python application. Compare the results. This is where the flexibility of the whole project is vital. Typically, very different trade-offs need to be made on different environments. d) most importantly, develop tools to easily allow third-parties to repeat (c) in their own domain and build their own tailored versions of !PyPy. e) release a few official versions pre-tailored for various common environments. Develop in particular a version whose goal is to simulate the existing !CPython interpreter to support legacy extension modules. Investigate if the !PyPy core can make internal choices that are very different from !CPython's without sacrifying legacy extension modules compatibility. 6. Infrastructure --------------------- The goal is to address the development and maintenance issues. a) !PyPy's own development needs an infrastructure that must continuously be kept up-to-date and further developed. b) write tests. All parts of !PyPy should be extensively covered by stress tests. Investigate the use of test-coverage analysers. c) investigate means of keeping !PyPy in sync with the future developments of !CPython, e.g. ways to relate pieces of !PyPy source and pieces of !CPython source. Look for existing solutions. 7. Extension of !PyPy ------------------------- The goal is to add functionalities in !PyPy that are not present in existing Python implementations. This is an open goal. We only list a few promizing directions: a) build alternate object spaces provides features that are essentially language-transparent, e.g. distributed computing (via a network proxy object space), compatibility layers (e.g. a Python-1.5.2-compliant object space), persistance (via a persistant object space). b) build language features that rely on translator support (2), i.e. which can be turned on or off during the production of individual versions of !PyPy, e.g. Stackless and continuations. c) work on the interaction between the compiler and the main loop to allow custom opcodes to be defined, generated by the compiler, and interpreted by the main loop, thus allowing syntactic extension of the language by user code d) conversely, develop interfaces to use object spaces without the main loop to provide Python-like object semantics to other programming languages, using their own syntax and execution environment, e.g. Java.