.. include:: crossreferences.asc Workplan Introduction =============================== The PyPy project addresses a number of topics, which can be categorized into the following working packages. Each group is further broken down into tasks as necessary. - Development of PyPy itself, as a Python written in Python. This includes code generation for various targets, re-implementation of all builtin Python objects and some extension modules, and the development of several object spaces, testing and documentation. - The PyPy Interpreter - The PyPy Compiler - Bootstrapping PyPy - High-performance PyPy-Python - Validation of PyPy in real applications. Generating down-sized code for embedded systems, load balancing in a distributed network, code generators optimized for number crunching on some processor architectures and re-writing numerical Python packages should be able to prove that PyPy is very appropriate for industrial strength applications. - Supporting Embedded Devices - Load Balancing in a Multi-Processor environment - Numerical Applications - Infra structure tasks carried out throughout the whole project - Coordination and Management - Project Documentation and Dissemination - Maintenance of the development environment - Synchronisation with Standard Python WP10_: Coordination and Management ------------------------------------- WP1 continues throughout the duration of the project and is carried out by the project coordinator. It involves the collection and monitoring of monthly status reports, reporting to the EU, organising meetings and maintaining an internal web site. The website will make heavy use of collaborative tools like Wiki pages, a content management system, and issue trackers. WP20_: The PyPy Interpreter --------------------------- The goal is to make a complete Python interpreter that runs under any existing Python implementation. WP21_: Development and Completion of the PyPy Interpreter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The PyPy interpreter itself should be developed and completed as a regular Python/RPython program. This package includes all parts of CPython that we don't want to move to WP22_. Further investigation is needed concerning the multimethod concepts of the standard object space, and how to hook in the bytecode compiler. WP22_: Porting CPython C-sourcecode to regular Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Port all parts of CPython which we don't want to implement in WP21_ into regular Python libraries. These ones should also work without PyPy, being just plain-Python replacements for existing CPython functionality. This includes the bytecode compiler, which definitely should become a regular Python program, instead of being built into the interpreter. WP30_: The PyPy Compiler --------------------------- WP31_: Translation of RPython ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RPython is a restricted version of Python, which is much more statically defined than standard Python. It allows to write code that is rather easily translated into something else, like C source, Pyrex code, or even passed to an assembly code generator. The Definition of RPython is slightly in flux and should be adjusted during WP31_. The goal is to be able to translate arbitrary RPython source code (e.g. the one produced in WP21_ into low-level code (C, Pyrex, Java, others). This includes making a stand-alone, not-PyPy-related tool for general optimisation of arbitrary but suitably restricted Python application or parts thereof. Bootstrapping PyPy ~~~~~~~~~~~~~~~~~~~ The goal is to put interpreter (WP21_, WP22_) and translator (WP31_) together. The interpreter is written as an RPython program. The translator has to translate this program into some low-level language. The resulting program then needs to be supported by a special runtime library. The work-flow of WP32_ is repetitive, since it will not be possible to "get it right" in the first attempt. Analysis and redesign will have to be repeated until the result is satisfactory. WP32_: Specific Analysis and Redesign ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The global translation of PyPy is going to raise particular problems, other than more general RPython programs do. Since translation of RPython is the core idea of the bootstrap process and the main target of the translator, we need to investigate and isolate the particular problems, and redesign specific parts of PyPy to support translation, code generation and optimisation better. This will also include a re-iteration of the interface design between application level and interpreter level until we reach overall convergence. WP33_: Low Level PyPy Runtime ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to give a working environment to the translated RPython program, we need to build the low-level-specific **run-time** components of PyPy. Most notably are the object layout, the memory management, and possibly threading support, as well as an efficient multimethod dispatch. The target language is not yet decided on. We might go different paths at the same time. If producing C code is a target, important parts can be directly re-used from CPython. WP40_: High-performance PyPy-Python ----------------------------------- The goal is to optimize Bootstrapping in possibly various ways, building on its flexibility to go beyond CPython. The main lack of flexibility in CPython stems from the fact that all structures are hard-coded in C, and there is no abstraction layer. PyPy does provide this abstraction layer, since its RPython implementation is not meant to be executed directly, but goes though a code generator which produces the actual machine code. This layer is highly configurable. Integration of Existing Technology ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are existing projects, notably Psyco_ and Stackless_, which have been traditionally dependent on closely following CPython's code base. Both will be rewritten as a meta-component that hooks into the translator plus a dedicated run-time component (WP33_). As a side effect, after successful completion of the PyPy project, they no longer need to exist as stand-alone projects. WP41_: Integration of _`Stackless` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stackless Python has implemented high speed multitasking in a single thread for C Python in two different ways: Continuation passing and stack switching. Both ways of task switching can be integrated into PyPy. Furthermore, pickling of running programs has been implemented in Stackless Python and should enable PyPy for load-balancing between different machines. WP42_: Several Object Implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since the standard object space allows coexistence of different implementations of the same type, we can develop **several object implementations**. We will develop heuristics to switch between different implementations during execution. The goal is to study the efficiency of different approaches, with the possibility to change the default implementation if favor of a different one, not known in CPython. Some object layouts will further exist in parallel, if their efficiency is highly application dependant. WP43_: Translator Optimisations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It should be identified, which **optimisations** would benefit from support from the translator. These are the optimisations not easily available to CPython because they would require large-scale code rewrites. This includes design-considerations including the decision whether to use reference counting together with garbage collection, or whether to go for garbage collection, only. The meta-programming underlying WP43_ -- namely the work on the translator instead of on the resulting code -- is what gives us the possibility of actually implementing several very different schemes. The outcome of this effort is of course not unique, but depends on the specific target of the optimisation. There will be at least two efforts at the same time: - optimisation towards maximum performance of an integrated PyPy system - optimisation towards minimal code size for embedded systems. WP44_: Integration of _`Psyco` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Psyco provides techniques to increase the performance of C Python by generating specialized machine code at run-time. Developing C Psyco has proved that more flexibility would be of paramount importance to the project. All prior knowledge of the Psyco project will thus be integrated into PyPy, as Python and RPython code, enabling yet more efficient optimisations and allowing new processor architectures to be targeted. WP50_: Validation of PyPy --------------------------- The goal is to provide a validation of the advancements of the state of the art and the real world applicability of these. Supporting Embedded Devices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Embedded devices are often limited in processor speed and memory size, which either limits the power of software that is implemented for these platforms, or enforces use of low-level approaches like C/C++ or Java. PyPy is especially suited to support such platforms, since it can produce highly effective, compact code, while retaining the abstraction and ease-of-use of Python. WP51_: Porting PyPy to an Embedded Device ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dependant from the actual processor architecture, PyPy's code generator needs an extra platform specific support module. Interfaces to necessary device drivers are needed as well as it makes sense to develop a PyPy simulator for the target platform. WP52_: Evaluation whether to do a small OS in PyPy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Implementing a small operating system in PyPy raises some new questions and opens a new category of problems. One of them is the possible need to write an IP stack in PyPy, and a number of device drivers as well. In this short task, we will figure out whether it is feasible to carry on with the next task. WP53_: A small Operating System for an Embedded Device ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *we are thinking of a printer here, but don't have facts yet* In extent to WP51_, it makes sense to write the whole operating system in PyPy. We need a low-level extension to the code generator that allows to access I/O ports. Interrupts need to be supported as well, and primitive access to persistant storage. Ideally, we can create a single threaded PyPy OS with a prioritized scheduler that runs the OS parts together with multiple Python application. The latter depends on WP41_ very much. Numerical Applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PyPy is extremely flexible, since it supports static and dynamic code generation by pluggable processor architecture modules. PyPy can use special hardware by itself, or it can cross-compile code for target machines which will run only a downsized PyPy instantiation which is runtime only. One advantage of using PyPy instead of other compiled libraries is its ability to not only support the special hardware, but also to adjust to the given machine properties, like cache sizes, number of parallel FPUs, memory access speed and memory size. PyPy is able to probe these parameters and to choose an optimum implementation for the particular hardware configuration during startup time. WP54_: Code Generator for SIMD Architecture ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to support numerical applications better, PyPy should be extended to support special hardware like SIMD (Single Instruction Multiple Data) processors. We are aiming to pick one SIMD architecture, like SSE2 or AltiVec, and extend the code generator to support and optimize for the new instructions. We don't intend to support a completely new instruction set, but prefer to choose an extension to an architecture that we are already supporting. The existence of special instructions and other extensions should be probed using run-time checks. Psyco should be extended to be aware of parallel instructions, and be enabled to emit optimized code for that. PyPy needs to be extended to support a vectorized data type from the Python level, as well. The specification of that is part of this task. WP55_: Enhanced Numerical Package ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rewrite a numerical Python package like NumPy using RPython. Identify the numerical operations in the package which are candidates for parallelization. Write an interface that allows to implement these operations in either a traditional way or using SIMD instructions, dependent from the available capabilities of the current code generator. WP56_: Load Balancing in a Multi-Processor environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *XXX add some general words here from the wp* Infrastructure ------------------ WP60_: Project Documentation and Dissemination ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ During the whole project, a set of documents will be maintained that documents the current status of the project, results of discussions, the planning of new sprints and their status reports, as well as the preparation of papers for presentation on various congresses. In extent, there is an ongoing information flow to external communities, like the Python Business Forum (PBF) and the Python developers list (python-dev@python.org). WP70_: Maintenance of Tools ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PyPy's own development needs an infrastructure that must continuously be kept up-to-date and further developed. This includes maintenance of the Subversion package and tracking of its ongoing development, maintenance of the Subversion repository, extensions like automatic documentation extraction, and change notification via mailing lists. WP80_: Synchronisation with Standard Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since Python is being further developed all the time, there is a continuous need of keeping PyPy in sync with the future developments of CPython, e.g. ways to relate pieces of PyPy source and pieces of CPython source. This work is carried out by hand at the moment. We do look for existing solutions which allow to automate this effort at least partially.