====================================================================== Effective implementation of dynamic languages for OO virtual machines* ====================================================================== (* draft title) Two main topics: - efficiency - interoperability Maybe also an optional third topic: experimenting with new features; e.g.: - stackless - sandboxing - logic programming Efficiency ========== The state of the art are IronPython for CLI and Jython for JVM. IronPython is much more advanced and efficient than Jython; this means two things: 1. It is hard (but not impossible) to compete with IronPython; 2. If we get close to IronPython's performaces, we automatically get the fastest Python for JVM ever. At the moment, pypy-cli is about 6 times slower than IronPython; after `some benchmarks`_, we concluded that the slow-down is distributed over different parts of the interpreter: 1. the GenCLI vs. C# slowdown: about 1.8x; 2. the interpretation overhead: about 2x; 3. the efficiency of PyPy standard object space compared to IronPython run-time environment; this can be measured indirecly by removing the other two factors above: about 1.8x. .. _`some benchmarks`: http://codespeak.net/pypy/extradoc/eu-report/D12.1_H-L-Backends_and_Feature_Prototypes-2007-03-22.pdf At the moment, there are a lot of small improvements that we could do the translation toolchain that could reduce the gap between C# and GenCLI (point 1 above) and give some considerable speedp (~10%, but it's just a guess). E.g., new optimizazions, generate better IL code, etc. Implementing JIT for CLI/JVM ---------------------------- `PyPy's JIT`_ is an experimental, state-of-the-art Just In Time specializer for Python: the prototype compiled with genc shows that Python functions can be made up to 60 times faster. The JIT works only with the C backend right now but it should be possible to port it also to CLI/JVM. The idea beyond the JIT is to gather at run-time extra informations about a Python function that are not available at compile time (e.g. the types of the arguments). Then it generates (or look up through a cache) and execute a specialized, faster version of the function. CLI could be a perfect target platform, because it provides the DynamicMethod_ class, which is specifically designed to emit code dynamically. Moreover, it would be possible to research new way to extend the existing virtual machines (CLI and JVM) to better support this kind of JIT compiler. .. _`PyPy's JIT`: http://codespeak.net/pypy/extradoc/eu-report/D08.2_JIT_Compiler_Architecture-2007-05-01.pdf .. _DynamicMethod: http://msdn2.microsoft.com/en-us/library/system.reflection.emit.dynamicmethod.aspx Develop an optional type-system for Python ------------------------------------------ The goal of this task is to exploit the extreme efficiency of RPython in standard Python programs. The idea is to let the developer to optionally annotate a Python function with the expected types of its arguments: the system will try to compile it as RPython, resulting in a highly efficient native function. The goal of this typesystem would explicitly be the efficiency. It would not be meant to be useful to statically check the program. LLType-based interpreter ------------------------ The CLI offers two different kinds of instructions: managed and unmanaged. Managed instructions are guaranteed to be safe and cannot compromise system's integrity, thus being the preferred one to use. At the moment GenCLI generates only managed and verifiable instructions. By contranst, unnmanaged instructions are available mainly to implement efficiently languages such as C or C++ and permit a full control over the memory through pointers, unmanaged arrays, etc. Being unsafe, these instructions are likely much faster than the managed counterpart. The goal of this task is to develop a LLType-based backend for CLI, which makes full use of these instructions to see whether there is a speed gain. The resulting interpreter would be not as safe as the current one, but it could be still interesting/useful in particular contexts. Moreover, we could explore the possibility to build an interpreter containing side by side both implementations, the fast one to run trusted code and the slow but safe one to run untrusted code: this would play very well with PyPy's own sandboxing_ feature, which does the same for C-based interpreters. .. _sandboxing: http://codespeak.net/pipermail/pypy-dev/2007q3/003978.html Interoperability ================ The state of the art in this area is again represented by IronPython and Jython; they both allow the Python developers to access native classes from Python; there is still space for a number of improvements, though. Improve current PyPy interop module ----------------------------------- At the moment pypy-cli has got a ``clr`` module, which allows Python programmers to access .NET libraries. However it is more a proof-of-concept than a real usable module. The goal of this task would be improve this module to allow the same degree of interoperability offered by IronPython and Jython. Thanks to PyPy modular architecture, it should be possibile to develop the module only once, and reuse it for both pypy-cli and pypy-jvm. The other way round: access Python classes from C#/Java ------------------------------------------------------- Neither IronPython nor Jython allows Python classes and functions to be directly accessed from native languages such as C# and Java. This is due to the very different object models of the two worlds. For example, consider this case:: # Python class Foo: def bar(self, x): return x def BOOM(self): del Foo.bar // C# public static void Main() { Foo x = new Foo(); x.BOOM(); x.bar(42); } It is not clear what is the expected behaviour here, because the class Foo does no longer provide a bar method after the call to BOOM. The goal of the task is to develop an interoperability model which does the "obvious thing" when there is one, while still being consistent with the semantics of the worlds. For example, one possibility is to create a .NET/Java method for each Python method known a compile time: such methods don't contain the body by themselves, but delegate the call to the underlying Python object; for instance, the above Foo class could be translated in something equivalent to this (in C#):: public class Foo { private PyObject obj; public Foo() { this.obj = PythonEngine.Instantiate("Foo"); } public PyObject bar(PyObject x) { return this.obj.call_method("bar", x); } public PyObject BOOM() { return this.obj.call_method("BOOM"); } } This is only one of the possible solutions, and there are still open problems to solve e.g., how to cope with overloadings, how to provide more .NET friendly signatures instead of the ones above full of PyObjects, etc. Comparision between the Proxy approach and the Boxing one --------------------------------------------------------- IronPython and Jython follows two different approaches to represent Python objects inside the environment: - Jython uses the "Proxy" approach: there is a class hierarchy rooted in PyObject, and one subclass for each kind of Python object: PyInteger, PyClass, PyInstance, etc.. The PyObject class contains several abstract methods used by the runtime to operate with them (e.g., getattribute(), call(), add(), etc.). - IronPython uses native .NET objects as long as it possible: so, for example, Python integers are represented as .NET boxed Int32 objects. This means that it is not possible to rely on methods of the root PyObject class, because there is not root class at all. Instead, IronPython explicitly checks the types of the involved objects to check the most appropriate implementation for the particular operations; something like this:: public object PyAdd(object x, object y) { if (type(x) == typeof(System.Int32) && type(y) == typeof(System.Int32)) return ((int)x + (int)y); else if ... else if ... } At the level of source code, PyPy is implemented using the Proxy approach: there is a W_Root class which is the root for all the wrapped object, and the dispatch of the operations is done through multimethods. But this does not imply that this must be the implementation used in the compiled executable: thanks to the extreme flexibility of the translation framework, it would be possible to add another transformation that converts a Proxy-based approach to a Boxing-based one, without touching the source. This would allow a direct comparision of the two techniques both on JVM and CLI, and to select the best one (or maybe some combination of the two). Moreover, we could also explore complete different approaches: `this article`_ describes and compares the Proxy (or Wrapper, or Adapter) approach and a new approach based on "navigators". .. _`this article`: http://www.szegedi.org/articles/wrappersOrNavigators.html Compiling to DLR instead of interpreting Python bytecode -------------------------------------------------------- Currently, Python source file are compiled to Python bytecode and interpreted by the PyPy VM's, which runs on top of the CLI. Recently, Microsoft has been developing the Dynamic Language Runtime (DLR), an API to allow different dynamic languages (such as IronPython and IronRuby) to interoperate. It would be interesting to make pypy-cli generate DLR Trees instead of Python bytecode and measure the eventual speedup. This would also allow pypy-cli to interoperate with IronPython, IronRuby, and eventually other languages. Unfortunately this task can not be started/completed until Microsoft releases a stable version of the DLR. For more informations about the DLR, see these blog entries by its creator: - http://blogs.msdn.com/hugunin/archive/2007/04/30/a-dynamic-language-runtime-dlr.aspx - http://blogs.msdn.com/hugunin/archive/2007/05/02/the-one-true-object-part-1.aspx - http://blogs.msdn.com/hugunin/archive/2007/05/04/the-one-true-object-part-2.aspx - http://blogs.msdn.com/hugunin/archive/2007/05/15/dlr-trees-part-1.aspx Implement the DLR in RPython ---------------------------- Not to be confused with the previous task. The idea is to develop an alternative implementation of the DLR for .NET in RPython, and compare it with the current implementation from Microsoft written in C#. The RPython version would have two advanages over the MS one: - it could exploit all the PyPy's features that are not available in C#; in particular, in should be possible to apply PyPy's JIT to the DLR interpreter written in RPython, thus making it way faster than the current version; - it would be automatically available also for JVM, for which nothing similar exists.