===================== XXX: find a title :-) ===================== Abstract ======== Python is a great dynamic language, but current implementations are somewhat slow; this is due to the fact that Python semantics can not be implemented efficiently with current techniques. The purpose of this thesis will be to develop new techniques, and/or adapt old ones that are not directly applicable to Python, and apply them to a new, more efficient implementation of the language. [maybe the following could/should be omitted in the short abstract] Python is strongly but dynamically typed: this means that the compiler and the virtual machine do not know the type of a variable until run-time; moreover, a variable can contain objects of different types during the execution of the program. Most of current performance issues are related to this lack of informations at compile time, thus we will pay particular attention on developing a type-system to help the compiler to produce better code. Next, we will explore different kinds of approaches for determining the types of variables and functions: type inference, optional type annotations, type feedback (with static recompilation) and just in time compilation. Instead of writing yet another Python implementation from scratch, we will modify PyPy to suit our needs; PyPy is an extremely flexible Python interpreter written in Python itself and translated to a number of different target; currently, the main working targets are C, .NET and JVM; because of the extreme similarity of .NET and JVM and thanks to PyPy's good modularity, most of the research done for the .NET target will apply also to the JVM one, and vice-versa. Our work will be mostly focused on the .NET and JVM targets, but hopefully the new techniques will improve also the performances of PyPy when compiled to C. Finally, we will study if and how the newly developed type-system affects the interoperability between Python and the underlying platform (either .NET and JVM), because until now it has been extremely difficult to integrate Python code into, say, C# and Java code, due to the lack of static type information. Outline ======= * The problem - interpretation overhead - boxed arithmetic and automatic overflow handling - extreme introspective and reflective capabilities - dynamic lookup of methods and attributes - dynamic dispatch of operations * Towards a possible solution - From Python bytecode to native machine code - From the interpreter to the compiler --> Exploit PyPy's JIT - JIT vs AOT (Ahead Of Time compiler) - JIT and AOT internals - Optimize the fast paths + Boxed arithmetic and automatic overflow handling + Dynamic lookup of methods and attributes + Dynamic dispatch of operations * A type-system for Python - Numeric types - Built-in types - User defined types (Structural subtyping) * Type gathering strategies - Static analysis (type inference) - Optional annotations by the user - Type feedback (without dynamic recompilation) - Type feedback with dynamic specialization/recompilation - Combinations of the previous * Related work * Tentative schedule - Write a JIT backend for .NET or JVM - Transform the JIT into an AOT - (maybe) Separate compilation for PyPy -- may be needed or not, depending on the details of the previous steps - Implement a static type inference engine -- OR - Start with manual annotation first, because they are simpler - Make the compiler exploit type informations - Experiment with others type gathering strategies