\chapter{Introduction to \emph{ootypesystem}} \label{cha:ootypesystem} As we saw in sections \ref{sec:architecture} and \ref{sec:typesystems}, the goal of the \emph{RTyper} is to turn the high-level, annotated operations of a flow graph into a low-level representation that is suitable for being easily translated by backends because it makes use of types and operations \textbf{natively available} on the target platform. Of course, the exact low-level representation depends on what primitives we might assume the target platform provides: the role of a \pypy\ \textbf{typesystem} is to define a set of low-level types and operations to be used for targeting platforms providing a precise set of primitives. In this chapter we will examine the \textbf{Object Oriented Typesystem} (\emph{ootypesystem}), which is tailored for backends that natively supports constructs like classes, exceptions, and so on. \section{The target platform} \label{sec:ootarget} There are plenty of object oriented languages and platforms around, each one with its own native features: they could be statically or dynamically typed, they could support or not things like multiple inheritance, classes and functions as first class order objects, generics, and so on. The goal of \emph{ootypesystem} is to define a trade-off between all the potential backends that let them to use the native facilities when available while not preventing other backends to work when they aren't. \subsection{Types and classes} \label{sec:ootypes} \emph{ootypesystem} defines a number of primitive types that are reasonably available on all platforms, as listed in table \ref{table-oo-primitives}. \begin{table}[ht] \small \begin{tabular}{|l|p{9cm}|} \hline Bool & boolean values \\ Signed & signed integers (usually 32 bit) \\ Unsigned & unsigned integers (usually 32 bit) \\ SignedLongLong & signed long integers (usually 64 bit) \\ UnsignedLongLong & unsigned long integers (usually 64 bit) \\ Float & double precision floating point numbers \\ Char & ASCII characters \\ UniChar & Unicode characters \\ Void & used for constants known at compile time; it will disappear in the generated code \\ \hline \end{tabular} \caption{\emph{ootypesystem} primitive types} \label{table-oo-primitives} \end{table} The target platform is supposed to support classes and instances with \textbf{single inheritance}. Instances of user-defined classes are mapped to the \texttt{Instance} type, whose \texttt{\_superclass} attribute indicates the base class of the instance. At the very beginning of the inheritance hierarchy there is the \texttt{Root} class, i.e. the common base class between all instances; if the target platform has the notion of a common base class too, the backend can choose to map the \texttt{Root} class to its native equivalent, if any. Object of \texttt{Instance} type can have attributes and methods: attributes are got and set by the \texttt{oogetfield} and \texttt{oosetfield} operations, while method calls are expressed by the \texttt{oosend} operation (see section \ref{sec:oo-instructions}). Classes are passed around using the \texttt{Class} type: this is a first order class type whose only goal is to allow \textbf{runtime instantiation} of the class. Backends that don't support this feature natively, such as Java, may need to use some sort of placeholder instead. \subsection{Static vs. dynamic typing} \label{sec:static_vs_dynamic} The target platform is assumed to be \textbf{statically typed}, i.e. the type of each object is known at compile time. As usual, it is possibile to convert an object from type to type only under certain conditions; there is a number of \textbf{predefined conversion} between primitive types such as from \texttt{Bool} to \texttt{Signed} or from \texttt{Signed} to \texttt{Float}. For each one of these conversions there is a corresponding low level operation, such as \texttt{cast\_bool\_to\_int} and \texttt{cast\_int\_to\_float} (see section \ref{sec:oo-conversion}). Moreover it is possibile to cast instances of a class up and down the inheritance hierarchy with the \texttt{ooupcast} and \texttt{oodowncast} low level operations (see section \ref{sec:oo-instructions}). \textbf{Implicit upcasting} is not allowed, so you really need to do a \texttt{ooupcast} for converting from a subclass to a superclass. With this design \textbf{statically typed} backends can trivially insert appropriate casts when needed, while \textbf{dynamically typed} backends can simply ignore some of the operation such as \texttt{ooupcast} and \texttt{oodowncast}. Backends that supports implicit upcasting, such as \emph{CLI} and \emph{Java}, can simply ignore only \texttt{ooupcast}. \subsection{Exception handling} \label{sec:ooexceptions} Since \textbf{flow graphs} are meant to be used also for very low level backends such as C, they are \textbf{quite unstructured}, as we saw in section \ref{sec:Link}. This means that the target platform doesn't need to have a \textbf{native exception handling } mechanism, since at the very least the backend can handle exceptions just like \texttt{genc} does. By contrast we know that most of high level platforms natively support exception handling, so \emph{ootypesystem} is designed to let them to use it. In particular the exception instances are typed with the \texttt{Instance} type, so the usual inheritance exception hierarchy is preserved and the native way to catch exception should just work. \subsection{Built-in types} \label{sec:oobuiltintypes} It seems reasonable to assume high level platforms to provide built-in facilities for common types such as \emph{lists} or \emph{hashtables}. RPython standard types such as \texttt{List} and \texttt{Dict} are implemented on top of these common types, as shown by table \ref{table-oo-built-in}. \begin{table} \small \begin{tabular}{|l|p{9cm}|} \hline String & self-descriptive \\ StringBuilder & used for dynamic building of string \\ List & a variable-sized, homogeneous list of object \\ Dict & a hashtable of homogeneous keys and values \\ CustomDict & same as dict, but with custom equal and hash functions \\ DictItemsIterator & a helper class for iterating over the elements of a \texttt{Dict} \\ \hline \end{tabular} \caption{\emph{ootypesystem} built-in types} \label{table-oo-built-in} \end{table} Each of these types is a subtype of \texttt{BuiltinADTType} and has set of \textbf{ADT (Abstract Data Type)} methods (hence the name of the base class) for being manipulated. Examples of ADT methods are \texttt{ll\_length} for \texttt{List} and \texttt{ll\_get} for \texttt{Dict}. From the backend point of view instances of built-in types are treated exactly as plain \texttt{Instance}s, so usually no special-casing is needed. The backend is supposed to provide a bunch of classes wrapping the native ones in order to provide the right signature and semantic for the ADT methods. As an alternative, backends can special-case the ADT types to map them directly to the native equivalent, translating the method names on-the-fly at compile time. \subsection{Other types} \label{sec:oo-other-types} There are few more \emph{ootypesystem} types that don't fit into categories above: \begin{description} \item[StaticMethod] used for representing static methods and plain functions. As for \texttt{Class}, it is a first-class-order type: this means that \texttt{StaticMethod} objects can be passed around and called with the \texttt{indirect\_call} instruction (see section \ref{sec:oo-call-instructions}). \item[Meth] subclass of \texttt{StaticMethod}, used for representing bound methods. \item[Record] used for grouping together a bunch of fields, much similar to C structs.; from the backend point of view the main difference with \texttt{Instance} is that \texttt{Record}s don't have methods. \end{description} \subsection{Generics} \label{sec:oogenerics} Some target platforms offer native support for \textbf{generics}, i.e. classes that can be parametrized on types, not only values. For example, if one wanted to create a list using generics, a possible declaration would be to say \texttt{List}, where \texttt{T} represented the type. When instantiated, one could create \texttt{List} or \texttt{List}. The list is then treated as a list of whichever type is specified. Each subclass of \texttt{BuiltinADTTypes} defines a bunch of type parameters by creating some class level placeholder in the form of \texttt{PARAMNAME\_T}; then it fills up the \texttt{\_GENERIC\_METHODS} attribute by defining the signature of each of the ADT methods using those placeholders in the appropriace places. As an example, look at listing \ref{lst:List}, which shows part of the implementation of the \emph{ootypesystem}'s List type. \begin{SaveVerbatim}{tmp} class List(BuiltinADTType): # placeholders for types SELFTYPE_T = object() ITEMTYPE_T = object() ... def _init_methods(self): # 'ITEMTYPE_T' is used as a placeholder for indicating # arguments that should have ITEMTYPE type. # 'SELFTYPE_T' indicates 'self' self._GENERIC_METHODS = frozendict({ "ll_length": Meth([], Signed), "ll_getitem_fast": Meth([Signed], self.ITEMTYPE_T), "ll_setitem_fast": Meth([Signed, self.ITEMTYPE_T], Void), "_ll_resize_ge": Meth([Signed], Void), "_ll_resize_le": Meth([Signed], Void), "_ll_resize": Meth([Signed], Void), }) ... \end{SaveVerbatim} \begin{program} \UseVerbatim{tmp} \caption{Excerpt from \texttt{ootype.List}} \label{lst:List} \end{program} Thus backends that support generics can simply look for placeholders for discovering where the type parameters are used. Backends that don't support generics can simply use the \texttt{Root} class instead (see section \ref{sec:ootypes}) and insert the appropriate casts where needed. Note that placeholders might also stand for primitive types, which typically require more involved casts: e.g. in Java, making wrapper objects around ints. \section{Low-level instructions} \label{sec:low-level-instructions} After flow graphs have been rtyped, they contain lists of low-level instructions; some of these low-level instructions are the same used by \emph{lltypesystem}, while others are specific to \emph{ootypesystem}, as we will see in this section. Many low-level instructions are \textbf{strongly typed}, i.e. they can operate only with operands of a precise type; these instructions are prefixed with the name of the type. For historical reasons, the type name is not the same as the types we saw in section \ref{sec:ootypes}, as shown by table \ref{table-ootype-mapping}. So, for example, the low-level instruction for integer addition is \texttt{int\_add}. \begin{table}[ht] \small \begin{tabular}{|l|l|} \hline Bool & \texttt{bool} \\ Signed & \texttt{int} \\ Unsigned & \texttt{uint} \\ SignedLongLong & \texttt{llong} \\ UnsignedLongLong & \texttt{ullong} \\ Float & \texttt{float} \\ Char & \texttt{char} \\ UniChar & \texttt{unichar} \\ \hline \end{tabular} \caption{Type names used by instructions} \label{table-ootype-mapping} \end{table} \subsection{Comparison instructions} \label{sec:oo-comparison} As the name suggests, these instructions are used to compare two values: they are composed by a prefix, indicating the type of the operands, and a suffix, that indicates the actual operation: equal to, not equal to, greater than, greater than or equal to, less than, less than or equal to (\texttt{eq}, \texttt{ne}, \texttt{gt}, \texttt{ge}, \texttt{lt}, \texttt{le}, respectively). \texttt{int}, \texttt{uint}, \texttt{llong}, \texttt{ullong}, \texttt{float} and \texttt{char} provide instructions for all types of comparisons, while \texttt{unichar} and \texttt{bool} provide instructions for equality and disequality only. \subsection{Arithmetic instructions} \label{sec:oo-arithmetic} As for comparison instructions, the arithmentic ones are prefixed by the name of the type which they operate on. All numeric types provide instructions for negation, addition, difference and multiplication (\texttt{neg}, \texttt{add}, \texttt{sub} and \texttt{mul}, respectively). Moreover integer types provide instructions for integer division and modulo (\texttt{floordiv}, \texttt{mod}), while the \texttt{float} type provides an instruction for exact division (\texttt{truediv}). Integer types also provide bitwise operations such as logical not, and, or, xor, left-shifting and right shifting (\texttt{invert}, \texttt{and}, \texttt{or}, \texttt{xor}, \texttt{lshift} and \texttt{rshift}). Finally, all numeric types provide the \texttt{abs} instruction which, as the name suggest, computes the absolute value. \subsection{Conversion instructions} \label{sec:oo-conversion} Table \ref{table-conversion-instructions} shows instructions used for casting and converting values from type to type; most of them are self-explanatory. The \texttt{is\_true} instruction tests the truth value of numeric types in the usualy way: zero is false, non-zero is true, while \texttt{same\_as} simply renames the variable, with no conversion at all. \begin{table}[ht] \small \begin{tabular}{|l|} \hline \texttt{cast\_bool\_to\_int} \\ \texttt{cast\_bool\_to\_uint} \\ \texttt{cast\_bool\_to\_float} \\ \texttt{cast\_char\_to\_int} \\ \texttt{cast\_unichar\_to\_int} \\ \texttt{cast\_int\_to\_char} \\ \texttt{cast\_int\_to\_unichar} \\ \texttt{cast\_int\_to\_uint} \\ \texttt{cast\_int\_to\_float} \\ \texttt{cast\_int\_to\_longlong} \\ \texttt{cast\_uint\_to\_int} \\ \texttt{cast\_float\_to\_int} \\ \texttt{cast\_float\_to\_uint} \\ \texttt{truncate\_longlong\_to\_int} \\ \texttt{is\_true} \\ \texttt{same\_as} \\ \hline \end{tabular} \caption{Conversion instructions} \label{table-conversion-instructions} \end{table} \subsection{Function call} \label{sec:oo-call-instructions} There are two instructions for calling functions: \begin{description} \item[direct\_call] call the given statically-known function. \item[indirect\_call] call the given \texttt{StaticMethod} object (see section \ref{sec:oo-other-types}). \end{description} \subsection{Object oriented instructions} \label{sec:oo-instructions} Table \ref{table-oo-instructions} shows \emph{ootypesystem}-specific instructions: \begin{table}[ht] \small \begin{tabular}{|l|p{9cm}|} \hline new & create a new instance of the given statically-known class \\ runtimenew & create a new instance of the given \texttt{Class} object (see section \ref{sec:ootypes}) \\ oosetfield & set the value of an object's field \\ oogetfield & get the value of an object's field \\ oosend & ``send a message'' to an object, i.e. call a method \\ ooupcast & self-descriptive \\ oodowncast & self-descriptive \\ oois & identity test \\ oononnull & return \texttt{False} if the object is \emph{null}, \texttt{True} otherwise \\ instanceof & test if an object is an instance of the given class \\ subclassof & test if a class is a subclass of the given class \\ ooidentityhash & return the hash code of an object \\ oostring & convert \texttt{char}, \texttt{int}, \texttt{float} and \texttt{instances} to \texttt{string} \\ ooparse\_int & convert a \texttt{string} to an \texttt{int}, given the base \\ \hline \end{tabular} \caption{Object oriented instructions} \label{table-oo-instructions} \end{table}