\chapter{Flow graph model and annotator model} \label{cha:flowgraph} This chapter is about step 1 and 2 of \pypy\ architecture, as defined in section \ref{sec:architecture}. In particular in this chapter we will inspect the \emph{flow graph model} and the \emph{annotator model}. \section{The flow graph model} \label{sec:flowmodel} In \pypy\ functions and methods are expressed by flow graphs: they group together bunch of instructions and determine the order they are executed. As an example, look at figure \ref{fig:flowgraph} shows the flow graph generated from the code in listing \ref{lst:flowgraph}. \begin{figure}[p] \centering \fbox{\includegraphics{images/flowgraph.png}} \caption{Flow graph example} \label{fig:flowgraph} \end{figure} \begin{program} \begin{verbatim} def exp(base, n): res = 1 while n > 0: res = res*base n = n-1 return res \end{verbatim} \label{lst:flowgraph} \caption{Flow graph example} \end{program} Flow graph are represented by instances of a number of classes that are grouped in the so called \emph{flow graph model}. For each class we give a short description and the list of its attributes. \subsection{\texttt{FunctionGraph}} \label{sec:FunctionGraph} Flow graphs are composed by blocks and links and are represented by instances of the \texttt{FunctionGraph} class. \begin{description} \item[startblock] the first block. It is where the control goes when the function is called. The input arguments of the startblock are the function's arguments. If the function takes a \texttt{*args} argument, the \texttt{args} tuple is given as the last input argument of the startblock. \item[returnblock] the (unique) block that performs a function return. It is empty, not actually containing any \texttt{return} operation; the return is implicit. The returned value is the unique input variable of the returnblock. \item[exceptblock] the (unique) block that raises an exception out of the function. The two input variables are the exception class and the exception value, respectively. (No other block will actually link to the exceptblock if the function does not explicitely raise exceptions.) \end{description} \subsection{\texttt{Block}} \emph{Basic blocks} are represented by instances of the \texttt{Block} class; it contains a list of operations and ends in jumps to other basic blocks. All the values that are ``live'' during the execution of the block are stored in \texttt{Variables}. Each basic block uses its own distinct Variables. \begin{description} \item[inputargs] list of fresh, distinct Variables that represent all the values that can enter this block from any of the previous blocks. \item[operations] list of low level operations to be executed sequentially. \item[exitswitch] see below \item[exits] list of \texttt{Links} representing possible jumps from the end of this basic block to the beginning of other basic blocks. \end{description} Each \texttt{Block} ends in one of the following ways: \begin{description} \item[unconditional jump] exitswitch is \texttt{None}, exits contains a single \texttt{Link}. \item[conditional jump] exitswitch is one of the Variables that appear in the \texttt{Block}, and exits contains one or more \texttt{Links} (usually 2). Each Link's exitcase gives a concrete value. This is the equivalent of a ``switch'': the control follows the \texttt{Link} whose exitcase matches the run-time value of the exitswitch Variable. It is a run-time error if the Variable doesn't match any exitcase. \item[exception catching] exitswitch is \texttt{Constant\_exception)}. The first Link has exitcase set to \texttt{None} and represents the non-exceptional path. The next Links have exitcase set to a subclass of \texttt{Exception}, and are taken when the \emph{last} operation of the basic block raises a matching exception. (Thus the basic block must not be empty, and only the last operation is protected by the handler.) \item[return or except] the returnblock and the exceptblock have operations set to an empty tuple, exitswitch to None, and exits empty. \end{description} \subsection{\texttt{Link}} \label{sec:Link} Instances of the \texttt{Link} class connect different \texttt{Block}s togheter. \begin{description} \item[prevblock] the \texttt{Block} that this \texttt{Link} is an exit of. \item[target] the target \texttt{Block} to which this \texttt{Link} points to. \item[args] a list of Variables and Constants, of the same size as the target \texttt{Block}'s inputargs, which gives all the values passed into the next block. (Note that each \texttt{Variable} used in the prevblock may appear zero, one or more times in the \texttt{args} list.) \item[exitcase] see above. \item[last\_exception] \texttt{None} or a \texttt{Variable}; see below. \item[last\_exc\_value] \texttt{None} or a \texttt{Variable}; see below. \end{description} Note that \texttt{args} uses \texttt{Variables} from the prevblock, which are matched to the target block's \texttt{inputargs} by position, as in a tuple assignment or function call would do. If the link is an exception-catching one, the \texttt{last\_exception} and \texttt{last\_exc\_value} are set to two fresh \texttt{Variables} that are considered to be created when the link is entered; at run-time, they will hold the exception class and value, respectively. These two new variables can only be used in the same link's \texttt{args} list, to be passed to the next block (as usual, they may actually not appear at all, or appear several times in \texttt{args}). \subsection{\texttt{SpaceOperation}} This class represents a recorded (or otherwise generated) basic high level operation, such as \texttt{add} or \texttt{getitem}. \begin{description} \item[opname] the name of the operation. \item[args] list of arguments. Each one is a \texttt{Constant} or a \texttt{Variable} seen previously in the basic block. \item[result] a \emph{new} \texttt{Variable} into which the result is to be stored. \end{description} Note that operations usually cannot implicitly raise exceptions at run-time; so for example, code generators can assume that a \texttt{getitem} operation on a list is safe and can be performed without bound checking. The exceptions to this rule are: \begin{enumerate} \item if the operation is the last in the block, which ends with \texttt{exitswitch == Constant(last\_exception)}, then the implicit exceptions must be checked for, generated, and caught appropriately \item calls to other functions, as per \texttt{simple\_call} or \texttt{call\_args}, can always raise whatever the called function can raise --- and such exceptions must be passed through to the parent unless they are caught as above. \end{enumerate} \subsection{\texttt{Variable}} A placeholder for a run-time value. There is mostly debugging stuff here. \begin{description} \item[name] it is good style to use the Variable object itself instead of its \texttt{name} attribute to reference a value, although the \texttt{name} is guaranteed unique. \end{description} \subsection{\texttt{Constant}} A constant value used as argument to a \texttt{SpaceOperation}, or as value to pass across a \texttt{Link} to initialize an input \texttt{Variable} in the target \texttt{Block}. \begin{description} \item[value] the concrete value represented by this \texttt{Constant}. \item[key] a hashable object representing the value. \end{description} A \texttt{Constant} can occasionally store a mutable Python object. It represents a static, pre-initialized, read-only version of that object. The flow graph should not attempt to actually mutate such \texttt{Constants}. \section{The annotator model} \label{sec:annmodel} The major goal of the annotator is to ``annotate'' each variable that appears in a flow graph. An ``annotation'' describes all the possible Python objects that this variable could contain at run-time, based on a whole-program analysis of all the flow graphs --- one per function. An ``annotation'' is an instance of \texttt{SomeObject}. There are subclasses that are meant to represent specific families of objects. Note that these classes are all meant to be instantiated; the classes \texttt{SomeXxx} themselves are not the annotations. In this section we only take a look at \emph{what} the annotator produces, not \emph{how}. For more details on how the annotator works, see \cite{translation}. Here is a brief overview of the class involved: \begin{description} \item[SomeObject] it is the base class. An instance \texttt{SomeObject()} represents any Python object. It is used for the case where we don't have enough information to be more precise. In practice, the presence of \texttt{SomeObject()} means that we have to make the annotated source code simpler or the annotator smarter. \item[SomeInteger] \sloppypar{\texttt{SomeInteger()} represents any integer. \texttt{SomeInteger(nonneg=True)} represent a non-negative integer (\texttt{>=0}).} \item[SomeBool] \texttt{SomeBool()} represents any boolean. \item[SomeString]: \texttt{SomeString()} represents any string; \texttt{SomeChar()} a string of length 1. \item[SomeTuple] \texttt{SomeTuple([s1,s2,..,sn])} represents a tuple of length \texttt{n}. The elements in this tuple are themselves constrained by the given list of annotations. For example, \texttt{SomeTuple([SomeInteger(), SomeString()])} represents a tuple with two items: an integer and a string. The lenght of the tuple must be known: we don't try to handle tuples of varying length (the program should use lists instead). \item[SomeList] it stands for a list of homogeneous type (i.e. all the elements of the list are represented by a single common \texttt{SomeXxx} annotation). \item[SomeDict] it stands for a homogeneous dictionary (i.e. all keys have the same \texttt{SomeXxx} annotation, and so have all values). \item[SomeInstance] stands for an instance of the given class or any subclass of it. For each user-defined class seen by the annotator, we maintain a \texttt{ClassDef} describing the attributes of the instances of the class; essentially, a \texttt{ClassDef} gives the set of all class-level and instance-level attributes, and for each one, a corresponding \texttt{SomeXxx} annotation. \end{description} All the \texttt{SomeXxx} instances can optionally have a \texttt{const} attribute, which means that we know exactly which Python object the \texttt{Variable} will contain. For a large part of operations when encountering \texttt{SomeXxx} with \texttt{const} set the annotator will do constant propagation and produce results with also 'const' set. This also means that based on \texttt{const} truth values the annotator will not flow into code that is not reachable given global constant values. A later graph transformation will remove such dead code.