\chapter{Defining New Types\label{defining-new-types}} If you've read chapter \ref{core-material}, you should know enough to write any extension module that only exports functions. But there's more to Python than functions; many programming tasks are most naturally expressed using \emph{objects}, and an important facility is being able to define new types of object. This is a large topic, and for any given task this chapter will almost certainly tell you both more and less than you need to know. In part due to its relative advancedness and in part due to the large changes in this area in Python 2.2, documentation on this topic is somewhat sparse. Chapter 10 of the Python/C API reference, ``Object Implementation Support'', has some information, but you may need to ``use the source, Luke''. Fortunately, you need look no further than the source of Python itself for several good examples of object implementations. \file{Modules/_randommodule.c} is a good first example, being relatively free of the obscurities that can affect the implementation of the most fundamental Python types -- \file{Objects/dictobject.c} or \file{Objects/typeobject.c} are not particularly good files to go looking for example usages in! When to implement a new type? Sometimes it's obvious: when wrapping an object oriented library like, say, the GTK interface library, it's only natrual to define a Python extension type for each widget defined by the library. On the other hand, if you're wrapping a library that is has a basically procedural interface, then, even if you want to provide an OO interface to this library the simplest course may be to expose the procedural interface in an extension module and do the objecty wrapper in Python. After all, if you didn't writing Python was less tedious than writing C, you probably wouldn't be reading this document. Although this document is not really concerned with versions of Python prior to Python 2.3, most of the material before this point applies with only minor changes to older versions -- probably all the way back to the ``Great Renaming'' of Python 1.2! This now ceases to be the case. The way in which new objects are defined changed drastically (for the better) as part of the work on so-called ``new-style classes'' in Python 2.2. The material in this chapter can probably be made to work with Python 2.2 with a little effort, but certainly not before that. One of the ways in which 2.2 changed things was that defining new types in C and Python became much more alike. A corollary of this is that having a good understanding of new-style classes in Python helps greatly when it comes to implementing types in C. \section{Your First Extension Type} In the spirit of example-based documentation, here's a minimal example of defining a new type in C: \verbatiminput{simpletypemodule.c} As you might expect, the type \code{Simple} and its instances are pretty boring; you can print and instantiate it and print its instances: \begin{verbatim} >>> import simpletype >>> simpletype.Simple >>> simpletype.Simple() \end{verbatim} But that's about it. Still, one has to start with something, and adding interesting behaviour can mostly be done in incremental and sometimes orthogonal steps from this example. Let's consider the code. A fair proportion should be familiar from earlier examples -- the basic layout, the creation of the module object, etc. The first novelty is \begin{verbatim} typedef struct { PyObject_HEAD /* Type-specific fields go here. */ } SimpleObject; \end{verbatim} Instances of the \code{Simple} type are, as far as C is concerned, \code{SimpleObject} objects. As mentioned previously, it must be possible to treat a pointer to any Python object as a pointer to a \code{PyObject}. This is the purpose of the \code{PyObject_HEAD} macro. In fact, \code{PyObject} is declared like this: \begin{verbatim} typedef struct _object { PyObject_HEAD } PyObject; \end{verbatim} So we can be sure that \code{PyObject} and \code{SimpleObject} start with the same fields in the same order. A macro is needed because \code{PyObject} contains extra fields in a debug build of Python. Note that there is no semicolon after the PyObject_HEAD macro; one is included in the macro definition. Be wary of adding one by accident; it's easy to do from habit, and your compiler might not complain, but someone else's probably will! (On Windows, MSVC is known to call this an error and refuse to compile the code.) Almost every type you define will include type-specific fields where the comment indicates -- an object that is all behaviour and no data is a strange beast. For a concrete example, here is the corresponding definition for standard Python integers: \begin{verbatim} typedef struct { PyObject_HEAD long ob_ival; } PyIntObject; \end{verbatim} Moving on, we come to the crunch --- the type object. In Python terms the concept of ``a new type'' is synonymous with ``a new instance of the type \code{type}.'' Earlier, it was claimed that all Python objects live on the heap. This is in fact only nearly true; in particular type objects are often defined statically (XXX what's the right word for this?), as here: \begin{verbatim} static PyTypeObject SimpleObjectType = { PyObject_HEAD_INIT(NULL) 0, /*ob_size*/ "simpletype.Simple", /*tp_name*/ sizeof(SimpleObject), /*tp_basicsize*/ 0, /*tp_itemsize*/ 0, /*tp_dealloc*/ 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ 0, /*tp_compare*/ 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ 0, /*tp_hash */ 0, /*tp_call*/ 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ "Simple objects are simple.", /* tp_doc */ }; \end{verbatim} If you go and look up the definition of \code{PyTypeObject} in \file{Include/object.h} you'll see that it has many more fields that the definition above. The remaining fields will be filled with zeros by the C compiler, and it is common practice to not specify them explicitly unless you need them. This is sufficiently important that we're going to pick it apart still further: \begin{verbatim} static PyTypeObject SimpleObjectType = { PyObject_HEAD_INIT(NULL) \end{verbatim} As noted above, defining a new type amounts to creating a new instance of the ``\code{type}'' type. As such an instance is a Python object just as much as a Python integer is, it must start with the fields of a \code{PyObject}. This is the purpose of the \code{PyObject_HEAD_INIT} macro. The line \begin{verbatim} PyObject_HEAD_INIT(NULL) \end{verbatim} itself is something of a wart. What we would have liked to have written is: \begin{verbatim} PyObject_HEAD_INIT(&PyType_Type) \end{verbatim} (as \code{SimpleObjectType} is to be an instance of type \code{type}) but this isn't strictly conforming C and is rejected by some compilers. Fortunately the \code{PyType_Ready} function will fill this field out for us. \begin{verbatim} 0, /* ob_size */ \end{verbatim} The \member{ob_size} field of the header is not used; its presence in the type structure is a historical artifact that is maintained for binary compatibility with extension modules compiled for older versions of Python. Always set this field to zero. \begin{verbatim} "simpletype.Simple", /* tp_name */ \end{verbatim} The name of our type. This will appear in the default textual representation of our objects and in some error messages, for example: \begin{verbatim} >>> '' + simpletype.Simple() Traceback (most recent call last): File "", line 1, in ? TypeError: cannot concatenate 'str' and 'simpletype.Simple' objects \end{verbatim} Note that the name is a dotted name that includes both the module name and the name of the type within the module. The module in this case is \module{simpletype} and name of the type is \class{Simple}, so we set the type name to ``\class{simpletype.Simple}''. Getting the name right is also vital for supporting pickling of instances of the custom type (XXX there must be other things too). %% Note that it is not essential that the module name (i.e. the part of %% \code{tp_name} that precedes the period) is the name of the module %% defining the type. For .... XXX \begin{verbatim} sizeof(SimpleObject), /* tp_basicsize */ \end{verbatim} This is so that Python knows how much memory to allocate when creating instances of our type. \begin{verbatim} 0, /* tp_itemsize */ \end{verbatim} This has to do with variable length objects like lists and strings. Ignore this for now. Skipping a number of type methods that we don't provide, we set the class flags to \constant{Py_TPFLAGS_DEFAULT}. \begin{verbatim} Py_TPFLAGS_DEFAULT, /*tp_flags*/ \end{verbatim} All types should include this constant in their flags. We provide a doc string for the type in \member{tp_doc}. \begin{verbatim} "Simple objects are simple", /* tp_doc */ \end{verbatim} So far we have used nothing but the default behaviour Python supplies for objects. The only area where the default is not what we want is that the default behaviour is to forbid instantiation of a type. To change this, we need to supply a \code{tp_new} method. For the usual case of a instantiatable type, Python provides a function \code{PyType_GenericNew} which we can use. Unfortunately we can't just include this in the definition of \code{SimpleObjectType} as on some platforms or compilers, we can't statically initialize a structure member with a function defined in another C module, so instead we assign the \member{tp_new} slot in the module initialization function just before calling \cfunction{PyType_Ready()}: \begin{verbatim} SimpleObjectType.tp_new = PyType_GenericNew; if (PyType_Ready(&SimpleObjectType) < 0) return; \end{verbatim} The \code{PyType_Ready()} function performs some sanity checks and various pieces of book-keeping. The only remaining unfamiliar code is the code that exposes the type object to Python: \begin{verbatim} Py_INCREF(&SimpleObjectType); PyModule_AddObject(m, "Simple", (PyObject *)&SimpleObjectType); \end{verbatim} That's it! All that remains is to put \verbatiminput{simpletype-setup.py} in a file called \file{simpletype-setup.py}, execute \begin{verbatim} $ python simpletype-setup.py build_ext -i \end{verbatim} %$ <-- bow to font-lock ;-( and play around with your first extension type in the interactive interpreter. The remainder of this chapter is devoted to creating extension types with more (well, any) capabilities. \section{Adding Data and Methods} For a less trivial example, consider the following Python code: \begin{verbatim} class Name(object): def __init__(self, forname='', surname=''): self.forname = forname self.surname = surname def name(self): return "%s %s"%(self.forname, self.surname) \end{verbatim} This is a perfectly fine Python class with no particular reason to be rewritten in C, but as we need an example, that's exactly what we'll do: \verbatiminput{namemodule.c} \section{Supporting Subclassing} Hmm, need to do some reading for this! \section{Providing Finer Control over Data Attributes} blah blah blah \section{Supporting Cyclic Garbage Collection} blah blah blah