============================================= Garbage Collection in PyPy -- Summer of Code ============================================= Thanks to Google for sponsoring this work! Goals ----- * following PyPy's main scheme: implementing Garbage Collection in Python! * flexibility, simplicity * bla * bla * bla ========= Addresses ========= * Addresses (pypy.rpython.memory.lladdress.address) provide a general method to access blobs of memory * Example usage:: addr = raw_malloc(16) addr.signed[0] = 1 assert addr.signed[0] == 1 addr.signed[1] = 42 assert (addr + 4).signed[0] == 42 raw_free(addr) * running on Python, they are simulated with extra checks:: addr.signed[0] --> Crash =========================== Explicitely managed classes =========================== * To have a higher level way to explicitely manage memory, you can declare a class to be explicitely freed:: class A(object): _malloc_flavor_ = 'raw' def __init__(self, b): self.b = b a = A(1) assert a.b == 1 free_non_gc_object(a) * running on Python there free_non_gc_object does things to remind you of the fact that the instance was freed:: a.b --> Crash ================== Garbage Collection ================== * Garbage Collection is a way to automatically free objects that can be no longer accessed by the user program (mutator) * This is generally done by starting from the *roots*: * the are the objects the program can access at the moment * basically all the pointers in the stack plus Constants * From the roots on the GC recursively follows all the embedded pointers * every object that can be reached by this method is life * every object that can not be reached is considered garbage and can be deleted (after such annoying things like finalization) ======= Example ======= .. raw:: html


.. image:: liveness.png .. raw:: html


======================== Memory Layout of the GCs ======================== * The GC needs information about the objects it collects to find the contained pointers * This information is stored in front of the object:: +---<- program sees only this | +---------+---------+----------------------------+ | gc info | type id | object data | | signed | signed | whatever ... | +---------+---------+----------------------------+ * the GC decides what it stores there -- even nothing * most GCs need the typeid, it provides access to the information a GC needs to know about a type ==================== Type query functions ==================== * ``is_varsize(typeid) --> bool`` * ``offsets_to_gc_pointers(typeid)`` --> list of offsets * ``fixed_size(typeid)`` --> size * ``varsize_item_sizes(typeid)`` --> size * ``varsize_offset_to_variable_part(typeid)`` --> offset * ``varsize_offset_to_length(typeid)`` --> offset * ``varsize_offsets_to_gcpointers_in_var_part(typeid)`` --> list of offsets * the GC uses these functions to get details about object layout * the typeids are retrieved from the memory in front of the object .. raw:: html



========== GC methods ========== * ``malloc(self, typeid, length=0)`` --> address * ``collect(self) --> None`` * ``size_gc_header(self, typeid)`` --> size * ``init_gc_object(self, addr, typeid) --> None`` * ``init_gc_object_immortal(self, addr, typeid) --> None`` * ``write_barrier(self, addr, addr_to, addr_struct) --> None`` .. raw:: html





============================== Tying the GC into the LLInterp ============================== * for now using a GC works only using the LLInterpreter, since you can't reliably find roots in C * some operations are implemented by calling methods of the GC: * ``setfield, setarrayitem --> write_barrier`` * ``malloc, malloc_varsize --> malloc`` * there has to be a way to call ``collect`` from user level, some fishing needed * ``init_gc_object_immortal`` is called by the code that converts the constants in a graph to a format the GC can use