PyPy
PyPy[howto-logicobjspace-0.9]
modified Mar 27, 2007 by Carl Friedrich Bolz

How to use the Logic Object space features of PyPy 0.9

Outline

This document gives some (outdated wrt what sits in the repository, a much better document to read would be the EU Interim Report) information about the content and usage of an extension of PyPy known as the Logic Objectspace (LO). The LO, when finished, will provide additional builtins that will allow to write:

  • concurrent programs based on coroutines scheduled by dataflow logic variables,
  • concurrent logic programs,
  • concurrent constraint problems,
  • new search "engines" to help solve logic/constraint programs.

The 0.9 preview comes without logic programming; the constraint solver is only lightly tested, is not equipped with some specialized but important propagators for linear relations on numeric variables, and might support concurrency - but that would be an accident; the dataflow scheduling of coroutines is known to fail in at least one basic and important case.

In this document, we skim over these topics, hoping to give enough information and examples for an uninformed user to understand what is going on and how to use the provided functionality.

To fire up a working PyPy with the LO, please type:

/root-of-pypy-dist/pypy/bin/py.py -o logic --withmod-_stackless

Logic Variables and Dataflow Synchronisation of Coroutines

Logic Variables

Description and examples

Logic variables are (should be, in an ideal LO) similar to Python variables in the following sense: they map names to values in a defined scope. But unlike normal Python variables, they have two states: free and bound. A bound logic variable is indistinguishable from a normal Python value, which it wraps. A free variable can only be bound once (it is also said to be a single-assignment variable). It is good practice to denote these variables with a beginning capital letter, so as to avoid confusion with normal variables.

The following snippet creates a new logic variable and asserts its state:

X = newvar()
assert is_free(X)
assert not is_bound(X)

Logic variables can be bound like that:

bind(X, 42)
assert X / 2 == 21

The single-assignment property is easily checked:

bind(X, 'hello') # would raise a FailureException
bind(X, 42)      # is admitted (it is a no-op)

In the current state of the LO, a generic Exception will be raised. It is quite obvious from this that logic variables are really objects acting as boxes for python values. No syntactic extension to Python is provided yet to lessen this inconvenience.

The bind operator is low-level. The more general operation that binds a logic variable is known as "unification". Unify is an operator that takes two arbitrary data structures and tries to assert their equality, much in the sense of the == operator, but with one important twist: unify mutates the state of the involved logic variables.

Unifying structures devoid of logic variables, like:

unify([1, 2], [1, 2])
unify(42, 43)

is equivalent to an assertion about their equality, the difference being that a FailureException will be raised instead of an AssertionError, would the assertion be violated:

assert [1, 2] == [1, 2]
assert 42 == 43

A basic example involving logic variables embedded into dictionaries:

Z, W = newvar(), newvar()
unify({'a': 42, 'b': Z},
      {'a':  Z, 'b': W})
assert Z == W == 42

Unifying one unbound variable with some value (a) means assigning the value to the variable (which then satisfies equality), unifying two unbound variables (b) aliases them (they are constrained to reference the same -future- value).

Assignment or aliasing of variables is provided underneath by the 'bind' operator.

An example involving custom data types:

class Foo(object):
    def __init__(self, a):
        self.a = a
        self.b = newvar()

f1 = Foo(newvar())
f2 = Foo(42)
unify(f1, f2)
assert f1.a == f2.a == 42    # assert (a)
assert alias_of(f1.b, f2.b)  # assert (b)
unify(f2.b, 'foo')
assert f1.b == f2.b == 'foo' # (b) is entailed indeed

The operators table

Logic variables support the following operators (with their arity):

Predicates

is_free/1
any -> bool
is_bound/1
any -> bool
alias_of/2
logic vars. -> bool

Variable Creation

newvar/0
nothing -> logic variable

Mutators

bind/2
logic var., any -> None
unify/2
any, any -> None

Micro-threads and dataflow synchronisation

Description and examples

When a piece of code tries to access a free logic variable, the thread in which it runs is blocked (suspended) until the variable becomes bound. This behaviour is known as "dataflow synchronization" and mimics exactly the dataflow variables from the Oz programming language. With respect to behaviour under concurrency conditions, logic variables come with two operators :

  • wait: this suspends the current thread until the variable is bound, it returns the value otherwise (impl. note: in the logic object space, all operators make an implicit wait on their arguments)
  • wait_needed: this suspends the current thread until the variable has received a wait message. It has to be used explicitly, typically by a producer thread that wants to produce data only when needed.

In this context, binding a variable to a value will make runnable all threads blocked on this variable.

Wait and wait_needed allow to write efficient lazy evaluating code.

Using the "uthread" builtin (which spawns a coroutine and applies the 2..n args to its first arg), here is how to implement a producer/consumer scheme:

def generate(n, limit):
    if n < limit:
        return (n, generate(n + 1, limit))
    return None

def sum(L, a):
    Head, Tail = newvar(), newvar()
    unify(L, (Head, Tail))
    if Tail != None:
        return sum(Tail, Head + a)
    return a + Head

X = newvar()
S = newvar()

unify(S, uthread(sum, X, 0))
unify(X, uthread(generate, 0, 10))

assert S == 45

Note that this eagerly generates all elements before the first of them is consumed. Wait_needed helps us write a lazy version of the generator. But the consumer will be responsible of the termination, and thus must be adapted too:

def lgenerate(n, L):
    """lazy version of generate"""
    wait_needed(L)
    Tail = newvar()
    bind(L, (n, Tail))
    lgenerate(n+1, Tail)

def lsum(L, a, limit):
    """this summer controls the generator"""
    if limit > 0:
        Head, Tail = newvar(), newvar()
        wait(L)
        unify(L, (Head, Tail))
        return lsum(Tail, a+Head, limit-1)
    else:
        return a

Y = newvar()
T = newvar()

uthread(lgenerate, 0, Y)
unify(T, uthread(lsum, Y, 0, 10))

wait(T)
assert T == 45

Please note that in the current LO, we deal with coroutines, not threads (thus we can't rely on preemptive scheduling to lessen the problem with the eager consumer/producer program). Also nested coroutines don't schedule properly yet. This impacts the ability to write a simple program like the following:

def sleep(X, Barrier):
    wait(X)
    bind(Barrier, True)

def wait_two(X, Y):
    Barrier = newvar()
    uthread(sleep, X, Barrier)
    uthread(sleep, Y, Barrier)
    wait(Barrier)
    if is_free(Y):
        return 1
    return 2

X, Y = newvar(), newvar()
o = uthread(wait_two, X, Y)
unify(X, Y)
unify(Y, 42)
assert X == Y == 42
assert o == 2

Finally, it must be noted that the bind/unify and wait pair of operations are quite similar to the asynchronous send and receive primitives commonly used for inter-process communication.

The operators table

Blocking ops

wait/1 # blocks if first argument is a free logic var., till it becomes bound
value -> value
wait_needed/1 # blocks until its arg. receives a wait
logic var. -> logic var.

Coroutine spawning

uthread/n | 1 <= n
callable, opt args. -> logic var.

Constraint Programming

The LO comes with a flexible, extensible constraint solver engine. While regular search strategies such as depth-first or breadth-first search are provided, you can write better, specialized strategies (an example would be best-search). We therein describe how to use the solver to specify and get the solutions of a constraint satisfaction problem, and then highlight how to extend the solver with new strategies.

Using the constraint engine

Specification of a problem

A constraint satisfaction problem is defined by a triple (X, D, C) where X is a set of finite domain variables, D the set of domains associated with the variables in X, and C the set of constraints, or relations, that bind together the variables of X.

Note that the constraint variables are NOT logic variables. Not yet anyway.

So we basically need a way to declare variables, their domains and relations; and something to hold these together. The later is what we call a "computation space". The notion of computation space is broad enough to encompass constraint and logic programming, but we use it there only as a box that holds the elements of our constraint satisfaction problem. Note that it is completely unrelated to the notion of object space (as in the logic object space).

A problem is a one-argument procedure defined as follows:

def simple_problem(cs):
    cs.var('x', FiniteDomain(['spam', 'egg', 'ham']))
    cs.var('y', FiniteDomain([3, 4, 5]))

This snippet defines a couple of variables and their domains, on the 'cs' argument which is indeed a computation space. Note that we didn't take a reference of the created variables. We can query the space to get these back if needed, and then complete the definition of our problem. Our problem, continued:

... x = cs.find_var('x')
    y = cs.find_var('y')
    cs.tell(make_expression([x,y], 'len(x) == y'))

    return x, y

We must be careful to return the set of variables whose candidate values we are interested in. The rest should be sufficiently self-describing...

Getting solutions

Now to get and print solutions out of this, we must:

import solver
cs = newspace()
cs.define_problem(simple_problem)

for sol in solver.solve(cs):
    print sol

The builtin solve function returns a generator. You will note with pleasure how slow the search can be on a solver running on a Python interpreter written in Python, the later running on top of cpython... It is expected that the compiled version of PyPy + LO will provide decent performance.

Table of Operators

Note that below, "variable/expression designators" really are strings.

Space creation

newspace/0

Finite domain creation

FiniteDomain/1
list of any -> FiniteDomain

Expressions

make_expression/2
list of var. designators, expression designator -> Expression
AllDistinct/1
list of var. designators -> Expression

Space methods

var/2
var. designator, FiniteDomain -> constraint variable instance
find_var/1
var. designator -> constraint variable instance
tell/1
Expression -> None
define_problem/1
procedure (space -> tuple of constraint variables) -> None

Extending the search engine

Writing a solver

Here we show how the additional builtin primitives allow you to write, in pure Python, a very basic solver that will search depth-first and return the first found solution.

As we've seen, a CSP is encapsulated into a so-called "computation space". The space object has additional methods that allow the solver implementor to drive the search. First, let us see some code driving a binary depth-first search:

1   def first_solution_dfs(space):
2       status = space.ask()
3       if status == 0:
4           return None
5       elif status == 1:
6           return space.merge()
7       else:
8           new_space = space.clone()
9           space.commit(1)
10          outcome = first_solution_dfs(space)
11          if outcome is None:
13              new_space.commit(2)
14              outcome = first_solution_dfs(new_space)
15          return outcome

This recursive solver takes a space as argument, and returns the first solution or None. Let us examine it piece by piece and discover the basics of the solver protocol.

The first thing to do is "asking" the space about its status. This may force the "inside" of the space to check that the values of the domains are compatibles with the constraints. Every inconsistent value is removed from the variable domains. This phase is called "constraint propagation". It is crucial because it prunes as much as possible of the search space. Then, the call to ask returns a positive integer value which we call the space status; at this point, all (possibly concurrent) computations happening inside the space are terminated.

Depending on the status value, either:

  • the space is failed (status == 0), which means that there is no combination of values of the finite domains that can satisfy the constraints,
  • one solution has been found (status == 1): there is exactly one valuation of the variables that satisfy the constraints,
  • several branches of the search space can be taken (status represents the exact number of available alternatives, or branches).

Now, we have written this toy solver as if there could be a maximum of two alternatives. This assumption holds for the simple_problem we defined above, where a binary "distributor" (see below for an explanation of this term) has been chosen automatically for us, but not in the general case. See the sources for a more general-purpose solver and more involved sample problems (currently, probably only conference_scheduling is up to date with the current API).

In line 8, we take a clone of the space; nothing is shared between space and newspace (the clone). We now have two identical versions of the space that we got as parameter. This will allow us to explore the two alternatives. This step is done, line 9 and 13, with the call to commit, each time with a different integer value representing the branch to be taken. The rest should be sufficiently self-describing.

This shows the two important space methods used by a search engine: ask, which waits for the stability of the space and informs the solver of its status, and commit, which tells a space which road to take in case of a fork.

Using distributors

Now, earlier, we talked of a "distributor": it is a program running in a computation space. It could be anything, and in fact, in the final version of the LO, it will be any Python program, augmented with calls to non-deterministic choice points. Each time a program embedded in a computation space reaches such a point, it blocks until some Deus ex machina makes the choice for him. Only a solver can be responsible for the actual choice (that is the reason for the name "non deterministic": the decision does not belong to the embedded program, only to the solver that drives it).

In the case of a CSP, the distributor is a simple piece of code, which works only after the propagation phase has reached a fixpoint. Its policy will determine the fanout, or branching factor, of the current computation space (or node in the abstract search space).

Here are two examples of distribution strategies:

  • take the variable with the biggest domain, and remove exactly one value from its domain; thus we always get two branches: one with the value removed, the other with only this value remaining,
  • take a variable with a small domain, and keep only one value in the domain for each branch (in other words, we "instantiate" the variable); this makes for a branching factor equal to the size of the domain of the variable.

There are a great many ways to distribute... Some of them perform better, depending on the characteristics of the problem to be solved. But there is no absolutely better distribution strategy. Note that the second strategy given as example there is what is used (and hard-wired) in the MAC algorithm.

Currently in the LO we have two builtin distributors:

  • NaiveDistributor, which distributes domains by splitting the smallest domain in 2 new domains; the first new domain has a size of one, and the second has all the other values,
  • SplitDistributor, which distributes domains by splitting the smallest domain in N equal parts (or as equal as possible). If N is 0, then the smallest domain is split in domains of size 1; a special case of this, DichotomyDistributor, for which N == 2, is also provided and is the default one.

To explicitly specify a distributor for a constraint problem, you need to say, in the procedure that defines the problem:

cs.set_distributor(NaiveDistributor())

It is not possible currently to write distributors in pure Python; this is scheduled for PyPy version 1.

Remaining space operators

For solver writers

ask/0
nothing -> a positive integer i
commit/1
integer in [1, i] -> None
merge/0
nothing -> list of values (solution)

For distributor writers

choose