Rabbit: A Compiler for Scheme/Chapter 4

4. The Target Machine

Compiled code is interfaced to the SCHEME interpreter in two ways. The interpreter must be able to recognize functional objects which happen to be compiled and to invoke them with given arguments; and compiled code must be able to invoke any function, whether interpreted or compiled, with given arguments. (This latter interface is traditionally known as the "UUO Handler" as the result of the widespread use of the PDP-10 in implementing LISP systems. [DEC] [Moon] [Teitelman]) We define here an arbitrary standard form for functional objects, and a standard means for invoking them.

In the PDP-10 MacLISP implementation of SCHEME, a function is, in general, represented as a list whose car contains one of a set of distinguished atomic symbols. (Notice that LAMBDA is not one of these; a LAMBDA-expression may evaluate to a function, but is not itself a valid function.) This set of symbols includes EXPR, SUBR, and LSUBR, denoting primitive MacLISP functions of those respective types; BETA, denoting a SCHEME function whose code is interpretive; DELTA, denoting an escape function created by the interpreter for a CATCH form, or a continuation given by the interpreter to compiled code; CBETA, denoting a SCHEME function or continuation whose code is compiled; and EPSILON, denoting a continuation created when compiled code invokes interpreted code. Each of these function types requires a different invocation convention; the interpreter must distinguish these types and invoke them in the appropriate manner. For example, to invoke an EXPR the MacLISP FUNCALL construct must be used. A BETA must be invoked by creating an appropriate environment, using the given arguments, and then interpreting the code of the function.

We have arbitrarily defined the CBETA interface as follows: there are a number of "registers", in the form of global variables. Nine registers called **CONT**, **ONE**, **TWO**, ..., **EIGHT** are used to pass arguments to compiled functions. **CONT** contains the continuation. The others contain the arguments prescribed by the user; if there are more than eight arguments, however, then they are passed as a list of all the arguments in register **ONE**, and the others are unused. (Any of a large variety of other conventions could have been chosen, such as the first seven arguments in seven registers and a list of all remaining arguments in **EIGHT**. We merely chose a convention which would be workable and convenient, reflect the typical finiteness of hardware register sets, and mirror familiar LISP conventions. The use of a list of arguments is analogous to the passing of an arbitrary number of arguments on a stack, sometimes known as the LSUBR convention. [Moon] [Declarative])

There is another register called **FUN**. A function is invoked by putting the functional object in **FUN**, its arguments in the registers already described, and the number of arguments in the register **NARGS**, and then exiting the current function. Control (at the MacLISP level) is then transferred to a routine (the "SCHEME UUO handler") which determines the type of the function in **FUN** and invokes it.

A continuation is invoked in exactly the same manner as any other kind of function, with two exceptions: a continuation does not itself require a continuation, so **CONT** need not be set up; and a continuation always takes a single argument, so **NARGS** need not be set to 1. {Note Multiple-Argument Continuations}

A CBETA form has additional fixed structure. Besides the atomic symbol CBETA in the car, there is always in the cadr the address of the code, and in the cddr the environment. The form of the environment is completely arbitrary as far as the SCHEME interpreter is concerned; indeed, the CHEAPY compiler and the RABBIT compiler use completely different formats for environments for compiled function. (Recall that this cannot matter since the only code which will ever be able to access that environment is the code belonging the the functional closure of which that environment is a part.) The "UUO handler" puts the cddr of **FUN** in the register **ENV**, and then transfers to the address in the cadr of **FUN**. When that code eventually exits, control returns to the "UUO handler", which expects the code to have set up **FUN** and any necessary arguments.

There is a set of "memory locations" -l1-, -l2-, ... which are used to hold intermediate quantities within a single user function. (Later we shall see that we think of these as being used to pass values between internally generated functions within a module. For this purpose we think of the "registers" and "memory locations" being arranged in a single sequence **CONT**, **0NE**, ..., **EIGHT**, -l1-, -l2-, ... There is in principle an unbounded number of these "memory locations", but RABBIT can determine (and in fact outputs as a declaration for the MacLISP compiler) the exact set of such locations used by any given function.) One may think of the "memory locations" as being local to each module, since they are never used to pass information between modules; in practice they are implemented as global MacLISP variables.

The registers **FUN**, **NARGS**, **ENV**, and the argument registers are the only global registers used by compiled SCHEME code (other than the "memory locations"). Except for global variables explicitly mentioned by the user program, all communication between compiled SCHEME functions is through these registers. It is useful to note that the continuation in **CONT** is generally analogous to the usual "control stack" which contains return addresses, and so we may think of **CONT** as our "stack pointer register".