Rabbit: A Compiler for Scheme/Chapter 4
4. The Target Machine
Compiled code is interfaced to the SCHEME interpreter in two ways. The interpreter must be able to recognize functional objects which happen to be compiled and to invoke them with given arguments; and compiled code must be able to invoke any function, whether interpreted or compiled, with given arguments. (This latter interface is traditionally known as the "UUO Handler" as the result of the widespread use of the PDP-10 in implementing LISP systems. [DEC] [Moon] [Teitelman]) We define here an arbitrary standard form for functional objects, and a standard means for invoking them.
In the PDP-10 MacLISP implementation of SCHEME, a function is, in general, represented as a list whose car contains one of a set of distinguished atomic symbols. (Notice that LAMBDA is not one of these; a LAMBDA-expression may evaluate to a function, but is not itself a valid function.) This set of symbols includes EXPR, SUBR, and LSUBR, denoting primitive MacLISP functions of those respective types; BETA, denoting a SCHEME function whose code is interpretive; DELTA, denoting an escape function created by the interpreter for a CATCH form, or a continuation given by the interpreter to compiled code; CBETA, denoting a SCHEME function or continuation whose code is compiled; and EPSILON, denoting a continuation created when compiled code invokes interpreted code. Each of these function types requires a different invocation convention; the interpreter must distinguish these types and invoke them in the appropriate manner. For example, to invoke an EXPR the MacLISP FUNCALL construct must be used. A BETA must be invoked by creating an appropriate environment, using the given arguments, and then interpreting the code of the function.
We have arbitrarily defined the CBETA interface as follows: there are a
number of "registers", in the form of global variables. Nine registers called **CONT**, **ONE**, **TWO**, ..., **EIGHT** are used to pass arguments to compiled
functions. **CONT** contains the continuation. The others contain the arguments
prescribed by the user; if there are more than eight arguments, however, then
they are passed as a list of all the arguments in register **ONE**, and the
others are unused. (Any of a large variety of other conventions could have been
chosen, such as the first seven arguments in seven registers and a list of all
remaining arguments in **EIGHT**. We merely chose a convention which would be
workable and convenient, reflect the typical finiteness of hardware register
sets, and mirror familiar LISP conventions. The use of a list of arguments is
analogous to the passing of an arbitrary number of arguments on a stack,
sometimes known as the LSUBR convention. [Moon] [Declarative])
There is another register called **FUN**. A function is invoked by
putting the functional object in **FUN**, its arguments in the registers already
described, and the number of arguments in the register **NARGS**, and then
exiting the current function. Control (at the MacLISP level) is then transferred
to a routine (the "SCHEME UUO handler") which determines the type of the function
in **FUN** and invokes it.
A continuation is invoked in exactly the same manner as any other kind of
function, with two exceptions: a continuation does not itself require a
continuation, so **CONT** need not be set up; and a continuation always takes a
single argument, so **NARGS** need not be set to 1. {Note Multiple-Argument Continuations}
A CBETA form has additional fixed structure. Besides the atomic symbol
CBETA in the car, there is always in the cadr the address of the code, and in the
cddr the environment. The form of the environment is completely arbitrary as far
as the SCHEME interpreter is concerned; indeed, the CHEAPY compiler and the
RABBIT compiler use completely different formats for environments for compiled function. (Recall that this cannot matter since the only code which will ever be
able to access that environment is the code belonging the the functional closure
of which that environment is a part.) The "UUO handler" puts the cddr of **FUN**
in the register **ENV**, and then transfers to the address in the cadr of
**FUN**. When that code eventually exits, control returns to the "UUO handler",
which expects the code to have set up **FUN** and any necessary arguments.
There is a set of "memory locations" -l1-, -l2-, ... which are used to
hold intermediate quantities within a single user function. (Later we shall see
that we think of these as being used to pass values between internally generated
functions within a module. For this purpose we think of the "registers" and
"memory locations" being arranged in a single sequence **CONT**, **0NE**, ...,
**EIGHT**, -l1-, -l2-, ... There is in principle an unbounded number of these
"memory locations", but RABBIT can determine (and in fact outputs as a
declaration for the MacLISP compiler) the exact set of such locations used by any
given function.) One may think of the "memory locations" as being local to each
module, since they are never used to pass information between modules; in
practice they are implemented as global MacLISP variables.
The registers **FUN**, **NARGS**, **ENV**, and the argument registers are
the only global registers used by compiled SCHEME code (other than the "memory
locations"). Except for global variables explicitly mentioned by the user
program, all communication between compiled SCHEME functions is through these
registers. It is useful to note that the continuation in **CONT** is generally
analogous to the usual "control stack" which contains return addresses, and so we
may think of **CONT** as our "stack pointer register".