Design of Asynchronous Circuits Assuming

1110 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-18, NO. 12, DECEMBER 1969 Design of Asynchronous Circuits Assuming Unbounded Gate Delays DOUGLAS B. ARMSTRONG, MEMBER, IEEE, ARTHUR D. FRIEDMAN, AND PREMACHANDRAN R. MENON Abstract-This paper considers the general problem of the synthesis of asynchronous combinational and sequential circuits based on the assumption that gate delays may be unbounded and that line delays are suitably constrained. Certain problems inherent to circuit realizations with unbounded gate delays are discussed and methods of solving them are proposed. Specific synthesis techniques are presented for both combinational and sequential circuits. The use of completion detection necessitated by the assumption of unbounded gate delays also causes the circuits to stop operating for approximately half of all possible single faults, thus achieving a degree of self-checking. Index Terms-Asynchronous sequential circuits, combinational circuits, completion detection, unbounded gate delays. I. INTRODUCTION ASYNCHRONOUS sequential circuit design has been studied using two different sets of assumptions regarding stray delays in the circuit. In the first, due to Huffman [1], line and gate delays are assumed to be bounded and no restrictions are imposed on their relative magnitudes. The time interval between successive input changes is assumed to be sufficient to allow the circuit to settle to a stable state. In the second set of assumptions, due to Muller [2], gate delays are assumed to be unbounded but line delays are assumed to be zero. In this paper we consider the general problem of synthesis of combinational and sequential circuits based on Muller's assumption of unbounded gate delays. The line delay assumption can be relaxed somewhat, making the results particularly applicable to integrated circuits where line lengths are extremely short. The design of circuits under the unbounded gate delay assumption leads to circuit realizations which will operate correctly independent of the gate delays, thus eliminating the need to consider the worst case of delays. This should lead to faster operating circuits, especially when there is a large spread in gate delays, or when the time for performing an operation is strongly dependent on the inputs and can vary over a wide range, as in the ripple carry adder [3] and certain other iterative circuits [4]. It can be shown that the circuits designed using the methods presented in this paper are partially self-checking in that they will stop operating for approximately half of the single faults that may occur. Manuscript received December 28, 1967; revised May 26, 1969. D. B. Armstrong is with Bell Telephone Laboratories, Inc., Whippany, N. J. A. D. Friedman and P. R. Menon are with Bell Telephone Laboratories, Inc., Murray Hill, N. J. The assumptions and the circuit model used throughout the paper are given in Section II. It is shown in Section III that circuits with unbounded gate delays may contain a type of hazard, called delay hazard, which has not been explicitly recognized in the literature. These hazards are defined and methods of eliminating them are discussed. Section IV discusses the conditions to be satisfied by the input and output codes. Two types of codes are considered. In one type, the code words consist of a set of data words and a special word called the spacer. In the second type, each uncoded word is coded in two different ways. Sections V and VI present methods of realizing combinational and sequential circuits using these codes. Combinational and sequential circuit modules with output storage and the means of interconnecting them are discussed in Section VII. Section VIII discusses some of the practical aspects of circuits designed under the unbounded delay assumption. These include fan-in and fan-out restrictions, relaxation of the zero line delay assumption, and the degree of self-checking attained. II. ASSUMPTIONS We assume that each gate acts like an instantaneous decision element with an unbounded lumped pure delay in its output, as in Fig. l(a). Perhaps a more realistic model would also include a delay element in each gate input, as in Fig. l(b). These input delays will, for example, take account of the fact that changing signals on gate inputs are in reality ramp-like functions of time rather than ideal step functions, and that the gate senses an input change when the ramp signal reaches some particular threshold value. If the input delays are assumed to be bounded, then our design procedures are still valid for the model in Fig. 1 (b) because the input delay may be included in the delay of the line feeding this input. The augmented line delays must satisfy the relaxed line delay assumption given in Section VIII. However, if the gate input delays are assumed to be unbounded, this is equivalent to assuming unbounded line delays. No design methods are known for the situation where line and gate delays are both unbounded. When gate delays are assumed to be unbounded, the time interval between successive input changes cannot be made large enough to allow the circuit to reach a stable state. In the model of Fig. 2, which will be used throughout the paper, input changes are initiated by

ARMSTRONG et al.: DESIGN OF ASYNCHRONOUS CIRCUITS 1111 Fig. 1. IDEAL GATE _~~~~~ PURE DELAY (a) DM~~~~~ IDEAL GATE *"-PURE DELAY (b) Two models of stray delays. Fig. 2. Circuit model. request signals generated by an appropriate completion detector. In Fig. 2, the source supplies coded inputs to the logic circuit when it receives an appropriate request signal from the completion detector. The logic circuit may be combinational or sequential. The completion detector contains combinational logic to determine when the outputs have become stable. The need for the connection between the inputs and the completion detector will be discussed in Section III. In the rest of the paper, the design of the logic circuit (combinational and sequential) and the completion detector will be discussed, assuming that the source operates in the prescribed manner. We have been unable to design such a source, using elements with unbounded delays, which will accept uncoded data from the outside world. It is, of course, realizable using elements with bounded delays. III. DELAY HAZARDS For the proper operation of the model of Fig. 2, the circuit should be transient-free. That is, the signal on any wire should not change temporarily when it is required to remain fixed, or change more than once when it is required to change once. A circuit is said to contain a hazard if for some transient-free input change, there exists some combination of stray delays for which the output may contain a transient. Two types of hazards have been defined in the literature [5], [6]. Although these terms are usually defined for a single input variable change in any transition, the definitions can be extended to include cases where several input variables change. Let A1 and I2 be two input states, not necessarily adjacent, applied in succession to a circuit which realizes a combinational function f. The circuit has a static hazard if f(12) =f(i1) and the input sequence 1112 can generate the output sequence f(ii)j(ii)f(ii). The circuit contains a dynamic hazard if f(12) zff(ii) and the input sequence I112 can produce the output sequence f(i1)f(12)f(i1)f(12) or a longer sequence. A different type of hazard, which we shall refer to as a delay hazard, can occur in circuits with unbounded delays. A sequence of at least three input states (I1I213) must be applied in order for the effect of a delay hazard to be observed. Explicit recognition does not seem to have been given to this type of hazard in previous work. Although transients due to delay hazards can occur for input sequences of length greater than three, we shall define them only for sequences of length three. As in the previous case, two types of delay hazards can be defined. A circuit which realizes a combinational function f contains a static delay hazard if for some input sequence I1I213, which may be applied to the circuit such that f(13) =f(i2), the output sequence f(11)f(12)j(12)f(12) can be produced, and the transient is not due to static hazards. The circuit contains a dynamic delay hazard if for some input sequence 111213 such that f(13) 5f(I2), the output sequence f(11)f(12)f(13)f(12)f(13) or a longer sequence can be produced, and the transients in the output are not due to dynamic hazards. Delay hazards occur when the input sequence 11213 is applied to a circuit and the I2 -*I3 transition may be initiated before all gates in the circuit have settled in response to I1-JI2. The delayed signal changes which are produced by the settling of these slow gates, when occurring simultaneously with other signal changes caused by 12-413, may produce a transient in f during the I2-413 transition. In circuits with bounded delays, delay hazards are easily eliminated by making the interval between transitions long enough so that the circuit has time to settle completely before a new input is applied. Though delay hazards are caused by the initiation of an input transition before the circuit has reached a stable state, the application of completion detection to the circuit outputs only may not be sufficient to eliminate them. For example, consider an OR gate whose inputs are P, Q, R... and whose output is U. Let U be initially 0 and let the input transition I1*I2 cause both P and Q to become 1. Let this transition be followed by the transition 12-13 which causes Q to return to 0 while P remains 1. If P is slower to change than Q because of the delay in some gate which feeds P, then during I1I2, U will go to 1 because Q goes to 1 while P is still 0. This change may propagate to a circuit output where it is detected by the completion detector. The detector may therefore conclude (erroneously) that the circuit has settled in response to 11-412 and initiate the I2-413 transition. The second transition causes Q to return to 0 while P is still at 0 because it has not responded to the first transition. Thus, during the 12-13 transition,

1112 U undergoes the transient 1>-*O--1 when it should remain at 1, and this transient may propagate to the circuit output. The above example shows that a slow 0-1 change on an input to an OR (or NOR) gate may be masked during transitions in which other inputs to the OR (NOR) change to 1 or remain at 1. Then if the delayed change occurs after these other inputs have returned to 0, it causes a transient on the output of the OR(NOR). Similarly, a slow 1->O change on the input to an AND (or NAND) gate may also create static delay hazards. The following methods can be used to eliminate these hazards. 1) Design the circuit so that not more than one input to any OR (or NOR) gate is 1 at any time, and not more than one input to any AND (or NAND) gate is 0 at any time. Under these conditions no slow changes can be masked, and a detector is required only to monitor the circuit outputs to determine completion. 2) Employ a detector to monitor the inputs to those gates for which condition 1) is not fulfilled, in addition to monitoring the circuit outputs. In this case a transition will not be considered to be complete until all the monitored wires whose signal values are supposed to change have in fact changed. The need for directly monitoring gate inputs can be eliminated by making use of the line delay assumption. IV. INPUT AND OUTPUT CODES A logic circuit (combinational or sequential) performs a mapping of input states into output states, both of which are represented by binary variables. In general, an input state may be followed by any other input state, resulting in the change of one or more input variables. The same is true for output states. The intermediate states that are passed through during any transition are called transient states. Note that transient states may be present when a transition occurs between two states which differ in more than one variable, even if the circuit itself produces no transients. To determine when a transition has been completed, the completion detector must be able to distinguish transient states from stable states. This can be achieved by coding the stable states, and employing a completion detector to recognize whether a state is a member of the code set. We have been unable to obtain workable circuit realizations using just one code set. However, feasible realizations employing two code sets have been obtained and we will present two such synthesis methods. In the first method, one code set consists of data words, which contain the input or output information, while the second code set consists of a single member, called the spacer. Special cases of this method have appeared in the literature [7]-[9]. In the second IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1969 method, both code sets consist of data words. This method has been proposed [10], but to our knowledge has not been implemented previously. A. Codes with a Single Data Set and a Spacer The source is assumed to generate input data words and the input spacer alternately. Under steady-state conditions the logic circuit maps the input spacer into the output spacer, and an input data word into an output data word. Different codes may be used for the input and the output, but input and output always alternate between data words and the corresponding spacers. The system undergoes two types of transitions: that from the spacer to a data word is called an SD transition, the reverse is called a DS transition. We shall use the following symbols. SI: the input spacer So: the output spacer DI: the set of input data words D1i: any member of the set DI Do: the set of output data words Doi: any member of the set Do. Correct circuit operation implies that a one-to-one correspondence must be maintained between output and input data words and also the respective spacers, and the input and output of the logic circuit must alternate between the spacer and data words. This leads to the following conditions to be satisfied by the input and output codes, which are necessary and sufficient for proper operation [4], [10]. 1) In any SD (SD) output transition So->Doi(Doi-0So), the output does not pass through any Don0 Doi. 2) In any SD (SD) input transition SI-*DIi(DIi--*SI), the input does not pass through any Dim/-D1i. The above conditions can be stated in a more convenient form if the data words and spacers are represented as vertices in appropriate Boolean cubes. If there are n input variables and q output variables, the assignment of input and output codes maps SI and DI onto vertices of an n cube, and SO and DO onto vertices of a q cube. Refer to the smallest subcube containing the SI vertex and any DIi vertex (or the So vertex and any Doi vertex) as the transition subcube for the transition S1-*D1i (or So->Doi). Then if input and output variables are permitted to change in any order, the code assignment must be such that the transition subcube for S1-ID1i (or SO-->Doi) does not contain any other vertex D1k (or Dok), i, j-k. We shall call this constraint the coding condition. The coding condition, taken in conjunction with the condition that the source is free of transients, and the fact that the logic and detector circuits are hazard-free is sufficient to ensure proper operation of the model of Fig. 2. Without loss of generality we can assign the all-zeros code to the spacer. Any code set which satisfies the coding condition with respect to the all-zeros spacer will

ARMSTRONG et al.: DESIGN OF ASYNCHRONOUS CIRCUITS be called a valid code. Any arbitrarily chosen spacer and code set which satisfy the coding condition can be mapped into the all-zeros spacer and a valid code merely by complementing the bits in the code words corresponding to the l's in the original spacer. Valid codes can be classified as pure or mixed codes depending on whether the weights of the code words are equal or unequal where the weight of a code word is defined to be the number of l's in the word.5the pure code of weight m whose words are of length n will be referred to as an m/n code. The following theorems are proven in [11]. Theorem 1: Any m/n code satisfies the coding condition with respect to the all-zeros spacer [4]. Theorem 2: For code words of any given length n, the pure code of weight m = n/2 (for n even) or m = n/2 + 2 (for n odd) contains more elements than any valid mixed code. The property of pure codes stated in Theorem 2 makes them more useful than mixed codes. A special case of the pure m/n code is the autosynchronous code [7]-[9], in which each uncoded variable is coded separately using 1/2 code. B. Codes with Two Data Sets The need for the spacer can be eliminated if the inputs alternate between codes DA and DB and the outputs also alternate between a similar pair of codes. Using arguments similar to those for the spacer-data case, it can be seen that proper operation will result if DA and DB (and also the output codes) are such that the words passed through during a transition from a word in DA to a word in DB (and vice versa) belong to neither DA nor DB. One set of codes which satisfies this condition is obtained by using two bits to represent each uncoded bit as shown below. uncoded bit code DB 0 1 code DA 0 1 1 0 0 0 1 1 Another set of codes whose use will be discussed in Section V has code words of 2n bits. Each member of DA has an m/n code word in bits 1-n and zeros in the remaining bits. Members of DB have zeros in bits 1-n and m/n coded words in the remaining bits. V. COMBINATIONAL CIRCUITS A. Realization using a Single Data Set and a Spacer The model to be used is the same as that shown in Fig. 2, except that the detector will be replaced by two detectors, one for the spacer and one for data. In addition to detecting spacer and data, the detectors will provide the means for eliminating delay hazards, both in the logic and in each other. Both the input and the output spacers will be assigned the all-zeros code. Input and output data words will be coded as m/n and p/q, respectively. The logic circuit, which generates q 1113 outputs, is realized as q two-level sum-of-products functions, and may contain shared logic. All input states having weight greater than m are DON'T CARES, as they will never be entered. Input states having weights less than m are entered transiently, and the function values assigned to them must be such as to avoid static and dynamic hazards. This is most simply achieved by assigning 0 to all such states. With this constraint on the assignment, the [prime implicant which covers any data state for which the output is required to be 1 and the all-l's input state is an essential prime implicant. The above method of assignment produces the following results: 1) in any SD transition only one implicant in any logic function becomes 1, assuming no input transients, and 2) the fact that all input states of weight less than m are assigned 0, in conjunction with the line delay assumption, ensures that the inputs have settled in the desired data state by the time the output data state has been detected. This means that the data detector need monitor only the logic outputs, and can request the spacer as soon as an output data word has been detected. It may be possible to obtain simpler logic functions by assigning l's to some states with weight less than m. This is permissible provided no hazards are introduced. However, the assignment of l's to such states makes it possible for the outputs to reach a data state before the inputs do, during SD transitions. In this case the data detector must monitor inputs as well as outputs to ensure that the logic has settled in a data state. This will result in a more complicated detector circuit than is needed when outputs only are monitored. Since at most one implicant is 1 at any time in any function, delay hazards do not exist on the inputs to the second-level OR gate in each function for each SD or DS transition. Because the circuit inputs are guaranteed to have changed by the time a data word has been detected at the output during SD transitions (due to the line delay assumption), no delay hazards exist on the first level AND gates for these transitions. However, during DS transitions, delay hazards may exist on the first-level AND gates because all transient input states produce the all-zeros output spacer, and thus inputs which are slow to change to 0 may be masked by other inputs which already have changed to 0. To eliminate these residual delay hazards in the logic, it is sufficient to have the spacer detector monitor both the circuit inputs and outputs, and not request the source for a new data input until both sets of wires have returned to all-zeros. As mentioned earlier, the data detector is required to monitor only the logic outputs. The spacer detector may consist simply of a NOR gate whose inputs are the circuit inputs and outputs. During an SD transition, all the spacer detector inputs that change to 1 do so before the data detector detects an output data word. Therefore there are no delay hazards in the spacer detector. The data detector consists of a two-level sum-of-

1114 IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1969 Uncoded Number (a) Coded Number 0 0 1 1 0 0 0 1 O 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 Coded Input (b) Coded Output XI X2 X3 X4 Yl Y2 Y3 Y4 1 0 0 11 0 0 O 0 1 1 1 0 1 0 1 0 1 0 O 0 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 xi X2 X3 X4 00 01 II 10 00 ILiEnIIi 01 O IJa 10 O 4L Y, xix2 x X 00 01 II 10 0 0 0 0 0 0 01 0 0 0 I10 _ O I@--j Y3 xx x2 x3x4 00 01 I 10 0 1 L2JiJLLL2 110..jjJju.gj x3 4 (d) xix2 00 01 O 10 00 0 0 0 0 01 0 I I 0 I10 0 y4 DATA DETECTOR 1t' l~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1 (e) Fig. 3. (a) Truth table of uncoded functions. (b) Code used for input and output. (c) Truth table of coded functions. (d) Function maps. (e) Realization.

ARMSTRONG et al.: DESIGN OF ASYNCHRONOUS CIRCUITS products function containing an implicant for each output data word. Thus, only one implicant is 1 at any time and there are no delay hazards on the inputs to the second-level OR gate in this circuit. There are also no delay hazards on the first-level AND gates because the line delay assumption assures that all inputs to these gates have returned to zero by the time the spacer detector detects the all-zeros condition. Thus, the delay hazards in the logic circuit and both detectors are eliminated. The source is assumed to satisfy the following two conditions. 1) It generates the next data word only when the spacer detector output is 1 and the data detector output is 0, and generates the spacer only when the outputs of the two detectors are 0 and 1, respectively. 2) Its outputs must not pass through any data state transiently during any transition. Example: Fig. 3 exhibits the design of a combinational circuit which performs the logical equivalent of cyclically permuting two-bit binary numbers according to the table in Fig. 3(a). The particular assignment of 2/4 code words used is shown in Fig. 3(b), and the truth table for the coded data in 3(c). The function maps and the circuit implementation are shown in Fig. 3(d) and (e), respectively. Fig. 4. Combinational circuit realization with alternating data sets-i. HIS3 B. Realization with Alternating Data Sets The code sets presented in Section IV can be used for realizing combinational functions. The elimination of the spacer leads to slightly faster operation at the expense of doubling the circuitry involved. Fig. 4 shows a realization using the first set of codes given in Section IV. Here the bits 0 and 1 are coded as 01 and 10 in DA and 00 and 11 in DB. The source is required to supply the inputs in double-rail form. That is, the inputs and their complements should be supplied. Both wires associated with an input variable may be zero simultaneously, but not one simultaneously. The source interprets RA = 1, RB =0 as a request for data in DA and RA 0, RB = 1 as a request for data in DB. It feeds two multioutput combinational circuits A and B, which in turn are connected to a common set of R-S flip-flops.1 The system outputs are taken from the flip-flops and also alternate between DA and DB. The outputs of circuit A are specified to be zero when its inputs do not represent a word in DA. With data in code DA as input, the outputs of A excite the output flip-flops to produce output coded in DA. Similarly, B is required ' Here and elsewhere in the paper, flip-flops will be assumed to be realized by pairs of cross-connected NOR gates. A flip-flop of this type is set or reset by a 1 on the appropriate input and 0 on the other. Ones on both inputs cause both outputs to be zero. Both outputs may also be zero simultaneously when the flip-flop is changing state. However, both outputs will never be one simultaneously. Fig. 5. Combinational circuit realization with alternating data sets-ii. to excite the flip-flops to produce output coded in DB when the input is in DB and have all outputs zero otherwise. The detector generates a request RB = 1 when all outputs of B are zero, the flip-flops are in the state to which they are excited, and the flip-flop outputs constitute a data word in code A. A request RA = 1 is generated under a similar set of conditions. Fig. 5 shows a method of realization using the second set of codes presented in Section IV. Here the code words are 2n bits long, but the source supplies single-rail inputs. Each member of code set DA consists of an m/n code in bits 1-n and zeros in the other n bits. Members of DB have m/n coded data in the half where members of DA have zeros, and zeros in the other half. As discussed earlier, signals RA and RB request data in codes DA and DB, respectively. The outputs of the combinational circuits alternate between the all zeros spacer and p/q coded data so that when circuit A has the spacer at its

1116 outputs, circuit B will have p/q-coded data and vice versa. Thus the signals on the 2q output wires alternate between two codes similar to the input codes. In both the methods presented above, the combinational circuits A and B may realize two entirely different sets of functions, if so desired. This allows the system to perform different operations on alternate data words. VI. SEQUENTIAL CIRCUITS Methods similar to those presented in Section V may also be employed for realizing sequential circuits. As in the case of combinational circuits, the inputs may be made to alternate between spacer and data or between two data sets. Due to the need for memory in sequential circuits, different techniques have to be employed to determine whether the circuit has reached a stable state. We assume that the circuit to be realized is represented by a flow table [1] and that the flow table is normal mode [12]. A normal mode flow table is one in which any transition leads directly to a stable state and no output is required to change more than once during any transition. Assuming that the sequential function to be realized is specified by a flow table with uncoded inputs, an augmented flow table is first obtained by the following procedure. The inputs are coded as m/n code words and an all-zeros spacer column containing only stable states is added. All state transitions are required to occur during SD input transitions. The outputs of the sequential circuit are also coded as p/q codes and the spacer output is associated with the spacer input, independent of the state of the circuit. Fig. 6 shows a method of realizing sequential circuits whose inputs and outputs alternate between the allzeros spacer and coded data. The realization requires a state assignment of the type discussed by Liu [13] and Friedman [14]. These assignments permit several variables to change simultaneously in a transition without critical races. An additional requirement is that all unstable states which lead to the same stable state in any column of the flow table, and the stable state itself, have a subset of variables in common which distinguishes these states from other such sets of states in the column. We shall refer to these assignments, which form a proper subset of single transition time assignments [15], as restricted single transition time (restricted STT) assignments. As shown in Fig. 6, a flip-flop is used for each state variable. The flip-flops enable the circuit to remain in its previous stable state during the spacer input, and also facilitate the detection of stable states. The flipflop excitation functions are all made zero with the spacer input. All flip-flops are excited when the inputs constitute a data word, whether or not the flip-flops are already in the desired states. During any SD transition at the inputs, each exci- IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1969 Fig. 6. Sequential circuit realization. tation function either remains at zero or changes to one. Because a restricted STT assignment is used, the next state for any given input is determined by a subset of the internal state (y) variables which remain fixed during the transition. This subset depends on the next state entry and the inputs and not on the state from which the transition is made. The excitation functions can therefore be realized in a two-level sum-of-products form such that not more than one implicant of each excitation function becomes one during any SD transition, thus eliminating delay hazards. The implicants which become one will depend only on the inputs and the y variables staying fixed during the transition. This may not be true for a more general single transition time assignment.2 In any stable state with data input, all flip-flops will be in the states to which they are excited and the outputs of the sequential circuit will be coded data. That is, the set side input and output, or the reset side input and output, of every flip-flop, and the output of the data detector connected to the output circuitry will be one. A stable state with spacer input is indicated by zeros at the inputs, outputs, and flip-flop excitations. Since the outputs of the OR gates used to detect stable states with data input will become zero only after the corresponding flip-flops become unexcited during a DS transition, the OR gate outputs may be checked instead of the flip-flop inputs. The output of the data detector is also connected to the spacer detector to eliminate delay hazards. Because of the restricted STT assignment, the circuit outputs are also uniquely determined by the coded input and the y variables which remain fixed during the 2 If any column of a flow table contains the transitions i->k and j--k, the set of state variables that remain fixed during the two transitions need not be the same for a general single transition time assignment. Hence, two implicants of some flip-flop excitation function may become one during a transition. However, the detectors will detect a stable state as soon as the first of the implicants becomes one, leading to a delay hazard.

ARMSTRONG et al.: DESIGN OF ASYNCHRONOUS CIRCUITS 2 3 4 2 3 4 II I2 I3 (Qo 2, 0 (i 4, ( 3, 0 1,0 2,0-0 I 1I 1,1 (a) XI X2 X3 000 001 0 10 100 ()OO (DIo 2,01 (D10 oo 4, 10 (DOI 3, 01 QOO 1, 01 2, 01 ()0 1 0O 10 c910 1, 10 YI Y2 Y3 10 1 0 0 1 1 1 0 (b) Fig. 7. (a) Original flow table. (b) Augmented flow table. transition. A hazard-free realization in which the outputs alternate between spacer and data is therefore possible. Example: Fig. 7 shows the original flow table and the augmented flow table in which the inputs and outputs are coded in 1/3 and 1/2 codes, respectively. The state assignment used is shown to the right of the augmented flow table. Note that this is a restricted STT assignment of the type discussed earlier and that all transitions leading to the same total state are distinguished from other transitions in that column by one state variable. For example, the transitions 1->2 and 3-*2 in column 12 (010 in the augmented table) have Y2=0, whereas the only other entry in that column has Y2 =1. The realization of the augmented flow table is given by the following equations where the upper case letters refer to the flip-flop excitation functions and the subscripts R and S the reset and set sides, respectively. YIS = X2 + X3yl y2s = x2y2 + X3Y1 Y3s = x2y2 + xly3 Y1R = Xl + X3yl Y2R = X-l+ X3Y1 + X2Y2 Y3R = X3 + X1Y3 + X2Y2 = X1y3 + X2y2 + X3y1 Z2 = X3y1 + X2y2 + X1y3- From the above equations, it can be readily verified that no more than one implicant of any function becomes one during any SD transition. The methods of realizing combinational circuits with alternating data sets discussed in Section V can be extended to sequential circuits. A restricted STT state assignment is used and flip-flops are used for the state variables. The state variable flip-flops are excited by one of the two combinational circuits, depending upon the input code, as in Fig. 4. An appropriate request signal is generated when the circuit outputs are stable and the state variable flip-flops are stable in the states to which they are excited. If the number of state variables required for a restricted STT assignment is greater than twice the number of variables required to merely distinguish the 1117 states, the number of state variable flip-flops needed can be reduced by the following method. Each combinational circuit is connected to a separate set of state variable flip-flops and the feedback wires are crossconnected. That is, the outputs of the state variable flip-flops connected to circuit A are fed back to the inputs of circuit B and vice versa. During any transition, one set of flip-flops is unexcited and remains unchanged, while the other set becomes excited to the next state determined by the inputs and the states of the unexcited flip-flops. Because of our line delay assumption, no critical races can occur and any state assignment which distinguishes between the internal states is sufficient for each half. The operation of this system is somewhat similar to a two-phase clocked system. The phase is determined by the input code instead of by a clock. The inputs are in turn controlled by completion signals. VII. INTERCONNECTION OF MODULES A digital system is not normally designed as a single large circuit, but is broken down into modules which are designed separately and then interconnected appropriately. We therefore wish to employ the circuits described in Sections V and VI as building block modules, and consider means for interconnecting them. We will confine our attention to circuits operating in the spacer-data mode. Modules operating in the alternating data mode can be interconnected in a similar manner. Two possible modes of operation can be envisaged. In the first mode, the modules do not have output registers for storing spacer or data. In a system comprised of storageless modules, only one data word can be processed at a time, thus this system would seem to be inherently slow. In the second mode, output storage is provided with each module, thus permitting several data words to be stored and processed simultaneously in different modules. Only the second mode will be considered. To implement it, some of the circuits described previously must be provided with output storage, and modifications to the spacer and data detection circuits are necessary, as discussed below. The networks described by Muller [8] and Hammel [7] are examples of systems operating in the second mode, using idealized storage elements. The following realizations are similar to those of Miller [2] in that no idealized storage elements are used. The ordinary setreset flip-flops used in our realizations contain delay hazards, which are eliminated by monitoring both outputs of each flip-flop and using them only when they are complementary. A combinational logic module with output storage is shown in Fig. 8. Double-rail data transmission between modules is required. The blocks labeled logic and data detector are designed in a manner identical to similarly labeled blocks in Fig. 3(e). The n wires which transmit

1118 IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1969 --o --~~R I - *,I _ COMBINATIONAL -< LOGIC SI RO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I Fig. 8. ith MODULE Combinational circuit module. the variables from the preceding module form the logic block inputs. The remaining n wires, which carry the complements of the variables, are inputs to the input spacer detector, which is an AND gate labeled ANDI. These wires are all 1 in the spacer state. Thus, ANDI generates a 1 when the spacer is present on the inputs and the succeeding module is requesting a spacer (RS = 1). The ANDI output is used to reset the n output flip-flops to the spacer state. However, the flip-flops will not become reset until the set inputs generated by the combinational logic became zero in response to the spacer input. The set side outputs from the flip-flops are monitored by the m/n data detector, the remaining outputs from the flip-flops are monitored by the output spacer detector, which is also an AND gate. The data and spacer detector outputs drive a flip-flop whose outputs labeled RS for request spacer and RD for request data are fed back to the preceding module. The purpose of this flip-flop is to eliminate the transient state (1, 1) on the (RS, RD) wires in order to avoid certain delay hazards. Note that the RD and RS wires from the succeeding module are fed back to enable the first-level AND gates in the logic block and the ANDI gate, respectively, in the ith module. If the outputs from the ith module feed several modules (hereafter referred to as the successors), the RD signals from the successors are used to enable the inputs to the ith logic block, and the RS wires from the successors are used to enable the ith ANDI gate. Similarly, if the ith module is fed by several modules (hereafter referred to as the predecessors), the wires which carry the complement of output variables from the predecessors are inputs to the ith ANDI gate. The above interconnection arrangement results in the following constraints on permissible changes of state in the ith module. 1) It cannot enter a data state unless the successors are in the spacer state and the predecessors are in data states. 2) It cannot enter the spacer state unless the successors are in data states and the predecessors are in the spacer state. The above constraints ensure that along any path in the network there will always be at least one module in the spacer state separating modules containing different (i.e., unrelated) data states. This makes it impossible for a data word to overtake and destroy a different data word when several words are being processed simultaneously in the network. A sequential circuit module with output storage, which is a modification of the circuit in Fig. 6, is shown in Fig. 9. In principle, larger digital systems can be realized by suitably interconnecting combinational and sequential circuit modules. However, it can be shown that under certain conditions, some such systems may enter "blocking conditions" and not operate any further [16]. VIII. PRACTICAL CONSIDERATIONS A. Complexity of Circuits and Fan-In and Fan-Out Restrictions When m/n codes are used, the combinational and data detector circuits contain an implicant (and therefore a first-level AND gate) for each distinct data word

ARMSTRONG et al.: DESIGN OF ASYNCHRONOUS CIRCUITS 1119 Fig. 9. Sequential circuit module. applied to their inputs.' The number of data words, such as may occur in a word-organized computer, for example, can be very large, thus resulting in an unfeasible number of first-level gates. Also, fan-out and fan-in indices may be extremely large for even moderate values of m and n. The use of fan-out amplifiers (which may have unbounded delays) to solve the fan-out problem requires introduction of additional detector circuits to avoid delay hazards, thus perpetuating the problem they were intended to solve. The use of factoring to reduce fan-in similarly requires additional detectors. The above problems can be reduced significantly by partitioning the uncoded variables into several small subsets and encoding each subset separately in a p/q code. The ultimate step in this direction is the autosynchronous code [9] (referred to in Section V), in which each bit is coded separately in 1/2 code. These codes which have also been used by Hammel [7] and Muller [8] have the added advantage that the data detectors are relatively simple. The only apparent advantage of the more general m/n codes is the smaller number of data lines required. In order to obtain an indication of the complexity of combinational logic when realized with the autosynchronous code, a 24-bit parallel ripple-carry adder was designed. The standard, uncoded adder consisting of 24 single-bit full adders was redesigned assuming that each carry and sum bit was coded autosynchronously. The design procedures of Section V were used, and a fan-in limitation of eight was assumed. The complexity of the coded adder (as measured by the total number of gate inputs in the circuit), including its two completion detectors, was only slightly more than twice that of the uncoded adder. 3A single threshold gate is sufficient for the data detector, if such a device with adequate fan-in is available. B. Relaxation of the Line Delay Assumption An analysis of the types of circuits discussed in Sections V and VI will reveal that, in order to eliminate delay hazards and ensure proper operation, it is sufficient for the line and gate delays to be such that any change of the signal at the output of any gate propagates to all its fan-out gates before the next change reaches them. Since the output of at least one gate in a detector circuit has to change in response to the first transition before the second transition can be initiated, the requirement that line delays be less than the minimum gate delay is clearly sufficient. The line delay restriction can be relaxed further in most cases, as shown below for the circuit of Fig. 3(e). In this circuit it must be ensured that all inputs to the NOR gate which are changing to 1 during an SD transition do in fact reach 1 before the next spacer is generated. This will be ensured provided the maximum delay in the lines from the circuit outputs to the spacer detector is less than the path delay through the data detector and the source. The reason is that signals must traverse this path before the next spacer can appear on the circuit inputs. The delays on the lines from the circuit inputs to the spacer detector need only be less than the path delay through the logic circuit, the detector and the source. C. Effect of Faults One practical reason that has been proposed for the generation of completion signals is that certain faults may cause the circuit to stop operating if they inhibit the generation of the completion signals. Thus, the system has a degree of self-checking. If we consider the class of faults in which some single gate input or output is stuck at one (s-a-1) or stuck at zero (s-a-0), then approximately half of the faults which may occur will

1120 Fig. 10. Model of spacer-data system. cause a circuit designed according to the procedure specified in this paper to stop operating. We will partially justify this conclusion by considering the basic circuit developed in Section V and reproduced in Fig. 10 where L1 is a 2 level AND-OR circuit as is the m/n detector. If any gate input or gate output in L1 is s-a-0, this will first propagate to the outputs of L1, causing some output of L1 to be 0 when it should be 1. This will prevent the generation of a data completion signal and the system will hang up. Similarly, if some OR gate input or output in L1 is s-a-1, then no spacer completion can be generated, and this malfunction will cause the circuit to halt. However, if an input to an AND gate in L1 is s-a-1, this may cause m+1 outputs to be 1 instead of rn. Since the m/n detector actually detects at least m out of n, this may cause a premature generation of a spacer request signal but the circuit will not stop. Similarly, s-a-1 on the NOR gate which generates the data request will hang up the circuit but s-a-0 will not. In the m/n detector the same classes of faults will cause the circuit to halt as in L1. In circuits containing flip-flops (e.g., sequential circuits), an additional degree of self-checking is obtained because of the fact that all flip-flops in a set are unexcited or all of them are excited. In the former case, any fault which causes any flip-flop input to be 1 when it should be 0 is easily detected. In the latter case, if any flip-flop input which should be 1 becomes 0, the flip-flop becomes unexcited and the completion signal is not generated. If, on the other hand, an input to a flip-flop which should be 0 becomes 1, both of its outputs will become zero and the completion signal will not be generated. IX. SUMMARY AND CONCLUSIONS We have presented procedures for realizing combinational and sequential circuits under the assumption that the gate delays are unbounded and the line delays are appropriately constrained, without making use of any ideal gates or ideal memory elements. We have pointed out the existence of delay hazards, which to our knowledge have not been explicitly recognized previously. Much of the complexity of our circuits results from the need to eliminate these hazards. Proper operation of the circuits presented here requires coding of the inputs and outputs. General m/n IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1969 codes are sufficient for the purpose, though the use of such codes may lead to difficult problems concerning fan-in, fan-out, and complexity in the logic. It is expected that the use of the autosynchronous code will lead to much simpler realizations than m/n codes. It was also shown that large systems can be realized by interconnecting modules, each of which generates its own completion signals. The modules may contain combinational or sequential circuits. The primary reason for designing circuits which operate correctly independent of the magnitudes of gate delays is to obtain increased speed. However, the (necessary) introduction of completion detectors adds roughly as much delay as is already present in the circuits they act upon. This partially offsets the anticipated speed gain, and it is not clear that a net gain will be achieved in all cases. Even if the speed gain is not significant, the circuits discussed in this paper may be useful because of their partial self-checking feature. ACKNOWLEDGMENT The authors are indebted to Prof. J. D. Ullman of Princeton University for his assistance in the proof of Theorem 2. REFERENCES [11 D. A. Huffman, "The synthesis of sequential switching circuits," J. Franklin Institute, vol. 257, pp. 161-190, 275-203, March and April 1954. [2] R. E. Miller, Switching Theory, vol. 2. New York: Wiley, 1965, chs. 9 and 10. [3] B.rGilchrist, J. H. Pomerene, and S. Y. Wong, "Fast carry logic for digital computers," IRE Trans. Electronic Computers, vol. EC-4, pp. 133-136, December 1955. [4] W. M. Waite, "The production of completion signals by asynchronous, iterative networks, " IEEE Trans. Electronic Computers, vol. EC-13, pp. 84-86, April 1964. [5] D. A. Huffman, "The design and use of hazard-free switching networks," J. A CM, vol. 4, pp. 47-62, 1957. [6] E. J. McCluskey, Jr., "Transients in combinational logic," in Redundancy Techniques for Computing Systems, Wilcox and Mann, Eds. Washington, D.C.: Spartan, 1962, pp. 9-46. [7] D. Hammel, "Ideas on asynchronous feedback networks," Proc. 5th Ann. Symp. on Switching Circuit Theory and Logical Design. Princeton University, November 11-13, 1964, pp. 4-11. [8] D. E. Muller, "Asynchronous logics and 'application to information processing," Proc. Symp. on Application of Switching Theory in Space Technology, Aiken and Main, Eds. Stanford, Calif.: Stanford University Press, 1963, pp. 289-297. [9] J. C. Sims, Jr. and H. J. Gray, "Design criteria for autosynchronous circuits, " Proc. Eastern Joint Computer Conf., Philadelphia, Pa., December 3-5, 1958, pp. 94-99. [10] B. Elspas, J. Goldberg, R. A. Short, and H. S. Stone, "Investigation of propagation-limited computer networks," Final Rept.- Phase II, Stanford Research Institute, July 1965. [11] D. B. Armstrong, A. D. Friedman, and P. R. Menon, "Design of asynchronous cricuits assuming unbounded gate delays," Bell Telephone Labs. internal memorandum (unpublished). [12] E. B. Eichelberger, "Sequential circuit synthesis using hazards and delays," Ph.D. dissertation, Princeton University, Princeton, N. J., March 1963. [13] C. N. Liu, "A state variable assignment method for asynchronous sequential switching circuits," J. A CM, vol. 10, pp. 209-216, April 1963. [14] A. D. Friedman, "Feedback in asynchronous sequential circuits," IEEE Trans. Electronic Computers, vol. EC-15, pp. 740-749, October 1966. [15] J. H. Tracey, "Internal state assignment for asynchronous sequential machines," IEEE Trans. Electronic Computers, vol. EC-15, pp. 551-650, August 1966. [16] A. D. Friedman and P. R. Menon, "Blocking conditions in asynchronous systems," Bell Telephone Labs. internal memorandum (unpublished).