# High-Performance Bidirectional Repeaters<sup>\*</sup>

S. Bobba and I. N. Hajj Coordinated Science Lab & ECE Dept. University of Illinois at Urbana-Champaign Urbana, Illinois 61801 E-mail: {bobba, i-hajj}@uiuc.edu

### Abstract

In this paper, we present high-performance bidirectional repeaters that recondition the signal waveform and reduce the signal degradation. We also present the application of these repeaters to the design of high-performance bidirectional busses. SPICE simulation results for long bidirectional interconnects show an almost linear increase in delay with repeaters compared to a quadratic increase in delay without repeaters. These repeaters are also applied to improve the performance of long AND domino gates. SPICE simulation results show a significant reduction in the delay of long AND domino gate with repeaters.

### 1 Introduction

A bidirectional repeater is a circuit that is capable of driving signals in either direction to improve the signal quality. In general, signal degradation occurs when signals pass through components that have significant parasitic resistance and capacitance. For instance, long on-chip interconnections can have significant resistance, capacitance and the delay increases quadratically with interconnect length [1]. In addition, the signal waveform at the receiving end of the long interconnect is severely degraded and the larger slew-rate (slope) can cause an increase in the short-circuit current of the subsequent logic gate(s). This performance limitation of interconnects can be avoided by limiting the length of the interconnects. If the signal flow is uni-directional, then uni-directional repeaters (buffers) can be inserted at appropriate locations along the long interconnect to break it into a few shorter segments [2]. This results in reduced delay for the combination of repeaters and segments when compared with the long interconnect. Furthermore, the slew-rate of the signal at the receiving end is smaller than the slew-rate with the long interconnect. In case of long interconnects that have to support bidirectional operation, bidirectional repeaters are required to minimize the signal degradation. In this paper, we present several bidirectional repeaters that reduce the delay and improve the signal quality.

A micro-architecture may require bidirectional and/or multiple source busses. This requirement avoids dedi-



Figure 1: Back-to-back connected tristate buffers with control signals

cated point-point busses that never have a concurrent signal transmission in both directions. For instance, data read and write to a data-cache requires movement of data to and from the cache. One obvious solution is to have two separate read and write data busses. But if the read and write operations never occur simultaneously, a single bus can be shared for both the operations. This requires a bus that can propagate signals in either direction. Since signal interconnects are inherently bidirectional, a single interconnect can be used between the tri-state driver and receiver to realize the bidirectional operation. Although this may be acceptable for short interconnects, for long interconnects the performance degradation can be significant. Hence, bidirectional repeaters are required to improve the performance of bidirectional busses.

A simple bidirectional repeater can be realized by back-to-back connected tristate buffers with a control signal. Fig. 1 shows an interconnect with an intermediate bidirectional repeater. Only one of the buffers in the bidirectional repeater is enabled when the control signal is LO or HI. Hence it provides the drive for the signal in a particular direction depending on the control signal. This technique requires the control signal routed to each tristate driver and in addition there are several problems with this method. The directional control signal must be clocked and set up before the signal is transmitted to avoid drive conflicts. When multiple repeaters are used, this gets further complicated because of the drive conflicts that can exist till all the repeaters receive the control signal. This technique can still be used for a bidirectional bus, but it can get extremely complex for a multiple source bus that has an arbitrary structure. A general multiple source bus may require selective activation of tristate drivers by the use of different control signals for different tristate drivers. The overhead of the directional control signal generation and set-up before the data signal transmission increases with

<sup>\*</sup>This work was supported by SRC under contract 98-DJ-109



Figure 2: Self-timed complementary regenerative feedback (CRF) repeater [3]

increasing number of tristate drivers. Although a control signal based tristate driver is conceptually simple, the disadvantages limit its application. Hence, we consider only techniques that do not require any explicit control signal(s).

In the absence of a directional control signal, the repeater must sense the transition and locally regenerate the appropriate signal values. This can be accomplished by the use of regenerative repeaters. The functionality of regenerative repeaters is described in Section 2. Fig 2 shows the self-timed complementary regenerative feedback repeater presented in [3]. The circuit shown in Fig. 2 is extremely sensitive to noise. This circuit has limited application to signal interconnects in a bidirectional bus because a small magnitude cross-talk noise can trigger spurious transitions. A self-timed transient sensitive accelerator for long resistive interconnects is proposed in [4]. The bidirectional repeaters we present in this paper differ from the existing repeaters in one or more of the following aspects: We present circuits that are simpler, use fewer transistors and still achieve identical performance. The proposed circuits can function correctly in the presence of cross-talk noise. In addition, we present repeaters for improving the performance of long AND domino gates.

In the next section, we describe the idea behind regenerative bidirectional repeaters and present the proposed circuits. In Section 3, we apply one of the circuits to the design of bidirectional busses. In Section 4, we present the application of the technique to the design of high-performance long-AND domino gates. Finally, we present the conclusions in Section 5.

## 2 Regenerative Bidirectional Repeater

Regenerative repeaters do not require any additional control signals to function correctly. This implies that they must interpret the signal variations on the interconnect and provide the appropriate drive to enhance the interconnect behavior. This requires that the repeater locally regenerate the appropriate signal transition and it can be attained using regenerative (or positive) feedback. The basic self-timed bidirectional repeater (RR1) using positive feedback is shown in Fig. 3a. The PMOS and NMOS transistor sizes for both the inverters I1 and



Figure 3: (a) Back to back connected inverters (RR1) (b) Static I-V plot



Figure 4: (a) Regenerative repeater with delayed negative feedback (RR2) (b) Static I-V plot

I2 are  $3\mu$  and  $1\mu$  respectively. The BSIM3 transistor parameters of a 1.5V  $0.18\mu$  process were used in all the SPICE simulations. Fig. 3b shows the static I-V plot for the simple regenerative repeater RR1. The static I-V plot is used only for a relative comparison of different regenerative repeaters and to demonstrate the different modes in the operation of the repeaters. From the plot, it can be seen that the circuit draws current and loads the signal driver till inverter *I1* switches. After that, the circuit switches into the regenerative mode and sources current. In [5], it was shown that the delay improvement due to RR1 is not significant because the feed-back circuitry improves the digital signal at the later part of the transient by deteriorating the signal at the beginning. This is because in the initial part of the transition, the circuit tries to hold on to the previous value and fights any change. There is a performance gain only when the improvement at the later part is greater than the degradation in the initial part of the transient. Hence, the performance of the regenerative repeaters can be enhanced by reducing the delay of the first inverter and ensuring that the circuit does not load the signal driver significantly. This will lead to reduced degradation in the initial part of the transient and result in an overall performance gain.

Delayed negative feedback can be used to make the driver a self-resetting high impedance driver. Consider the circuit shown in Fig. 4a. The combination



Figure 5: (a) Regenerative repeater with dual asymmetric input inverters (RR3) (b) Regenerative repeater with dual asymmetric input inverters and hold transistors (RR4)

of the positive and negative feedback ensures that in the steady state the output node of the driver is in a high impedance state. Hence, in the steady state the equivalent circuit of RR2 is just a capacitance. The transistor sizes in the first inverter of RR2 are identical to the transistor sizes in the first inverter of RR1. The sizes of transistors MP1, MP2, MN2, MN1 are  $9\mu$ ,  $6\mu$ ,  $2\mu$  and  $3\mu$  respectively. From the static I-V plot for RR2 (Fig. 4b), it can be seen that the current supplied by RR2 is 0 till I1 switches and after that it sources current. Reduced loading in RR2 improves its dynamic behavior when compared with RR1. The transistors in the self-resetting high impedance driver have been sized to account for the delay degradation of the stack structure. The transistors in the input inverter I1 can be sized to reduce the HL or LH transition delay of the inverter. Since, we want to ensure that the performance of the repeater is identical for both HL and LH transition on the signal interconnect, inverter I1 should have balanced P and N drive strength (symmetric).

The effect of reducing the HL and LH transition delay of the input inverter I1 can be emulated by using dual asymmetric input inverters. Fig 5a shows a regenerative repeater (RR3) with dual asymmetric input inverters. The transistor sizes are chosen to lower the switching threshold for both the LH and HL transitions. Hence, the repeater switches into the regenerative mode in shorter time reducing the signal degradation in the initial part of the signal transient. The transistors in the inverter  $I_n$  are sized to provide a fast HL transition  $(W_n = 1\mu, W_p = 0.6\mu)$  which results in a lowered switching threshold for the LH signal transition. The sizes of transistors in  $I_p$  are  $W_n = 0.24\mu$  and  $W_p = 3\mu$ which results in a lowered switching threshold for the HL signal transition. The transistor sizes in the selfresetting high impedance driver are identical in RR2 and RR3. From the Fig 6, it can be seen that the switching threshold is lowered because the circuit begins to source current at 0.5V instead of 0.75V.



Figure 6: Static I-V plot for RR3 and RR4



Figure 7: (a) Regenerative repeater with dual selfresetting high-impedance drivers (RR5) (b) Regenerative repeater with transmission gates (RR6)

Although RR3 has a better performance compared to other regenerative repeaters, it is highly sensitive to noise. The noise tolerance of RR3 can be enhanced by adding the hold transistors. Fig. 5b shows the regenerative repeater with dual asymmetric input inverters and hold transistors (RR4). The function of the hold transistors is identical to that of a keeper in a dynamic circuit. The hold transistors are small in size ( $W_{mnh} = 0.24\mu$ ,  $W_{mph} = 0.6\mu$ ) and do not load the signal driver significantly. Fig 6 shows the static I-V plot with the hold circuit and it can be seen that the load on the signal driver is nominal.

Two variants of the RR4 are shown in Fig 7. Fig 7a shows a dynamic realization (RR5) in which both the trigger and the driver circuit have delayed negative feedback. Since the internal node M and the node N are connected to the output of self-resetting high-impedance drivers, the hold circuit is necessary to enhance the noise tolerance of the circuit. Fig. 7b shows a transmission gate based realization of regenerative repeater (RR6). The delayed input signal is used as a control signal to selectively turn ON the pull-up circuit or the pull-down circuit. The static I-V plot for both these circuits is identical to RR4. The dynamic performance of RR4, RR5 and RR6 can be different. In this paper, we use RR4 and its modifications as they have fewer transistors and achieve similar performance when compared with repeaters RR5 and RR6.



Figure 8: Bidirectional signal interconnect with repeaters

#### 2.1 Limitations of Regenerative Repeaters

The key concerns in the design of regenerative repeaters are delay, noise sensitivity, area, power and meta-stability problems. These issues are explained in this subsection. As described before, a regenerative repeater achieves performance improvement of the later part of the transient at the cost of delay degradation in the initial part of the transient. Hence, it is useful only when the signal rise time at the node is slow. This can occur in cases such as long resistive wires and long stack of ON devices. In the next few sections, we show that there is a nominal delay penalty with the use of regenerative repeaters for short wires or smaller stack of devices. Also, enhancing noise-tolerance implies a performance degradation for the regenerative repeater. A comparison of the area and power requirements of long AND gates with and without regenerative repeaters is presented in Section 4.

The use of the feed-back structure in the regenerative repeaters can create some functional problems. Any even number of back-to-back connected inverters can get into a meta-stable state. A meta-stable state is an unstable operating point for the regenerative repeater. It cannot be determined apriori how often or for how long the circuit can get stuck in a meta-stable state. Hence, it becomes necessary to avoid conditions that create meta-stability and thereby guarantee correctness. Meta-stability can occur when there is a conflict between two regenerative repeaters. By ensuring that there is no conflict, the conditions that create meta-stability are removed for the applications presented in the following sections.

# 3 Application to Bidirectional Bus Design

In this section, we present simulation results for a bidirectional bus with interconnects of different length, different resistance (R), (C) values and multiple regenerative repeaters. Fig. 8 shows the schematic of the bidirectional signal interconnect with repeaters. The transistors in the signal driver and receiver are of the same size  $(W_n = 4\mu, W_p = 12\mu)$ . Two sets of R, C values are used:  $(90m\Omega/\mu m, 200aF/\mu m)$  and  $(150m\Omega/\mu m, 300aF/\mu m)$ . Interconnect lengths of  $250\mu$ ,  $500\mu$ ,  $750\mu$ ,  $1000\mu$ ,  $1500\mu$ ,  $2000\mu$ ,  $4000\mu$ ,  $6000\mu$ ,  $8000\mu$ ,  $10000\mu$ ,  $12000\mu$ ,  $14000\mu$ ,  $16000\mu$ ,  $18000\mu$ ,  $20000\mu$  were used. The delay is measured from the input of the signal driver to the output



Figure 9: Delay Vs Interconnect length with and without repeaters for the interconnect parameters: (a)  $90m\Omega/\mu m$  and  $200aF/\mu m$  (b)  $150m\Omega/\mu m$  and  $300aF/\mu m$ 

of the signal receiver. Since, we want to ensure bidirectional symmetry with respect to signal delay, only an odd number (1, 3, 5, 7) of repeaters are placed equidistantly along the interconnect. RR4 is used as the repeater and its transistor sizes are presented in the previous section. Fig. 9a and Fig. 9b show the plot of delay variation with interconnect length for the two sets of interconnect R, C parameters. Observe that the there is a significant performance improvement for long interconnects with the repeaters. Also, for shortinterconnects there is a nominal delay penalty with the repeaters. Without the repeaters, the delay almost increases quadratically with interconnect length. But with repeaters the delay can be approximated as a linear function of interconnect length. Notice that with the increasing number of repeaters, the curve *flattens* out and it also gives greater performance improvement for long interconnects.

We now present an example to show the impact of cross-talk noise on coupled signal interconnects with and without regenerative repeaters (RR4). An interconnect of length  $12000\mu$  was chosen with the same driver and receiver as in previous case. The interconnect parameters are: resistance  $(150m\Omega/\mu m)$ , line capacitance  $(120aF/\mu m)$  and coupling capacitance  $(180aF/\mu m)$ . The aggressor signal interconnect has a LH transition while the victim signal interconnect is held LO. Simulations were performed with no repeater and 7 repeaters placed at equal distances along the interconnect. Fig 10 shows the voltage waveforms at different nodes with and without the repeaters. Observe that the aggressor farend waveform with the repeaters is degraded in the initial part of the transient but is improved at the later part of the transient when compared with the voltage waveform without repeater. Therefore, the signal output of the aggressor with repeaters makes a sharper and earlier transition compared to the voltage waveform without repeater. The improvement in the later part of the aggressor far-end waveform with the repeaters implies a sharp voltage variation at nodes on the aggressor. A sharp



Figure 10: Voltage waveforms for coupled interconnects with and without repeaters

voltage transition on aggressor with repeaters could result in a significant cross-talk noise on the victim line. But the presence of hold transistors in RR4 helps reduce the cross-talk noise. Observe the cross-talk noise voltage waveform at the far-end of the victim line. Even with a faster switching aggressor, the peak cross-talk noise is comparable to the peak cross-talk noise for the slower switching aggressor without repeaters. The peak cross-talk noise can be further reduced with an associated delay penalty by sizing up the hold transistors. Also note that the area under the curve for the cross-talk noise waveform with repeaters is much smaller than the area under the curve for the cross-talk noise waveform without repeaters. This implies that the effective RC of the victim line with repeaters is much smaller than the effective RC of victim line without repeaters.

# 4 Application to long AND Domino gates

A long AND gate can be used in the realization of a low power AND based row decoders [1]. In a AND based implementation, only one row in the array is pulled down in a cycle. In comparison, all but one row are pulled down in an OR based row decoder. But, an OR based decoder is much faster than an AND based decoder. The long stack of transistors in a long AND gate severely degrades its performance. In general, the stack height is restricted to 3 or 4 transistors. This implies that multiple stages of AND gates with a smaller stack size have to be used to realize a long AND gate. There are several limitations of a multiple stage realization. It results in an increase in the area and the number of transistors. The power dissipation of a multiple stage AND gate is greater than the power dissipation of a single long AND gate. For instance, consider a 9 input AND gate realized using 3 sets of 3 input AND gates at first stage and a second stage 3 input AND gate that combines the previous stage outputs. Although the transition activity of the second stage 3 input AND is the same as the transition activity of the long 9 input AND gate, there exist some input patterns that can make one or two of the



Figure 11: Long AND domino gate with active pull-up and pull-down circuits

first stage AND gates to switch without switching the final output. Hence, the first level AND gates can switch more often leading to an increase in the power dissipation. Also, a multi-stage AND has a higher clock load and an associated increase in power dissipation. Hence, from a low-power point of view one may still want to implement a long AND domino gate. The question we address is: Is it possible to enhance the performance of a long AND gate?

A long AND structure is avoided because of the increased time to precharge and evaluate. In addition, the delay in the evaluate stage is further degraded due to the conflict between the weak long AND stack and the keeper. Consider a long AND domino gate (Fig. 11) with all inputs HI. Observe that for a long AND domino gate, the pre-charge is from the top of the stack and the discharge is from the bottom of the stack. Hence, the nMOS transistors in the stack are pulled down from one end and pulled up from the other end. The nMOS transistors in the stack respond to signaling from both directions. This bidirectional signaling can be made faster by using bidirectional repeaters. This can be achieved using an active pull-up repeater at the bottom of the stack to speed up the precharge phase and an active pull-down at the top of the stack to speed up the evaluate phase. In essence, the stack can be pulled up or down from both ends resulting in better performance. Fig. 11 shows the active pull-up and pull-down circuits. The parameters for the transistors are kept the same as in RR3 (Fig. 5a).

SPICE simulations with different number of inputs and keeper sizes for the long AND gate, with and without the repeaters were performed. In both cases, all the transistors in the stack were chosen to have the same size  $(W = 1\mu)$ . The pre-charge transistor and the output inverter are identical in both cases. The delay is measured in the evaluate phase from clock input at the bottom nMOS transistor to the output with all the nMOS transistors in the stack turned ON. Fig 12a shows the results for keeper size  $(W = 0.24\mu)$  and Fig 12b shows the re-



Figure 12: Delay variation with number of inputs with and without repeaters for keeper sizes (a)  $W = 0.24\mu$ (b)  $W = 0.5\mu$ 

sults for keeper size  $(W = 0.5\mu)$ . From the graphs, it can be seen that the delay increases rapidly without the repeaters. With the repeaters, there is only a nominal increase in delay and it is approximately a linear function of the number of inputs. Therefore, high performance long AND gates can be realized using the repeaters.

There are several alternate circuit configurations that improve the performance of long AND domino gates. It is possible to use the clock to enable the active pull-up and pull-down circuits in Fig. 11 instead of using delayed negative feedback. The limitation of this approach is the increased clock load. It is also possible to use additional pre-charge devices and introduce uni-directional buffers in the stack. The limitation of this approach is the increased clock load and the increased delay due to the buffers on the critical path.

#### 5 Summary

In this paper, we presented several regenerative repeaters for bidirectional operation and a description of their advantages and limitations. The repeaters were applied to improve the performance of bidirectional signal interconnects. Simulation results show that a quadratic increase in delay without repeaters can be reduced to an almost linear increase in delay with repeaters. The repeaters were also applied to improve the performance of long AND domino gates. Simulation results show significant delay improvement for long AND domino gates with repeaters.

#### References

- J. M. Rabaey, Digital integrated circuits. Upper Saddle River, NJ: Prentice Hall, 1996.
- [2] C. J. Alpert, A. Devgan, and S. T. Quay, "Buffer insertion for noise and delay optimization," in *Proc.* of DAC, pp. 362–367, 1998.
- [3] I. Dobbelaere, M. Horowitz, and A. El Gamal, "Regenerative feedback repeaters for programmable interconnections," in *Proc. of ISSCC*, pp. 116–117, 1995.

- [4] T. Iima, M. Mizuno, T. Horiuchi, and M. Yamashina, "Capacitive coupling immune, transient sensitive accelerator for resistive interconnect signals of subquarter micron ULSI," *IEEE Journal of solidstate circuits*, vol. 31, no. 4, pp. 531–536, April 1996.
- [5] M. Shoji, Theory of CMOS digital circuits and circuit failures. Princeton, NJ: Princeton University Press, 1992.