# Driver Pre-emphasis Techniques for On-Chip Global Buses

Liang Zhang, John Wilson, Rizwan Bashirullah<sup>\*</sup>, Lei Luo, Jian Xu, and Paul Franzon

Dept. of ECE, North Carolina State University, Raleigh, NC 27606 Dept. of ECE, University of Florida, Gainesville, FL 32611 {lzhang3,jmwilson,lluo3,jxu6,paulf}@ncsu.edu

\*rizwan@tec.ufl.edu

# ABSTRACT

By using current-sensing differential buses with driver preemphasis techniques, power dissipation is reduced by 26.0% -51.2% and peak current is reduced by 63.8%, compared to conventional repeater insertion techniques, for 10mm long buses in TSMC 0.25µm technology. This proposed architecture lowers the worst coupling capacitance to total capacitance ratio to 14.4%. It only requires 7.9% more bus routing area than single-ended designs for a 16-bit bus, and saves all of the repeater placement blockages. To further verify that the driver pre-emphasis techniques can also be applied to voltage-mode single-ended buses, a test chip in TSMC 0.18µm technology was fabricated and measured.

# **Categories and Subject Descriptors**

B.4.3 [Input/Output and Data Communications]: Interconnections (Subsystems) – *Topology (e.g., bus, point-to-point)*.

# **General Terms**

Performance and Design.

#### Keywords

Pre-emphasis, low-power, peak current, crosstalk, current sensing, on-chip bus, differential.

## **1. INTRODUCTION**

Power consumption and the delay/noise of global interconnects have become the two major factors in deciding how long CMOS can serve the world's need for intelligent devices and communication [1]. Unlike local or intermediate interconnects, global interconnects do not scale in length since they communicate signals across a chip [2]. Together with a lack of new process/materials based solutions for long interconnects, signaling design on global interconnects has become an increasingly difficult task for circuit and architecture designers.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED'05, August 8-10, 2005, San Diego, California, USA.

The conventional repeater insertion techniques have been effective at achieving lower latency and higher data throughput for on-chip RC dominated interconnects [3], [4]. However, it causes layout placement blockages to interrupt a line with repeaters. More importantly, the number of required repeaters increases as optimal repeater insertion spacing decreases with each technology node [5]. The power dissipation and delay latency associated with repeater themselves start to undermine the power/delay performance of global interconnects.

Several on-chip bus architectures have been reported to minimize the number of repeaters required. An adaptive bandwidth bus based on hybrid current/voltage mode repeaters was reported in [6], [7], but it requires pipeline latency to accommodate its computational data-paths, and its power saving is not significant for low data activity buses. Similar current sensing technique was used in [8] for a differential bus, but it consumes even more power and its power dissipation performance is worse than that of the tradition voltage-mode single-ended bus for data activity factors below 0.5. In other work [9], a low-swing differential interconnect architecture with distributed line equalization was proposed for global interconnects, but it increases the load of clock wires and the number of layout blockages.

In this paper, we propose a driver pre-emphasis architecture for on-chip buses based on transmitter equalization techniques used in chip-to-chip communication [10]. High frequency signal components are pre-emphasized at the driver to improve interconnect channel bandwidth and obtain higher data rates. The rest of paper is organized as follows. Section 2 describes how bandwidth is improved by using driver pre-emphasis techniques for on-chip RC interconnects. Section 3 describes the circuit used for current-sensing differential buses with driver pre-emphasis in TSMC  $0.25\mu$ m technology. To further verify that the proposed techniques, section 4 presents the measured results for a voltagemode single-ended bus with driver pre-emphasis in TSMC  $0.18\mu$ m technology. Section 5 concludes this work.

#### 2. DRIVER PRE-EMPHASIS

Fig. 1 shows the frequency responses of a 1cm long on-chip interconnect channel, a pre-emphasis equalizer, and their combination. Interconnects are modeled as distributed RC lines ( $R_0=240\Omega$ /cm,  $C_0=2.5$ pF/cm). Pre-emphasis techniques improve the system -3dB frequency from 0.5GHz to 1GHz for RC dominated interconnects. Therefore, driver pre-emphasis can compensate not only the frequency dependent attenuation of off-

Copyright 2005 ACM 1-59593-137-6/05/0008...\$5.00.

chip transmission lines [10], but also the diffusion of on-chip RC interconnects to achieve lower latency and higher data rate.

# 3. CURRENT SENSING DIFFERENTIAL BUS WITH DRIVER PRE-EMPHASIS

Current-mode (CM) signaling can be used to provide higher interconnect bandwidth when compared to the traditional full swing voltage-mode (VM) signaling, at the expense of increased DC power dissipation [11]. For the current-sensing CM circuit architecture shown in Fig. 2(a), a static current path always exists between the driver and receiver stages even if there is no data activity on the interconnect.

To compensate for this static current, we propose to use a pair of differential interconnects with a bridge resistor termination  $R_B$  (Fig. 2(b)). The static current is reduced by at least 50% due to the resistance increase on the current path. Because a virtual ground is set up in the middle of  $R_B$  with a voltage of Vdd/2, the system RC time constant is the same as that of a single line system. This architecture requires less CM static current and has all the advantages of differential signaling. Discussed later in section 3.1, we show that for a 16-bit bus this technique uses only 7.9% more bus routing area than the single-ended bus and requires none of the repeater area.

# 3.1 Circuit Design

Fig. 3 shows the driver and receiver circuit for a CM differential bus with driver pre-emphasis. Together with the single-ended to differential conversion circuit, a one-tap FIR filter and a simple DAC are used to reduce the driver power overhead. Minimumsize inverters are used for "invA" and "invB" to reduce static current and maintain a 100mV signal swing (200mV differential) at the receiver input for consecutive "1"s or "0"s.



Figure 1. Frequency responses of a distributed RC interconnect channel, pre-emphasis equalizer, and their combination.

Transistors P1/N1 and P2/N2 form two tri-state gates and are only turned on when there is a "0-1" or "1-0" transition. They are only 7x minimum size transistors. The benefits of the small drivers are small peak current and therefore reduce power supply noise. The peak current reduction shown later in Fig. 7 in section 3.2 proves this improvement.

Buffers "bufA" and "bufB" are placed to compensate for the data skew between their following inverter drivers and the tri-state gates. Data sequence does not need to be pipelined or delayed as in [7] before appearing at the bus input. Pre-emphasis is determined by every previous sent bit. Therefore, it does not introduce any extra clock-period of latency into the timing.

At the receiver side, an nmos transistor is used as the resistive termination. A differential pair using an active current mirror amplifies the 200mv differential signal swing and converts it to a single-ended output. Longer channel transistors are used in the receiver to compensate for input offset voltage. The power overhead of bias circuit is shared by 16-bit bus and is less than  $10\mu A$  per bit.



Figure 2. CM static current for (a) single-ended bus and (b) differential bus with bridge resistor termination.



Figure 3. Driver and receiver circuit for CM differential bus with driver pre-emphasis.

Bus layout for the 16-bit differential and single-ended buses are shown in Fig. 4. Metal-4 with  $0.8\mu$ m pitch-minimum (Pmin) in TSMC 0.25 $\mu$ m technology is used for signal lines. Every differential pair is drawn at minimum pitch with 0.4 $\mu$ m width and 0.4 $\mu$ m spacing. The pairs have a spacing of 2 $\mu$ m and therefore a pitch of 3.2 $\mu$ m, or 2xPmin per line. The lines are 10mm long with three meanders. Dummy layers of underlying metal-3 to metal-1 with 70% coverage are used to emulate a realistic chip environment. For clarity, neither the meanders nor the dummy layers are shown in this figure. One ground line at each side of the 16-bit bus is used to shield the low-swing signal. To run the single-ended full-swing bus at the same speed (1GHz) wider wires with 3xPmin are used and one Vdd/Gnd shielding line is inserted for each 4-bit bus to provide signal return path. Because each differential pair is driven by a pair of 7x tri-state gates and 1x inverters, a 16x driver is used for each bit of single-ended bus for fair comparison. Two repeaters, with equally sized drivers, need to be inserted into each 10mm long line. The proposed differential bus uses only 7.9% more bus routing area than the single-ended bus and it requires none of the active area needed for repeaters.

In the reference bench, the 3xPmin buses with two 16x repeaters are not optimized for power [12], but in this test case the total repeater capacitance is only 5% of the total line capacitance. Additional power optimization will not yield significant power improvement to challenge the validity of the power comparison results in section 3.2. 2xPmin or 1xPmin buses can be used to save the routing area of the reference bench, but that requires much more repeaters to meet the delay goal. Moreover, a smaller pitch can also be used in the proposed differential bus architecture by inserting one or two repeater with pre-emphasis. The proposed architecture always requires much less repeaters than the reference bench. The purpose of this work is to compare delay, power and noise performance based on similar bus routing area.

METAL<sup>TM</sup> from OEA [13] is used to extract the parasitic interconnect capacitance (Table 1). For the differential bus, the total capacitance per line is,

$$C_{tot} = C_a + C_{f1} + C_{f2} + 2 \times C_{diff} + CCM \times C_c \quad (1)$$

where Ca=.145pF/cm is the area capacitance to bottom layers,  $C_{f1}$ =.270pF/cm and  $C_{f2}$ =.094pF/cm are the two fringe capacitances, Cdiff=.806pF/cm is the coupling capacitance between one differential pair, (the multiplier of Cdiff is fixed at 2 for differential lines so that Cdiff is not counted as coupling capacitance,) Cc=.179pF/cm is the coupling capacitance from the neighbor differential pair lines, and CCM is the coupling capacitance multiplier factor, (CCM is 0 for transitions in the same directions,). The coupling capacitance to total capacitance (C<sub>c</sub>/C<sub>tot</sub>) ratios are 7.8% and 14.4% for CCM=1 and 2, respectively. This is a significant improvement from a coupling capacitance ratio of 50% in deep sub-micro technologies [14] and allows for more noise rejection and less data-dependent delay.

The  $C_c/C_{tot}$  reduction is the result of both the low-swing differential signaling [15] and the width/spacing configuration used in this work. If the similar configuration is used for the VM single-ended bus in the reference bench to achieve the same  $C_c/C_{tot}$  ratio, the reference bench will require much more repeaters and be delay and power uncompetitive. Besides, smaller spacing can be used in the proposed architecture to saves more bus routing area with reasonable increase in total capacitance and noise.

For single-ended bus, the total capacitance per line is,

$$C_{tot} = C_a + 2 \times C_f + CCM \times C_c \tag{2}$$

where Ca=.435pF/cm, C<sub>f</sub>=.283pF/cm, Cc=.393pF/cm, and CCM is 0, 1, 2, 3 or 4 because the two neighboring lines can transition in any direction. The worst case of coupling capacitance to total capacitance ratio is 61.2%, a huge degradation.



Figure 4. Differential and single-ended 16-bit bus structures, meanders and dummy underlying metal layers not shown.

Table 1. Parasitic capacitance for one interconnect line

|                           | Differential | Single-ended |
|---------------------------|--------------|--------------|
| Ca (pF/cm)                | .145         | .435         |
| C <sub>f1</sub> (pF/cm)   | .270         | .283         |
| C <sub>f2</sub> (pF/cm)   | .094         | .283         |
| C <sub>diff</sub> (pF/cm) | .806         | /            |
| Worst CCM                 | 2            | 4            |
| Cc (pF/cm)                | .179         | .393         |
| C <sub>tot</sub> (pF/cm)  | 2.48         | 2.57         |
| Coupling ratio            | 14.4%        | 61.2%        |

#### 3.2 Simulation Results

Fig. 5 shows the signal waveforms at the receiver input for the CM differential bus with driver pre-emphasis. All consecutive "1"s and "0"s are equalized by the pre-emphasis and a 200mV differential signal swing is achieved. Crosstalk is shown by transitioning the two neighbor pairs in various directions. Due to its 14.4% of coupling capacitance to total capacitance ratio, this bus structure has very good differential mode noise rejection on the  $2^{nd}$  and  $3^{rd}$  waveforms. 80mV common mode noise is observed on the bottom waveform while the two neighboring pairs couple the differential lines to the same direction. From 1V - 1.5V the common mode rejection ratio (CMRR) of the differential sense amplifier is 50 and is able to reject this 80mV noise. The coupling on the differential signal swing is always under 20% for any direction of transitions. This makes the twisting of differential wires unnecessary and avoids via resistance and complicating the bus layout.

Fig. 6 compares the power dissipation of one channel of currentsensing differential buses with driver pre-emphasis to full-swing VM single-ended buses with repeaters. At 1GHz, the proposed bus architecture reduces power by 26.0% to 51.2% for data activity factors above 0.2. It only consumes more power than the conventional bus architecture for data activity factors less than 0.1, due to its 0.52mA static current (Fig. 7). The peak current of these two bus architecture is also compared in Fig. 7. Due to its small drivers and signal swing, the CM differential bus reduces the peak current by 63.8% over that of the full-swing VM bus.



Figure 5. Signal waveforms at the receiver input with two neighboring pairs transitioning in various directions.



Figure 6. Power dissipation comparison at different data activity factors.

# 4. VM SINGLE-ENDED BUS WITH DRIVER PRE-EMPHASIS

Besides the current-sensing differential buses, driver pre-emphasis technique can also be applied to VM single-ended buses to minimize the number of the repeaters required. A test chip in TSMC 0.18 $\mu$ m technology was fabricated and measured to demonstrate this.

# 4.1 Circuit Design

Fig. 8 shows the driver circuit. Unlike the CM differential bus driver which emphasizes the high frequency signal components, the VM bus driver de-emphasizes low frequency part to reduce inter-symbol interference (ISI) and save power. All consecutive "1"s or "0"s are attenuated by one threshold voltage (Vth) at the driver output, Dout, by transistors N2/N3 and P2/P3. Transistors P1 and N1 provide full signal swing at Dout and are sized to produce a swing from Vth to Vdd-Vth at the receiver input, Rin. Transistors P2 and N2 are 2.5x of the minimum size and keep this voltage level. Fig. 9 shows the timing sketch of this circuit.

The photograph in Fig. 10 shows the portion of the TSMC  $0.18\mu m$  CMOS test chip used in this work. Meandered metal-4 lines with a length of 10mm and width of  $4.5\mu m$  were used. Simple buses with no repeater and buses with one repeater were included for comparison. The size of the drivers and repeaters in the comparison circuits are the same as P1 and N1 in the driver with pre-emphasis.



Figure 7. Peak current comparison of VM bus with repeaters and CM bus with pre-emphasis.



Figure 8. Driver with pre-emphasis for VM single-ended bus.



Meandered bus with repeaters (10mm)

Figure 10. Test chip photograph.



Figure 9. Timing sketch.

#### 4.2 Measurement Results

A 127-bit pseudo random binary sequence (PRBS) input was generated from an Agilent 81134A source. The eye diagrams at the receiver input are measured by a digital sampling oscilloscope (DSO) with infinite persistence display and are shown in Fig. 11, for the simple bus (top), the bus with one repeater (middle), and the VM bus using driver pre-emphasis (bottom). At 2GHz, the severe ISI on the simple bus results in eye closure. The repeater alleviates ISI by boosting the whole signal, while the driver preemphasis does this by attenuating the low-frequency signal components.

Both techniques, repeater insertion and driver pre-emphasis approaches increase bandwidth, but driver pre-emphasis saves power, with the trade-off being a lower signal swing. With an eye opening of 400mV, a simple inverter can be used as a receiver with negligible increase in static power. Unlike the voltage-mode low-swing schemes in [16], which generally sacrifice both noisemargin and bandwidth for power dissipation, this pre-emphasis technique improves bandwidth while trading off noise-margin due to reduction in voltage swing.

Vth variation also has an impact on noise margin. The DC points at both the driver output and the input are dependent on Vth. If Vth variation between the driver and receiver track each other, the DC points also track and there is no noise margin penalty. Only slow N and fast P at one side and fast N and slow P at the other side degrade noise margin. In this case, sense amplifiers are needed as receivers instead of simple inverters.



Figure 11. Eve diagram measurement at the receiver input for, simple bus (top), bus with repeater (middle), and VM bus with driver pre-emphasis (bottom).

Fig. 12 shows the power dissipation measurement for PRBS data at different frequencies. The simple bus does not work above 1GHz. The driver pre-emphasis bus decrease power consumption by up to 40% when compared to using repeaters.



Figure 12. Power dissipation measurement at different frequency with PRBS input.

#### 5. CONCLUTIONS

Driver pre-emphasis techniques were applied to both currentsensing differential buses and VM single-ended buses. For 10mm differential buses in TSMC 0.25 $\mu$ m technology, driver preemphasis decreased power dissipation by 26.0% - 51.2% and reduced peak current by 63.8%, compared to conventional repeater insertion techniques. For 10mm single-ended buses with driver pre-emphasis in TSMC 0.18 $\mu$ m technology, up to 40% power saving was measured.

#### 6. ACKNOWLEDGMENTS

The authors thank Dr. Stephen Mick, Evan Erickson, and Karthik Chandrasekar for discussions and thank Dr. Steve Lipa for wirebonding tutorial. This work is supported by NSF under CCR-9988334 and AFRL under F29601-03-3-0135.

#### 7. REFERENCES

 C. Hu, "CMOS for one more century?" Custom Integrated Circuits Conference, Keynote Speech, Oct 2004.

- [2] R. Ho, K. W. Mai, and M. A. Horowitz, "The future of wires," Proc. IEEE, vol. 89, no. 4, pp. 490-504, Apr 2001.
- [3] H. Bakoglu, *Circuits, Interconnections and Packaging for VLSI*, Addison-Wesley, 1990.
- [4] R. McInerney, et al., "Methodology for repeater insertion management in the RTL, layout, floorplan and fullchip timing databases of the Itanium microprocessor," ISPD Proc., pp. 99-104, Apr. 2000.
- [5] J. Cong, "An interconnect-centric design flow for nanometer technologies," Proc. of the IEEE, vol. 89, no. 4, pp. 505-528, Apr 2001.
- [6] R. Bashirullah, W. Liu, and R. Cavin, "Low-power design methodology for an on-chip bus with adaptive bandwidth capability," DAC, pp. 628-633, Jun 2003.
- [7] R. Bashirullah, et al., "A 16Gb/s adaptive banwidth on-chip bus based on hybrid current/voltage mode signaling," Symp. VLSI Circuits, pp. 392-393, Jun 2004.
- [8] D. Schinkel, et al., "A 3Gb/s/ch transceiver for RC-limited on-chip interconnects," ISSCC, pp. 386-387, Feb 2005.
- [9] R. Ho, K, Mai, and M. Horowitz, "Efficient on-chip global interconnects," Symp. VLSI Circuits, pp. 271-274, Jun 2003.
- [10] W. Dally and J. Poulton, *Digital Systems Engineering*, Cambridge Univ. Press, Cambridge, UK, 1997.
- [11] E. Seevinck, P. van Beers, and H. Ontrop, "Current-mode techniques for high-speed VLSI circuits with application to current sense amplifier for CMOS SRAM's," IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 525-536, April 1991.
- [12] K. Banerjee and A. Mehrotra, "A power-optimal repeater insertion methodology for global interconnects in nanometer designs," IEEE Trans. Electron Devices, vol. 49, no. 11, pp. 2001-2007, Nov 2002.
- [13] http://www.oea.com/document/metal.pdf
- [14] M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De, "Static pulsed bus for on-chip interconnects," Symp. VLSI Circuits, pp. 78-79, Jun 2002.
- [15] D. Sylvester and H. Kaul, "Power-driven challenges in nanometer design," IEEE Design & Test of Computers, vol. 18, issue. 6, pp. 12-21, Nov 2001.
- [16] H. Zhang, V. George, and M. Rabaey, "Low-swing on-chip signaling techniques: effectiveness and robustness," IEEE Trans. VLSI, vol. 8, no. 3, pp. 264-272, Jun 2000.