# Information Theoretic Approach to Address Delay and Reliability in Long On-Chip Interconnects

Rohit Singhal Computer Science Texas A & M University Gwan Choi Electrical and Computer Engineering Texas A & M University Rabi Mahapatra Computer Science Texas A & M University

# ABSTRACT

With shrinking feature size and growing integration density in the Deep Sub-Micron technologies, the global buses are fast becoming the "weakest-links" in VLSI design. They have large delays and are error-prone. Especially, in systemon-chip (SoC) designs, where parallel interconnects run over large distances, the effects of crosstalk are detrimental to the overall system performance due to the large delays and un-reliability involved. This paper presents an information theoretic approach to address delay and reliability in long interconnects. A framework to calculate the capacity of a physical wire is laid out herein. The results for 8-bit wide buses of varying lengths in  $0.1\mu m$  technology are also presented. The wires are modeled based on their calculated parasitic (R,L,C) values and the coupling (C,L) parameters. Using this model, results are obtained for the data transfer capacity of long interconnects. It is seen that for wide buses, the signal delay distribution has a long tail, meaning that most signals arrive at the output much faster than the worst case delay. Using communication-theory, these "good" signals arriving early can be used to predict/correct the "few" signals arriving late. Further, results show that for every bus configuration, there exists an optimal frequency of transmission that will result in the maximum data transfer rate. Also, this optimal frequency is higher than the pessimistic worst case delay based clock design.

## 1. INTRODUCTION

Recent advances in the System-on-Chip (SoC) technology have given rise to complex communication systems between different modules on a chip. Even though the problems arising due to the size and magnitude of the logic have been alleviated by Moore's law, the high speed data transfer over long distances, poses difficult research problems. These long interconnecting wires are traditionally modeled as RC networks [1]. The signal delay and the energy consumed is determined by the product RC. The capacitance C in a wire is two-fold. First, there is a wire-to-substrate capacitance

ICCAD'06, November 5-9, 2006, San Jose, CA

Copyright 2006 ACM 1-59593-389-1/06/0011...\$5.00.

 $C_g$ . Secondly, there is an inter-wire capacitance called the coupling capacitance  $C_c$ . As the fabrication technologies advance into the deep sub-micron (DSM) region, the coupling capacitance  $C_c$ , becomes dominant compared to the capacitance  $C_g$ . Past research works have modeled the effects of this coupling capacitance [1,2]. The most important problem that has been identified is of "crosstalk", where adjacent signals interfere with each other resulting in errors and large delays.

Crosstalk due to inductive coupling has recently gained significance [3–5]. Each interconnecting wire has a Self-Inductance  $L_s$  and a Mutual-Inductance  $L_m$ , with its neighbors. With speeding clocks, the problems posed by inductive coupling are getting even more serious.

In addition to the lossy effects of R, L and C, the signal flowing through a wire is also affected by *Power Noise* and *Process Variations* [6]. Process variations result in imperfect wire dimensions, while power noise results in imperfect  $V_{dd}$  and ground values. Both result in errors at the receiver. While this paper includes the power noise as fluctuations in the driver voltages, it ignores the effects of process variations.



Figure 1: Schematics of Buffer Insertion and Coding Schemes.

Recent research has presented ideas to minimize the crosstalk effects through buffer insertion [7,8] while others present coding as a method to do this [9–13]. Other interesting techniques recently presented include the use of variable cycle design [14] and wave-pipelining [15] to speed-up the data transmission. [16] proposes a coding technique for off-chip interconnects to avoid the effects of inductive crosstalk.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

In [9] it is shown, using simulations, that coding is a better alternative to buffer insertion as it not only reduces wire-delays but also reduces the power consumption for data transmission. Another important advantage of coding over repeater/buffer insertion that is worth mentioning, is that buffers usually limit the data-flow in one direction, while coding allows the buses to operate in both directions.

A previous work [17] by the authors derives the Shannon's Information Theoretic capacity [18, 19] of interconnects, but like [9], it assumes only capacitive coupling. This paper extends [17] to calculate the wire capacity in the presence of both capacitive and inductive coupling and also power noise. Further, this work also compares the codes proposed/discussed in [9] against this information theoretic channel capacity. It has been shown that the coding schemes BSC and DAP [9] outperform the other coding schemes considered in [9].

The rest of this paper is organized as follows. Section 2 describes the modeling of on-chip interconnects, and coupling in a wide-bus. Section 3 includes discussion about the Shannon's capacity. Section 4 gives an analytical framework for calculating the capacity and provides empirical numerical results for data-carrying capacity of a wire in a  $0.01\mu$ m fabrication process. Finally, section 5 and 6 give the conclusions and acknowledgements respectively.

## 2. ON-CHIP INTERCONNECTS

## 2.1 Single Interconnect Model

The dimensions (width w, separation d, height h, and thickness t) of a global wire in the  $0.1\mu$  technology can be found in [6]. Based on these dimensions and the properties (permittivity  $\epsilon$  and permeability  $\mu$ ) of copper, the parasitics  $R, C_g$  and  $L_s$  can be calculated. The calculations for R are trivial.  $C_g$  is calculated using the parallel plate model.  $L_s$ is calculated using the equations in [20]. The assumption is that the wires are of constant width, height etc. throughout their entire length. This is reasonable in light of the fact that the wires being considered are expected to carry bidirectional data. Techniques like wire-tapering etc. will result in uneven characteristics between the two directions.

#### 2.2 Wide Bus Model

Two more parameters that are of interest in a wide bus are the coupling capacitance  $C_c$  and the mutual inductance  $L_m$ . While  $C_c$  can be calculated using the parallel plate model,  $L_m$  is calculated according to [20].

The small section of an *n*-bit wide bus can be described as a multi-input-multi-output (MIMO) system [17], the transfer function of which can be described in Laplace domain as an  $n \times n$  matrix  $F(s) = (I + L(s)C(s))^{-1}$ , where, the size of I, L(s) and C(s) is  $n \times n$ . L(s) defines the inductance and resistance while C(s) defines the capacitance of the wires.

Based on this model for a 8 bit wide bus in  $0.1\mu$ m technology, fig. 2 plots the histogram of signal propagation delay for a 1mm long wire. The long tail of the distribution is clearly visible. This means that most signals arrive much faster than the worst case, even without any data preprocessing. Also, the worst case delay increases as the square of length.

$$L(s) = \begin{bmatrix} A & B & 0 & \dots & 0 & 0 \\ D & A & B & \dots & 0 & 0 \\ 0 & D & A & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & A & B \\ 0 & 0 & 0 & \dots & D & A \end{bmatrix},$$
(1)

$$A = R + sL_s, \tag{2}$$
$$B = sI \tag{3}$$

$$D = SL_m, \tag{3}$$
$$D = -sL_m. \tag{4}$$

$$C(s) = \begin{bmatrix} Y & Z & 0 & \dots & 0 & 0 \\ Z & X & Z & \dots & 0 & 0 \\ 0 & Z & X & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & X & Z \\ 0 & 0 & 0 & \dots & Z & Y \end{bmatrix},$$
(5)  
$$X = s(2C_c + C_g),$$
(6)  
$$Y = s(C_c + C_g),$$
(7)

$$Y = s(C_g + C_c), \tag{1}$$

$$Z = -sC_c. \tag{6}$$



Figure 2: The Signal Delay Distribution of an 8 Bit Wide 1 mm Long Bus.

## 2.3 Coding Schemes for Wide Buses

As mentioned earlier, several coding schemes [9–13] have been proposed in the past. [9] makes the most important contribution in studying the area and power penalties of encoder/decoder designs. According to [9], the three main concerns for interconnects are

- The Delay Problem Large delays due to crosstalk.
- The Power Problem Due to  $C_q$  and  $C_c$ , and
- The Reliability Problem Errors due to DSM noise.

The authors in [9] propose a multi-stage encoding structure, where each coding stage conquers one of the above problems individually. For example the codes are built using a cascade of low power codes (LPC), crosstalk avoidance codes (CAC), and error correction codes (ECC). Intuitively, the slower the clock the more "settled-down" the signals are, and the faster the clock, the more the errors. This is because the signals will have more time to settle down to a nominal value in a slower clock. Based on this, clock speed can be treated as an additive noise that leads to errors. The magnitude of this noise is a function of the clock frequency. The delay problem, therefore is not different from the reliability problem in "clocked-buses" and should be combined. A unified coding approach that corrects errors should be sufficient to solve both the delay and reliability problems.

## 3. INFORMATION THEORETIC CAPACITY

## **3.1** A Few Definitions

#### 3.1.1 Self-Information

For any random event E, which occurs with a probability p(E), the self-information  $I(E) = -\log_2 p(E)$  defines the information conveyed in bits by the occurrence of that event. For example, the information conveyed by the occurrence of heads in an unbiased coin is 1 bit.

#### 3.1.2 Entropy and Conditional Entropy

The entropy H(s) of the system S of n random events  $E_1, E_2, \ldots, E_n$ , with probabilities of occurrence  $p_1, p_2, \ldots, p_n$  is the measure of the system's uncertainty and is given as

$$H(S) = \sum_{k=1}^{n} p_k I(E_k).$$
 (9)

The uncertainty in a system consisting of an unbiased coin with heads and tails as the events is 1 bit. The uncertainty in a system with a six faced dice is 2.58 bits.

The conditional entropy of a system  $S_1$ , given the outcome of a system  $S_2$ , is defined as

$$H(S_1|S_2) = -\sum_{j=1}^n \sum_{k=1}^m p_{jk} \log_2 p_{jk}/p_k$$
(10)

where  $p_{jk} = p(E_j \cap F_k)$ , and  $p_k = p(F_k)$ . *n* and *m* are the number of possible outcomes *E* and *F* of system  $S_1$  and  $S_2$ respectively. If system  $S_1$  is defined as a system with two coins, and system  $S_2$  is defined as one of the coins, then original uncertainty in the system  $S_1$  is  $H(S_1) = 2$  bits. Once the outcome of the system  $S_2$  is known, the remaining uncertainty in the system  $S_1$  is  $H(S_1|S_2) = 1$  bit.

#### **3.2** Capacity of Memory-less BSC Channels

For a communication channel, let the input and output be two different random systems  $S_1$  and  $S_2$ . The capacity is defined as the reduction in input uncertainty given the outputs [18] and is given as  $C = H(S_1) - H(S_1|S_2)$ . For the binary channel, when  $p_0 = p_1 = 1/2$ , this reduces to

$$C = 1 + p \log_2(p) + (1 - p) \log_2(1 - p)$$
(11)

where, p is the bit-error-rate of the channel.

## **3.3 Capacity of BSC with Memory**

The bit error rate in some channels is altered significantly, based on the past transmissions. Such channels, called the "Gilbert-Elliot Channels" [19] seem to have a multiple set of operating conditions or states. Consider a channel with *n* states  $A_1, A_2, \ldots, A_n$ , with each state having a bit-errorrate  $p_{ei}$  and a probability of the occurrence  $p_i$ . The capacity of each state  $C_i = 1 + p_{ei} \log_2(p_{ei}) + (1 - p_{ei}) \log_2(1 - p_{ei})$ and the overall [19] channel capacity is

$$C = \sum_{i=1}^{n} p_i C_i. \tag{12}$$

## 4. FRAMEWORK AND RESULTS

## 4.1 Interconnect States and Capacity

It is apparent from the "multi-modal" distribution in fig. 2 that an interconnect operates in several different distinct conditions. These conditions can be broadly classified into the states listed in table  $1^1$ . The reason why there are more states compared to [17] is that inductive coupling is not symmetric on the left and the right side like capacitive coupling. The probability of occurrence  $p_i$  of each state  $A_i$  is given in table 1. Let  $C_i$  be the capacity of the channel when in state  $A_i$ , then

$$C = \sum_{i=1}^{13} p_i C_i.$$
(13)

| State    | Left | Relevant | Right | Prob |
|----------|------|----------|-------|------|
|          | Wire | Wire     | Wire  |      |
| $A_1$    |      | 1        | Î ↑   | 1/32 |
|          | ↓    | ↓        | ↓     | ,    |
| $A_2$    | 1    | 1        | -     | 1/16 |
|          | ↓    | ↓ ↓      | -     |      |
| $A_3$    | 1    | 1        | Ļ     | 1/32 |
|          | ↓ ↓  | ↓ ↓      | 1     |      |
| $A_4$    | -    | ↑        | ↑     | 1/16 |
|          | -    | ↓        | ↓     |      |
| $A_5$    | -    | 1        | -     | 1/8  |
|          | -    | ↓        | -     |      |
| $A_6$    | -    | 1        | Ļ     | 1/16 |
|          | -    | ↓        | 1     |      |
| $A_7$    | Ļ    | 1        | 1     | 1/32 |
|          | ↑    | ↓        | ↓     |      |
| $A_8$    | Ļ    | 1        | -     | 1/16 |
|          | 1    | ↓        | -     |      |
| $A_9$    | Ļ    | 1        | Ļ     | 1/32 |
|          | 1    | ↓        | 1     |      |
| $A_{10}$ | 1    | -        | ↑     | 1/16 |
|          | ↓    | -        | ↓     |      |
| $A_{11}$ | ↑    | -        | -     | 1/4  |
|          | ↓    | -        | -     |      |
|          | -    | -        | ↑     |      |
|          | -    | -        | ↓     |      |
| $A_{12}$ | Î    | -        | Ļ     | 1/16 |
|          | ↓    | -        | ↑     |      |
| $A_{13}$ | -    | -        | -     | 1/8  |

Table 1: The Interconnect States

Noise power is a function of the clock frequency and the length of the wire. Fig. 3 plots the capacity (bits per clock period per wire) as a function of bus length and the clock frequency. It is seen that the capacity falls down as the clock

 $<sup>^1\</sup>uparrow$  means a 0 to 1 transition,  $\downarrow$  means a 1 to 0 transition, and - means no transition

rate or the length increases. Fig. 4 plots the capacity as bits per wire per second as a function of the bus length and the clock frequency. It is observed that a given bus has an optimal operation frequency at which the data rate is maximum. This optimal frequency of operation is much higher than the worst case pessimistic design. As an example, the worst case clock design for a 1mm long bus would be 17.5 GHz resulting in 17.5 Gbps data through all the wires. It is possible to drive the bus at 40 GHz to achieve up to 28 Gbps with a rate 0.7 code. A rate 0.7 code means that for every 7 bits of information, there are 3 parity bits to recover errors, resulting in a rate 7/10 = 0.7

Another important observation is that the optimal frequencies for different wire-lengths require different code rates. For example, running a 1mm long wire at the optimal frequency requires only a rate 0.7 Code while running a 5 mm wire at its optimal frequency requires a rate 0.18 code. In other words, the coding schemes for 1mm and 5mm buses require 50% and 500% overhead respectively. Though constructing rate 0.7 codes is fairly simple, constructing rate 0.2 code may prove tricky if not impossible. This may be a trade-off requiring operation at non-optimal frequencies at a higher rate code. Also, at a very high wire-length buffer insertion may be coupled with coding schemes to become an ideal option.



Figure 3: The Capacity of 8 Bit Wide Bus as a Function of Clock Frequency and Length

|  | 4.2 | Code 1 | Perf | ormance | Agai | inst C | Capacit | ty |
|--|-----|--------|------|---------|------|--------|---------|----|
|--|-----|--------|------|---------|------|--------|---------|----|

| Code     | CAC         | LPC | ECC         | Total<br>Extra<br>Wires |
|----------|-------------|-----|-------------|-------------------------|
| Hamming  | -           | -   | Hamming     | 3                       |
| HammingX | -           | -   | Hamming &   | 4                       |
|          |             |     | parity      |                         |
| BIH      | -           | BI  | Hamming     | 5                       |
| FTC+HC   | CAC [10]    |     | Hamming     | 10                      |
| BSC      | Shielding   | -   | -           | 5                       |
| DAP      | Duplication | -   | Parity      | 5                       |
| DAPX     | Duplication | -   | Parity with | 6                       |
|          |             |     | shielding   |                         |
| DAPBI    | Duplication | BI  | Parity      | 7                       |

 Table 2: The Code Construction [9] for a 4-bit wide

 bus

In [9] a wide variety of codes, with a variety of features



Figure 4: The Data Rate Limits of 8 Bit Wide Bus as a Function of Clock Frequency and Length

are proposed and are listed in table 2. All codes are consructed using a cascade of CAC and LPC followed by ECC codes. The various techniques used in the construction of these codes are BI <sup>2</sup>, Shielding <sup>3</sup>, and Duplication <sup>4</sup> and are explained in the footnotes. Speedups are only offered in codes with a CAC. The CAC reduces the worst case capacitive coupling from  $4C_c$  to  $2C_c$  resulting in a speedup of  $\frac{C_g + 4C_c}{C_g + 2C_c} = 1.88$  as for this paper,  $C_c = 4C_g$ . Table 3 list the properties of these codes. In the table, the data rates/wire is calculated by multiplying the reference frequency (of the no code system) by the speed-up and the code rate. Fig. 5 shows a comparison of all codes against the information capacity. It is seen that BSC and DAP are the best codes in terms of data rate v/s overhead comparison.

|          |      |       | 1mm  | 2mm  | 3mm  | 4mm  |
|----------|------|-------|------|------|------|------|
| Code     | Rate | Speed | d/w  | d/w  | d/w  | d/w  |
|          | [9]  | Up    | Gbps | Gbps | Gbps | Gbps |
| No Code  | 1    | 1     | 17.5 | 5.21 | 2.43 | 1.36 |
| Hamming  | 0.57 | 1     | 10.0 | 2.98 | 1.39 | 0.77 |
| HammX    | 0.5  | 1     | 8.77 | 2.60 | 1.22 | 0.68 |
| BIH      | 0.44 | 1     | 7.80 | 2.31 | 1.08 | 0.60 |
| FTC+HC   | 0.28 | 1.88  | 9.47 | 2.81 | 1.31 | 0.73 |
| BSC      | 0.44 | 1.88  | 14.7 | 4.37 | 2.04 | 1.14 |
| DAP      | 0.44 | 1.88  | 14.7 | 4.37 | 2.04 | 1.14 |
| DAPX     | 0.4  | 1.88  | 13.3 | 3.94 | 1.84 | 1.03 |
| DAPBI    | 0.36 | 1.88  | 12.1 | 3.58 | 1.67 | 0.93 |
| Capacity |      |       | 28   | 14   | 8    | 5    |

Table 3: The data per wire rates for different codes

#### 5. CONCLUSION

There are several techniques to counter the effects of crosstalk induced delays including buffer insertion and coding. Coding provides several advantages over buffer insertion including bi-directionality, signal speedups and power savings. Limits to coding gains are studied and it is observed that for

<sup>2</sup>Bus Invert Codes [13] for Low Power

 $<sup>^3 \</sup>mathrm{Insert}$  a Vdd or Gnd Wire between data lines, this is done for Crosstalk avoidance

<sup>&</sup>lt;sup>4</sup>Duplication results in crosstalk from only one neighbor



Figure 5: The Comparison of Various Codes against Capacity

a given bus length, there is an optimal frequency of operation that results in the maximum data transfer. The codes presented in [9] are compared against this theoretical limit. Even though the codes in [9] are sufficient to prove advantage over buffer/repeater insertion in wires, they perform far below the limits. This shows that there is a pathbreaking potential. If the limits are ever reached, they will result in high-speed, low-power, reliable data communication on SoC networks.

## 6. ACKNOWLEDGEMENTS

This authors would like to extend their thanks to the faculty and graduate students at Texas A&M University for their continued support in this research. The authors also want to thank NSF for their support.

## 7. REFERENCES

- H. Bakoglu, "Circuits, Interconnections, and Packaging for VLSI," Addison Wesley, 1990.
- [2] M. Takahashi, M. Hashimoto, H. Onodera, "Crosstalk noise estimation for generic RC trees," In Proc. ICCD, 2001, pp. 110-116.
- [3] X. Qi, B. Kleveland, Y. Zhiping, S. Wong, R. Dutton, T. Young, "On-chip inductance modeling of VLSI interconnects," *In Proc. Solid-State Circuits Conf.*, 2000, pp. 172-173.
- [4] S.H. Choi, K. Roy, "Noise analysis under capacitive and inductive coupling for high speed circuits," *In Proc. EDTA*, 2002, pp. 365-369.

- [5] Y. Tanji, H. Asai, "Closed-form expressions of distributed RLC interconnects for analysis of on-chip inductance effects," *In Proc. DAC*, 2004, pp. 810 - 813.
- [6] Y. Cao, P. Gupta, A.B. Kahng, D. Sylvester, J. Yang, "Design sensitivities to variability: extrapolations and assessments in nanometer VLSI," *In Proc. ASIC/SOC Conf.*, 2002, pp. 411-415.
- [7] D. Liang, M.D.F. Wong, "Buffer insertion under process variations for delay minimization," In Proc. ICCD, 2005, pp. 317 - 321.
- [8] Z. Li, C.N. Sze, C.J. Alpert, J. Hu, W. Shi, "Making fast buffer insertion even faster via approximation techniques," *In Proc. ASP-DAC*, 2005, vol. 1, pp. 13-18.
- [9] S.R. Sridhara, N.R. Shanbagh, "Coding for system on chip networks: A unified framework" *IEEE Trans. on VLSI*, 2005, vol. 13-6, pp. 655-667.
- [10] C. Duan, S.P. Khatri, "Exploiting crosstalk to speed up on-chip buses," In Proc. DATE, 2004, pp. 778-783.
- [11] M. Ghoneima, Y. Ismail, "Low power coupling-based encoding for on-chip buses," *In Proc. ISCAS*, 2004, pp. 325-328.
- [12] D. Bertozzi, L. Benini, B. Ricco, "Energy-efficient and reliable low-swing signaling for on-chip buses based on redundant coding," *In Proc. ISCAS*, 2002, pp. 93-96.
- [13] M.R. Stan, W.P. Burleson, "Bus-invert coding for low power I/O," *IEEE Trans. on VLSI*, 1995, vol. 3-1, pp. 49-58.
- [14] L. Li, N. Vijaykrishnan, M. Kandemir, M.J. Irwin, "A crosstalk aware interconnect with variable cycle transmission," *In Proc. DATE*, 2004, pp. 102-107.
- [15] L. Zhang, Y. Hu, C.P. Chen, "Wave-pipelined on-chip global interconnect," *In Proc. ASP-DAC*, 2005, pp. 127-132.
- [16] B.J. LaMeres, S.P. Khatri, "Encoding-based minimization of inductive crosstalk for off-chip data transmission," *In Proc. DATE*, 2005, pp. 1318-1323.
- [17] R. Singhal, G.S. Choi, R.N. Mahapatra, "Information Theoretic Capacity of Long On-chip Interconnects in the Presence of Crosstalk," *In Proc. ISQED* 2006, pp. 407-412.
- [18] R. G. Gallager, "Information Theory and Reliable Communication," New York: Wiley. 1968.
- [19] M. Mushkin, I. Bar-David, "Capacity and coding for the Gilbert-Elliot channels" *IEEE Trans. on Info. Theory*, 1989, vol. 35-6, pp. 1277-1290.
- [20] F.W. Grover, "Inductance Calculations, Working Formulas and Tables" New York: Dover 1962.