Ruiming Li, Dian Zhou, Jin Liu

Department of Electrical Engineering School of Engineering and Computer Sciences Richardson, TX 75083 The University of Texas at Dallas {rl18,zhoud,jinliu}@utdallas.edu

## ABSTRACT

This paper studies the problems of minimizing power dissipation of an interconnect wire by simultaneously considering buffer insertion/sizing and wire sizing (BISWS). We consider two cases, namely minimizing power dissipation with optimal delay constraints, and minimizing power dissipation with a given delay penalty. We derive closed form optimal solutions for both cases. These closed form solutions can be used to efficiently estimate the power dissipation in the early stages of the VLSI designs. We observe that the power dissipation can be much different even with the same optimal delay.

# 1. INTRODUCTION

With the development of deep submicron (DSM) very large scale integrated (VLSI) designs, the interconnect delay becomes the dominant factor in the performance of these IC designs. In order to reduce the interconnect delay, many methods have been proposed such as parallel regeneration, driver and sink sizing, buffer insertion, wire sizing and simultaneously buffer insertion/sizing and wire sizing [1], [2], [3], [6], [7], [12], [14]. Some tutorials can be found in [6], [10], [11]. Since the interconnect delay is quadratically proportional to its wire length, buffer insertion is one of the most effective methods to decrease global interconnect delay. After buffer insertion, the long wires are broken into a lot of short wire segments and the optimal delay of a buffered wire is linear in its length [1]. With the technology scaling, it is expected that more and more buffers should be inserted for high-performance VLSI designs ( about 800,000 for 50nm technology estimated in [9]). The introduction of such a large number of buffers will increase the power consumption which exacerbates the power dissipation issue because of the technology scaling. A lot of references address the power dissipation issue during the buffer insertion. Lillis el al [15] presented optimal polynomial algorithms to minimize dynamic power dissipation while satisfying given timing

ICCAD'03, November 11-13, 2003, San Jose, California, USA.

Xuan Zeng\*

ASIC & System State-key Lab Microelectronics Department Fudan University Shanghai 200433, P. R. China xzeng@fudan.edu.cn

constraints for the wire sizing/buffer insertion problems. Turgis et al [16] discussed the buffer insertion limits and the application to the buffer design to satisfy minimum power delay product constraint. Zhou and Liu [19] considered the problem of minimizing the circuit area and power dissipation subject to delay constraints by buffer sizing. Nalamalpu and Burleson [17], and Adler and Friedman [18] also addressed the issue of optimizing buffer insertion and power dissipation. Banerjee and Mehrotra [4] discussed the power-optimal buffer insertion for global interconnects in Nanometer designs. They developed a methodology to compute the repeater size and interconnect length that minimizes the total interconnect power dissipation for a given delay penalty for a uniform long line. However, these discussions for the power dissipation in the buffer insertion either ignore the leakage power [15], [16], [19], [18], or ignore both the leakage and short-circuit power [17], [18], [19], or only consider the buffer sizing for a uniform line ignoring the wire sizing [4], [16], [17], [18], [19]. Furthermore most of these discussions were algorithmic, and did not provide closed form solutions. Chu and Wong [6] derived elegant closed form solutions to minimize the interconnect delay for simultaneous buffer insertion/sizing and wire sizing. They obtained closed form solutions for three versions. The first version is wire sizing (WS) without buffer insertion. The second version uses fixed number of buffers (version BISWS/m), and the third version uses the optimal number of buffers (version BISWS). They mathematically proved some frequently used optimal results such as the use of equal-length wire segments is optimal. But they did not analyze the power dissipation during the simultaneous buffer insertion/sizing and wire sizing. We notice that when the optimal delay for BISWS (BISWS/m) is achieved, the optimal solution is not unique. However, the power dissipation corresponding to different solutions which achieve the same optimal delay can be much different. We simulate the power dissipation corresponding to different optimal delay solutions given by [6] using SPICE3 for a wire with length= $4000 \mu m$ , ten wire segments, and one buffer (version BISWS/1). The driver and load are 50 times minimum device. The result is shown in Figure 1, from which we can see the maximum power dissipation is more than 3 times of its minimum with the same optimal delay. Since power dissipation becomes a serious issue in the DSM VLSI design, it is very important to minimize the power dissipation while satisfying delay constraints.

In this paper, we study the problem of minimizing power dissipation of an interconnect wire by simultaneously considering buffer insertion/sizing and wiring sizing (BISWS). We consider two cases, namely minimizing power dissipation with optimal de-

This research is supported by NSF grants CCR-0098275 and CCR-0306298

<sup>\*</sup>This research is supported partly by NSFC research project 60176017 and 90207002, Cross-Century Outstanding Scholars fund of Ministry of Education of China, National 863 plan projects 2002AA1Z1340 and 2002AA1Z1460, partly by Synopsys Inc., partly by the doctoral program foundation of Ministry of Education of China 2000024628, Shanghai Science and Technology committee project 01JC14014 and Shanghai AM R&D fund 0107, Science & Technology key project of Ministry of Education of China 02095.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 2003 ACM 1-58113-762-1/03/0011 ...\$5.00.



Figure 1: Power dissipation variation with optimal delay simulated by SPICE3 with  $0.18 \mu m$  technology

lay constraints, and minimizing power dissipation with a given delay penalty. Our main contributions are as follows, (1) we derive closed form optimal solutions to the problem of minimizing power dissipation while satisfying delay-optimal constraints for BISWS and BISWS/m. These closed form solutions can be used to efficiently estimate the power dissipation in the early stages of the VLSI designs. (2) we derive closed form optimal solutions to the problem of minimizing power dissipation without delay-optimal constraints but with delay penalty for BISWS and BISWS/m. This is very important in the sense that it is believed that the total power dissipation is excessive for the delay-optimal buffer insertion [4]. Our solutions provide efficient methods for the power-delay trade off.

The rest of the paper is organized as follows. In section 2, we give the preliminaries for BISWS (BISWS/m) and power dissipation. In section 3, we discuss the power dissipation minimization problems for BISWS (BISWS/m) with optimum delay. In section 4, we consider the power dissipation minimization problems for BISWS (BISWS/m) with delay penalty. Section 5 is the experimental results. Section 6 gives conclusion.

## 2. PRELIMINARIES

In this paper, we use the following notations of parameters, some of which are used in [6] or [4].

- $R_D$ : driver resistance
- $C_L$ : load capacitance
- $r_0$ : unit square wire resistance
- $c_0$ : unit area wire capacitance
- re: effective output resistance of a minimum device
- $c_q$ : gate capacitance of a minimum device
- $c_d$ : drain capacitance of a minimum device
- $V_{DD}$ : power supply voltage

*f*: clock frequency

 $I_{off_n}(I_{off_p})$ : leakage current per unit NMOS (PMOS) transistor width

 $W_{n_{min}}(W_{p_{min}})$ : width of the NMOS (PMOS) transistor in minimum sized inverter



Figure 2: Wire and Buffer model

We model a wire segment as a  $\pi$ -type RC circuit and a buffer as switch-level RC circuit as shown in Figure 2. Elmore delay model [13] is used to calculate the delay of a circuit.

# 2.1. Simultaneous Buffer Insertion/Sizing and Wire Sizing Problem

The simultaneous Buffer Insertion/Sizing and Wire Sizing problem (BISWS) [6] is defined as follows: given wire length L, the driver resistance  $R_D$ , the load capacitance  $C_L$ , and the total number of segments n to be used, minimize the delay D from source to sink over the number of used buffers m, the number of segments  $n_j$   $(0 \le j \le m)$  between the *j*th buffer and the (j + 1)th buffer (with  $n_0$  being the number of wire segments between the source and the first buffer and  $n_m$  being the number of segments between the last buffer and  $n_m$  being the number of segments between the last buffer and sink), the length  $l_i$  and the width  $h_i$   $(1 \le i \le n)$ of *i*th segment, and the size of *j*th buffer  $b_j$   $(0 \le j \le m)$ , with constraints  $n_0 + n_1 + \cdots + n_m = n$  and  $l_1 + l_2 + \cdots + l_n = L$ . For BISWS, Chu and Wong obtain closed form optimal solutions (THEOREM 3 in [6]). By their results, when

 $m = the \ better \ one \ of \lfloor \hat{m} \rfloor \ and \ \lceil \hat{m} \rceil,$  where

$$\hat{m} = \left( ln \frac{r_e c_g}{R_D C_L \hat{\beta}} \left( 1 + \frac{S\hat{\beta}}{2} - \sqrt{S\hat{\beta} + \left(\frac{S\hat{\beta}}{2}\right)^2} \right)^n \right) / ln\hat{\beta},$$
  
and  $\left(\frac{1}{e\hat{\beta}}\right)^{1/\hat{\beta}} = e^{c_d/c_g},$ 
(2.1)

 $n_j = an \ arbitrary \ nonnegative \ integer, \ for \ 0 \le j \le m,$ such that  $n_0 + n_1 + \dots + n_m = n,$ 

$$\begin{split} b_{j} &= \frac{r_{e}}{R_{D}} \frac{\alpha^{s_{j}}}{\beta^{j}}, for \ 1 \leq j \leq m, \\ l_{i} &= \frac{L}{n}, \\ h_{i} &= \sqrt{\frac{r_{0}C_{L}}{c_{0}R_{D}} \frac{\beta^{m}}{\alpha^{n-1}} \frac{\alpha^{i-1}}{\beta^{j}}}, for \ 1 \leq i \leq n, \\ & with \ j \ being \ the \ index \ such \ that \ s_{j} + 1 \leq i \leq s_{j+1}, \\ S &= \frac{r_{0}c_{0}L^{2}}{r_{e}c_{g}n^{2}}, \\ \beta &= \frac{(1-\alpha)^{2}}{S\alpha}, \end{split}$$

$$(2.2)$$

where

$$s_i = n_0 + n_1 + \dots + n_{i-1}, \tag{2.3}$$

and  $\alpha$  is the unique root between 0 and 1 of

$$g(\alpha) = \sqrt{\frac{r_e c_g}{R_D C_L}} S^{(m+1)/2} \alpha^{(n+m+1)/2} - (1-\alpha)^{m+1} = 0,$$
(2.4)

the optimum delay  $D_{opt}$  is given by

$$D_{opt} = mr_e c_d + \frac{r_0 c_0 L^2}{2n^2} \frac{n + 2(m+1)\alpha - n\alpha^2}{(1-\alpha)^2}.$$
 (2.5)

The problem of simultaneous Buffer Insertion/Sizing and Wire Sizing problem with m buffers (BISWS/m) is the same as BISWS except m is a given input. The solutions for BISWS/m are in the same forms as (2.2)-(2.5) except the value of m is given by input. In the following we simplify the expression of  $h_1$  for the latter use.

Finding  $h_1$  from (2.2), then substituting  $\beta$  and S in (2.2) into the expression of newly obtained  $h_1$ , we have

$$h_1 = \sqrt{\frac{r_0 C_L}{c_0 R_D} \frac{(1-\alpha)^{2m}}{\alpha^{n+m-1} S^m}}.$$
 (2.6)

From (2.4), we have,

$$\alpha^{n+m-1}S^m = \frac{R_D C_L (1-\alpha)^{2m+2}}{r_e c_g \alpha^2 S} = \frac{n^2 R_D C_L (1-\alpha)^{2m+2}}{r_0 c_0 L^2 \alpha^2}$$
(2.7)

substitute (2.7) into (2.6),

$$h_1 = \frac{\alpha r_0 L}{n R_D (1 - \alpha)}.$$
(2.8)

#### 2.2. Power Dissipation

The power dissipation of a buffer consists of switching power, leakage power and short-circuit power [5]. With the technology scaling, the leakage power and short-circuit power occupy a large percentage of total power dissipation. The total power dissipation is computed using the similar formulae in [4]. For detail, please see the reference.

# 3. OPTIMAL POWER DISSIPATION WITH OPTIMUM DELAY FOR BISWS (BISWS/M)

It is noticed that the optimal delay solution (2.1)-(2.5) for BISWS has no further restriction over the parameters  $n_j (0 \le j \le m)$ except  $n_0 + n_1 + \cdots + n_m = n$ . However the different assignments of  $n_j$  may affect the total power dissipation. As power dissipation becomes a serious issue in DSM IC designs, it is important to maintain power dissipation minimum while satisfying delay requirements. In the next, the power-optimal problems over  $n_0, n_1, \cdots n_m$ , satisfying the delay-optimal constraint for simultaneous buffer insertion/sizing and wire sizing problems are considered. We formulate the power-optimal BISWS/m (BISWS) with optimum delay problem named POBISWSOD/m (POBISWSOD) as follows.

Power-optimal Simultaneous Buffer Insertion/Sizing and Wire Sizing with *m* buffers satisfying Optimum Delay constraint problem (POBISWSOD/m): Given a BISWS/m problem, the objective is to minimize the total power dissipation over  $n_j(0 \le j \le m)$  satisfying (2.2)-(2.5).

**Power-optimal Simultaneous Buffer Insertion/Sizing and Wire Sizing with Optimum Delay problem (POBISWSOD)**: Given a BISWS problem, the objective is to minimize the total power dissipation over  $n_j (0 \le j \le m)$  satisfying (2.1)-(2.5).

# 3.1. POBISWSOD/m Problem

For a given BISWS/m problem, we first find its optimal solutions by (2.2)-(2.5), then compute the total power dissipation. The total power dissipation is the sum of switching power ( $P_{switch}$ ), shortcircuit power ( $P_{short}$ ) and leakage power ( $P_{leakage}$ ), i.e.

$$P_{total} = P_{switch} + P_{short} + P_{leakage}.$$

In the following, we compute  $P_{switch}$ ,  $P_{short}$ , and  $P_{leakage}$  using the similar basic formulae in [5, 4].

# 3.1.1. Switching Power

The switching power is computed as following,

$$P_{switch} = V_{DD}^{2} f \delta[c_{0}(l_{1}h_{1} + l_{2}h_{2} + \dots + l_{s_{1}}h_{s_{1}}) + c_{g}b_{1} + c_{d}b_{1} + c_{0}(l_{s_{1}+1}h_{s_{1}+1} + \dots + l_{s_{2}}h_{s_{2}}) + c_{g}b_{2} + \dots + c_{d}b_{m} + c_{0}(l_{s_{m}+1}h_{s_{m}+1} + \dots + l_{s_{m+1}}h_{s_{m+1}}) + C_{L}]$$

$$(3.1)$$

where  $\delta$  is the switching factor, which is the fraction of buffers on a chip that are switched during an average clock cycle.  $\delta$  can be 0.15 [5].

Substituting the parameters  $l_i$ ,  $h_i(1 \le i \le n)$  and  $b_j(1 \le j \le m)$  of (2.2) and  $h_1$  of (2.8) into (3.1), we have,

$$P_{switch} = \frac{V_{DD}^2 f \delta r_0 c_0 \alpha L^2}{n^2 R_D (1-\alpha)^2} \left[ \left( \frac{\alpha^{s_1}}{\beta} + \frac{\alpha^{s_2}}{\beta^2} + \dots + \frac{\alpha^{s_m}}{\beta^m} \right) - \left( \alpha^{s_1} + \frac{\alpha^{s_2}}{\beta} + \dots + \frac{\alpha^{s_m}}{\beta^{m-1}} \right) \right] \\ + \frac{V_{DD}^2 f \delta r_e}{R_D} (c_g + c_d) \left( \frac{\alpha^{s_1}}{\beta} + \frac{\alpha^{s_2}}{\beta^2} + \dots + \frac{\alpha^{s_m}}{\beta^m} \right) \\ + \frac{V_{DD}^2 f \delta r_c \alpha \Delta L^2}{n^2 R_D (1-\alpha)^2} - \frac{V_{DD}^2 f \delta r_c \alpha \Delta L^2}{n^2 R_D (1-\alpha)^2} \frac{\alpha^n}{\beta^m} + V_{DD}^2 f \delta C_L.$$

$$(3.2)$$

From (2.7) and the expression of  $\beta$  of (2.2), we obtain

$$\frac{\alpha^n}{\beta^m} = \frac{\alpha^{n+m} S^m}{(1-\alpha)^{2m}} = \frac{n^2 R_D C_L (1-\alpha)^2}{r_0 c_0 \alpha L^2}$$
(3.3)

Substitute (3.3) into (3.2),

$$P_{switch} = k_1 \left(\frac{\alpha^{s_1}}{\beta} + \frac{\alpha^{s_2}}{\beta^2} + \dots + \frac{\alpha^{s_m}}{\beta^m}\right) + k_2, \qquad (3.4)$$

where

$$k_{1} = \frac{V_{DD}^{2} f \delta r_{0} c_{0} \alpha L^{2}(1-\beta)}{n^{2} R_{D}(1-\alpha)^{2}} + \frac{V_{DD}^{2} f \delta r_{e}(c_{g}+c_{d})}{R_{D}},$$

$$k_{2} = \frac{V_{DD}^{2} f \delta r_{0} c_{0} \alpha L^{2}}{n^{2} R_{D}(1-\alpha)^{2}}.$$
(3.5)

#### 3.1.2. Leakage Power

Assume that  $I_{off_n} = I_{off_p}$  and  $W_{p_{min}} = 2W_{n_{min}}$  as in [4]. The leakage power is given by,

$$P_{leakage} = \frac{3}{2} V_{DD} I_{off_n} W_{n_{min}} (b_1 + b_2 + \dots + b_m)$$
  
$$= \frac{3 V_{DD} I_{off_n} W_{n_{min}} r_e}{2R_D} (\frac{\alpha^{s_1}}{\beta} + \frac{\alpha^{s_2}}{\beta^2} + \dots + \frac{\alpha^{s_m}}{\beta^m}).$$
(3.6)

#### 3.1.3. Short-circuit Power

Under the same assumption as in [4] that the input waveform is a single time-constant exponential and  $V_{t_n} = V_{t_p} = V_{DD}/4$ , where  $V_{t_n}$  and  $V_{t_p}$  are the threshold voltages. The short-circuit power is given by,

$$P_{short\_circuit} = \delta V_{DD} W_{n_{min}} I_{short\_circuit} f ln3$$

$$(b_1 \tau_0 + \dots + b_m \tau_{m-1}),$$

$$(3.7)$$

where  $\tau_i (0 \le i \le m - 1)$  is the delay between the *i*th buffer and (i + 1)th buffer (with  $\tau_0$  being the delay between the source and the first buffer). By Elmore delay [13] and Theorem 1 of [6],

$$\tau_0 = An_0 + B_0 \tau_i = An_i + B_1, \ 1 \le i \le m,$$
(3.8)

where

$$A = \frac{r_0 c_0 L^2 (1+\alpha)}{2n^2 (1-\alpha)}, \quad B_0 = \frac{\alpha r_0 c_0 L^2}{n^2 (1-\alpha)^2}, \quad B_1 = r_e c_d + B_0.$$
(3.9)

Substitute  $b_j (1 \le j \le m)$  of (2.2) and  $\tau_i (0 \le i \le m - 1)$  of (3.8) into (3.7),

$$P_{short\_circuit} = \frac{r_e \delta V_{DD} W_{n_{min}} I_{short\_circuit} fln3}{R_D} [\frac{\alpha^{s_1}}{\beta^2} (An_0 + B_0) + \frac{\alpha^{s_2}}{\beta^2} (An_1 + B_1) + \dots + \frac{\alpha^{s_m}}{\beta^m} (An_{m-1} + B_1)].$$
(3.10)

Combining (3.4), (3.6) and (3.10), we get,

$$P_{total} = k_3 \frac{\alpha^{s_1}}{\beta} + k_4 \left(\frac{\alpha^{s_2}}{\beta^2} + \dots + \frac{\alpha^{s_m}}{\beta^m}\right) + k_5 \left(\frac{\alpha^{s_1}}{\beta} n_0 + \dots + \frac{\alpha^{s_m}}{\beta^m} n_{m-1}\right) + k_2,$$
(3.11)

where

$$k_{3} = k_{1} + \frac{3V_{DD}I_{off_{n}}W_{n_{min}}r_{e}}{2R_{D}} + \frac{r_{e}\delta V_{DD}W_{n_{min}}I_{short\_circuit}fln3}{R_{D}}B_{0},$$

$$k_{4} = k_{1} + \frac{3V_{DD}I_{off_{n}}W_{n_{min}}r_{e}}{2R_{D}} + \frac{r_{e}\delta V_{DD}W_{n_{min}}I_{short\_circuit}fln3}{R_{D}}B_{1},$$

$$k_{5} = \frac{r_{e}\delta V_{DD}W_{n_{min}}I_{short\_circuit}fln3}{R_{D}}A.$$

$$(3.12)$$

By setting  $\frac{\partial P_{total}}{\partial n_i} = 0 \ (0 \le i \le m-1)$ , we have

$$n_{m-2} = \frac{1}{\beta ln\alpha} \alpha^{\frac{-k_4 - k_3 ln\alpha}{k_4 ln\alpha}} - \frac{k_4 + k_3 ln\alpha}{k_4 ln\alpha}.$$
 (3.13)

After some simplification from  $\frac{\partial P_{total}}{\partial n_i} = 0 \ (0 \le i \le m-1)$ , the following recursive relationships are obtained,

$$n_0 = \frac{1}{\beta l n \alpha} \alpha^{n_1} - \frac{k_5 + k_3 l n \alpha}{k_5 l n \alpha},$$
  

$$n_i = \frac{1}{\beta l n \alpha} \alpha^{n_{i+1}} - \frac{k_5 + k_4 l n \alpha}{k_5 l n \alpha}, \quad 1 \le i \le m-2, \quad (3.14)$$
  

$$n_{m-1} = \frac{-k_5 - k_4 l n \alpha}{k_5 l n \alpha}.$$

Note that (3.14) is the only critical point (at which all the partial derivatives are 0) of function  $P_{total}$  and it is a peak point. In the practice,  $k_5$  is much smaller than  $k_4$ . As  $\alpha$  and  $\beta$  are both between 0 and 1 [6], the values of expressions of  $n_i(0 \le i \le m - 1)$  of (3.14) are negative. However  $n_i$  is the number of wire segments between two buffers, it should be nonnegative integers between 0 and n. This means the maximum and minimum of  $P_{total}$  can only be achieved at the boundary points. It is easily seen that if  $k_5 << k_4$ , all the partial derivatives of  $P_{total}$  are negative in the feasible region of  $n_i(0 \le i \le m - 1)$  increases. Let  $p(n_0, n_1, \dots, n_{m-1}) = P_{total}$ . The following theorem is obtained,

**Theorem 1.** If  $0 \le n_i < \overline{n_i} \le n$ , and  $n_j (0 \le j \le m - 1, j \ne i)$  is fixed, then

$$p(n_0, \cdots, n_i, \cdots, n_{m-1}) > p(n_0, \cdots, \bar{n_i}, \cdots, n_{m-1}),$$
  
for  $0 \le i \le m-1.$  (3.15)

The following theorem can be derived under the assumption that  $k_4 >> k_5$ .

$$p(n_0, \dots, n_i, \dots, n_j, \dots, n_{m-1}) > p(n_0, \dots, n_i + 1, \dots, n_j - 1, \dots, n_{m-1}),$$
(3.16)  
for  $0 \le i \le j \le m - 1.$ 

**Proof.** From the expression of  $p(n_0, \dots, n_i, \dots, n_j, \dots, n_{m-1})$ ,

and noticing (2.3), we have

$$p(n_0, \cdots, n_i, \cdots, n_j, \cdots, n_{m-1}) - p(n_0, \cdots, n_i + 1, \cdots, n_j - 1, \cdots, n_{m-1}) = k_4(1 - \alpha)(\frac{\alpha^{s_{i+1}}}{\beta^{i+1}} + \frac{\alpha^{s_{i+2}}}{\beta^{i+2}} + \cdots + \frac{\alpha^{s_j}}{\beta^j}) + k_5(1 - \alpha)(n_{i+1}\frac{\alpha^{s_{i+2}}}{\beta^{i+2}} + \cdots + n_{j-1}\frac{\alpha^{s_j}}{\beta^j}) + k_5\frac{\alpha^{s_{j+1}}}{\beta^{j+1}} + [k_5n_i(1 - \alpha) + k_4 - (k_4 + k_5)\alpha]\frac{\alpha^{s_{i+1}}}{\beta^{i+1}}.$$

Since  $0 < \alpha < 1$  and  $k_4 >> k_5 > 0$ ,  $k_5 n_i(1 - \alpha) + k_4 - (k_4 + k_5)\alpha \ge 0$ , and all other terms in (3.17) are nonnegative. We have proved Theorem 2.

(3.17)

Combining Theorem 1 and Theorem 2, we have

**Theorem 3.** Given a simultaneous buffer insertion/sizing and wire sizing with m buffers (BISWS/m) problem, suppose its optimal delay is achieved, i.e. (2.2)-(2.5) are satisfied, the maximum and minimum power dissipation are at  $(n_0, n_1, \dots, n_{m-1}) = (0, 0, \dots, n)$  and  $(n_0, n_1, \dots, n_{m-1}) = (n, 0, \dots, 0)$ , respectively, where n is the total number of wire segments.

Theorem 3 tells us that when all the buffers are taped just before the load the power dissipation is optimal with the optimum delay for BISWS/m. This is because of the special property of BISWS. The wire size and buffer size are simultaneously changed when the buffer position is changed. Our SPICE simulations in section 5 also confirm this conclusion.

#### Remark

We need to point out that Theorem 3 provides us the optimal solution from the theoretical aspect. In practice, there may have some constraints such as the lower bounds for wire width of each segment. But we always can find the optimal solution by Theorem 1 and 2.

### 3.2. POBISWSOD Problem

For a given BISWS problem, we first find its optimal solutions by (2.1)-(2.5). After the value of m is fixed, the POBISWSOD problem becomes POBISWSOD/m problem which can be solved by the method of last part.

## 4. OPTIMAL POWER DISSIPATION WITH DELAY PENALTY FOR BISWS

As we mentioned before, to achieve the optimum delay, a large number of buffers need to be inserted with the technology scaling. It is believed that the total power dissipation of the optimum buffer insertion scheme can be excessive [4]. It is very important to study the optimum power dissipation with delay penalty during the buffer insertion. [4] addressed this problem for a uniform long line. In the next we consider the problems of optimizing total power dissipation under given delay penalty for BISWS.

Power-optimal Simultaneous Buffer Insertion/Sizing and Wire Sizing with Delay Penalty problem (POBISWSDP): Given a BISWS problem, and a delay penalty  $\rho(\rho > 1)$ , the objective is to minimize the total power dissipation satisfying that its delay is  $\rho D_{opt}$ , where  $D_{opt}$  is the BISWS optimum delay given by (2.5).

To solve POBISWSDP, we first solve the BISWS problem to find out the optimum delay  $D_{opt}$  of (2.5), then

$$D = \rho D_{opt}.\tag{4.1}$$

From (2.4), we get

$$\frac{r_e c_g}{R_D C_L} \alpha^n = \beta^{(m+1)}.$$
(4.2)



Figure 3: Power dissipation for different wire length with optimal delay simulated by SPICE3 with  $0.18\mu m$  technology

Hence

$$m = \frac{1}{ln\beta} \left( ln \frac{r_e c_g}{R_D C_L} + n ln\alpha \right) - 1. \tag{4.3}$$

Substituting m of (4.3) into (2.5), after simplification we have

$$\begin{pmatrix} rec_d + \frac{2\alpha}{(1-\alpha)^2} \end{pmatrix}^{\frac{\ln(r_e c_g) - \ln(R_D C_L)}{2\ln(1-\alpha) - \ln(S\alpha)}} \\ + \frac{r_{0c_0 L^2}}{2n^2} \frac{n(1-\alpha^2) + 2\alpha}{(1-\alpha)^2} - \frac{2\alpha}{(1-\alpha)^2} = D + r_e c_d,$$

$$(4.4)$$

where S is given in (2.2). We can obtain  $\alpha$  by numerically solving equation (4.4) using Newton-Raphson iterative method. Then substitute the value of  $\alpha$  into (4.3), the value of m is obtained.

Once m and  $\alpha$  are fixed, the POBISWSDP problem becomes POBISWSOD/m which can be solved by using the method of last section.

Similarly, we can solve the problem of Power-optimal Simultaneous Buffer Insertion/Sizing and Wire Sizing with m buffers with Delay Penalty problem (POBISWSDP/m).

# 5. EXPERIMENTAL RESULTS

In order to verify our theoretical results, SPICE simulation is carried out in this section. We use SPICE3 with the  $0.18\mu m$  technology for all the simulations. The value of the circuit parameters is from [4], which is based on ITRS [21] and FASTCAP [20] as shown at Table 1. The source driver and load we used are 100X of minimum device in the following simulations.

## 5.1. Simulation of POBISWSOD(POBISWSOD/m) problems

For the first simulation, we take m = 1, n = 10, and the total wire length  $L = 1500\mu m$ ,  $2500\mu m$ , and  $3500\mu m$ , respectively. The simulation result for power dissipation is shown in Figure 3, from which, we can see that the minimum power dissipation is achieved at the point  $(n_0, n_1) = (10, 0)$ , and the maximum achieved at the point  $(n_0, n_1) = (0, 10)$ . The maximum power dissipation can be 6 times of the minimum, and there is no much difference for the minimum power dissipation for different wire lengths while the



Figure 4: Power dissipation for BISWS with optimal delay simulated by SPICE3 with  $0.18\mu m$  technology. The minimum power is at  $(n_0, n_1, n_2) = (10, 0, 0)$  labeled with '\*'

Table 2: Power Dissipation for a wire with length  $15000\mu m$ , 10 segments

| m | delay | max power | min power |
|---|-------|-----------|-----------|
|   | (ns)  | (mW)      | (mW)      |
| 1 | 0.817 | 20.701    | 1.712     |
| 2 | 0.767 | 65.248    | 2.421     |

maximum power dissipation changes significantly. This indicates optimizing the power dissipation for BISWS is very necessary.

The second simulation is for a wire with length  $L = 2500\mu m$ , 10 segments and 2 buffers. When the number of wire segments between the source and the first buffer  $n_0 = 0, 1, \dots, 10$ , the power dissipation curves over the number of wire segments between the first buffer and the second buffer  $n_1$  are shown in Figure 4. This also verified our results of Theorem 1 and Theorem 2.

#### 5.2. Simulation of POBISWSDP

We simulate a wire with length  $15000\mu m$ , 10 segments. The number of buffers is m = 2 when the optimal delay is achieved. If the delay penalty is  $\rho = 1.06$ , m = 1 is the optimal solution by solving the POBISWSDP problem of section 4 and the minimum power dissipation can be saved as much as 25%. The result is shown at table 2.

#### 5.3. Comparison with uniform buffer insertion

In order to show the advantages of the simultaneous buffer insertion/sizing and wire sizing, we compare it with the traditional uniform buffer insertion (UBI) in which all the wire segments are with the same width. We simulate a wire with one buffer, ten segments for BISWS. For uniform buffer insertion the wire width is set to be the average width of all the segments in BISWS. The same buffer size is used for both UBI and BISWS. Table 3 shows the simulation results for wire length  $L = 1000\mu m, 2500\mu m, 5000\mu m$ .

Table 1: Parameters for  $0.18\mu$  m technology from [4] which is based on ITRS ([21]) and FASTCAP([20])

| $r_0(\Omega/\Box)$ | $c_0(\mu F/m^2)$ | $r_e(k\Omega)$ | $c_g(\mathbf{fF})$ | $c_d(\mathbf{fF})$ | $V_{DD}$ (V) | $I_{off_n}(\mu A/\mu m)$ | f(GHz) | $W_{min}(\mu m)$ |
|--------------------|------------------|----------------|--------------------|--------------------|--------------|--------------------------|--------|------------------|
| 0.0419             | 232.9            | 8              | 1.9                | 4.8                | 1.8          | 0.2                      | 1.2    | 0.18             |

Table 3: SPICE3 simulation for the comparison between the traditional uniform buffer insertion and BISWS for a wire with length  $L=1000\mu m$ ,  $2500\mu m$  and  $5000\mu m$  and the number of segments is 10 for BISWS.

| wire            | type  | buffer size       | delay  | power |
|-----------------|-------|-------------------|--------|-------|
| $length(\mu m)$ |       | (X of min device) | (ns)   | (mW)  |
| 1000            | UBI   | 71.30             | 0.0977 | 0.540 |
|                 | BISWS | 71.30             | 0.0954 | 0.439 |
| 2500            | UBI   | 49.47             | 0.1568 | 0.621 |
|                 | BISWS | 49.47             | 0.1432 | 0.409 |
| 5000            | UBI   | 32.18             | 0.3160 | 0.986 |
|                 | BISWS | 32.18             | 0.2409 | 0.598 |

From table 3, we can see the BISWS is better than UBI in terms of delay and power dissipation.

# 6. CONCLUSION

We have analyzed the optimal power dissipation problems for simultaneous buffer insertion/sizing and wire sizing with delay constraints. The closed form solutions to minimize total power dissipation with optimal delay or with delay penalty are derived for BISWS. These closed form solutions can be used to efficiently estimate the power dissipation in the early stages of the VLSI designs. We observe that the power dissipation can be much different even with the same optimal delay. SPICE simulations verify our theoretical results. The simulation results also show that the simultaneous buffer insertion/sizing and wiring sizing is better than traditional uniformed buffer insertion in terms of optimal delay and optimal power dissipation.

## 7. REFERENCES

- H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI. Reading, MA: Addision-Wesley, 1990.
- [2] H. B. Bakoglu and J. D. Meindl, Optimal interconnection circuits for VLSI, *IEEE Transaction on Electron Devices*, Vol. ED-32, pp. 903-909, May 1985.
- [3] C. Albert and A. Devgan, Wire segmenting for improved buffer insertion, Proc. ACM/IEEE DAC, pp. 588-593, 1997.
- [4] K.Banerjee, and A.Mehrotra, A power-optimal repeater insertion methodology for global interconnects in nanometer designs, *IEEE Transaction on Electron Devices*, Vol.49, No.11, Nov., 2002.
- [5] A.P.Chandrakasan and R.W.Brodersen, Sources of power consumption, in *Low Power Digital CMOS Design*. Norwell, MA: Kluwer,1995.
- [6] C.Chu, and D.F.Wong, Closed form solution to simultaneous buffer insertion/sizing and wire sizing, ACM Transactions on Design Automation of Electronic Systems, Vol.6, pp. 343-371, No.3, July 2001.

- [7] J. Cong and K. Leung, Optimal wire sizing under the distributed Elmore delay model, *Proc. IEEE ICCAD*, pp. 634-639, 1993.
- [8] J. Cong, T. Kong and Z. Pan, Buffer Block Planning for Interconnect Planning and Prediction, *IEEE Trans. VLSI*, VOL. 9, NO. 6, Dec 2001.
- [9] J. Cong. An interconnect-centric design flow for nanometer technologies, *Proc. IEEE*, vol. 89, pp. 505-528, Apr. 2001.
- [10] J. Cong, L. He, C.-K. Koh, and P. H. Madden, Performance optimization of VLSI interconnect layout, *Integration VLSI J.*, vol. 21, pp. 1-94, 1996.
- [11] J. Cong, L. He, K.-Y. Khoo, C.-K. Koh, and D. Z. Pan, Interconnect design for deep submicron ICs, Proc. Int. Conf. Computer-Aided Design, San Jose, CA, Nov. 1997, pp. 478-485.
- [12] S. Dhar and M.A. Franklin, Optimum buffer circuits for driving long uniform lines, *IEEE J. Solid-State Circuits*, Vol.26, pp. 32-40, Jan. 1991.
- [13] W.C.Elmore, The transient response of damped linear network with particular regard to wideband amplifiers. J. Appl. Phys. 19, pp. 55-63, 1948.
- [14] M. Nekili and Y. Savaria, Parallel regeneration of interconnections in VLSI & ULSI circuits, Proc. IEEE Int. Symp. Circuits and Systems, May 1993.
- [15] J. Lillis, C.-K. Cheng, and T. Lin, Optimal wire sizing and buffer insertion for low power and a generalized delay model, *IEEE/ACM ICCAD*, pp. 138-143, 1995.
- [16] S. Turgis, N. Azemard, and D. Auvergne, Design and selection of buffers for minimum power-delay product, *Proc.European Design and Test Conference*, pp. 224 -228,1996.
- [17] A. Nalamalpu and W. Burleson, A practical approach to DSM repeater insertion: Satisfying delay constraints while minimizing area and power, Proc. 14th Annu. IEEE Int. ASIC/SOC Conf., pp. 152-156,2001.
- [18] V. Adler and E. G. Friedman, Repeater design to reduce delay and power in resistive interconnect, *IEEE Trans. Circuits Syst. I*, vol. 45, pp. 607-616, May 1998.
- [19] D. Zhou and X. Liu, Minimization of chip size and power consumption of high-speed VLSI buffers, *Proc. Int. Symp.* on *Physical Design*, pp. 186-191, 1997.
- [20] K. Nabors and J. K. White, FASTCAP: A multipoleaccelerated 3-Dcapacitance extraction program, *IEEE Trans. Computer-Aided Design*, Vol. 10, pp. 1447-1459, Nov. 1991.
- [21] International Technology Roadmap for Semiconductors (ITRS), Semiconductor Industry Association, San Jose, CA, 1999.