# Delay Modeling and Static Timing Analysis for MTCMOS Circuits 

Naoaki Ohkubo Kimiyoshi Usami<br>Graduate School of Engineering, Shibaura Institute of Technology

307 Fukasaku, Munuma-ku, Saitama, 337-8570 Japan
E-mail: \{m105021, usami\}@sic.shibaura-it.ac.jp


#### Abstract

One of the critical issues in MTCMOS design is how to estimate a circuit delay quickly. In this paper, we propose a delay modeling and static timing analysis (STA) methodology targeting at MTCMOS circuits. In the proposed method, we prepare a delay look-up table (LUT) consisting of the input slew, the output load capacitance, the virtual ground length, and a power-switch size. Using this LUT, we compute a circuit delay for each logic cell by applying the linear interpolation. Experimental results show that the proposed methodology enables to estimate the critical path delay in a good accuracy.


Key words: MTCMOS, Selective-MT, Delay, Static timing analysis, Leakage power, Interpolation.

## I. Background

As the transistor technology gets advanced, low-power design techniques become significantly important. In particular, leakage power reduction is strongly required in the LSI for portable information devices to prolong the battery life. In addition, high performance is also needed in the recent cell phones with multimedia capabilities.

Papers have been reported so far describing techniques to reduce standby leakage current while maintaining high performance. A Multiple-Threshold CMOS (MTCMOS) [1] is a well-known technique that efficiently reduces the standby leakage power. Figure 1 shows the circuit structure of an MTCMOS circuit. MTCMOS has low-Vth logic gates and hign-Vth power switches. These elements are connected each other by the wire named virtual ground (VGND) line. The low-Vth logic gates operate at high speed by turning on the power switch in the active mode, while in the standby mode high-Vth transistor cuts the leakage current by turning off the power switch.

As an improved MTCMOS methodology, the selective MT technique is presented in [2]. Selective MT applies MTCMOS technique only to the critical path and does not share the virtual ground line, as shown in Figure 2. Low-Vth gates are only applied in the critical path and each low-Vth gate is connected to the power switch individually. Since high-Vth gates are applied to the non-critical paths, it is also effective in the reduction of active leakage power. To downsize the


Figure 1: MTCMOS circuit


Figure 2: Selective MT technique
power switch area overhead in the selective MT design, an approach that shares the virtual ground line has been also proposed in [3]. This methodology can improve an area overhead and efficiency of leakage power reduction compared with the conventional selective MT technique.

Next, we discuss the problems in MTCMOS. One of the critical issues is the delay increase due to the voltage fluctuation of the virtual ground line [4]. When logic cells and flip-flops switch and discharge its output load capacitance, flowing the discharge current gets difficult because of the wire resistance of virtual ground and the resistance of power switches. These factors trigger the fluctuation of virtual ground and influence the circuit delay. The impact
affected by this event is different if the virtual ground is shared or not. In the shared design, circuit delay increases depending on the discharge pattern of the logic gates. In general, it is necessary to resize the power switch to avoid this delay increase. Therefore, in this case, the power-switch size needs to be optimized considering the gate discharge patterns and the delay of the entire gates sharing the same virtual ground. Meanwhile, the unshared design is able to avoid the delay increase caused by the discharge current of the other logic gates. Although this will enable to optimize the power-switch size independently, area overhead is larger than the shared design.

Due to the fact presented above, timing verification gets difficult in MTCMOS design. To guarantee the circuit timing, timing analysis is essential considering the delay increase due to the resistance of the virtual ground and the power switch. Conventionally, the delay look-up table (LUT) consisting of the input slew and the output load capacitance is generally used in the STA methodology.

However, in the design applying the MTCMOS technique, we are not able to estimate delay increase by using the conventional STA methodology because the virtual ground length and the power-switch size affect the fluctuation of the virtual ground. This makes MTCMOS circuit design more difficult.

In this paper, we propose a delay modeling and static timing analysis (STA) methodology targeting at MTCMOS circuits. The proposed method enables to estimate the critical path delay by simply extending the conventional STA methodology.

The rest of this paper is organized as follows: Section 2 presents the layout model and delay modeling in MTCMOS circuits. Section 3 presents the proposed STA methodology. Section 4 presents experimental results and Section 5 concludes the paper.

## II. Delay modeling for MTCMOS Circuits

## A. Physical Implementation of Cell-based Selective MT

This section describes the layout architecture of the selective MT. In this architecture, we assume a cell-based methodology that combines Low-Vth MTCMOS logic cells and high-Vth cells. The MTCMOS logic cell has an extra virtual ground pin in addition to the conventional input and output pins. The virtual ground pin of the MTCMOS logic cell and the power-switch cell are connected by the virtual ground line. The layout architecture based on this implementation is shown in Figure 3. The virtual ground length depends on where the power switch is placed. Since the virtual ground line has wire resistance and capacitance, it is necessary to model the virtual ground. Based on this RC model, we analyze the impact on the circuit delay.

## B. Delay Modeling of MTCMOS Circuits

We discuss factors that affect the MTCMOS circuit delay. Generally, a cell delay and an output slew are determined by an input slew and output load capacitance. However, MTCMOS has a virtual ground line and a power switch in addition to logic transistors in the conventional CMOS circuits. A wire resistance and a capacitance exist in virtual ground line and a channel resistance also exists in power switch. Those factors influence a circuit delay and an output slew especially when the output of the gate make a transition to "low". To analyze the impact of these factors, we model the MTCMOS circuit as shown in Figure 4. We investigate how the input slew, the output capacitance, the power-switch size, and the virtual ground length impact on the cell delay and output slew. We conducted analysis by using HSPICE simulations and the Toshiba 90 nm device models. The input and the output slew are defined as transition times from $10 \%$ to $90 \%$ of the VDD. The cell delay is defined as the time from $50 \%$ of input to $50 \%$ of output. We assume the virtual ground line has the same line width as other inter-cell wires, and model it using the $\pi$-type lumped RC circuit. As the number of $\pi$-ladders, we chose three. This number is based on the result of the pre-analysis showing that no significant difference was observed between 3 and 4 as the number of $\pi$-ladders [5]. We assume that the virtual ground is drawn by using two routing layers. We also assume the wire resistance and capacitance are the same between the two layers. As the


Figure 3: Layout architecture of cell-based selective MT


Figure 4: Circuit model for HSPICE simulation (NAND)


Figure 5: Influence of the delay and the output slew by the parameters (NAND 1x)


Figure 6: Delay increase due to the discharge within the cell (AND 8x)
default values, we chose the input slew of 1000ps, the load capacitance of 70 fF , the virtual ground length of $50 \mu \mathrm{~m}$, and the x 1 power-switch size. For example, when we investigate how the delay is affected by the input slew, we kept other three parameters as the default values. For the output slew, we analyzed in the same manner as the delay. Figure 5 shows the results of the analyses.

Results show that the delay and the output slew increase almost linearly with the parameters such as input slew, output capacitance and virtual ground. In addition, the cell delay and output slew increase linearly with inverse of power-switch size. In contrast, since the rise transition delay is the charging time for output capacitance, the influence from the virtual ground length and power-switch size is not significant.

However, the situation becomes different in gates such as AND or OR gate consisting of a NAND (or NOR) gate and an inverter inside the cell. For AND gates, when the output goes "high", the output of the internal NAND circuit goes "low". This means the discharge of capacitance occurs when the output of the gate is "high". Especially, since the large AND cell tends to have a large parasitic capacitance at the internal NAND circuit, the influence of this case is not


Figure 7: Two-dimensional interpolation example using LUT
negligible. We investigated this impact on 8 x size AND cell. The circuit delay increase with the virtual ground length and the inverse of power-switch size, as shown in Figure 6. The additional delay should be taken into consideration in the case of rise transition.

## III. Proposed STA Methodology for MTCMOS

## A. Proposed Delay Modeling

In the previous section, we proposed that the virtual ground length and the power-switch size influence the circuit delay, and that delay increases linearly with the virtual ground and the inverse of power-switch size. Based on this observation, we propose a novel delay modeling and STA methodology for MTCMOS design. The proposed method mainly targets at MTCMOS circuit that does not share the virtual ground line.

First, we present the conventional delay modeling. In the conventional delay modeling, the cell delay and the output slew are computed using LUT that consists of the input slew and the output capacitance. LUT has delay values corresponding to the discrete input slew and output capacitance. If given input slew and output capacitance are not in LUT, the circuit delay is estimated by interpolation using the values in LUT. For example, as shown in Figure 7, when the input slew is 900 ps and the output capacitance is 150 fF , the delay is computed from delay values corresponding to the two nearest input slew (i.e. 600 ps and 1200 ps ) and two nearest output capacitance (i.e. 100 fF and 200 fF ) through interpolation. The interpolation is performed by using the equation (1) below. For given input slew and output capacitance, two nearest values for them are picked up from the LUT and substituted for x and y in the equation (1). The corresponding cell delay is substituted for Z .

$$
\begin{equation*}
Z=A+B x+C y+D x y \tag{1}
\end{equation*}
$$

Since we obtain simultaneous equations for $\mathrm{A}, \mathrm{B}, \mathrm{C}$, and D , we solve them by applying the Gaussian elimination to obtain the coefficients A, B, C, and D. The same procedure is done for the interpolation of the output slew as well.

In the proposed methodology, the virtual ground length
and the inverse of power-switch size are added as the parameters. This is based on the fact that the cell delay and the output slew linearly increases with the virtual ground length and the inverse of power-switch size as previously presented in Section 2. This trend is well suited to the linear interpolation. In the proposed methodology, we employ LUT consisting of following parameters: the input slew, the output capacitance, the virtual ground length and the power-switch size. By using this LUT, we compute the cell delay and the output slew through the linear interpolation. Extending the equation (1) into the four-dimensional interpolation gives the equa- tion (2) as follows:

$$
Z=A+B v+C w+D x+E y+F v w+G v x+
$$

$$
\begin{equation*}
H v y+I w x+J w y+K x y+L v w x+M v w y+ \tag{2}
\end{equation*}
$$

$N v x y+O v w y+P v w x y$
For given input slew, output capacitance, virtual ground length and power-switch size, two nearest values are picked up from LUT and substituted for $\mathrm{v}, \mathrm{w}, \mathrm{x}$ and y in the equation (2). Then, we solve simultaneous equations to obtain the coefficients $A$ to $P$.

Although we extend to the four dimensions in the proposed methodology, there is the case that interpolation to only three dimensions is enough. In contrast to the parameters such as the input slew, the output capacitance, or virtual ground length, the power-switch size prepared in the cell library is discrete. If we prepare frequently used sizes in LUT, it is likely that we find the exact value for the power-switch size in the LUT. In this case, we do not need to interpolate the delay by using two nearest values for the power-switch size. In other words, the interpolation is performed by using three parameters of the input slew, the output capacitance and the virtual ground length. This results in reducing the dimension of the interpolation to three. It leads to making the simultaneous equations simpler.

In addition, preparing the LUT consists of four parameters for fall transition and preparing the LUT consists of the input slew and the output capacitance for rise transition is possible in which the gates does not discharge the parasitic capacitance of the internal circuit when the output goes "high". Except for AND or OR gates presented previously, the delay in the rise transition do not need the information of the virtual ground length and the power-switch size. This technique allows us to suppress the increase of the LUT size.

## B. Design Flow

We propose a design flow that uses STA based on the delay modeling presented in the previous section. STA is generally applied after the logic synthesis. In MTCMOS design, it is difficult to apply the STA including the information of virtual ground length and power-switch size after the logic synthesis stage. This is because these parameters are optimized after the place. We perform STA after the place using the information on the power-switch size and the virtual ground length. This STA enables us to optimize the power-switch size and location in the cell-based MTCMOS design.


Figure 8: MTCMOS design flow

## IV. Experimental results

## A. Application to a NAND Gate

We describe the application of the proposed STA methodology to the NAND gate. We prepared the LUT1 and LUT2 for the interpolation, as shown in Table 1. In LUT1, we provide the delay for equal step of parameter values. For example, delay values for every 600 ps of input slew are provided. In contrast, in LUT2, for smaller input slew we prepare the delay at fine steps, while for large input slew we prepare the delay at coarse steps. In this experiment, we examined the accuracy of this interpolation by using NAND circuits with various parameter values shown in Table 2. These parameter values were randomly generated within the compass of values that actually used in MTCMOS design. The objective is to examine if the proposed methodology enables to interpolate in a good accuracy under any con-

|  | LUT 1 |  |  |  | LUT 2 |  |  |  |
| :---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Input Slew(ps) | 10 | 600 | 1200 | 1800 | 10 | 400 | 900 | 1800 |
| Output capacitance(fF) | 5 | 100 | 200 | 300 | 5 | 50 | 120 | 300 |
| Virtual ground length $(\mu \mathrm{m})$ | 1 | 150 | 300 | 450 | 1 | 150 | 300 | 450 |
| Power switch size | x 1 | x 2 | x 4 | x 8 | x 1 | x 2 | x 4 | x 8 |

Table 1: Look up table

|  | Ex1 | Ex2 | Ex3 | Ex4 | Ex5 | Ex6 | Ex7 | Ex8 | Ex9 | Ex10 | Ex11 | Ex12 |
| :---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Input slew (ps) | 50 | 750 | 1600 | 230 | 610 | 1720 | 20 | 1010 | 1340 | 490 | 880 | 1130 |
| Output capacitance (fF) | 110 | 20 | 220 | 4 | 85 | 100 | 55 | 290 | 35 | 230 | 135 | 185 |
| VGND length ( $\mu \mathrm{m})$ | 380 | 22 | 75 | 320 | 120 | 440 | 95 | 160 | 310 | 200 | 140 | 190 |
| Power switch size | x 3 | x 1 | x 7 | x 9 | x 5 | x 1 | x 2 | x 3 | x 4 | x 2 | x 6 | x 8 |

Table 2: Circuit parameter examples for the interpolation (NAND)
dition. The comparison of interpolation results about the cell delay and the output slew with the SPICE simulation is shown in Figure 9.

Experimental results show that LUT2 allows us to interpolate at slightly better accuracy than LUT1, especially when the cell delay is small as shown in Figure 9 (c) and (d). This is because the delay does not increase linearly for small input slew and output capacitance values, as shown in Figure 5 (a) and (b). Since the delay is small for the small input slew and output capacitance, the relative error tends to become large. For these reasons, preparing delay values at fine steps for smaller input slew and output capacitance is preferable to obtain good accuracy.

Next, we describe the application of the proposed methodology to the critical path delay calculation. Using Synopsys Design Compiler, we synthesized a couple of modules with high-Vth cells of the Toshiba 90 nm technology. As the modules, we used a circuit "SAND" in the MCNC Benchmark [6] and a CPU control unit ("CONTROL") of the SH3-compatiple processor IP [7]. The number of gates of these circuits is 392 and 1766, respectively. We extracted the critical paths for these two circuits, as shown in Figure 10 and Figure 12. We replaced the high-Vth cells in the critical path with low-Vth MTCMOS logic cells. In addition, we connected them with each power-switch cell individually in the SPICE netlist. We modeled the virtual ground as $\pi$-type lumped circuit as previously illustrated in Figure 4. The values for output capacitance, virtual ground length and power-switch size were randomly generated, as the previously presented NAND gate. The input slew of the first gate is defined as 50 ps , while the input slew for the gates at later stages is interpolated by using the output slew of previous gate computed by this methodology. The result is shown in Figure 11 and Figure 13. The result is shown as the error between the SPICE simulation and the interpolation


Figure 9: Interpolation result in NAND cell
using LUT2.
To examine the effectiveness of this methodology, we also interpolate by using the real length for the virtual ground at the critical-path delay calculation of CONTROL. Using Synopsys Astro, we placed the CONTROL with the Toshiba 90 nm technology. We estimated the real virtual ground length using Manhattan distance from the placement result. Real virtual ground length is shown in Figure 12. Experimental result is shown in Figure 14. Since the power switch is placed near the MTCMOS logic cells, the virtual ground length tends to be short in unshared design.

In this experiment, we found that the proposed methodology allows to interpolate with approximately $8 \%$ error in a cell and $1 \%$ error in the critical path. This result is almost the same even in the case of giving the real virtual ground length. The big relative error would appear when the output capacitance is small. More detailed analysis will be needed about how fine we should prepare the LUT for small output capacitance.

In this experiment, we use the same LUT even when the cell size is different. Large cell tends to have large output capacitance. When the parameter values do not exist in the range of LUT, we compute the cell delay by applying the extrapolation. However, this technique is difficult to obtain good accuracy. Hence, preparing wide range for larger cell size whereas preparing the narrow range for smaller cell size will lead to improving the accuracy.


Figure 10: Critical path and parameters (SAND)


Figure 11: Interpolation result in critical path (SAND)


Figure 12: Critical path and parameters (CONTROL)


Figure 13: Interpolation result in critical path (CONTROL)


Figure 14: Interpolation result applying the measured virtual ground length (CONTROL)

## V. Conclusion and Future Work

In this paper, we have proposed the STA methodology targeting at MTCMOS. We described that the cell delay and the output slew linearly increase with the virtual ground length and the inverse of power-switch size. Based on this observation, we proposed a delay computation scheme to use the four-dimensional linear interpolation. We found that this scheme allows us to estimate the delay in a good accuracy. We showed that STA could be applied to the MTCMOS circuits by simply extending the parameters and the interpolation equations.
In the MTCMOS design, the design to share the virtual ground is also done to reduce the area overhead. In case of the shared design, the factors affecting the delay become more complex. This is because the discharge current of the cell overlaps the current of the other cells. How to further extend the proposed methodology to shared virtual ground cases is the future work.

## Acknowledgements

This work was supported by Toshiba Corporation. The authors would like to thank Mr. Masami Murakata and Mr. Takeshi Kitahara for their support and suggestions.

This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc.

## References

[1] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS," J. of Solid State Circuits, vol.30(8), pp.847-854, 1995.
[2] K. Usami, N. Kawabe, M. Koizumi, K. Seta, and T. Furusawa, "Automated Selective Multi-Threshold Design for Ultra-Low Standby Applications," in Proc. ISLPED, pp.202-206, August 2002.
[3] T. Kitahara, N. Kawabe, F. Minami, K. Seta, and T. Furusawa, "Area-efficient Selective Multi-Threshold CMOS Design Methodology for Standby Leakage Power Reduction," in Proc. DATE, pp.646-647, 2005.
[4] J. Kao, A. Chandrakasan, and D. Antoniadis, "Transistor Sizing Issues and Tools for Multi-Threshold CMOS Technology," in Proc. DAC, pp.409-414, 1997.
[5] K. Usami, N. Ohkubo, and M. Shirakawa, "Analysis on MTCMOS Circuit based on Lamped RC Model for Virtual Ground Line," in Proc. ISOCC, pp.116-119, 2005.
[6] "MCNC Benchmark" http://www.cbl.ncsu.edu/
[7] Y. Mitani, H. Uchida, T. Hironaka, J. MattauschHans, and T. Koide, "The Processor IP for Research with Software Development Environment," Technical Report of IEICE, VLD2001-109, pp.121-126, 2001. (in Japanese)

