# Gate-Level Current Waveform Simulation of CMOS Integrated Circuits

Alessandro Bogliolo

Luca Benini

Giovanni De Micheli

Bruno Riccò †

CSL - Stanford University Stanford, CA-94305-9030

### Abstract

We present a new gate-level approach to current simulation. We use a symbolic model of current pulses that takes accurately into account the dependence on the switching conditions. We then construct current waveforms during eventdriven logic simulation by means of pulse composition. We obtain satisfying accuracy on time-domain current waveforms and on peak current estimates, while maintaining performance comparable with traditional gate-level simulation.

### 1 Introduction

Large currents flowing in power and ground routes adversely affect the reliability of VLSI systems. In particular, while the overall chip temperature is related to the average supply current, voltage drops and electromigration cannot be evaluated without having accurate estimates of the instantaneous and maximum currents flowing through power and ground routes (*i.e.*, without having a detailed knowledge of timedomain current waveforms). In the last few years, *patternindependent* and *pattern-dependent* approaches have been developed to provide fast and accurate current estimates.

Pattern independent techniques exploit the concepts of probability waveform [1, 2] and maximum envelope current [3] in order to obtain current estimates directly from the input statistics. The accuracy of pattern independent estimates is impaired by the simplifying assumption that the current and the delay of each gate do not depend on its input signals. Moreover, the propagation of probability waveforms and maximum envelope currents is pattern independent only if the input signals are uncorrelated, while partial enumeration techniques are required otherwise [3].

Pattern dependent techniques are essentially based on the following observation: the current drawn by a complex CMOS gate for any given input transition has almost the same behavior of the current drawn by an elementary gate with the same driving capability and switching capacitance. Hence, only a small set of reference gates actually need to be characterized, while any other gate simply needs to be reduced into the equivalent elementary one whenever a transition occurs at its inputs [4, 10, 5, 6, 7, 8, 9]. These techniques provide good approximations of the current behavior of gates with single input transitions, but they lose accuracy when †DEIS - University of Bologna Bologna, I-40136

dealing with internal charge redistributions, input glitches and misaligned multiple transitions. Accurate transistorlevel estimates of supply currents and voltage drops are also provided by commercial tools for power simulation and analysis [11].

To overcome most of the above mentioned problems, we developed a new logic-level simulator that provides accurate current waveforms. Starting from the approach proposed in [12] for power simulation, we constructed a more sophisticated model of CMOS gates that allows us to take into account the dependence of the current pulses on both Boolean variables (representing input signals) and analog parameters (representing internal charge status, output load and input transition times). An algorithm is also proposed to describe the effects of signal glitches and misaligned input transitions on the current drawn by a logic gate.

We restrict our scope to CMOS circuits mapped on a pre-defined cell library and we use the two-step paradigm of library characterization and event-driven logic simulation. Current behavior of the basic library cells are characterized once for all, and current waveforms are obtained at run time by composition, with small computational overhead. We implemented our algorithms in C using VERILOG-XL as simulation platform. The experimental results obtained on benchmark circuits are in good accordance with those provided by electrical simulation. In particular, 9% of average accuracy have been obtained on peak current estimates.

## 2 Current model

The supply current drawn by a CMOS gate in response to an input transition has two major contributions: a *short circuit current* directly flowing to ground and a *charging current* that actually changes the charge status at some of the internal and output nodes (Fig. 1a). Both contributions are strongly affected not only by the output load, but also by the input patterns applied to the cell, by the input slopes and by the charge status at the internal nodes.

For given I/O conditions the waveform of the total current can be effectively approximated by an asymmetric triangular pulse. This is shown in Fig. 1b, where the current profile obtained by electrical simulation is compared with the triangular model. Three parameters are then sufficient to describe the approximate shape of a single current pulse: the rising time  $T_r$ , the peak value  $I_p$  and the duration T. The values of these parameters are extracted from the results of electrical simulation.

Gate-level current modeling reduces to a two-fold issue: i) finding an operative definition for the pulse parameters, and ii) modeling their dependences on the I/O conditions. In principle, a look-up table could be constructed for each



Figure 1: a) Schematic representation of a three input NOR gate. A falling transition of a causes a supply current due to two effects: the charging of  $C_{3g}$  and  $C_{Lg}$ , and the activation of a temporary conductive path between  $V_{dd}$  and Ground. b) Triangular approximation of a current pulse. The four parameters  $(t_0, T_r, T \text{ and } I_p)$  are defined using a threshold  $I_{th} = I_m/20$  to filter noise.



Figure 2: Supply current drawn by the NOR gate of Fig. 1 during the falling transition of input *a*. The current behavior is reported for different values of input slopes  $(\tau)$  and loads  $(C = C_{Lp} + C_{Lg})$ .

parameter. However, the size of such table must be large to avoid excessive inaccuracy (note that input slopes, output load and internal voltages are analog quantities that must be discretized in order to be used to address a look-up table). We propose an alternative symbolic model that exploits the use of decision diagrams and linear regressions to provide compact and accurate representations of current parameters as functions of Boolean and analog variables.

#### Modeling time parameters

In order to distinguish between significant currents and noise, we use a current threshold  $I_{th}$ , and we define time parameters considering only current values  $I(t) > I_{th}$ , as shown in Fig. 1b (significant estimates have been obtained by using a threshold of one twentieth (5%) of the maximum measured current).

Looking at Fig 2, we notice that the pulse duration is tightly related to the output load, while the location of the peak is mainly related to the input transition time. Furthermore, both dependencies are almost linear.

We approximated the dependence of T and  $T_r$  on the I/O conditions as

$$T = c_0 + c_1 \tau + c_2 C_L \qquad T_r = d_0 + d_1 \tau + d_2 C_L.$$
(1)

where  $\tau$  is the input transition time,  $C_L$  is the total output load ( $C_L = C_{L_P} + C_{L_g}$ ) possibly including wiring capacitances, and coefficients  $c_0, c_1, c_2$  and  $d_0, d_1, d_2$  are to be set in order to fit the results of electrical simulations. If there is more than one input signal switching at the same time, we take the average of the transition times as  $\tau$ .

The linear models of Eq. (1) provide good approximations of the actual values of T and  $T_r$  as long as the driving capability of the pull-up (pull-down) network can be assumed to be constant. In general, different driving capabilities are associated with different input transitions, because they activate different conductive paths.

In principle, different equations should then be used for each possible input transition (*i.e.*, for  $2^{2n}$  pairs of input patterns), thus resulting in using  $6 \cdot 2^{2n}$  coefficients to model T and  $T_r$  for a *n*-input gate. In practice, substantial simplifications can be made without loss of accuracy, thanks to two important observations: *i*) only the last test pattern applied to a CMOS gate actually affects its driving capability; *ii*) the pull-up and pull-down networks of a CMOS gate can assume only a small set of driving capabilities (usually much smaller than  $2^n$ ).

The complete models of T and  $T_r$  then consist of a small set of linear equations, associated with clusters of input patterns. Such a model can be effectively represented by a decision diagram [12], in which: i) root is associated with T(or  $T_r$ ); ii) internal nodes are associated with input variables (decisions being made on the values they take at the end of the transition); iii) leaves are associated with linear equations (obtained by min square fitting on the results of electrical simulations).

#### Modeling the peak current

To define the upper vertex of the triangular pulse, we decided not to use the maximum measured current since it is strongly noise sensitive and it depends on the minimum time-step used in the electrical simulation. We decided, instead, to use a model that preserves the total amount of charge  $(\Delta Q)$ drawn by the cell during a transition. Since  $\Delta Q$  represents the area of the current pulse,  $I_p$  is defined in order to make the area of the triangular pulse equal to that of the original one:

$$I_p = \frac{2\Delta Q}{T}.$$
(2)

Estimating  $\Delta Q$  is still a difficult task, since it depends not only on the actual switching conditions, but also on the internal charge status (*i.e.*, on the inner structure of the gate and on its internal parasitic capacitances). This task, however, has already been addressed in [12], where we presented an accurate symbolic model of CMOS gate power consumption. Based on this model, a gate-level power simulator (PPP) was developed providing single-pattern single-cell power estimates with 5% maximum error from the results of electrical simulations. In this context we then assume  $\Delta Q$  to be available with sufficient accuracy during gate-level simulation, and we refer to [12] for a detailed treatment.

#### Misaligned input transitions

The model proposed so far provides a good estimate of the current pulse drawn by a generic CMOS gate corresponding to a generic transition between two input patterns. It is worth noting, however, that we implicitly made the assumption that all switching inputs have the same arrival times. Unfortunately, in actual circuits internal signals are in general slightly misaligned in time.

Though misaligned transitions have a sizable effect on power consumption and current flows, they have never been modeled at gate-level for two reasons. First, they elude any pre-characterization attempt due to the intractable number of possible combinations of signal skews. Second, the corresponding current waveforms are no longer shaped as triangular pulses.

In [12], we faced this problem in the context of power simulation and we proposed an effective solution based on the following observation: a misaligned transition of two input



Figure 3: Approximation of the current pulse associated with a misaligned double transition at the inputs of the NOR gate of Fig. 1a.

signals can be viewed as an intermediate situation between two limiting cases: a simultaneous double transition, and a sequence of two disjoint single transitions.

Since PPP provided accurate power estimates in both the limiting cases, we decided to use linear interpolation between them to estimate the actual power consumption. We apply the same approach to current waveforms.

Consider, for instance, a misaligned transition between abc = 100 and abc = 010 at the inputs to the NOR gate of Fig. 1a. In particular, we assume a skew of  $\Delta T = 0.2ns$ between the falling edge of a and rising edge of b, giving rise to the intermediate temporary pattern abc = 000. Fig. 3 reports the current waveforms corresponding to the two limiting situations. A, B and C are used to denote the current pulses associated with transitions  $100 \rightarrow 000, 000 \rightarrow 010$ and  $100 \rightarrow 010$ , respectively. According to the interpolation criterion proposed in [12], the overall charge drawn by the cell during the misaligned transitions is given by:

$$\Delta Q = (\Delta Q^{\mathbf{A}} + \Delta Q^{\mathbf{B}}) \frac{T^{\mathbf{A}} - \Delta T}{T^{\mathbf{A}}} + \Delta Q^{\mathbf{C}} \frac{\Delta T}{T^{\mathbf{A}}}, \qquad (3)$$

where  $T^{\mathbf{A}}$  is the duration of pulse **A** and the two transitions are assumed to be disjoint if the corresponding current pulses do not overlap.

To estimate the shape of the current pulse we must follow an event-driven approach, because our logic-level simulation model is inherently event-driven. First, notice that the second input event cannot affect the current behavior before its arrival time. Thus, when a switches we do not have any information on the future event on b. At time  $t_0^A$  the triangular pulse (A) provided by the pre-characterized cell-model for a single transition of a is added to the overall current<sup>1</sup>. Let **A** have duration  $T^{\mathbf{A}} = 1.1ns$ . When b switches, the overlapping of the two transitions is easily detected by comparing its arrival time  $(t_0^{\mathbf{B}})$  with  $t_0^{\mathbf{A}} + T^{\mathbf{A}}$ .

Instead of adding B to the overall current, the interpolation procedure is then invoked and a new current pulse  $(namely, \mathbf{D})$  is constructed having the time parameters of **B**, and peak value  $I_p^{\mathbf{B}}$  such that its area corresponds to the difference between the interpolated value of  $\Delta \dot{Q}$  and the already considered charge  $\Delta Q^{\mathbf{A}}$ :

$$T_r^{\mathbf{D}} = T_r^{\mathbf{B}}, \quad T^{\mathbf{D}} = T^{\mathbf{B}}, \quad I_p^{\mathbf{D}} = \frac{2(\Delta Q - \Delta Q^{\mathbf{A}})}{T^{\mathbf{B}}}.$$
 (4)

Notice that  $I_p^{\mathbf{D}}$  does not represent a real current and it may also take negative values. Nevertheless, the overall current estimate is a good approximation of the actual behavior provided by electrical simulation, as shown in Fig. 3 by pulse E. The intuition behind this procedure is that when b arrives, we correct the error made by scheduling the full current pulse upon the arrival of a.

#### **Event-driven simulation** 3

During simulation, the current pulses of each gate are composed in order to obtain the overall supply current. We use a simple labeling mechanism to represent the power distribution network: gates fed by the same supply route are associated with the same label. Whenever an event occurs at the input to a gate, the corresponding current pulse is then added both to the overall supply current and to the current flow of the route corresponding to its label.

Pulse composition is the key step involved in current simulation and has a strong impact on the global performance. Since current pulses are not directly handled by traditional event-driven simulators, we propose an efficient algorithm for pulse management.

#### Pulse composition

Consider a current pulse I(t) starting at time  $t_0$  and having duration T. Since the pulse is a time-continuous function, adding it to the overall current  $I_{tot}(t)$  would affect  $I_{tot}(t)$  for every  $t \in [t_0, T]$ , thus involving a number of operations related to the ratio between T and the time resolution. Notice however that, since we approximate I(t) with a triangular pulse (with parameters  $I_p$ ,  $T_r$ , T), it can be described by looking at the instantaneous changes of its slope, occurring at time  $t_0$ ,  $t_0 + T_r$  and  $t_0 + T$ . Starting from this information, a three step algorithm can then be used to construct the pulse:

- 1. at  $t_0$  change the slope by  $I_p/T_r$ ; 2. at  $t_0 + T_r$  change the slope by  $-I_p/T_r I_p/(T T_r)$ ; 3. at  $t_0 + T$  change the slope by  $I_p/(T T_r)$ .

The three instantaneous changes in the current slope actually represent an impulsive function that is the second order derivative of the triangular pulse. Adding impulsive functions is no longer a time-continuous operation. In an event-driven context, second order derivatives of current pulses can be added by following the three step paradigm of the above algorithm. Corresponding to an input event occurring at time  $t_0$ , the three slope changes of the new current pulse are added to the overall second order derivative at time  $t_0$ ,  $t_0 + T_r$  and  $t_0 + T$ .

On the other hand, due to the linearity of derivation  $(\mathcal{D})$ and integration  $(\mathcal{I})$ , the following property holds for any pair of functions f(t) and g(t):

$$f(t) + g(t) = \mathcal{I}\{\mathcal{D}\{f(t)\} + \mathcal{D}\{g(t)\}\}.$$
(5)

During simulation, we can then construct the second order derivative of the overall current by adding the slope changes of the current pulses. The current waveform can then be obtained at the end of the simulation run by integrating twice with initial conditions  $I_{tot}(0) = 0$  and  $\mathcal{D}\{I_{tot}\}(0) = 0$ .

In summary, adding a pulse to the overall current actually entails only three additions (in the worst case). No additional events are generated. Integration is performed off-line once for all. As a consequence, current monitoring does not impose substantial performance degradation on logic simulation.

<sup>&</sup>lt;sup>1</sup>Hereafter we always refer to time-domain current waveforms. The addition of a pulse to the overall current is to be intended as the sum of the corresponding time-continuous functions.

### 4 **Results and conclusions**

The algorithms described in this paper have been implemented in C and embedded into PPP [12], that uses VERILOG-XL as simulation platform. According to the model described in Section 2, a low-power CMOS library [13] (including sequential elements and two-level cells) has been characterized using HSPICE to perform electrical simulations.

| benchmark             |       | CPU time (s) |        | accuracy (%) |          |      |
|-----------------------|-------|--------------|--------|--------------|----------|------|
| name                  | cells | HSPICE       | PPP    | I(t)         | $I_p(n)$ | T(n) |
| C17                   | 6     | 199.4        | 1.8    | 19.3         | 8.7      | 6.1  |
| C432                  | 217   | 7867.4       | 38.8   | 27.2         | 7.6      | 5.4  |
| C499                  | 498   | 21841.8      | 107.0  | 18.2         | 5.6      | 9.1  |
| C880                  | 343   | 17713.6      | 65.2   | 13.8         | 5.9      | 5.7  |
| C7552                 | 2776  | -            | 1239.8 | -            | -        | -    |
| $^{\rm cmb}$          | 49    | 974.6        | 8.8    | 14.8         | 6.6      | 2.9  |
| parity                | 75    | 1451.0       | 13.6   | 13.7         | 8.7      | 5.3  |
| $\operatorname{comp}$ | 163   | 5450.4       | 32.8   | 14.5         | 8.9      | 2.6  |
| alu2                  | 359   | 29222.6      | 67.2   | 16.2         | 8.8      | 5.4  |
| alu4                  | 712   | -            | 112.8  | -            | -        | -    |
| s27                   | 15    | 316.0        | 3.6    | 29.6         | 9.8      | 2.0  |
| s208                  | 80    | 3692.8       | 10.2   | 19.8         | 9.6      | 5.6  |
| s953                  | 371   | 27083.6      | 40.8   | 28.9         | 14.8     | 7.9  |
| s1196                 | 484   | 37358.6      | 70.8   | 15.6         | 5.5      | 5.3  |
| s5378                 | 1409  | -            | 161.2  | -            | -        | -    |

Table 1: Results on benchmark circuits (missing results mean that the corresponding simulation exceeded 10 hours of CPU and/or 20 Mbytes of RAM).

Accuracy and performance have been compared with HSPICE on combinational and sequential benchmark circuits mapped on our test library. Random sequences of 100 test vectors have been applied with a 20ns clock period.

Experimental results are reported in Table 1. The first two columns contain the circuit name, and the number of cells used to implement it. Columns three and four represent the CPU times required by HSPICE and by PPP, respectively, to run simulation on a DECstation 5000. The speedup of PPP with respect to HSPICE has always been between two and three orders of magnitude, with an average performance loss of 6 times with respect to the simplest gate-level simulation with unit delay.

The last three columns represent the accuracy in terms of three parameters: i) the average absolute error of the time domain current waveform I(t) (represented with 0.1ns time resolution), ii) the average absolute error of the peak current per test-pattern  $I_p(n)$ , and iii) the average absolute error of the duration of the overall current pulse per test-pattern T(n). The average accuracy provided by our approach is of about 20% for time-domain current waveforms and 10% for peak current estimates.

The current waveform obtained for benchmark circuit  $alu_2$  is reported in Fig. 4.a and compared with that provided by HSPICE. The accuracy of our current estimation is evident.

Peak and average current estimates are compared in Fig. 4.b for 50 input transitions applied to circuit C7552. It is worth noting that there are no linear relations between the two measures. Moreover, they take maximum values corresponding to completely different input transitions. This result confirms the importance of peak current estimations because such quantity cannot be obtained from average currents provided by traditional power simulators.



Figure 4: a) Comparison between the current waveforms by PPP and HSPICE for benchmark circuit *alu2*. b) Peak *vs.* average current drawn by benchmark circuit *C7552*.

#### Conclusions

In this paper we presented a new symbolic model of current flows in CMOS gates, and a new gate-level current simulator. We have described the three key concepts that enable accurate logic-level current estimation, namely: i) triangular approximation for the current pulses generated by CMOS gates, ii) mixed Boolean and min-square fitting techniques to model the dependence of current waveforms from I/O conditions, iii) efficient event-driven waveform generation based on second derivatives and off-line integration. Additionally, we leverage the accurate power information provided by PPP. These ideas enable efficient solution to the main issues in accurate and efficient current simulation. The current simulator is available for evaluation at the following URL: http://akebono.stanford.edu/users/PPP.

### References

- F. Najm et al., "CREST: a Current Estimator for CMOS Circuits," in Proc. of IEEE ICCAD, 1988.
- [2] R. Burch *et al.*, "Pattern Independent Current Estimation for Reliability Analysis of CMOS Circuits," in *Proc. of DAC*, 1988.
- [3] H. Kriplani et al., "Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI Circuits: Algorithms, Signal Correlations, and Their Resolution," *IEEE Transaction on CAD*, vol. 14, no. 8, 1995.
- [4] S. Chowdhury et al., "Current Estimation in MOS IC Logic Circuits," in Proc. of IEEE ICCAD, 1988.
- [5] S. Chowdury et al., "Estimation of Maximum Currents in MOS IC Logic Circuits," *IEEE Transaction on CAD*, vol. 9, no. 6, 1990.
- [6] U. Jagau, "SIMCURRENT An Efficient Program for the Estimation of the Current Flow of Complex CMOS Circuits," in Proc. of IEEE ICCAD, 1990.
- [7] F. Rouatbi, et al., "Power Estimation Tool for Sub-Micron CMOS VLSI Circuits," in Proc. of IEEE ICCAD, 1992.
- [8] A. Nabavi-Lishi et al., "Delay and Bus Current Evaluation in CMOS Logic Circuits," in Proc. of IEEE ICCAD, 1992.
- [9] A. Nabavi-Lishi et al., "Inverter Models of CMOS Gates for Supply Current and Delay Evaluation," *IEEE Transaction* on CAD, vol. 13, no. 10, 1994.
- [10] A. Deng et al., "Time Domain Current Waveform Simulation of CMOS Circuits," in Proc. of IEEE ICCAD, 1988.
- [11] A. Deng, "Power Analysis For CMOS/BiCMOS Circuits," in *Proc. of IWLPD*, 1994.
- [12] A. Bogliolo et al., "Accurate Logic Level Power Estimation," in Proc. of IEEE SLPE, 1995.
- [13] T. Burd, "Current Estimation in MOS IC Logic Circuits," in M. S. Report UC Berkeley, UCB/ERLM94/89.