### New Clock-Gating Techniques for Low-Power Flip-flops

A.G.M. Strollo, E. Napoli, D. De Caro University of Naples "Federico II" Department of Electronic and Telecommunication Engineering via Claudio, 21 - Naples - Italy +39-0817683125

Email: decaro@diesun.die.unina.it

### ABSTRACT

Two novel low power flip-flops are presented in the paper. Proposed flip-flops use new gating techniques that reduce power dissipation deactivating the clock signal. Presented circuits overcome the clock duty-cycle limitation of previously reported gated flip-flops.

Circuit simulations with the inclusion of parasitics show that sensible power dissipation reduction is possible if input signal has reduced switching activity. A 16-bit counter is presented as a simple low power application.

### Keywords

CMOS digital integrated circuits, flip-fops, low-power circuits, transition probability.

### **1. INTRODUCTION**

Low-power techniques are essential in modern VLSI design due to the continuous increase of clock frequency and chip complexity [3]. Various recently proposed techniques yield low power operation reducing signals switching activity [1,4,15]. Such techniques are generally applied to internal nodes with high capacitive load that heavily contribute to total power dissipation.

In particular, the clock system, composed by flip-flops and clock distribution network, is one of the most power consuming subsystem in a VLSI circuit [13]. As a consequence many techniques have been proposed to reduce clock system power dissipation [5,11,12].

Disabling the clock signal (clock gating) in inactive portions of the chip is a useful approach for power dissipation reduction.

Clock gating can be applied to different hierarchical levels. It is possible to disable the clock signal that drive a big functional unit reducing power dissipation on both its internal nodes and its clock line [14]. Other papers use clock gating with lower granularity level [2,9]. In these cases a single circuitry that enables the activity of a whole set of flip-flops is presented.

Recently it has been shown that clock gating can be

ISLPED '00, Rapallo, Italy.

Copyright 2000 ACM 1-58113-190-9/00/0007...\$5.00.

successfully applied when a different activation function is generated for each flip-flop [6,8]. Papers [6,8] show that sensible reduction of power consumption is achieved if flip-flop input signal switching activity is sufficiently low. In such cases each flip-flop includes its own gating logic and hence the introduced overhead must be limited as much as possible.

In [6,8] the use of a combinatorial gating logic is proposed. Unfortunately a correct timing of the flip-flop is guaranteed only if the gating logic is sequential. As a consequence the solutions proposed in [6,8] need additional effort to avoid timing violations. In [8] a subnano-pulse generator is used on the clock line while in [6] severe constrains on clock duty-cycle are imposed. Proposed solutions can be hardly used when reliable operation is needed (as in the design of leaf cells) since they impose severe timing constrains.

In this paper, two novel low power flip-flops will be presented. Proposed flip-flops use gating techniques to gain low power operation and show no limitation on clock duty cycle.

The first technique [11], named as Double Gating in the following, applies gating technique not to the whole flip-flop, but separately to the master latch and to the slave latch. Although, in this way, the introduced overhead is doubled, it will be shown that significant power dissipation reduction is obtained if input signal switching activity is low.

The second technique, named as NC<sup>2</sup>MOS Gating in the following, uses one only gating logic for the whole flip-flop. The gating logic is sequential with reduced overhead.

Operation principle for Double Gating and NC<sup>2</sup>MOS Gating will be presented in sections II and III. Simulation results for flipflops designed up to the layout level will be presented in section IV. Performance of a 16-bit counter realized with proposed flipflops is presented in section V.

## 2. GATING BOTH MASTER AND SLAVE LATCHES (DOUBLE GATING)

The schematic of a gated latch is shown in Figure 1. The latch is positive level-sensitive (it is transparent when ckg=1 and in hold for ckg=0). The comparison between D and Q is performed by a XOR gate, while the gating logic is a simple AND gate.

The operation of the circuit is as follows. If ck is 0, then ckg is also 0 and the latch is correctly in hold state. On the other hand, when ck is high and D is different from Q, the gating logic enables the ckg signal so that the latch can correctly switch. Note that if D is equal to Q the gating logic inhibits the propagation of switching activity from ck to ckg.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.



Figure 1. Positive level-sensitive latch with clock gating.

A negative level-sensitive clock-gated latch is quite similar to the schematic of Figure 1. The difference is in the gating logic (implemented with an *OR* gate) and in the comparator logic (implemented with a *XNOR* gate).

With reference to power dissipation let us firstly examine the case with input signal switching activity,  $\alpha$ , equal zero. In this case a power consumption reduction is obtained if the capacitive load introduced on the clock line by the gating logic ( $C_{Ck}$ ), is lower than the capacitance on the *ckg* node ( $C_{Ckg}$ ). When  $\alpha$  increases, power consumption overhead introduced by the comparator and the gating logic also increases. Let us define  $\alpha_{lim}$  as the switching activity value that provides equal power dissipation for gated and no gated latches. For  $\alpha > \alpha_{lim}$  gated latches are useless as they provide higher power dissipation. This is a common characteristics of gated latches and flip-flops that, therefore, are best suited for applications in which data switching activity is low [6,8,11].

It is worth noting that the approach shown in Figure 1 can not be applied to an edge-triggered flip-flop. In this case, in fact, a change of D while ck is high causes a commutation of ckg, triggering the flip-flop.

In previous papers the problem with gated flip-flops has been avoided allowing D to change only when Ck is 0 (for AND gated logic) or when Ck is 1 (for OR gated logic). This is done both in [6], where a timing constraint on clock duty-cycle is imposed, and in [8], where fine-tuned sub-nanosecond pulse generator is used on the clock line. In both cases it is necessary to use asymmetrical clock signals that are hardly controllable on chip.

The Double Gating, presented in this paper, overcomes the limitations shown by previously proposed flip-flop gating techniques.

A general implementation of Double Gating is presented in Figure 2. The technique uses two gated latches in a master slave configuration [11]. In Figure 2 the first gated latch is positive level-sensitive and so is AND gated with XOR comparator and the second one is negative level-sensitive. Since, as previously shown, gated latches do not present timing failure problems, the technique yield a reliable gated flip-flop that can be used with any clock duty-cycle and is suitable for standard cell design.

### 2.1 Implementation

Different flip-flops have been designed using different logic styles for the comparator and the gating logic (Transmission Gates logic, CPL, full CMOS logic).

Figure 3 shows the schematic of the most effective structure. The caption reports transistor sizing for a  $0.8\mu$ m technology.

The circuit of Figure 3 has been obtained from the scheme of Figure 2 joining the gating logic and the comparator in a single



Figure 2. *Double Gated* flip-flop schematic for negative edge-triggered flip-flops.

complex CMOS gate. This is a substantial improvement as, in this way, the signal *comp* and its power dissipation are eliminated.

The flip flop of Figure 3 requires 42 devices, four of them are driven by Ck signal.

# **3. SEQUENTIAL GATING (NC<sup>2</sup>MOS GATING)**

A reliable flip-flop can also be realized using the schematic of Figure 4 where the a negative edge-triggered flip-flop is shown. We name this gating technique NC<sup>2</sup>MOS Gating as the gating logic is implemented through a NC<sup>2</sup>MOS circuit.

Let us examine the gating logic. When ck=0, node  $\overline{ckg}$  is pulled high by M1. During the subsequent high level of the ck signal, two different situations are possible.

- If *comp* is, or becomes equal to 1, as *D* is different from *Q*, node  $\overline{ckg}$  is pulled down. Afterwards, when *ck* goes to 0 a positive edge of  $\overline{ckg}$  is produced by M1.
- If *comp* is always equal to 0, when ck=1,  $\overline{ckg}$  node is not pulled down and hence no active edge of ckg is produced when ck goes to 0. For ck=1 and *comp*=0 the gating logic is in a memory state that is made static by *I1* inverter.



Figure 3. Proposed *Double Gated* flip-flop. The latches are simple static transmission-gate circuits. NMOS and PMOS aspect ratio are  $W/L = 3.6 \mu m/0.8 \mu m$  and  $W/L = 7.2 \mu m/0.8 \mu m$  respectively.



Figure 4. *NC<sup>2</sup>MOS Gated* flip-flop schematic for negative edge-triggered flip-flops.

The gating logic succeed in disabling flip-flop clock signal when *D* is equal to *Q* and does not exhibits timing problems. As a matter of fact, the structure uses a pull-up net, for  $\overline{ckg}$  node, realized by one only PMOS driven by *ck* signal. In this way only the clock *ck*, and not *comp*, can drive negative-edges on *ckg*, so that the flip-flop can be activated only by *ck*.

The schematic of NC<sup>2</sup>MOS Gating logic for positive-edge triggered flip-flops is quite similar. The difference is in the pull-down net, which is composed of a single NMOS driven by ck, and in the pull-up net, which presents two series PMOS, one driven by ck and the other driven by  $\overline{comp}$ .

#### **3.1 Implementation**

The flip-flops have been implemented using different logic styles and optimized for low power operation. Figure 5 shows the



Figure 5. Proposed  $NC^2MOS$  flip-flop. The latches are simple static transmission-gate circuits. Weak transistors are indicated with \*, their sizing is:  $W/L = 1.2 \mu m/4.8 \mu m$ . MOS driven by *ckg* in 11 inverter are minimum sized  $(W/L = 1.2 \mu m/0.8 \mu m)$ . Remaining PMOS and NMOS sizing is  $W/L = 7.2 \mu m/0.8 \mu m$  and  $W/L = 3.6 \mu m/0.8 \mu m$ , respectively.



Figure 6.  $NC^2MOS$  flip-flop using pass-transistor logic for the comparator. Gating logic transistors sizing is the same of Figure 5 and the NMOS driven by *comp* has  $W/L = 3.6\mu m/0.8\mu m$ . Comparator PMOS and NMOS sizing is  $W/L = 7.2\mu m/0.8\mu m$  and  $W/L = 3.6\mu m/0.8\mu m$ , respectively.

schematic of the most effective structure. The caption reports transistor sizing for a 0.8µm technology.

The circuit of Figure 5, derived from the scheme of Figure 4, substitutes the NMOS gated by *comp* with a NMOS net that realizes the comparison directly. The improvement allows a significant reduction of power dissipation due to the reduction of transistor count and to the elimination of signal *comp*. Note that the weak devices of the feedback inverter, *I1*, are composed of two series transistors, one permanently ON. By this way the MOSFETs driven by *ckg* signal in 11 inverter are minimum sized minimizing *ckg* node switching capacitance and power dissipation. Weak devices are sized in order to guarantee that overall W/L ratio of 11 inverter is low. The flip flop of Figure 5 requires 28 transistor, only two of them are driven by *ck* signal.

A different implementation of NC<sup>2</sup>MOS Gated flip-flop is presented in Figure 6 where simple pass-transistor logic is used for the comparator yielding transistor count of 27. In this case reduced logic swing for signal *comp* and transistor count reduction provides a small improvement on power dissipation with respect to the circuit of Figure 5. Main drawback of the implementation of Figure 6 is the larger overhead on area



Figure 7. Proposed layout for the NC<sup>2</sup>MOS Gated Flip-flop presented in Figure 5.



Figure 8. Proposed flip-flops SPICE simulation.

occupation and increased flip-flop latency due to the slow driving of signal *comp*.

### 4. SIMULATIONS AND RESULTS

Proposed circuit implementations of Figure 3 and Figure 5 have been designed up to the layout level for a  $0.8\mu$ m technology with 5V supply voltage. As example in Figure 7 the layout of the circuit in Figure 5 is presented.

Proposed flip-flops are compared with a conventional non gated flip flop. The comparison circuit is a master-slave flip-flop with simple static transmission gate latches. An inverter has been included in the flip-flop to produce the inverted clock signal. Transistor sizing for master and slave latches is the same used in Figure 3 and Figure 5.

Simulation results have been obtained through SPICE simulation of the circuits extracted from the layout with the inclusion of parasitics. The SPICE simulation of Figure 8 shows correct circuit operation for both flip-flops of Figure 3 and Figure 5. No timing failure is present even for 50% duty-cycle clock signal and in presence of glitching on D input.

Power consumption has been calculated using the test circuit of Figure 9. Sizing of inverters A and B is  $W/L = 7.2\mu m/0.8\mu m$ and  $W/L = 3.6\mu m/0.8\mu m$  for PMOS and NMOS respectively. Further, their drain capacitances are set to zero in order to minimize the influence of measurement inverters on flip-flop power dissipation. Clock frequency is 50MHz. D is a periodic waveform with given switching activity. Power consumption is calculated as:

$$P = \frac{1}{T^*} \int_{T^*} V_{DD} I_{DD} \tag{1}$$

where T\* is a whole period of D signal.

Different simulations have been performed varying  $\alpha$ .



Figure 9. Test circuit for power consumption measurement. *Clk* is a 50 MHz clock signal while *In* produces a periodic waveform with given switching activity.



Figure 10. Power dissipation as a function of input signal switching activity. Solid lines refer to gated flip-flops of Figure 3 and Figure 5. Dotted line refers to the comparison flip-flop. Clock frequency is 50MHz.

Simulation results are shown in Figure 10. Power consumption is reported as a function of input signal switching activity, assuming that the input signal has no glitches. The switching activity, indicated as  $\alpha$ , is defined as the average number of transitions of *D* in a clock cycle. Note that, according to previous definition, maximum switching activity for a glitch free signal is 1 while switching activity can be greater than one if the presence of glitches is considered. Switcing activity for a signal that behaves as clock signal is equal to 2.

Highest reduction of power dissipation is achieved when D is idle. In this case power saving with respect to the no gated flip-flop is 64% for the Double Gated flip-flop, and 74% for the NC<sup>2</sup>MOS Gated flip-flop. Moreover, it is worth pointing out that  $\alpha_{\text{lim}}$  for the Double Gated flip-flop (0.17) is lower than  $\alpha_{\text{lim}}$  for the NC<sup>2</sup>MOS Gated flip-flop that is 0.36.

Better performances for the NC<sup>2</sup>MOS Gated flip-flop, with respect to the Double Gated one, is due to the reduced complexity and hence smaller overhead introduced on both  $C_{Ck}$  capacitance and on gating and comparator power dissipation. Smaller overhead on  $C_{Ck}$  capacitance reduces power dissipation for  $\alpha$ =0. Smaller overhead on gating and comparator logic reduces the slope of the plot in Figure 10 and hence provide higher  $\alpha_{lim}$ .



Figure 11. Test circuit for timing parameters measurement. The block indicated as  $\Delta$  is a delay element used to produce a skew between Ck1 and Ck2. Clock source (Clk) has variable period (T). The inverters have  $W/L = 3.6 \mu m/0.8 \mu m$  for NMOS and  $W/L = 7.2 \mu m/0.8 \mu m$  for PMOS. Combinatorial logic is implemented with two series inverters.



Figure 12. Power dissipation and area occupation for proposed flip-flops and comparison flip-flop.  $\alpha$  is the input signal switching activity.

Proposed gated flip-flops exhibit reduced minimum power dissipation and significant dependence of power dissipation with respect to flip-flop input pattern. The behavior is common to every gated flip-flop. A similar behavior is also present in other full-static flip-flop structures such as differential RAM flip-flops [10].

The measure of flip-flops timing parameters has been performed using the test circuit reported in Figure 11. The circuit provides a measure of setup and hold times based on their effect on maximum clock skew and maximum clock frequency.

Firstly,  $\Delta$  is set to zero and, as setup time limits minimum clock period, *T* is decreased to get minimum *T* value that guarantees correct circuit operation (*T<sub>min</sub>*). The setup time is measured as the time interval between the active edge of *Ck2* and the edge of *D2* when *T*=*T<sub>min</sub>*.

For the hold time measurement, as hold time limits maximum allowed clock skew, *T* is set in order to produce no setup time failure ( $T=5 \cdot T_{min}$ ), and  $\Delta$  is increased to get maximum  $\Delta$  value that guarantees correct circuit operation ( $\Delta_{max}$ ). Hold time is measured as the time interval between the active edge of *Ck2* and the edge of *D2* when  $\Delta = \Delta_{max}$ .

Table 1 presents setup time, hold time and clock to Q delay for proposed and no gated flip-flops.

Gated flip-flops present a significant increase of timing parameters. This is due to the presence of gating logic that causes a propagation delay between clock and gated clock signals. As example, the setup time of a gated flip-flop is greater than setup time of the no gated one because, when D changes, an active clock edge is allowed to safely sample D input into the flip-flop only after that the D transition has been propagated through the comparator and the gating logic. Similar analysis can be conducted for clock to Q delay value. Degraded performances on setup time and clock to Q delay are typical of gated structures and have already been reported [8].

Table 1. Timing parameters for proposed gated flip-flop (Figure 3 and Figure 5) and for the comparison no gated flip-flop. Latency is defined as the sum of Setup and Ck to Q time.

|                              | Setup<br>(ns) | Hold<br>(ns) | Ck to Q<br>(ns) | Latency<br>(ns) |
|------------------------------|---------------|--------------|-----------------|-----------------|
| Double Gated FF              | 1.40          | -1.04        | 1.35            | 2.75            |
| NC <sup>2</sup> MOS Gated FF | 1.07          | -1.01        | 0.98            | 2.05            |
| No Gated FF                  | 0.41          | -0.26        | 0.57            | 0.98            |

On the other hand, gated flip-flops, due to the increased hold time, provide better robustness with respect to clock skew.

Figure 12 shows proposed flip-flops synthetic performances: power dissipation and area occupation. Note the substantial reduction of power dissipation for  $\alpha$ =0. As previously reported, the reduced complexity of NC<sup>2</sup>MOS Gated flip-flop results in reduced area overhead with respect to Double Gated flip-flop. As a matter of fact NC<sup>2</sup>MOS area occupation is only 1.45 times higher than comparison flip-flop area occupation.

### 5. AN APPLICATION: LOW POWER BINARY COUNTER

Proposed structures are suited for applications with reduced switching activity. An example is a binary counter, in which input switching activity for each flip-flop is known beforehand and is equal to  $2^{-k}$  for the k-th bit. Three 16-bit counters have been designed to the layout level and simulated. Double Gated and NC<sup>2</sup>MOS low power counters use Double Gated and NC<sup>2</sup>MOS flip-flops respectively. The third counter uses only no gated flip-flops. For low power counters, flip-flop switching activity suggests the use of standard flip-flops for the lower-order bits (with high switching activity) and of clock gated flip-flops for the remaining bits. As example NC<sup>2</sup>MOS counter uses no gated flip-flops for bits 0 and 1.

Simulation results are shown in Figure 13. The use of clockgated flip-flops results in a power saving up to 52% for the  $NC^2MOS$  16-bit counter. In both low power counters the reduced clock capacitance of gated flip-flops allows the use of a smaller clock buffer yielding further reduction of total power dissipation.

### 6. CONCLUSIONS

In this paper two new techniques for flip-flop gating that overcome the limitations of previously presented structures have been presented. Proposed flip-flops are particularly reliable as they don't require any constraint on clock duty-cycle.

Proposed circuits have been designed up to the layout level in a  $0.8\mu$ m 5V technology. Numerical simulations of the circuit extracted from the layout with the inclusion of parasitic, show that a significant power dissipation reduction is obtained if input signal switching activity is low. As example, the NC<sup>2</sup>MOS Gated flip-flop provides 74% power dissipation reduction when input signal is idle.



Main drawback of the proposed approach is the reduction of timing performances. Flip-flop latency is doubled with respect to the comparison flip-flop. It is worth pointing out that this problem has already been reported in previous papers [8] for gated flip-flops.

As a conclusion proposed flip-flops, as well as all gated one, are best suited for applications in which input activity is kept low and time isn't a crucial factor. In a 16-bit counter the use of new clock-gated flip-flops results in a power saving up to 52%.

### 7. REFERENCES

- Alidina, M., J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou, "Precomputation-Based Sequential Logic Optimization for Low Power", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 2(4), 426-436 (December, 1994).
- [2] Benini, L., and G. De Micheli, "Automatic Synthesis of Low-Power Gated-Clock Finite-State-Machines", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and System*, 15(6), 630-643 (June, 1996).
- [3] Chandrakasan, A. P., S. Sheng, and R.W. Brodersen, "Low-Power CMOS Digital Design", *IEEE Journal of Solid-State Circuits*, 27(4), 473-483 (April, 1992).
- [4] Kapadia, H., L. Benini, and G. De Micheli, "Reducing Switching Activity on Dathpath Buses with Control-Signal Gating", *IEEE Journal of Solid-State Circuits*, 34(3), 405-414 (Mar., 1999).
- [5] Kawaguchi, H., and T. Sakurai, "A reduced clock swing flipflop (RCSFF) for 63% power reduction", *IEEE Journal of Solid-State Circuits*, 33(5), 807-811 (May, 1998).
- [6] Lang, T., E. Musoll, and J. Cortadella, "Individual Flip-Flops with Gated Clocks for Low Power Datapaths", *IEEE Transactions on Circuits and System-II: Analog and Digital Signal Processing*, 44(6), 507-516 (Junuary, 1997).

- [7] Najm, F. N., "A survey of power estimation techniques in VLSI circuits", *IEEE Trans. on VLSI Systems*, 2(4), 446-455 (December, 1994).
- [8] Nogawa, M., and Y. Ohtomo, "A Data-Transition Look-Ahead DFF Circuit for Statistical Reduction in Power Consumption", *IEEE Journal of Solid-State Circuits*, 33(5), 702-706 (May, 1998).
- [9] Raghavan, N., V. Akella, and S. Bakshi, "Automatic Insertion of Gated Clocks at Register Transfer Level", 12th Intern. Conf. On VLSI Design, 48-54 (January, 1999).
- [10] Stojanovic, V., and V. G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for highperformance and low-power systems", *IEEE Journal of Solid-State Circuits*, 34(4), 536-548 (April, 1999).
- [11] Strollo, A.G.M., and D. De Caro, "New low power flip-flop with clock gating on master and slave latches", *Electronics Letters*, (4), 294-295 (February, 2000).
- [12] Strollo, A.G.M., E. Napoli, and C. Cimino, "Analysis of Power Dissipation in Double Edge Triggered Flip-Flop", Accepted for publication on *IEEE Transaction on VLSI* Systems.
- [13] Tiwari, V., D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, "Reducing Power in High-performance Microprocessors", *Proc. Design Automation Conf.*, 732-737 (June, 1998).
- [14] Tiwari, V., R. Donnelly, S. Malik, and R. Gonzalez, "Dynamic Power Management for Microprocessors: A Case Study", *10th Intern. Conf. On VLSI Design*, 185-192 (January, 1997).
- [15] Tiwari, V., S. Mailik, and P. Ashar, "Guarded evaluation: Pushing power management to logic synthesis/design", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and System*, 17(10), 1051-1060 (October, 1998).