# A Dynamic-SDRAM-Mode-Control Scheme for Low-Power Systems with a 32-bit RISC CPU

Seiji Miura, Kazushige Ayukawa, and Takao Watanabe Central Research Laboratory, Hitachi, Ltd. 1-280 Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601,Japan.

E-mail: smiura@crl.hitachi.co.jp

#### Abstract

We have developed a dynamic-SDRAM-mode-control scheme for low-power systems with a 32-bit RISC CPU. The scheme is based on two dynamic changes of SDRAM modes: from active standby to standby and from standby to active standby. It reduces both the operating current and the latency of an SDRAM. An analysis using benchmark programs shows that the developed scheme reduces the SDRAM operating current by 40% and latency by 38% compared to those of standby mode. An SDRAM controller was developed based on this scheme and 0.18-um CMOS technology. The area of the controller is 0.28mm<sup>2</sup> and its operating current is 2.5mA at 1.8V and 100 MHz.

### **Keywords:**

SDRAM controller, standby mode, active-standby mode

#### **1. Introduction**

Memory technologies enabling low-power and high-speed operation are of key importance in battery-operated mobile devices such as note-book PCs and personal digital assistants. Several circuitry techniques for DRAMs have been developed [1-4]. Several architectural approaches to reduce DRAM latency have also been developed [5-7]. These approaches are based on using an active-standby mode, which enables the SDRAM to stand by in a row-active state. In this mode, data can be read directly from sense amplifiers when the accessed row address is the same as the previous one (i.e., a hit), resulting in a short latency and low power. Such approaches mainly focused on how to keep a hit rate high. However, if the accessed row address is different (i.e., a miss), the latency increases significantly. And latency in active-standby mode is even longer than that in standby mode when the SDRAM is in an idle state, because an additional operation to read the data from the sense amplifiers is needed. This long miss-latency is a serious problem that has not been solved by the approaches based on active-standby mode. Accordingly, we have developed a dynamic-SDRAM-mode-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED'01, August 6-7, 2001, Huntington Beach, California, USA. Copyright 2001 ACM 1-58113-371-5/01/0008...\$5.00.

control scheme that reduces the long miss-latency. This scheme effectively reduces both the SDRAM operating current and its latency by changing the SDRAM modes dynamically. And it is sets to active standby when the access result is a hit and is changed to standby when the access result is a miss.

#### 2. SDRAM operating current and latency

Figure 1 shows a read operation in standby mode in an SDRAM. In standby mode, because the SDRAM is in an idle state, bankactive, read, and precharge operations are performed when the SDRAM is accessed. As a result, the SDRAM operating current is 107mA and latency is five cycles.



Figure 1. Read operation in standby mode

Figure 2 shows a read operation in active-standby mode in a SDRAM. In active-standby mode, sense amplifiers in each of the banks in the SDRAM can work as a cache memory because the SDRAM is in a row-active state. When the access result is a hit, the operating current and the latency can be reduced because the precharge and bank-active operations are eliminated. As a result, the SDRAM operating current is 52mA and latency is three cycles. When the access result is a miss, the operating current is the same as the operating current in standby mode. However, when the

access result is a miss, the latency is seven cycles and two cycles longer than the latency in the standby mode because an extra precharge command is needed.



#### 3. Dynamic-SDRAM-mode-control scheme

To reduce the operating current and the latency while reducing the long miss-latency, we developed a dynamic- SDRAM-modecontrol scheme. The basic idea is that active-standby mode is available when the access result is a hit and standby mode is available when the access result is a miss.

The design of the developed SDRAM-mode controller is shown in Figure 3. This controller consists of an address-alignment unit, a hit/miss-judgement unit, a dynamic-mode-change unit, and a command-generation and an I/O-control unit. The address-alignment unit aligns the tag and the index addresses in the L1-cache with the address in the SDRAM-memory module. The hit rate depends on this alignment [5,6]. (Section 5 describes in detail how this alignment works.) The hit/miss-judgement unit evaluates whether the access results in a hit or a miss. The dynamic-mode-change unit changes control modes of the SDRAM to active standby or standby according to the results from the hit/miss-judgement unit and number of successive miss

accesses. The command-generation and I/O-control unit sends commands and addresses to the SDRAM and manages data transfer.



The read-operating mechanism in the developed SDRAM-control scheme is shown in Figure 4. The dynamic-mode-change unit maintains the active-standby mode to reduce the operating current (Iop = 52mA) and latency (La = three cycles) when the access result is a hit. When the access result is a miss, in the active-standby mode, the precharge, bank-active, and read operations are performed, and the latency increases (La = seven cycles). If there are several successive misses in the active-standby mode, the dynamic-mode-change unit changes the mode from active standby to standby after the precharge, bank- active, read, and precharge operations are performed. The dynamic-mode-control unit maintains the standby mode to reduce the long latency when the access result is a miss (La = five cycles). If it is a hit, the bank-active and read operations are performed, and the dynamic-mode-change unit changes the mode from standby to active standby.



Figure 4.SDRAM controlscheme

The dynamic-mode-change unit has two thresholds for mode changing from active standby to standby:

(a) Two successive misses, or (b) four successive misses.

Ordinarily, threshold (a) is used in this mode change. The access pattern of two successive misses is that of a miss-miss-hit. The sum of the latencies with two successive misses in the active-standby mode consists of 17 cycles (7 cycles + 7 cycles + 3 cycles). In contrast, under the developed scheme, the sum of the latencies with two successive misses consists of 19 cycles (7 cycles + 7 cycles + 5 cycles). This means that latency increases when the modes change. When two successive misses occur, threshold (b) is used for the mode change to eliminate this side effect.

Figure 5 shows how the threshold changes when the mode changes. If there are more than three successive misses, the dynamic-mode-change unit keeps the threshold at two successive misses. If there are fewer than three successive misses, the dynamic-mode-change unit changes the threshold to four successive misses.



## 4. Latency and operating current estimations

#### 4.1 Latency estimation

We estimated the average read latency of SDRAM( 256Mbit, PC-100, BL=4, CL=2 ) under the developed scheme and under the active-standby mode with a 100-MHz system clock. The latency in the active-standby mode is defined as

Lac = Lhit  $\times$  Hr + Lmiss  $\times$  Mr + Lstb  $\times$  (Nref / Ntotal) (1)

and 
$$Hr + Mr + (Nref / Ntotal) = 1$$
, (2)

where Hr is the hit rate in the active-standby mode, Mr is the miss rate in the active-standby mode, Nref is number of refresh operations, Ntotal is the number of total accesses, Lhit is the hitlatency in the active-standby mode, Lmiss is the miss-latency in the active-standby mode, and Lstb is the latency in the standby mode.

When an access to the SDRAM occurs every 10 clock cycles, (Nref / Ntotal) is 0.012 and is negligible.

Substituting (Nref / Ntotal) = 0, Lhit=3, Lmiss=7, Lstb=5 into Equations (1) and (2), we get

$$c = 7 - 4Hr.$$
(3)

The latency in the developed scheme is defined as

La

 $Ldv = Lhit \times Hr + Lmiss \times Mr + Lstb \times (Ndv / Ntotal)$  (4)

and 
$$Hr + Mr + (Ndv / Ntotal) = 1,$$
 (5)

where Hr is the hit rate in the active-standby mode, Mr is the miss rate in the active-standby mode, Ndv is the number of read and write operations in the standby mode, Ntotal is the number of total accesses, Lhit is the hit-latency in the active-standby mode, Lmiss is the miss-latency in the active-standby mode, and Lstb is the latency in the standby mode.

Substituting Lhit=3, Lmiss=7, Lstb=5 into Equation (5) gives

$$Ldv = 3 \times Hr + 7 \times (1 - Hr) - 2 \times (Ndv / Ntotal).$$
(6)

When hit rate, Hr, is one, (Ndv / Ntotal) is zero because the developed scheme always maintains the active-standby mode. As Hr in active-standby mode decrease, (Ndv / Ntotal) increases because the developed scheme often changes active-standby mode to standby mode. When Hr is 0, (Ndv / Ntotal) is one because the developed scheme always maintains the standby mode. Therefore, (Ndv / Ntotal) can be approximated as

$$(Ndv / Ntotal) = I_m (1-Hr)^m + I_{m-1} (1-Hr)^{m-1} + \dots + I_1 (1-Hr)^1 \quad (7)$$
where  $\sum_{k=1}^{m} I_m = 1$ , m is an integer.

When we assume that (Ndv / Ntotal) is  $(1 - Hr)^2$  and substitute  $(Ndv / Ntotal) = (1 - Hr)^2$  into Equation (7), Ldv can be expressed as

$$Ldv = 5 - 2Hr^2.$$
(8)

Figure 6 shows the average read-latency under the developed scheme and in the active-standby mode.

In active-standby mode, the latency is longer than that in standby mode for hit rates below 50%.

In the developed scheme, when the hit rate equals 100%, the latency is the same as that in active-standby mode. For the hit rates above 80%, the latency is the almost the same as the latency in active-standby mode because the developed scheme is set to active-standby mode mainly to reduce the latency. For hit rates of 20% to 80%, the developed scheme often changes to standby mode to reduce long miss-latency. The latency is shorter than that in standby mode for hit rates below 50%. For hit rates less than 20%, the latency is almost the same as that in standby mode, because the developed scheme is mainly set to standby mode. When the hit rate is zero, the latency under developed scheme is not longer than that in standby mode over the whole range of hit rates.



#### 4.2 Estimation of operating current

We estimated the operating current of an SDRAM (256Mbit, PC-100, BL=4, CL =2) under the developed scheme with a 100-MHz system clock. It defined as

$$Isd = Ihit \times Hr + Imiss \times (1 - Hr) + Iref, \qquad (9)$$

when Ihit = 
$$\operatorname{Icol} \times (6 / \operatorname{Tc}) + \operatorname{Inp} \times ((\operatorname{Tc} - 6) / \operatorname{Tc})$$
 (10)

and Imiss = Irow  $\times$  (8 / Tc) + Inp  $\times$  ((Tc - 8) / Tc), (11)

where Ihit is the operating current for a hit in active-standby mode, Imiss is the operating current for a miss in active-standby mode or standby mode, Iref is the operating current for refresh, Hr is the hit rate in active-standby mode, Icol is operating current when the access results in a hit every six cycles in active-standby mode, Irow is operating current when the access results in a miss every eight cycles, Inp is standby current in non-power-down state, Tc is the average number of interval clock cycles for an access to SDRAM.

Substituting Icol=73mA, Irow=129mA, Inp=20mA, Iref=2mA, into Equations(9), (10) and (11) gives

$$Isd = (872 / Tc) - (554Hr / Tc) + 22.$$
(12)

Figure 7 shows SDRAM operating current at Tc of eight, ten, and thirteen cycle.

The operating current under the developed scheme is the same as that in standby mode when hit rate is zero. As hit rate increases, operating current is reduced. When Tc is eight cycles and hit rate is zero, operating current become 131mA, and when hit rate equals 100%, operating current can be reduced by half of that in standby mode.





#### 5. Simulation using benchmark programs

The developed scheme was evaluated by using two benchmark programs on a 32-bit RISC-CPU simulator. Table 1 lists the specifications of the system used in the evaluation.

In this evaluation system, the CPU has a 16-KB L1-cache for data and instructions, and the bus width between the CPU and the two SDRAM chips is 32 bits. Two benchmark programs - Linpack, which is a matrix operation program, and Hydro2d, which is a program to solve hydro dynamical Navier Stokes equations – are used.

Figure 8 shows the three address alignments performed in the evaluation. The least significant bits of the tag addresses in the L1-cache correspond to the bank and column addresses in the SDRAM.

Table I System Specifications

| CPU come              | 32-bitRISC TYPE (250MHz)                                                                                                                                                           |  |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| L1- cache             | Instruction cache: 1 6KB<br>D ata cache: 16 KB<br>Line size: 128 bit                                                                                                               |  |
| SD RAM                | B us width:32 bit<br>M emory capacity:256 M bit x 2 chip<br>Number of bank:4banks<br>Refresh:64m sec /8192 cycles<br>Burst length:4 CAS latency:2<br>O perating frequency:100M H z |  |
|                       | 0 perating current [mA/chip]<br>Trow =129, fcol = 73,<br>Tstb = 20, Tref = 2                                                                                                       |  |
|                       | Latency [cycles]<br>Read: Lhit = 3, Lmiss = 7, Lstb = 5<br>W rite: Lhit = 1, Lmiss = 5, Lstb = 3                                                                                   |  |
| Benchmark<br>programs | Linpack , Hydro2d                                                                                                                                                                  |  |



in SDRAM and L1-cache

Figure 9 compares the average latency under the developed scheme and that in the active-standby mode. The latency under the developed scheme is shorter than that in the active-standby mode over the whole hit-rate range. The latency in the active-standby mode is longer than that in the standby mode for hit rates below 50%. When we used the developed scheme, the latency became shorter than that in the standby mode even though the hit rate was below 50%. The developed scheme reduced the latency by changing the mode to active standby over the whole range of high hit rates. It reduced the long miss-latency by changing the mode to standby for the whole range of low hit rates.

The developed scheme achieved maximum hit rates of 83% for Linpack and 43% for Hydro2d. It also reduced the latency by 38% for Linpack and by 10% for Hydro2d in comparison to those in standby mode.

Using the least-square method, we determined Ldv for Linpack and Hydro2d. For Linpack, Ldv is given

$$Ldv(Linpack) = 4 - 2Hr - 2Hr^2 - 2Hr^3$$
, (13)

where  $(Ndv / Ntotal) = -1(1 - Hr)^3 + 2(1 - Hr)^2$ , and coefficient of determination  $\gamma^2$  is 0.991.

For Hydro2d, Ldv can be obtained as

$$Ldv(Hydro2d) = 4.4 - 2Hr^2$$
, (14)

where (Ndv / Ntotal) =  $(1 - Hr)^2$ , and  $\gamma^2$  is 0.996.

Figure 10 compares SDRAM operating current under the developed scheme and in standby mode.

For Hydro2d and Linpack, an access to the SDRAM occurs in every 8 cycles and 13 cycles respectively. The operating current at a hit rate of 83% for Linpack is 54mA and at a hit rate of 43% for Hydro2d is 101mA. These results show that the developed scheme reduced the operating current by 40% for Linpack and 23% for Hydro2d in comparison to those in standby mode.



Figure 10. Operating current reduction

### 6. Controller design

We designed an SDRAM controller based on 0.18-um CMOS technology. The layout pattern of the controller is shown in Figure 11.



Figure 11. Layout pattern of the controller

| Technology         | 0.18-um CMOS         |
|--------------------|----------------------|
| Gate count         | 5.7 k gate           |
| Area               | 0.28 mm <sup>2</sup> |
| Operating current  | 2.5 mA               |
| Supply voltage     | 1.8 V                |
| Operating freqency | 100 M Hz             |

Table II Controller characteristics

The hit/miss judgement unit and the dynamic-mode-change unit require a 2-k gate and a 1-k gate, respectively. The command-generation and an I/O-controller unit has a 2.7-k gate. The controller chip characteristics are summarized in Table II. The area of the controller chip containing the above unit is only 0.28 mm<sup>2</sup>. The operating current of the controller is 2.5mA at a supply voltage of 1.8 V and an operating frequency of 100-MHz.

#### 7. Conclusions

We developed a dynamic-SDRAM-mode-control scheme for lowpower systems with a 32-bit RISC CPU. It changes SDRAM modes from active standby to standby and from standby to active standby dynamically; it therefore reduces both the operating current and the latency of the SDRAM.

Using benchmark programs, we tested the scheme and found that the latency was reduced by 38% and the operating current was reduced by 40% compared to corresponding values under standby mode.

We designed an SDRAM controller based on 0.18-um CMOS technology. The area of the controller is only 0.28 mm<sup>2</sup>, and its operating current is 2.5mA at a supply voltage of 1.8 V and an operating frequency of 100 MHz.

#### 8. Acknowledgements

We thank Katsutaka Kimura, Dr. Takayuki Kawahara, and Dr. Yoshio Miki for their support. We also thank Takashi Akazawa, Kouki Noguchi, Susumu Narita, Kunio Uchiyama, Osamu Nishii, Fumio Arakawa, Dr Hiroyuki Mizuno, Yusuke Kanno, and Satoru Akiyama for their comments and discussions.

### 9. References

- H.Tanaka, et al., "Sub-1- μ A Dynamic Reference Voltage Generator for Battery-Operated DRAMs," 1993 Symposium on VLSI Circuits Digest of Technical Papers, pp. 87-88.
- [2] K.Itoh, et al., "Trends in Low-Power RAM Circuit Technologies," Proceedings of The IEEE vol. 83, No. 4, April 1995.
- [3] T.Inaba, et al., "250mV Bit-Line Swing Scheme for a 1V 4Gb DRAM," 1995 Symposium on VLSI Circuits Digest of Technical Papers, pp. 99-100.
- [4] T.Yamagata, et al., "Circuit Design Techniques for Low-Voltage Operation and/or Giga-Scale DRAMs," 1995 ISSCC Digest of Technical Papers, pp. 248-249.
- [5] T.Watanabe, et al., "Access Optimizer to Overcome the Future Walls of Embedded DRAMs in the Era of Systems on Silicon," 1999 ISSCC Digest of Technical Papers, pp. 370-371.
- [6] Y.Kanno, et al., "A DRAM System for Consistently Reducing CPU Wait Cycles," 1999 Symposium on VLSI Circuits Digest of Technical Papers, pp. 131-132.
- [7] Y.Kim, et al., "A Memory Access System for Merged Memory with Logic LSIs," 1999 AP-ASIC, pp. 384-387.