# Low-Power Embedded SRAM Macros with Current-Mode Read/Write Operations

Jinn-Shyan Wang Electrical Engineering Department Chung-Cheng University 160, San-Hsing,Ming-Hsiung, Chia-Yi,Taiwan 886-05-2720411 ext. 5321 ieegsw@ccunix.ccu.edu.tw Po-Hui Yang Electrical Engineering Department Chung-Cheng University 160, San-Hsing,Ming-Hsiung, Chia-Yi,Taiwan 886-05-2720411 ext. 5321 d8442005@ccunix.ccu.edu.tw Wayne Tseng Electrical Engineering Department Chung-Cheng University 160, San-Hsing,Ming-Hsiung, Chia-Yi,Taiwan 886-05-2720411 ext. 5321 wayne99@ms17.hinet.net

# 1. ABSTRACT

The newly proposed SRAM performs both read and write operations in the current-mode. Due to the current-mode operations, voltage swings at bit-lines and data-lines are kept very small during read and write. The AC power dissipation of bit-lines and data-lines can thus be saved efficiently. For an embedded SRAM macro used in an 8-bit µ-controller, the SRAM the fully current-mode technique using consumes only 30% power dissipation as compared to the SRAM with only currentmode read operation. Experimental results show good agreement with the simulation results and prove the feasibility of the new technique.

# 2. INTRODUCTION

Embedded SRAM's are an important dissipation source in many VLSI chips because they contain high-capacitance buses and they are frequently accessed. Several design techniques have been proposed to reduce the power dissipation of SRAM's [1][2]. These techniques are usually used to reduce the active DC current. In the other respect, to obtain a fast read, the cell signal on the bit-line is made as small as possible, transmitted to the common data-line and amplified by a sense amplifier. The small voltage swing on the bit-line also leads to small AC power dissipation consumed by the large bit-line capacitance. Recently, several current-mode sensing circuits [3]-[6] are proposed to overcome the problem of possible speed degradation due to large bit-line and/or data-line capacitances. Current sensing provides the advantage of extremely small bit-line and data-line voltage swings, which also leads to further reduction of AC power dissipation.

Above mentioned techniques mainly attempt to reduce the power dissipation during read. The memory cell is usually designed to have enough static noise margin, and thus it needs a near full supply-voltage swing on the bit-line to override the original cell data during write. This large voltage swing will consume a lot of AC power according to the law of  $CV^2f$ . Recently, researches [7] and [8] start to pay attention to the power dissipation during write. In [7], by using a memory cell with different transistor sizes than the conventional one, write can be performed by pulling one of the bit-lines to 1V while the other is 0V. Although the dissipation associated with write operations is reduced, this SRAM design exhibits degraded noise margins compared to the conventional one. While in [8], the theory of energy recovery is applied for the purpose of power reduction during write. However, the design is restricted to the pipelined SRAM and it needs a resonant clock driver, which may not be implemented easily on chip.

Besides using some conventional techniques for DC power reduction, a new SRAM design technique, which performs both read and write operations in the current-mode, is proposed in this paper to further reduce the AC power dissipation associated with bit-lines and data-lines. The new technique is adapted from a conventional design [5], which only performs read operation in the current-mode. The new design needs a new memory cell with seven MOS transistors. In the following sections, section 3 will briefly review the basic concepts of the current-mode operation used in [5]. Section 4 will describe the new design technique and the circuits of each block. Section 5 discusses design considerations and the layout of the new memory cell. Design examples and experimental results are illustrated in section 6. Conclusions are given finally in section 7.

# 3. THE CONVENTIONAL SRAM WITH A CURRENT-MODE SENSE AMPLIFIER

Fig. 1 shows the read circuit of a conventional SRAM [5] with a hybrid current-mode sense amplifier. The hybrid current-mode sense amplifier consists of a P-type current conveyor, formed by P1, P2, P3, and P4, and a <u>clamped data-line sense amplifier (CDLSA)</u>, formed by N3, N4, N5, P5, and P6. The design uses a conventional 6-transistor memory cell.

The P-type current conveyor, which is proposed by Seevinck et al. in [3], is used to pass out the differential currents between BL and  $\overline{BL}$  to DL and  $\overline{DL}$ . The current conveyor is actually a current sense amplifier with the amplification magnitude of one. The conveyor consists of four equally sized PMOS transistors and is selected by grounding the *YSEL* node.

The operational principle of the current conveyor is explained briefly as follows with the aid of Fig. 2. Suppose the cell is accessed and draws current I. The gate-source voltage of P1 will be equal to that of P3, since their currents are equal, their sizes are equal, and both transistors are in saturation. This voltage is represented by V<sub>1</sub>. Similarly, the gate-source voltages of P2 and P4 are equal and represented by V2. It follows that, since YSEL is grounded, the left bitline will have voltage  $V_1 + V_2$ , and the right bit-line will also have voltage  $V_1 + V_2$ . Therefore the potential of the bit-lines will be equal independent of the current distribution. This means that there exists a virtual short effect across the bit-lines. Since the bit-line voltages are equal, the bit-line load currents will also be equal, as well as the bit-line capacitor currents. As the cell draws current I, it follows that the right-hand leg of the current conveyor must pass more current than the left-hand leg. In fact, the difference between these currents is I, the cell current. The drain currents of P3 and P4 are passed to data-lines DL and  $\overline{DL}$ . The differential data-line current is therefore equal to the cell current. Thus current sensing is obtained.

The sensing delay is unaffected by the bit-line capacitance since no differential capacitor discharging is required to sense the cell data. The reported comparison results in [3] shows that both the delay and current consumption performances of current-sensing are better than those of voltage sensing.

In the other respect, in order to overcome the problem of large data-line capacitance, a <u>clamped data-line sense</u> <u>amplifier</u> (CDLSA) is used to convert the differential currents on DL and  $\overline{DL}$  into the complementary output voltages. The CDLSA is derived from the <u>clamped bit-line</u> <u>sense amplifier</u> (CBLSA), proposed by Blalock et al. in [4]. The operational principle of the CDLSA is explained briefly as follows.

The CDLSA operates in two phases: precharge and sensesignal amplification. During read, N5 of the CDLSA is first turned on to precharge the output nodes of the CDLSA to equal potentials. The voltages at DL and  $\overline{DL}$ , which are close to ground owing that N1 and N2 are operated in the linear region, are also kept to nearly equal.



Fig. 1 The read circuit of a conventional current-mode SRAM.



Fig.2 A schematic with the current conveyor.

After release of the signal REQ, the CDLSA enters the phase of sense-signal amplification. Once N5 is turned off, N3, N4, P5, and P6 then act as a high-gain positive feedback amplifier. Due to the positive feedback, the impedance looking into the source terminal of either N3 or N4 is a negative resistance, which causes N3 and N4 to begin sourcing the differential currents. The differential currents flow through N3 and N4, charging the small equivalent capacitances at the drains of N3 and N4, giving rise to a  $\Delta V$  across the output nodes of the sense amplifier. The positive feedback effect of the cross-coupled circuit amplifies the small differential voltage to CMOS-signal level. Simulation results in [4] show that the CBLSA sense amplifier may consume less power if the same sensing speed is required.

### 4. THE PROPOSED LOW-POWER CURRENT-MODE SRAM

In order to perform the write operation in the current-mode, we observe that a 6-transistor memory cell is indeed a latch circuit, which is quite similar to the CDLSA. The two access transistors of the memory cell behave as transistors N3 and N4 of the CDLSA. The two cross-coupled inverters in the memory cell also provide a positive feedback gain just as transistors N3, N4, P5, and P6 of the CDLSA do in its signal amplification phase. In order to let the memory cell to respond to a small differential current, we need to clear the cell's content prior to the write operation by using a transistor as N5 in the CDLSA. In summary, the new memory cell should have seven transistors with one performing cell equalization just before the write operation. The transistor for equalization must be turned off during read in order not to interrupt the normal operation. Now, after inserting an N-type current conveyor between the data input circuit and the new memory cell, the SRAM can indeed perform the current-mode write operation in a complementary way as the read operation.

As compared with the conventional SRAM with only current-mode read operation, the new SRAM should have a circuit to generate the equalization for the new memory cell. Meanwhile, the input circuit must provide the differential current into the memory cell through a current-conveyor. The circuit construction of some important blocks in the write path will be described below. To help description, part of the transistor-level schematic of a new current-mode 4 x 1 SRAM is drawn in Fig. 3.



Fig. 3 Transistor-level schematic of a new current-mode 4 x 1 SRAM.

# 3.1 Data Input Circuit

Fig. 4 shows the data input circuit. Data to be written into cells are controlled by a write-enable signal *we* and transferred to write-data-lines through two pass transistors.

When we turns on pass transistors, the differential current  $\Delta I$  may be passed to wdlp and wdln. wdlp and wdln are write-data-lines and their voltage levels are nearly pulled down to ground by data-line load transistors.



Fig. 4. Data input circuit.

#### 3.2 N-Type Current Conveyor

When enough  $\Delta I$  appeared on write-data-lines *wdlp* and *wdln*, the N-type current conveyor, as shown in Fig. 5, will be enabled by the signal *wy*. Then the differential currents can be transferred to bit-lines *blp* and *bln* without attenuation.

The characteristics of this N-type current conveyor are quite similar to the P-type current conveyor [3]. When this circuit is enabled by the signal *wy*, there also exists a virtual short effect across the write-data-lines *wdlp* and *wdln*. So the voltage swings on data-lines can be kept as small as possible, and the sensing speed will be insensitive to the write-data-line capacitances.



Fig. 5. N-type current conveyor.

#### 3.3 New memory Cell

The new 7-transistor memory cell is drawn in Fig. 6.  $M_{eq}$  is used to clear the cell's content prior to the write operation, as described previously. When  $M_{eq}$  is off, the new cell performs as a conventional 6-transistor memory cell. PMOS transistors are used as the access transistors, which is different from the conventional design and will be explained in the next section.



#### 3.4 Decoders and Cell Equalizing Circuit

For the embedded applications, a system clock is assumed to be available for the SRAM design. We adopt a dynamic NAND decoder for low-power consideration [2]. The signal *weq* for cell's equalization is designed to be a short pulsed signal. After the differential current is transferred into the cell, the signal *weq* should be disabled to enable the cell's strong positive feedback operation. Then, release word-line to complete a write-operation cycle. The X-decoder circuit and the cell equalizing circuit is shown in Fig. 7, where clk is an internal clock signal with a short pulse and xadr is a word line. Y-decoder also uses a dynamic NAND circuit structure in this design.



Fig. 7. The X-decoder and the cell equalizing circuit.

#### 4. MEMORY CELL DESIGN

When considering the design of a memory cell, the cell area, static noise margin, access speed, and power consumption should be taken into consideration. A conventional 6-transistor memory cell is drawn in Fig. 8, with the transistor sizes shown in the figure. The design is based on a 0.6 $\mu$ m SPDM CMOS technology. According to typical design guidelines, the cell ratio of  $\beta_{driver}$  to  $\beta_{access}$  (i.e., the ratio of the width of  $M_n$  to the width of  $M_{access}$ ) is designed to be 2.75, and the ratio of  $W_p$  to  $W_{access}$  is designed to be 0.66. The layout of this cell is shown in Fig. 9(a), and the cell size is 7.4 $\mu$ m x 12.9 $\mu$ m.

When considering the design of the new memory cell shown in Fig. 6, how to minimize its cell area is an important concern because the new cell owns seven

transistors. Because both read and write operations are performed in the current-mode, a small differential current from the cell can be detected by the current conveyor, meanwhile a small differential data current can override the cell's content. The first decision in the design of the new cell is that the access transistors are made of minimumsized PMOS transistors (without bone-shaped source/drain layout areas), which can provide enough driving capability for small differential currents. Second, the size of M<sub>n</sub> is designed to be the same as that of Maccess. However, the cell ratio is still kept the same as that in the 6-transistor memory cell because the mobility of a PMOS transistor is 2.75 times smaller than that of a NMOS transistor in this process. M<sub>p</sub> is also designed to have a minimum size. Minimum-sized M<sub>p</sub> not only minimizes the source/drain areas, but also increase the ratio of  $W_p$  to  $W_{access}$ . We can conclude that the static noise margin of the new cell may nearly equal to that of the conventional cell.  $\mathrm{M}_{\mathrm{eq}}$  is designed to be as large as possible (2.2µm in this design) to enhance equalization speed, but not to increase the cell size.



Fig. 8 Schematic of a 6-transistor cell.

The layout diagram of the new 7-transistor memory cell is shown in Fig. 9(b). The cell size is  $7.2\mu m \times 12.7\mu m$ . The cell area of the new cell is found even smaller than that of the conventional cell owing to suitable sizing, despite one more transistor in the new cell. The area reduction is 4.3%.



Fig. 9 Layouts. (a) 6-transistor cell, (b) 7-transistor cell.

# 5. DESIGN EXAMPLES AND EXPERIMENTAL RESULTS

Two SRAM's were designed to verify the feasibility of the proposed circuit, and one of them has been implemented and tested. This section will show the simulation and experimental results.

# 5.1 128 x 8 SRAM's

First of all, we designed a new current-mode 128 x 8 SRAM, which has also been integrated into an 8-bit  $\mu$ -controller. Another 128 x 8 SRAM with current-mode read and voltage-mode write operations is also designed for the purpose of performance comparison. For simplicity, we call the new SRAM as a CWCR SRAM and the latter as a VWCR SRAM hereafter. These two SRAM's are designed in a 0.6 $\mu$ m SPDM CMOS process [10]. Operational frequency of both designs are set to be 100MHz (i.e., the clock cycle time is equal to 10ns ), and the supply voltage V<sub>DD</sub> is 3.3V. Table 1 shows the Hspice pre-layout simulation results.

| SRAM | t <sub>w</sub> | tacc   | Bit-line swing | Power dissipation | Power ratio |
|------|----------------|--------|----------------|-------------------|-------------|
| VWCR | 4.74ns         | 5.71ns | 2.31V          | 306uW/MHz         | 1           |
| CWCR | 3.95ns         | 5.94ns | 0.14V          | 93uW/MHz          | 0.3         |

Table 1. SRAM performance comparisons

In Table 1, write-in time  $t_w$  is defined to be the interval from the clock =  $0.5V_{DD}$  at the rising edge to cell's level  $\geq$  0.67  $V_{DD}$  for writing 1 or  $\leq$  0.33  $V_{DD}$  for writing 0. The read access time  $t_{acc}$  is defined to be the interval from the clock =  $0.5V_{DD}$  at the rising edge to the output level =  $0.5V_{DD}$ . It is clear that, under the constraint of nearly equal cycle time, the CWCR SRAM consumes only 30% power as compared to the VWCR SRAM.

Simulation waveforms are shown in fig. 10. First row shows the clock waveform. Second and third rows are the waveforms of the CWCR SRAM's bit-lines and input<7> / output<7>, and fourth and fifth rows are those of the VWCR SRAM. Write-operation and read-operation are alternated in the simulation, and we find from the simulation results that input data written into the memory cell can be read out correctly. Notice that the bit-line swing of the new SRAM is suppressed below 4% of V<sub>DD</sub>, and that of the voltage-mode SRAM is as large as 70% of V<sub>DD</sub>.

Table 2 shows the average current breakdown of each block. Some observations are described below. First, the power reduction of bit-lines is the major contribution for the total power reduction. Second, the CWCR SRAM needs an extra circuit to generate the equalization signal for the memory cells, and this in turn consumes extra power. However, this extra power is only 11% of the total power. Third, the memory cells of the new SRAM also consume larger power because of dc power during equalization. Finally, the little impact due to the extra power consumed by the equalization circuit and memory cells are easily compensated by the power saving by bit-lines.



Fig. 10. Simulation waveforms.

| s | RAM's | Adr-buf | x-dec | x-eq  | y-dec | clock | Bit-line | Cell  | Input | Output | Total |
|---|-------|---------|-------|-------|-------|-------|----------|-------|-------|--------|-------|
| • | CWCR  | 0.071   | 0.154 | 0.028 | 0.183 | 0.721 | 0.644    | 0.056 | 0.400 | 0.274  | 2.530 |
| , | WWCR  | 0.071   | 0.151 | 0     | 0.345 | 0.500 | 7.090    | 0.009 | 0.246 | 0.263  | 8.673 |

#### Table 2. Average current consumption of eachsub-circuit

# 5.2 512 x 15 SRAM's

In 1996, K. J. Schultz et al. published a paper [9] to demonstrate a very-low power  $512 \times 15$  embedded synchronous SRAM using the technique of divided word-line. For comparison, a new current-mode  $512 \times 15$  SRAM was also designed and simulated. The operational frequency is set to be 50MHz.

We use Powermill to measure the average power dissipation. This input pattern applied is the same as that in K. J. Schultz's paper. Pairs of writes and reads were alternated to two address, with Data = 7FFFh stored in location Address = 1FFh and Data = 0000 stored in location Address = 000. Performance comparisons are illustrated in Table 3.

This new current-mode SRAM consumes less power per MHz than Schultz's circuit. Although the normalized power of the new SRAM is only slightly smaller than that of the previous design, the operational speed of the new SRAM is designed to be much higher. In simulation of the new SRAM, a clock signal with 50% duty is applied. In reality, the clock pulsed low duty can be designed even shorter, and the cycle time of the new SRAM can even be smaller than 12ns.

The normalized power-delay product  $p \bullet \tau$  is defined to be the product of the normalized power and the clock cycle time.  $p \bullet \tau$  of the new SRAM is found to be only 40% of that of the previous design.

| Items             | [9]           | this work      |  |
|-------------------|---------------|----------------|--|
| technology        | 0.8um         | 0.6um          |  |
| supply voltage    | 3.3V          | 3.3V           |  |
| Access time       | 18ns          | бns            |  |
| clock cycle time  | 21ns          | ≤ 12ns         |  |
| power dissipation | 4.8mW @ 20MHz | 8.42mW @ 50MHz |  |
| normalized power  | 0.24mW/MHz    | 0.168mW/MHz    |  |
| p∙τ               | 5.04pJ/MHz    | 2.02pJ/MHz     |  |

Table 3. Performance comparisons of two 512x15 SRAM's

# 5.3 Experimental Results

The before-mentioned new current-mode 128 x 8 SRAM has been fabricated in a 0.6 $\mu$ m single-poly double-metal CMOS process. Fig.12 shows the die photo of this 128x8 SRAM. This SRAM chip functions correctly, and the measurement timing results are listed in Table 4. The delay times due to input/output pads have been subtracted from the measurement delay times to obtain the corrected results. It is seen that the simulation results are quite close to the corrected results. The power consumption data are not available because there are only one V<sub>DD</sub> pad and one Ground pad allowed for the chip in our educational project.



Fig.12. The photograph of the new current-mode 128x8 SRAM

| clock rate                 | $V^{}_{\rm DD} = 5.0 V$ | $V_{\rm DD} = 3.3 V$ | $V^{}_{\rm DD} = 2.5 V$ |
|----------------------------|-------------------------|----------------------|-------------------------|
| measurement ( corrected )  | 81.4MHz                 | 51.1MHz              | 32.5MHz                 |
| simulation ( post-layout ) | 80MHz                   | 50MHz                | NA                      |

**Table 4. Chip performances** 

This memory is used in an 8-bit  $\mu$ -controller, which has been tested successfully and in turn proves the feasibility of the current-mode SRAM as an embedded memory.

#### 6. CONCLUSIONS

A new current-mode technique for embedded SRAM's is proposed in this paper. In order to perform the currentmode write-operation, a seven-transistor SRAM cell is developed. According to the current-mode characteristics, all transistors in the memory cell are designed to be equal and minimum sized.

Using the seven-transistor SRAM cell and the N-type current conveyor, a current-mode write-operation can be obtained. Combined with conventional circuits used for current-mode sensing, not only read operation but also write operations are all performed in the current-mode.

Simulation results indicate that for the same speed specification, the 128\*8 SRAM adopting the new technique consumes only 30% power as compared to the conventional SRAM. Another new current-mode 512 x 15 SRAM is also designed. The power-delay product of the new design is 2.02pJ/MHz, which is only 40 percent of that of a recently published design [9].

#### 7. REFERENCES

- [1] Abdellatif Bellaouar and Mohamed I. Elmasry, "Low-Power Digital VLSI Design," Kluwer Academic Publishers, 1995.
- [2] K. Itoh, K. Sasaki, and Y. Nakagome, "Trends in low-power RAM circuit technologies," Proceedings of the IEEE, Vol. 83, No. 4, pp. 524-543, Apr. 1995.
- [3] Evert Seevinck, Petrus J. van Beers, and Hans Ontrop, "Current-Mode Techniques for High-Speed VLSI Circuits with Application to Current Sense Amplifier for CMOS SRAM's," IEEE J. of Solid-State Circuits, vol. 26, no. 4, pp. 525-536, Apr. 1991.
- [4] Travis N. Blalock, and Richard C. Jaeger, "A High-Speed Clamped Bit-Line Curent-Mode Sense Amplifier," IEEE J. of Solid-State Circuits, vol. 26, no. 4, pp. 542-548, Apr. 1991.
- [5] P. Y. Chee, P. C. Liu, and L. Siek, "High-Speed Hybrid Curent-Mode Sense Amplifier for CMOS SRAM's," Electronics Letters, vol. 28, no. 9, pp. 871-873, Apr. 1992.
- [6] Y. K. Seng and S. S. Rofail, "1.5V high speed low power CMOS current sense amplifier," Electronics Letters 9<sup>th</sup>, vol. 31, no. 23, pp. 1991-1993, Nov. 1995.
- [7] Jonas Alowersson and Per Andersson, "SRAM cells for low-power write in buffer memories," IEEE Symposium on Low Power Electronics, pp. 61-61, 1995.
- [8] N. Tzartzanis and W. C. Athas, "Energy recovery for the design of high-speed, low-power static RAMs," IEEE International Symposium on Low-Power Electronics and Design, pp. 55-60, 1996.
- [9] K. J. Schultz, et al., "Low-supply-noise low-power embedded modular SRAM," IEE Proc. Circuits Devices Systems., vol. 143, no. 2, pp. 73-82, Apr. 1996.