# A 16-bit Redundant Binary Multiplier Using Low-Power Pass-Transistor Logic SPL 

Hirofumi SAKAMOTO ${ }^{\dagger}$ Hiroyuki $\mathrm{OCHI}^{\dagger}$<br>$\dagger$ Dept. of Computer Engineering<br>Hiroshima City University<br>Asaminami-ku, Hiroshima, 731-3194 Japan<br>Tel: +81-82-830-1550, Fax: $+81-82-830-1792$<br>e-mail: ochi@ce.hiroshima-cu.ac.jp

Ken’ichiro UDA ${ }^{\ddagger}$<br>Kazuo TAKI ${ }^{\ddagger}$

Bu-Yeol LEE ${ }^{\ddagger}$<br>Takao TSUDA ${ }^{\dagger}$


#### Abstract

We have designed a 16-bit redundant binary multiplier using pass-transistor logic SPL on a $0.35 \mu \mathrm{~m}$ technology. Number of transistors is 12,349 , and area is $1,322 \mu \mathrm{~m} \times 332 \mu \mathrm{~m}$. Measured power dissipation and maximum delay at $\mathrm{T}=25^{\circ} \mathrm{C}$ and $\mathrm{V}_{\mathrm{DD}}=\mathbf{3 . 3}$ V are $33.7 \mathrm{~mW} / 100 \mathrm{MHz}$ and 7.4 nsec , respectively.


## I. Introduction

In recent years, power consumption, as well as area and speed, is one of most important issue in VLSI design. Pass-transistor logic has been intensively studied as a breakthrough for high-speed and low-power digital circuit. Various kind of pass-transistor logics have been proposed, including CPL (Complementary Pass-transistor Logic), LEAP (LEAn integration with Pass-transistors), and SPL (Single-rail Pass-transistor Logic)[1]. While most passtransistor logics are based on double-rail structure to achieve high-speed, LEAP and SPL are based on singlerail structure to reduce number of transistors and thus achieve low-power. Especially, SPL uses a long series of pass-transistors to a non-critical path if it is of benefit to reduce number of transistors. This paper reports design and measurement results of a 16 -bit multiplier using SPL.

## II. Design

## A. Multiplication Algorithm

The multiplication algorithm used in this design is based on RB (Redundant Binary) addition[2]. RB adder, as well as carry save adder, is a high-speed carry-propagation-free adder. Detailed techniques proposed in [3] is adopted in our design. As shown in Fig. 1, the multiplier consists of four blocks, (1) Booth's recoder, (2) partial product generator, (3) RB adders, and (4) RB-tobinary converter. For (4), we used carry select adder. Inputs and output of the multiplier are signed binary numbers (2's complement).

$\ddagger$ Dept. of Computer and Systems Engineering Kobe University<br>Nada-ku, Kobe, 657-8501 Japan<br>Tel: +81-78-803-6210, Fax: $+81-78-803-6391$<br>e-mail: taki@cs.kobe-u.ac.jp

## B. Circuit and Layout

Basically, an SPL circuit is obtained from $\mathrm{BDDs}[4]$ by replacing each BDD node with two pass-transistors. For example, BDDs of a 1-bit RB adder shown in Fig. 2 are converted to an SPL circuit shown in Fig. 3. Note that a trickle pull-up PMOS is attached to each output of a series of NMOS pass-transistors, because NMOS pass-transistor cannot drive CMOS gates. In the left part of the circuit in Fig. 3, inverters are inserted as intermediate buffers to improve performance.

After circuit design is completed, SPL cells are designed manually. For example, the circuit in Fig. 3 ( 66 transistors) is implemented as two cells ( $26.6 \mu \mathrm{~m} \times 15.4 \mu \mathrm{~m}$ and $37.8 \mu \mathrm{~m} \times 15.4 \mu \mathrm{~m}$ ) on $0.35 \mu \mathrm{~m}$ technology. Finally, cells are placed and routed. Placement is manually designed, while routing is automatically performed by Aquarius XO.

Total number of transistors and area is 12,349 and 1,322 $\mu \mathrm{m} \times 332 \mu \mathrm{~m}$, respectively.

## III. Measurements and Conclusion

The multiplier is fabricated by $0.35 \mu \mathrm{~m}$ triple metal CMOS process. Micrograph of the multiplier TEG is shown in Fig. 4. Power dissipation for pseudo-random pattern measured at $\mathrm{T}=25^{\circ} \mathrm{C}$ and $\mathrm{VDD}_{\mathrm{D}}=3.3 \mathrm{~V}$ is 33.7 mW when input rate is 100 MHz . From schmoo plot in Fig. 5, maximum delay at $V_{D D}=3.3 \mathrm{~V}$ is 7.4 nsec. Delay is measured as clock-to-clock delay of input and output FFs. This means that measured delay includes delay of the input FFs and setup time of the output FFs (approx. 0.5 nsec and 0.2 nsec , respectively, at $\operatorname{VDD}=3.3 \mathrm{~V}$ ).

For comparison, physical design and circuit simulation of a 16-bit multiplier generated by a logic synthesis tool for the same target technology is performed. Its power dissipation and maximum delay are $83.0 \mathrm{~mW} / 100 \mathrm{MHz}$ and 5.4 nsec, respectively. Note that energy-delay product of our multiplier is $44 \%$ smaller.

From these results, we can see that a low-power 16 -bit


Fig. 1. Block Diagram of the Multiplier


Fig. 2. BDDs of 1-bit RB Adder


Fig. 3. Circuit of 1-bit RB Adder

## References

[1] K. Taki and B.-Y. Lee: "Low Power Pass-Transistor Logic and Application Examples", IEICE Trans. Electronics, Information and Communication, Vol.J80-A, No.5, pp.1-12, May 1997, (in Japanese), or Electronics and Communication in Japan, Part 3 (SCRIPTA TECHNICA), Vol.81, No.9, pp.54-66, 1998.
[2] N. Takagi, H. Yasuura, and S. Yajima: "High-speed VLSI multiplication algorithm with a redundant binary addition tree", IEEE Trans. Comput., vol.C-34, no.9, pp.789-796, Sep. 1985.
[3] N. Takagi: "Arithmetic unit based on a high-speed multiplier with a redundant binary addition tree", Proc. SPIE, vol.1566, pp.244-251, July 1991.
[4] R.E. Bryant: "Graph-based algorithms for Boolean function manipulation", IEEE Trans. Comput., vol.C-35, no.8, pp.677691, Aug. 1986.

