# 0.5 V CMOS Logic Delivering 200 Million 8x8 Bit Multiplications/s at less than 100 fJ based on a 50 nm T-Gate SOI Technology

Volker Dudek, Reinhard Grube, Bernd Höfflinger, Michael Schau

Institute for Microelectronics Stuttgart Allmandring 30a D-70569 Stuttgart, Germany +49 (0) 711 685 5778

dudek@ieee.org

## 1. ABSTRACT

High-performance CMOS logic at a very low voltage of 0.5 V can deliver 150 Million 8x8 multiplications/s at an energy level of only 30fJ, if 0.35 µm SOI technology is enhanced with self-aligned 50 nm T-Gate transistors, if a new adder with a differential Manchester chain including special accelerators and if the DIGILOG multiplier, a leading-one-first pseudo-log multiplier with complexity order (n) are optimized simultaneously.

## 1.1 Keywords

Adder, Multiplier, T-Gate, low power, high-performance

## 2. INTRODUCTION

Very high operational throughput at very low voltages is the goal of digital signal processing, particularly video, in portable nomadic applications like cellular and satellitebased video-telephoning and video-conferencing. This paper presents a synergetic view of technology, circuits and functional architecture to maintain speed in spite of drastic reductions in power and energy.

## 3. T-Gate SOI Transistor

In conventional CMOS processing, the limits of the poly gate length and the channel length are correlated and directly dependent on the lithography resolution. Keeping in mind that the integration density and consequently the



Figure 1: Symmetrical self-aligned T-Gate transistor for SOI low power applications.

minimum transistor size are mainly given by the metalization system (metal 1 pitch), there is enough space between the source and drain contacts to decouple the length from source to drain, the gate length and the channel length. The symmetrical T-Gate transistor presented in this paper allows a lithography-independent sub 0.1 µm channel and a longer gate, for optimizing resistance and capacitance. Decoupling the length of the poly gate from the channel length leads to high transconductance, minimum Miller capacitance and high speed or bandwidth. The SOI technology for low power applications uses the symmetrical T-Gate transistor (Fig 1). The T-Gate transistors are produced with the lithography-independent edge-defined MOS technology (EDMOS). The basic ideas lithography-independent nanometer of the silicon technologies are the substitution of the lithography by suitable film deposition techniques [1]. The geometries that are normally defined by lithography are now defined by the thickness of deposited films, thus allowing a smaller accurately reproducible gate length.

To realize the symmetrical T-Gate transistors, an oxide edge is defined. After a channel-implantation we produce a spacer (Fig.2) and etch the oxide anisotropically. After LDD and Source/Drain implantations a TEOS-Oxide is deposited and planarized (Fig. 3).



Figure 2: T-Gate transistor, after spacer etching.

The spacer is removed and the gate oxide is grown. The gate material is deposited and structured. No additional high temperature step is necessary, and alternate dielectrics and new gate materials are possible. The self-aligned T-Gate techniques decouple the length of the poly gate from the channel length [2].



Figure 3: T-Gate transistor before spacer etch.

The SOI T-Gate transistor can be used in the hybrid mode which leads to an improved performance of logic circuits and gate-arrays [3] with respect to speed and power consumption.

#### 4. HIPERLOGIC Adder

Due to its structure, the SOI T-Gate transistor has significantly smaller capacitances than the bulk counterpart. When using the SOI T-Gate transistor in the hybrid mode, smaller supply voltages are possible. Both effects contribute to improved high-performance CMOS logic circuits (HIPERLOGIC) with respect to speed and power consumption.

The potential of the hybrid SOI T-Gate transistor is demonstrated with the HIPERLOGIC 8-Bit adder. In this adder scheme, a Differential Manchester Carry Chain (MCC) is used because of its inherent speed and noise immunity. The basic element of the carry chain is shown in Fig.4. The input signals are fed via transmission gates in



Figure 4: MCC-element with accelerator

order to transfer correctly the logical levels ",0" and ",1" for supply voltages as low as 0.5 volts. In the chain element there are two cross-coupled inverters working as accelerator. The carry chain is clocked. In the first clock phase with the clock signal high (",equilibrium phase"), the

| Tech-  | t <sub>ox</sub> | W    | L    | Cgate | Cjunct | R <sub>S,D</sub> |
|--------|-----------------|------|------|-------|--------|------------------|
| nology | (nm)            | (µm) | (µm) | (fF)  | (fF)   | (Ω)              |
| 0.8µm  | 5               | 2.4  | 0.1  | 1.65  | 0.67   | 265N             |
| SOI    |                 |      |      |       |        | 240P             |
| 0.35µm | 3               | 1    | 0.05 | 0.57  | 0.28   | 630N             |
| SOI    |                 |      |      |       |        | 1750P            |
| 0.1µm  | 3               | 0.4  | 0.1  | 0.46  | 0.53   | 630N             |
| Bulk   |                 |      |      |       |        | 1750P            |
| 0.1µm  | 3               | 0.4  | 0.05 | 0.23  | 0.08   | 630N             |
| SOI    |                 |      |      |       |        | 1750P            |

Table 1: Basic data used for PSPICE simulations

two outputs of the accelerator are set to zero volts. In the second clock phase with the clock signal low ("evaluation phase"), the output signals are established depending on the input, propagate, generate, and kill signals.

The HIPERLOGIC 8-Bit adder was simulated with PSPICE for four different technologies. The basic data used in the simulations are given in table 1. Table 2 shows the simulation results for a supply voltage of 0.5 V. When using 0.8 mm SOI hybrid T-Gate transistors in the 8-Bit

| Technology                  | Simulation conditions  | t <sub>sim</sub><br>(ns) | E <sub>sim</sub><br>(pJ) |
|-----------------------------|------------------------|--------------------------|--------------------------|
| 0.8µm SOI<br>Hybrid T-Gate  | Fanout=<br>4 inverters | 7                        | 0.08                     |
| 0.35µm SOI<br>Hybrid T-Gate | Fanout=<br>4 inverters | 3.8                      | 0.03                     |
| 0.1µm Bulk<br>Standard      | Fanout=<br>4 inverters | 6                        | 0.023                    |
| 0.1µm SOI<br>Hybrid T-Gate  | Fanout=<br>4 inverters | 1.4                      | 0.014                    |

Table 2: Simulation results HIPERLOGIC 8-Bit adder for VDD=0.5V

adder, time and energy consumed for one addition are 7 ns and 80 fJ, respectively. Going down to smaller sizes of the transistors, time and energy are reduced. The last two rows of table 2 compare 8-Bit adders with bulk standard transistors and SOI hybrid T-Gate transistors of the same technology generation showing a superior performance of the SOI hybrid T-Gate adder.

By connecting two 8-Bit adders in parallel, a 16-bit adder can be realized while using the same clock signal for both 8-bit adders. Simulated carry signals and the last sum signal for the 16-bit adder are shown in figure 5. These results were obtained for the 0.35 mm SOI technology with hybrid T-Gate transistors. The behaviour of the accelerators in the carry chain is clearly demonstrated by the carry signals. With the clock signal going down, the outputs of the accelerator rise very fast to about half the supply voltage and remain there until the carry signals of the previous stage arrive. The evaluation time of the 16-bit adder is 3 ns. Taking into account an offset time of about 3 ns (the time for which the input signals of the adder must apply before



Figure 5: Signals of the HIPERLOGIC 16-Bit adder

the evaluation phase starts), a total time for the 16-Bit addition of about 6 ns is obtained.

#### 5. The DIGILOG Multiplier

The principle of the DIGILOG multiplication is based on the addition of logarithms [4]. The logarithm encoding of an n-Bit integer number is done considering a piece-wise linear approxima-tion. A splitter divides between the leading "one" of the number and the remainder. The leading "one" represents the exponent and can be shown as 2j. The upper shifter shown at the left side of figure 6 subtracts the two exponents of the numbers using only a shifter. This is possible because both expo-nents contain only a single "one". The remainder of the first number is added to the second number which is shifted by difference of the two exponents. The result of the addition represents the multipli-cation which must only be shifted by the exponent of the second number. This method results in a full dynamic range. Assume the numbers A and B to be multiplied. The numbers can be divided in a power of two and the remainder:

$$\mathbf{A} \equiv \mathbf{2}^{j} + \mathbf{A}_{R} \qquad \qquad \mathbf{B} \equiv \mathbf{2}^{k} + \mathbf{B}_{R}$$

The multiplication can now be split up:

$$\mathbf{A} \cdot \mathbf{B} = (\mathbf{2}^{j} + \mathbf{A}_{R})(\mathbf{2}^{k} + \mathbf{B}_{R})$$
$$= \mathbf{2}^{j} \cdot \mathbf{B} + \mathbf{2}^{k} \cdot \mathbf{A}_{R} + \mathbf{A}_{R} \cdot \mathbf{B}_{R}$$

The first part can be received by shifting the number B by the exponent of A, the second part by shifting the remainder by the exponent of B. These two parts are added and the third part is ignored in the first step.

|                              | 1 <sup>st</sup> Iteration | 2 <sup>nd</sup> Iteration | 3 <sup>rd</sup> Iteration |
|------------------------------|---------------------------|---------------------------|---------------------------|
| Worst-case error             | -25%                      | -6%                       | -1.6%                     |
| Probability of<br>Error < 1% | 10%                       | 70%                       | 99.8%                     |

 Table 3: Accuracy enhancement by using an iteration scheme

The accuracy can be enhanced by using an iteration scheme which adds and accumulates the results in a register and multiplies the remainders in a following step (table 3). With an 8 by 8 bit multiplier the exact result can be obtained at a maximum of seven iteration steps (worst case). If a minimum of six percent accuracy is needed, it is



Figure 6: Schematic of a pipelined, two-step DIGILOG Multiplier

possible to cascade two DIGILOG multipliers after another and then there is no iteration needed anymore (Fig. 7

The accuracy in percent of a product of two digits cannot be higher than the percent accuracy of the less accurate digit, i.e. the accuracy of the product of two n-Bit digits is n-Bit.

#### 6. Conclusion

Improved high-performance CMOS logic circuits based on an symmetrical self-aligned T-Gate transistor on SOI for low power applications, show the potential of simultaneous optimizing 50 nm MOS transistors, very-low-voltage operation, differential logic and DIGILOG multiplier, whose complexity increases only with order (n) for n-bit words. A benchmark multiplier in a 0.35  $\mu$ m technology achieves 150 MOPS at an energy level of 30 fF.

Further improvements are predicted for 0.1µm CMOS SOI with hybrid T-Gate transistors, per-forming 200 MOPS at a level of 14 fJ. These performance levels are attractive for DSP functions in nomadic, portable video-telephony

#### 7. REFERENCES

- V. Dudek, W. Appel, L. Beer, G. Digele, B. Höfflinger, "Lithography-Independent Nanometer Silicon MOSFET's on Insulator", IEEE Transactions on Electron Devices ED-43(10), 1996, pp.1626-1632.
- [2] S.J. Abou-Samra, V. Dudek, F. Ayache, A. Guyot, B. Courtois, B. Höfflinger, "Designing with 3D SOI CMOS", 8th Int. Symposium Silicon-on-Insulator Technology and Devices, Paris Aug.31-Sep. 7,1997.
- [3] V. Dudek, D.O. Keck, G. Mayer, B. Höfflinger, "Digital Consideration for High-Performance, Low-Power Silicon-on-Insulator Gate Arrays", IEEE 1995 Custom Integrated Circuits Conference, p. 2.3.1 - 2.3.4
- [4] B. Höfflinger, M. Selzer, F. Warkowski, "Digital Logarithmic CMOS Multiplier For Very-High-Speed Signal-Processing", IEEE 1991 Custom Integrated Circuits Conference, p.16.7.1 - 16.7.5