# A Low-Power Digital Matched Filter for Spread-Spectrum Systems

Shoji Goto, Takashi Yamada, Norihisa Takayama, Yoshifumi Matsushita, Yasoo Harada<sup>1</sup> SANYO Electric Co., Ltd. 180, Ohmori, Anpachi-Cho Anpachi-Gun, Gifu 503-0195, Japan +81-584-64-5218

{gotoh, yamada}@ul.rd.sanyo.co.jp

Hiroto Yasuura

Kyushu University 6-1 Kasuga-Koen, Kasuga, Fukuoka 816, Japan +81-92-583-7620

yasuura@c.csce.kyushu-u.ac.jp

# ABSTRACT

A Digital Matched Filter (DMF) is an essential device for Direct-Sequence Spread-Spectrum (DS-SS) communication systems. Reducing the power consumption of a DMF is especially critical for battery-powered terminals. The reception registers and the correlation-calculating unit dissipate the majority of the power in a DMF. In this paper we discuss this problem and propose a lowpower architectural approach to a DMF. The total switching activity factor and the switched capacitance are reduced. As a result of power analysis at the gate level, the implementation of the proposed architecture in a standard 0.18-µm CMOS technology achieved a reduction in the power consumption of more than 70 %.

# **Categories and Subject Descriptors**

B.5.1 [Register-Transfer-Level Implementation]: Design arithmetic and logic units, control design, styles.

## **General Terms**

Algorithms, Management, Design, Experimentation.

# **Keywords**

matched filter, spread-spectrum, CDMA, VLSI, low power.

### 1. Introduction

Spreading and de-spreading are specific operations of DS-SS communication systems. A Matched Filter (MF) calculates the correlation values between the received signal (symbol) and the known PN sequence by which the received symbol has been spread in the transmitter. It is therefore used to acquire the despreading timing of the received symbol. Figure 1 shows a block diagram of the front-end receiver, which comprises of a quadrature mixer with a local oscillator, Low Pass Filters (LPF), A/D converters, MFs and a path searcher for RAKE combiner. An MF is located at the interface between the A/D converters and the <sup>1</sup> Presently with the Semiconductor Technology Academic Research Center (STARC), Yokohama, 222-0033 Japan.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED'02, August 12-14, 2002, Monterey, California, USA. Copyright 2002 ACM 1-58113-475-4/02/0008...\$5.00.

digital base-band processor. This can be implemented by using either analog or digital technology. Taking into account the desired application to battery-powered commercial mobile phones, the issue of low power consumption is of the foremost concern when choosing between analog and digital MFs. So far, several different approaches have been proposed for MF implementation, such as the SAW MF [1], the digital MF (DMF) [2]-[5], the CCD MF [6] and the Analog MF (AMF) [7]. Since differences in filter length, technology scaling and operating frequency also have different effects on the power consumption of both analog and digital circuits, the optimum approach needs to take into account all of these factors. An AMF is more power-efficient for shorter, faster MFs and a DMF is more power-efficient when the filters are longer or slower [8].



Figure 1. Block diagram of the front-end receiver

Recent advances in CMOS technology are making it possible to design a DMF for practical use. For application to code acquisition or tracking in a DS-SS system such as W-CDMA or post W-CDMA, in which relatively long reference codes are (likely to be) used, a DMF is preferred not only from the viewpoint of power efficiency, but also for design flexibility. A DMF for W-CDMA might consume a good half of the total power in the base-band processing if no appropriate low-power measures were adopted. It is desirable that the power consumption in the DMF is kept below 10 mW in order to maintain a practical active standby time. In [4] and [5], the main focus is on low-power techniques, including the architecture, circuitry and design layout to implement a low-power DMF consuming less than 10 mW.

In this paper, we consider how the switching probability in a DMF can be reduced by focusing on the RTL architecture. The following are the key points of the proposal to work on this issue:

- asynchronous latch clock generation
- parallelism of correlation calculation operation
- simplified chip correlation operation

This paper is organized into the following sections. In Sect.2, the basic functions and structures of the conventional approaches to a DMF are explained. The low-power DMF is proposed and discussed in Sect. 3, and analyzed in Sect. 4. Lastly, the conclusion is presented in Sect. 5.

# 2. Basic Function and Structure of a DMF

A DMF calculates the following :

$$\sum_{k=0}^{L-1} (C_k \cdot R_k) \tag{1}$$

where L,  $C_k$  and  $R_k$  denote the length of the reference code sequence, the *k*-th chip and the tapped received sample, respectively. Figure 2 shows the timing relationship between  $C_k$  and  $R_k$ , in which they are synchronized with respect to time, and a DMF outputs the maximum correlation value.  $R_k$  has usually been sampled 2-8 times per chip (over-sampling) and quantized into multiple bits.



Figure 2. Easy example of  $C_k$  and  $R_k$  waveforms

A DMF comprises of reception registers for values of  $R_k$ , reference code registers for values of  $C_k$ , and a Correlation-Calculating Unit (CCU) including coefficient multipliers and summation adders. The most basic DMF is shown in Figure 3 (a), which we call "a received sample-shifting DMF". A received sample  $R_k$  is latched in the leftmost register of the reception registers and shifted along to the next tapped delay line. On the other side, the values of  $C_k$  are stored in reference code registers as filter tap coefficients. All of the samples that correlate with the corresponding tap coefficients are summated and are then output. This structure is the same as that of an FIR filter that has the same number of asymmetric taps as L. Note that the multiplier referred to here simply controls the polarity of  $R_k$ , because the tap coefficient is 1-bit and is expressed as  $\pm 1$ . A large amount of power is consumed in the reception registers, because every register is activated to transmit the multiple-bit  $R_k$  to the next register by a shifting operation. The power consumption becomes higher still when L is longer.

A reference code-shifting DMF, as shown in Figure 3 (b), has been proposed in order to reduce the power consumption in the reception registers [7]. In this form of DMF, the shift register is not used for the reception registers but is used for the reference code registers. There is no need to take into account over-sampling, so the shift register can be smaller in both bit width and length. The dominant power consumed in the reception register bank is drastically reduced, but it is necessary to add a selector (MUX) for each tap (boxed as a tapping block) in order to recursively select  $R_k$  in the over-sampling case (not distinguished in Figure 3). It also needs a control unit for the reception registers, which is implemented by a control counter and MUXs. These overheads wipe out some of the power reduction obtained in the reception registers. When an asynchronous design is considered, it can be implemented using a clock gating technique, but there is still an overhead in the clock gating circuitry.

When considering a low-power DMF, the reference code-shifting approach should be adopted. The DMF discussed in this paper is based on such a reference code-shifting DMF, as shown in Figure 3 (b).



# **3.** Low-Power Consumption DMF

# 3.1 Low-power design

The power consumed in a CMOS circuit is estimated from a summation of the dynamic power and the static power. Static power is dissipated by the leakage current in inactive gates, while a CMOS circuit consumes dynamic power when the circuit is activated. When this occurs, a dynamic charge/discharge of the circuit parasitic capacitance occurs, and a short circuit current (direct current from VDD to GND) flows, accompanied by power dissipation when the input signals transit. This dynamic power is the dominant component of the total power consumption in existing CMOS technologies, and therefore it is important that we work on ways of minimizing it.

The dynamic power is given by the product of the switching probability, the clock frequency, the load capacitance and the square of the supply voltage. In this paper we are aiming at minimizing the switching probability in order to lower the dynamic power.

Besides the reception registers, the CCU consumes the next largest amount of power. Low-power design concepts for each block are enumerated as follows:

- Reception registers: a) use low-frequency asynchronous clock signals as an alternative to the gated-clock signals, b) reduce the switching activity of all the D-inputs of the flip-flops (FFs) in the reception registers,
- CCU: a) use parallel CCUs for simplicity of tapping neighboring samples that are (assumed to be) identical chips, b) halt the latter half of the correlation calculations in a "threshold judgment", c) simplify the positive/negative sign-inversion operation of a 2's complement signal with the easier bit-inversion operation.

Figure 4 shows the configuration of the proposed DMF, based on a reference code-shifting DMF, where  $N, f_c$ , and n denote the oversampling number, the chip frequency and the bit width of the received sample, respectively. The maximum operating frequency of each block is shown by the marking on the hatched boxes. A

detailed explanation for each block (reception block, correlationcalculating block) will be given in the following subsections.



Figure 4. Configurations of the proposed DMF

#### **3.2** The reception block

The reception block comprises of a control unit and the reception registers. The control unit generates the latch clock LCK\_*i* and the masking signal MSK\_*j* for the *i*-th and *j*-th reception registers ( $i = j \mod (N \cdot L/2), j = 1, 2, ..., N \cdot L$ ), respectively.

Figure 5 shows a block diagram of the control unit. The system clock only operates the clock pulse counter. Each bit of the counter and its delayed signal (except for the MSB and the LSB) are utilized as the clocks of the lowest possible frequency  $(Nf_c/4 \sim 2 \cdot f_c /L)$  in the delay line, which has the structure of a shift register (1-bit,  $N \cdot L/2-1$  steps). The LSB is utilized as a delaying clock in the delay unit. The MSB is output to the reception register as LSK\_0 with a frequency of  $f_c /L$ . This is also connected to the delay line, and is delayed in succession by  $1/Nf_c$ . Other latch clocks LCK\_i are obtained by tapping in the delay line.

Figure 6 shows an input/output timing diagram for the control unit. The switching frequency of each signal is attached in the parentheses. LCK\_1 is generated by latching LCK\_0 at the negative edge of the clock, which is the delayed signal of the ( $log_2$  ( $N \cdot L$ )-1)-th bit of the counter. The other LCK\_i are obtained by delaying LCK\_'i-1' in the same manner. The received sample is denoted by  $D_t$ .  $D_t$  ( $t \mod N \cdot L < N \cdot L/2$ ) are stored in the reception registers at the positive edge of the clock, LCK\_i (i = t), and  $D_t$  ( $N \cdot L/2 \le t \mod N \cdot L \le N \cdot L$ -1) is stored at the negative edge of LCK\_i ( $i = t \mod N \cdot L = N \cdot L/2$ ). The number of LCK\_i, that is, the number of delay-line steps, is halved by introducing negative edge driven FFs for the latter half of the reception registers.

Figure 7 shows a slice of the delay line. a) is the proposed approach, and b) is a conventional clock-gating approach. As explained above, the operating frequency in the hatched part is lower (by a factor of  $4 \sim N \cdot L/2$ ) in the proposed approach. The clock frequency of a FF to latch and delay LCK\_*i* is  $Nf_c/4$  (*i*: 1,2,5,6, ..., 4k+1,4k+2),  $Nf_c/8$  (*i*: 3,4,11,12, ..., 8k+3,8k+4),  $Nf_c/16$  (*i*: 7,8,23,24, ..., 16k+7, 16k+8), ..., and  $2f_c/L$  (*i*: 0,  $N \cdot L/4-1$ ,  $N \cdot L/4$ ), respectively, where *k* is a non-negative integer.

The average clock frequency is given by

$$N \cdot f_c / 4 \cdot 1/2 + N \cdot f_c / 8 \cdot 1/4 + N \cdot f_c / 16 \cdot 1/8 + \dots + N \cdot f_c / (N \cdot L/2) \cdot 1/(N \cdot L/4)$$
  
$$= N \cdot f_c / 8 \cdot 1 + N \cdot f_c / 8 \cdot 1/4 + N \cdot f_c / 8 \cdot 1/16 + \dots (2) + N \cdot f_c / 8 \cdot (1/4)^{(log_2N \cdot L-3)}$$
  
$$= N \cdot f_c / 8 \cdot \sum_{n=0}^{log_2N \cdot L-3} \sum_{n=0}^{log_2N \cdot L-3} N \cdot f_c / 6$$



Figure 5. Block diagram of the control unit



Figure 6. Timing diagram of the control unit



Figure 7. A slice of the delay line

On the other hand, the operating frequency of the clock-gating circuit is  $N_{f_c}$ . Furthermore, the hatched part in Figure 7 b) is necessary for  $N \cdot L$  reception registers, while it is needed for  $N \cdot L/2$  reception registers in Figure 7 a), in which the number of latch clocks is halved. The reduction of the switching activity in a latch clock generator is therefore estimated to be around one twelfth (=  $1/6 \cdot 1/2$ ). The power consumption is also reduced by a factor of 12,

assuming that the power consumption in the clock pulse counter and in the delay unit are negligible.

Besides the clock control, data masking is proposed. The receivedsignal bus is connected to N·L reception registers, increasing the input load of the FFs. The expected switching frequency for the bus is as high as  $N:f_c/2$ . If the D-inputs of the FFs are suitably deactivated by a masking operation, for example with AND gates whose input load is smaller, the power dissipation may be reduced. LCK i are utilized for generating the masking signals. The masking signals are denoted by MSK\_ j ( $j = 1, 2, ..., N \cdot L$ ) in Figure 5. The data inputs of the FFs are masked by MSK *j* according to the allocated register number. Since the reception registers include  $N \cdot L \cdot n$  FFs, the total power is reduced when such a masking operation is applied to each FF. The power consumption is reduced by  $N \cdot L \cdot n \cdot \alpha/2$ , where  $\alpha$  is the potential maximum-power reduction per FF. As for the technology library used in this work, 0.6 is obtained for  $\alpha$ . In the case where such a masking operation is included in an FF cell in the technology library, the outer masking operation may not be necessary.

### **3.3** The correlation-calculating block

#### A. Parallelism

Parallelism can be applied in order to reduce the power dissipation in the correlation-calculating block. In the proposed DMF, multiple CCUs are deployed. The number of CCUs is equal to the over-sampling number N. As shown in Figure 4, only one MUX is needed to recursively select the correct output from N CCUs for functional equivalence. It is possible by using this approach to reduce the number of the MUXs, and to remove the glitching activity in the CCUs, which is caused by the arrival-time nonuniformity of the input signal bits from the tapping block as well. The operating frequency of the elements of the CCU is lowered by nearly a factor of N.

#### **B.** Halting by the threshold judgment

The correlation value to be calculated in each CCU can be expressed as:  $|I_{-1}|$ 

$$\sum_{k=0}^{L-1} (C_k \cdot R_k) = \sum_{k=0}^{\lfloor \frac{-2}{2} \rfloor} (C_k \cdot R_k) + \sum_{\lfloor \frac{L+1}{2} \rfloor}^{L-1} (C_k \cdot R_k)$$
(3)

 $R_k$  changes at a frequency of  $f_c/L$ , but the timing differs by  $1/Nf_c$  between CCUs for each tap.  $C_k$  changes at the frequency of  $f_c$ , and therefore one of the CCUs is allowed just one system clock cycle to complete the operation if a common reference code sequence is

used. Another reference code sequence, which may be a delayed one, is necessary for that CCU.

The intermediate register and another reference code sequence result in the overhead gate count. It can also be readily understood that the lower the threshold, the less impact it has on power efficiency, because the second  $\sum$  block can rarely be halted as the threshold value becomes lower. On the other hand, the higher the threshold, the worse the DMF functional efficiency becomes, because the second  $\sum$  block is deactivated so often that it loses the main peak under the inferior propagation conditions. It is therefore very important to set the threshold at the optimum value, taking into account the influences of the actual propagation conditions, such as interference or fading of the radio wave. This approach is application specific.

#### **C.** Simplifying the chip-correlation

Next, we consider the compensation for the gate-count increase, which is inherent in the parallelism approach, and the enhancement of the power efficiency. The received samples stored in the reception registers are *n*-bit 2's complement. In an adder tree, the number of full adders can be reduced in offset binary processing rather than 2's complement processing, since there is no concept of a positive/negative sign bit in an offset binary signal and no control measures are needed for bit overflow in the addition result. We propose the following bit manipulation for a transformation from an (*n*+1)-bit 2's complement  $r_i$  to an offset binary  $d_i$ , which also includes an operation for chip-correlation of  $r_i$  with  $C_i$ :

$$d_{i}[n] = r_{i}[n] (C_{i} = 1), \quad \sim r_{i}[n] (C_{i} = 0)$$
(4)  
$$d_{i}[n-1:1] = \sim r_{i}[n-1:1] (C_{i} = 1), r_{i}[n-1:1] (C_{i} = 0)$$
(5)  
$$d_{i}[0] = r_{i}[0]$$
(6)

where symbols '~' and 's[n]' denote a bit inversion and the (n+1)-th bit of the signal bus s, respectively. This method of making the transformation even leads to a loss of equivalence for  $r_i$ , in which  $d_i$  is smaller by 2 than the conventional sign inversion,  $-r_i$ . If  $r_i$  is forcibly assumed to be odd, the error in  $d_i$  for even  $r_i$  is balanced over  $C_i$ . Then the total correlation value  $X_{n+1}$  is given by

$$X_{n+1} = X'_{n+1} + \sum_{i=1}^{n} \overline{(r_i[0])} \cdot C_i)$$
(7)

where  $X_{n+1}$  is the accurate correlation value between the (n+1)-bit  $r_i$  and  $C_i$ . The second term represents the total error. Note that the LSB of  $r_i$  is 1, and the bit width of the reception register is n in order to store  $r_i[n:1]$ .

Here the accurate correlation value  $X'_{n+1}$  is

$$X'_{n+1} = \sum_{i=1}^{L} (r_i \cdot C_i) = \sum_{i=1}^{L} \{ (2 \cdot r_i[n:1] + r_i[0]) \cdot C_i \}$$
  
=  $2 \cdot \sum_{i=1}^{L} (r_i[n:1] \cdot C_i) + \sum_{i=1}^{L} (r_i[0] \cdot C_i) = 2 \cdot X_n + \sum_{i=1}^{L} (r_i[0] \cdot C_i)$ (8)

The first term of equation (8) expresses the doubling of  $X_n$ , which is the correlation value between the *n*-bit signal  $r_i[n:1]$  and  $C_i$ . Therefore, the correlation value  $X_{n+1}$  can be rewritten according to (7) and (8) as

$$X_{n+1} = 2 \cdot X_n + \sum_{i=1}^{L} (r_i[0] \cdot C_i) + \sum_{i=1}^{L} \overline{(r_i[0]} \cdot C_i) = 2 \cdot X_n + \sum_{i=1}^{L} C_i \quad (9)$$

In DS-SS systems, the second term of equation (9) approximates 0 or a very small constant. Therefore the accuracy of  $X_{n+1}$  calculated by the proposed correlating method, is compatible with  $X_n$ .

As a result, a processing series, which includes transforming the 2's complement to the offset binary, correlating with the reference code sequence and the tree additions, can be simplified in the circuitry. In the technology library used in this work, the power reduction and the gate count reduction for correlation operation per tap are 63% and 49%, respectively.

# 4. Evaluation

Table 1 describes the specifications of the evaluated DMF. A DMF was designed incorporating the proposed technique. The reference code consists of 32 chips with a rate of 3.9 Mcps. The received signal is assumed to be a 2's complement signal of 6-bit resolution arriving with a frequency of 15.6 MHz. The oversampling number *N* is 4, and 32 samples out of 128-step reception registers are tapped. The threshold value  $\alpha$  is chosen to be -8dB below the average value for the main peak from the results of preliminary simulations.

| Table | 1. | Specifications | of | the | DMF |
|-------|----|----------------|----|-----|-----|
|-------|----|----------------|----|-----|-----|

| Code length    | 32 chips       |  |  |  |  |
|----------------|----------------|--|--|--|--|
| Chip rate      | 3.9 Mcps       |  |  |  |  |
| Bit width      | 6 bits         |  |  |  |  |
| Over-sampling  | 4 samples/chip |  |  |  |  |
| Number of taps | 32 taps        |  |  |  |  |
| Filter steps   | 128 steps      |  |  |  |  |
| System clock   | 15.6 MHz       |  |  |  |  |

The dynamic power consumption of a CMOS circuit is estimated by the following well-known equation:

$$P = p_a \cdot f_a \cdot C_l \cdot V_{dd}^{\ 2} \tag{10}$$

where  $p_s$ ,  $f_c$ ,  $C_l$  and  $V_{dd}$  represent the switching probability, the clock frequency, the load capacitance, and the supply voltage, respectively. In this paper, P at the gate level is brought down into the switching power and the internal power, and calculated by the following equation based on (10),

$$P = \frac{V_{dd}^{2}}{2} \sum_{\forall net(i)} \left( C_{l_{i}} \cdot TR_{i} \right) + \sum_{\forall cell(j)} \left( E_{j} \cdot TR_{j} \right) \quad (11)$$

where  $C_{l_i}$  and  $TR_i$  are the total load capacitance and the switching probability for node *i*,  $E_j$  and  $TR_j$  are the internal energy and the switching probability of the output node for logic cell *j*, respectively. The first term of (11) denotes the switching power and the second denotes the internal power.  $C_{l_i}$  and  $E_j$  are characterized in the technology library.  $TR_{j(j)}$  are obtained from the gate-level logic simulation.

Figure 8 shows the analytical results for a power consumption (A) and a gate count (B) of the proposed DMF. In order to make the comparison clear, these are shown broken down into the units of the low-power approach (DMF (a) - (f)). A 0.18- $\mu$ m standard CMOS cell array technology library with a supply voltage of 1.6V is used in the simulation. The definitions of DMF (a)- (f) are as follows

- (a) a received sample-shifting DMF (shown in Figure 3 (a))
- (b) a reference code-shifting DMF (shown in Figure 3 (c)) using the clock-gating approach

- (c) a reference code-shifting DMF with the proposed latch-clock generator (shown in Figure 4)
- (d) a reference code-shifting DMF with the proposed datamasking approach, in addition to (c)
- (e) a reference code-shifting DMF with parallel CCUs, in addition to (d)
- (f) a reference code-shifting DMF with a threshold judgment and a simplified correlation operation in the CCU, in addition to (e)

The power consumption in the reception registers is reduced by the clock-gating approach (b), in which the clock-gating circuits (latches and AND gates) are included. In the case of (c), there is a bigger impact on the power efficiency, and the area cost is also reduced. The data-masking approach (d) is useful for power reduction in the reception registers themselves, but a small improvement is achieved in the total power consumption at the expense of a little circuit area. The trade-offs are the latch-clock buffers for the masking-signal generation in the power consumption and the masking signal generator in the circuit area. The dominant power is therefore consumed in the CCU. Parallelism of the CCUs results in MUX removal, and further power reduction can be achieved in the case of (e). However, since no management of the supply voltage is considered, the power consumption in the correlation-calculating block is almost the same, that is, the total capacitance increase negates the frequency reduction of a piece of the CCU. Lastly, the implementation of the architecture for simplifying the correlation operation in the CCU improves the power efficiency even further, as seen in (f).

The total power consumption for the proposed DMF (f) is estimated to be 0.9 mW, which is less than 25% of the value of a conventional received sample-shifting DMF (a), and is also less than 30% of the reference code-shifting DMF with the conventional clock-gating technique. On the other hand, the gate count is nearly double the value of the DMF (a), mainly due to parallelism of the CCUs.

Table 2 shows several previously reported DMF properties for comparison with the proposed DMF. The DMFs in the table have the same functionality, but the parameters are different, such as the CMOS technologies, the number of filter taps, the operating frequency and so on. The power consumption and the gate count of the DMF are functions of the operating frequency, input quantization levels, number of taps (filter steps) and the technology scaling. Therefore, the estimated values for the proposed DMF cannot easily be compared with other reports due to the lack uniformity of the specifications. Judging from the well-known equation (10), the estimated power consumption in the proposed DMF is thought to be advantageous. Furthermore, there is room left for improvement by means of circuit optimisation for low power in the physical design stage.

## 5. Conclusion

A low-power architectural approach to a DMF has been proposed by focusing on the reception registers and the correlationcalculating unit. The key measures are 1) asynchronous latch clock generation for the reception registers, 2) halting half of the correlation summation adders by a threshold judgment and 3) simplifying the sign inversion operation for chip correlating. The power consumption of the proposed DMF is estimated by computer simulations to be as low as 0.9 mW. This is less than 25% of the value estimated for a conventional received sampleshifting DMF, when using a 0.18-µm standard CMOS cell array technology with a supply voltage of 1.6 V and a system clock frequency of 15.6 MHz. The total number of gates is increased due to the deployment of multiple correlation-calculating units. Further power reduction is expected by optimizing the physical design, such as placement and routing, clock-network optimization, transistor sizing, and so on. The application of a DMF to correlation operations, such as neighboring cell search, acquisition tracking as well as initial code acquisition is also capable of enhancing the impact of the power efficiency of the proposed DMF in the overall low-power design for mobile terminals.

#### 6. Acknowledgments

The authors would like to thank T. Tanaka and K. Shukuguchi for their cooperation and supports.

# REFERENCES

- Y. Takeuchi, H. Taguma, M. Nara and A. Tago, "SS demodulator using SAW device for wireless LAN applications," IEEE Technical Report, vol. CS94-50, pp. 7-12, 1994.
- [2] D. Garrett and M. Stan, "Power reduction techniques for a spread spectrum-based correlator," in Proc. ISLPED'97, pp. 85-88.

- [3] She-Hwa Yen and Chorng-Kuang Wang, "A 2V CMOS programmable pipelined digital differential matched filter for DS-CDMA system," in Proc. the 1st IEEE Asia-Pacific Conf. on ASIC, August 1999.
- [4] K. Kitamura, K. Taki, T. Ogata and Y. Murata, "Low power consumption CMOS digital matched filter," IPSJ J., vol. 42, No. 4, pp. 1016-1022, Apr. 2001.
- [5] M. L. Liou and T. D. Chiueh, "A low-power digital matched filter for direct-sequence spread-spectrum signal acquisition," IEEE J. Solid-State Circuits, vol.36, No. 6, pp. 933-943, June 2001.
- [6] E. Nishimori, C. Kimura, A. Nakagawa, and K. Tsubouchi, "CCD Matched Filter in Spread Spectrum Communication," in the 9th IEEE International Symp. on Personal, Indoor and Mobile Radio Communications, Sep. 1998.
- [7] T. Shibano, K. Iizuka, M. Miyamoto, M. Osaka, R. Miyama and A. Kito, "Matched filter for DS-CDMA of up to 50Mchip/s based on sampled analog signal processing," in ISSCC Dig. Tech. Papers, pp. 100-101, Feb. 1997.
- [8] M. D. Hahm, E. G. Friedman, and E. L. Titlebaum, "A comparison of analog and digital circuit implementations of low-power matched filters for use in portable wireless communication terminals," IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. CAS II-44, No. 6, pp. 498-506, June 1997.



Figure 8. Comparison of power consumption (A) and gate count (B)

|                            | tech-<br>nology | supply<br>voltage (V) | code length<br>(filter steps) | frequency<br>(MHz) | input<br>(bit) | power<br>consumption | gate count (kgates)           | N.B.                              |
|----------------------------|-----------------|-----------------------|-------------------------------|--------------------|----------------|----------------------|-------------------------------|-----------------------------------|
| Proposed DMF               | 0.18 µm         | 1.6                   | 32<br>(128)                   | 15.6               | 6              | 0.9 mW               | 12.6                          |                                   |
| National Taiwan Univ. [5]  | 0.80 µm         | 5.0                   | 32<br>(128)                   | 50                 | 4              | 184 mW               | NA<br>(9.38 mm <sup>2</sup> ) | I/Q dual ch.                      |
| Kobe University [4]        | 0.18 µm         | 1.8                   | 128<br>(256)                  | 40                 | 8              | 15.26 mW             | 54.8                          | carried out clock<br>optimization |
| National Taiwan Univ. [3]  | 0.60 µm         | 2.0                   | 16<br>(32)                    | 2.5                | 4              | 1.6 mW               | NA<br>(2.25 mm <sup>2</sup> ) |                                   |
| University of Virginia [2] | 2.00 µm         | 5.0                   | 256<br>(256)                  | 25                 | 8              | 753 mW               | NA                            |                                   |

Table 2. Comparison of DMF properties