# A Low-Power, Multichannel Gated Oscillator-Based CDR for Short-Haul Applications

Armin Tajalli<sup>1</sup>, Paul Muller<sup>2</sup>, Mojtaba Atarodi<sup>1</sup>, and Yusuf Leblebici<sup>2</sup>

<sup>1</sup>Sharif University of Technology, SUT Sharif Integrated Circuit and Systems Group (SICAS) Tehran, Iran

#### Abstract

A gated current-controlled oscillator (GCCO) based topology is used to implement a low-power multi-channel clock and data recovery (CDR) system in a 0.18um digital CMOS technology. A systematic approach is presented to design a reliable and lowpower system based on the required specifications. Behavioral simulations are also used to estimate the achievable bit error rate (BER), jitter tolerance (JTOL), and frequency offset tolerance (FTOL) of the proposed CDR. Using a single 1.8V supply voltage, the proposed 20Gbps 8-channel CDR consumes only 70.2mW or 3.51mW/Channel/Gbps while occupies 0.045mm<sup>2</sup> silicon area.

## **1. INTRODUCTION**

Increasing demand for higher bandwidth in serial link applications, especially for chip-to-chip interconnections and optical transceivers, makes the design of multi-channel serial data transceivers in a low-cost CMOS technology very desirable. The clock and data recovery (CDR) circuit, as a main building block in each serial data receiver, plays an important rule in feasibility of implementing a low-cost and low-power, multi-channel transceiver.

While phase-locked loop (PLL) based CDRs can operate at very high speeds and can lock exactly on the frequency of the received data. Their high power consumption, tight jitter performance, and large silicon area consumed by loop filter, make them less attractive for multi-channel applications [1]-[3]. On the other hand, dual-loop delay-locked loop (DLL) based or phase-interpolation (PI) based CDRs are widely used in multi-channel receivers [4]. They show good jitter performance and medium power consumption. The other approach is gated oscillator based CDRs which are very attractive mainly due to their fast phase alignment and simple topology lead to few power consumption and silicon area [5][6]. Especially, their high jitter tolerance makes them very suitable for multi-channel short-haul applications.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED'05, August 8-10, 2005, San Diego, California, USA.

Copyright 2005 ACM 1-59593-137-6/05/0008...\$5.00.

<sup>2</sup>Ecole Polytechnique Fèdèrale de Lausanne, EPFL Microelectronic Systems Laboratory (LSM) CH-1015 Lausanne, Switzerland



# Figure 1. Multi-channel GCCO-based CDR structure using shared-PLL

Here, a systematic, structural approach is presented to demonstrate the capabilities of gated oscillator based CDRs for multi-channel, short distance applications. It is shown that the proposed gated CCO approach offers significant advantages in terms of power dissipation and jitter tolerance. The main goal of this work is designing a high performance multi-channel 2.5Gbps/Ch system with power consumption less than 5mW/Channel/Gbps, which is essential in implementation of high-density serial link applications where hundreds of links are placed on the same chip. Behavioral simulations and analyses have been performed to calculate the jitter and frequency tolerance in a GCCO (gated current controlled oscillator) based CDR. Meanwhile, minimum achievable power dissipation has been determined based on required jitter specifications of the ring oscillator.

# 2. SYSTEM LEVEL DESIGN 2.1. Multi-Channel GCCO Topology

As shown in Figure 1, in a multi-channel GCCO, a shared-PLL generates a local high frequency clock ( $f_{out}$ ) from a low-frequency crystal oscillator clock ( $f_{in}$ ) where  $f_{out}$  is exactly equal to the baud rate of the received data.



Figure 2. Simplified topology of a gated oscillator based CDR

To have a better matching between channels and PLL, current controlled oscillators (CCO) are used instead of voltage controlled oscillators (VCOs) in each channel.. A copy of control current ( $I_C$ ) produced by PLL is delivered to all matched oscillators in each channel ( $I_{CTL}$ [1:N]). Providing well matched CCOs, the clock frequencies of all channels ( $Ck_{out}$ [1:N]) are identical to  $f_{out}$ .

At each data edge, an edge detector circuit, based on a delay line and an XOR gate, generates a synchronization signal (EDET) for the GCCO [Figure 2]. At an incoming data edge  $(D_{in})$ , EDET goes low for a duration defined by the delay line and freezes the output clock (Clkout) to high via the first stage of the oscillator. At the rising edge of EDET, oscillator is released and goes back to free oscillation at the frequency determined by its controlling current and in phase with the received data. When the delayed data  $(DD_{in})$  is used for sampling instead of  $D_{in}$ , the delay introduced by the delay line do not influence the precision of the sampling. Meanwhile, parasitic delays coming from the XOR gate or the delay mismatch between both inputs of the NAND gate in the oscillator are compensated by proper dummy gates (not shown in the figure). Also, to ensure correct phase alignment, total delay of signal in delay line path was set to somewhat larger than  $T_0/2$ . This makes sure that the edge detection signal (EDET) will propagate through the CCO and affect the timing of ring oscillator.

#### 2.2. Jitter Tolerance

Jitter tolerance (JTOL) is a measure of capability of a CDR in tolerating the input jitter. JTOL is usually tested by adding a sinusoidal jitter at given frequency range to the data stream, which already includes channel jitter [7]. The maximum jitter amplitude, which is a function of jitter frequency, at which the CDR still operates at a given BER, is called jitter tolerance. In this situation input frequency would be

$$\omega(t) = \omega_0 + \Delta \omega \cdot \cos \omega_i t \tag{1}$$

in which  $\omega$  indicates the instantaneous frequency,  $\omega_j$  is the sinusoidal jitter frequency, and





$$\Delta \omega = \pi \cdot U I_{nn} \cdot \omega_{i} \tag{2}$$

here,  $UI_{pp}$  is the peak to peak jitter amplitude [8]. As a rough estimation to calculate the JTOL,

$$JTOL(s) = 0.5/[1 - JTRAN(s)]$$
(3)

in which JTRAN is the jitter transfer function of the CDR [9]. In a GCCO, the JTRAN could be approximated by

$$JTRAN(s) \approx e^{-T_0 s/2} \tag{4}$$

where  $T_0 = 2\pi/\omega_0$  is the nominal data period. Figure 3(a) shows the calculated JTOL based on (3) and (4) compared to the behavioral simulation results in presence of channel random (RJ) and deterministic jitter (DJ). As can be seen, due to the high bandwidth of a GCCO CDR, it shows a very good JTOL performance. Also, one can calculate the JTOL based on variations in data period when a sinusoidal jitter has been applied to the input as

$$T = 2\pi/(\omega_0 \pm \Delta\omega) \tag{5}$$

Therefore, to have a correct sampling (ignoring other types of jitter) data edge must be within the time interval of:  $T_0/2 < T < 3T_0/2$ , therefore regarding to (5):  $\Delta \omega / \omega_0 < 1/3$ , or:

$$UI_{pp} \approx \omega_0 / (3\pi\omega_i). \tag{6}$$

which is again much higher than the minimum required JTOL specification [Figure 3(b)].

#### **2.3. Frequency Tolerance**

Unlike in conventional PLL based CDRs, a frequency difference can exist between the gated oscillator in the receiver of a channel and the incoming data stream. In practical applications, the data rate is specified within  $\pm$  100ppm accuracy. The frequency tolerance (FTOL), defined as the maximum frequency difference at which the BER remains lower than a specified value (usually 10<sup>-12</sup>). Ideally, when there is no jitter on data or clock, frequency error must be smaller than

$$\left| f_{ck} - f_0 \right| < f_0 / 2n \tag{7}$$

where  $f_0$  is data frequency,  $f_{ck}$  is oscillator frequency, and n indicates the number of consecutive identical digits (CID). Obviously, any jitter on the input data or recovered clock will degrade the FTOL. Figure 4 shows the achievable FTOL in presence of random jitter on received data and recovered clock. As can be seen, an increase in clock or data jitter will lead to a degradation of FTOL.

The main source of jitter on recovered clock is accumulated jitter during free running of gated oscillator that increases with the time interval of free running as [10]

$$\sigma_{ck} = \kappa \sqrt{\Delta T} \tag{8}$$

In which  $\sigma_{ck}$  indicates the *rms* (root mean square) jitter value on clock accumulated in the time interval of  $\Delta T$ , and  $\kappa$  is a proportionality factor which depends on topology and power consumption of delay stages and also process. Here,  $\Delta T$  depends on the number of CIDs. The 8B10B encoding schemes used in short distance communications, reduces the CID to not more than 5 digits. Therefore, regarding to Figure 4 and using (8), to have a FTOL of about 9% to tolerate 5 identical bits, then  $\kappa \leq 9.4 \times 10^{-8}$ . This criterion could be used to determine the bias condition and so the sizing of transistors in each delay cell.

# **3. DESIGN OF GATED-OSCILLATOR CDR 3.1. Phase Noise Requirement**

Frequency stability and timing jitter are the two most important specifications of the oscillator in a GCCO topology. Timing jitter of ring oscillators, or its frequency domain analogy phase noise, have been extensively studied in [10][11]. Equation (9) could be used to have a good estimation about jitter-power consumption trade-off in a gated ring oscillator, where the minimum achievable K is [11]:



Figure 4. Frequency tolerance of a GCCO CDR based on *rms* value of random jitter on clock and input data



Figure 5. Trade-off between phase-noise and power consumption in a ring oscillator [10][11]

$$\kappa_{\min} = \sqrt{\frac{8}{3\eta}} \cdot \sqrt{N \cdot \frac{kT}{P} \cdot (\frac{V_{dd}}{V_{char}} + \frac{V_{dd}}{R_L I_{SS}})}$$
(9)

in which  $\eta$  indicates the relation between rise time and delay in each delay cell, *P* is the oscillator power dissipation, *N* is the number of delay stages in ring oscillator,  $R_L$  is the load resistance,  $I_{SS}$  is the tail current of delay cell,  $V_{dd}$  is supply voltage, and  $V_{char} = V_{dsat}$  (drain-source overdrive voltage) for long channel devices and  $V_{char} = E_C L/\gamma$  for short-channel devices [11]. Shown in Figure 5, this equation can help us to determine the minimum achievable power dissipation regarding to (8) and required FTOL value. In this design, bias current of transistors and so the device sizing has been chosen based on this graph. This figure also compares the estimated  $\kappa$  value derived in [10] and [11].

1

### **3.2. Delay Stages**

To have a good matching, all the delay cells in delay line and the ring oscillator are built with the identical currentmode logic (CML) two-input multiplexer (MUX) gates optimized for this application. The minimum acceptable bias current has been chosen based on (4) and Figure 5. As shown in Figure 6, a replica bias circuit has been used to control the voltage swing over process and supply voltage variations [9].

## 3.3. GCCO-Based CDR

Based on the topology shown in Figure 1, an 8-channel CDR has been implemented in a 0.18um digital CMOS technology. The proposed shared-PLL uses exactly the same oscillator applied in each channel. In addition, a high order loop filter has been used to suppress the ripples on controlling signal and thus have a very little jitter generation. Figure 7 shows the transistor level simulation result while each channel is derived with a random 2.5Gbps input data stream. Table I, summarizes the specifications of the proposed 20Gbps 8-channel CDR. Occupying 0.045mm<sup>2</sup> silicon area, the total power consumption is 70.2mW or 3.51mW/Channel/Gbps which is well suited for modern multi-channel serial link applications.







Figure 7. Eye diagram of the recovered data and clock from a 2<sup>8</sup>-1 PRBS 2.5Gbps data stream after PLL has settled to its steady-state condition

Table 1. Specifications of the Proposed 8-Channel CDR

| Technology                      | 0.18um Digital CMOS         |
|---------------------------------|-----------------------------|
| Supply (V)                      | 1.6 - 2.0                   |
| Per channel bit rate (Gbps/Ch)  | 2.5                         |
| Total bit rate (Gbps)           | 20                          |
| Power consumption (mW @ 1.8V)   |                             |
| PLL                             | 10.8                        |
| GCCO                            | 7.2                         |
| Total                           | 70.2                        |
| PLL settling time (usec)        | 1.3                         |
| Area of GCCO (mm <sup>2</sup> ) | 0.045 (without I/O drivers) |

### 4. CONCLUSION

Design of a low-power gated oscillator based CDR for multi-channel applications, was presented. A structural systematic approach was introduced to measure the capabilities of a GCCO CDR for short-haul application, especially its jitter and frequency tolerance. Meanwhile, the minimum achievable power consumption of the GCCO was determined based on required jitter performance of the CDR. A 0.18um standard digital CMOS technology was used to design an 8-channel CDR with a total power consumption (including overhead circuitry) of only 70.2mW (3.51mW/Channel/Gbps) while achieving a totally high data rate of 20Gbps. It has been shown that the proposed circuit is suitable for modern multi-channel short-haul applications with small silicon area, very low power dissipation, and also high jitter tolerance.

#### **5. REFERENCES**

- H. Takauchi, and *et al*, "A CMOS multichannel 10-Gb/s transceiver," *IEEE J. of Solid-State Circuits*, vol. 38, pp. 2094-2100, December 2003.
- [2] Z. O. Gursoy, and Y. Leblebici, "Design and realization of a 2.4 Gbps - 3.2 Gbps clock and data recovery circuit using deepsubmicron digital CMOS technology," *IEEE Int. SOC Conf.*, pp. 99-102, September 2003.
- H. Partovi, and et al., "A 62.5Gb/s multi-standard SerDes IC," IEEE Custom Integrated Circuits Conf. (CICC), pp. 585-588, 2003.
- [4] K. Tanaka, M. Fukaishi, and et al., "A 100Gb/s transceiver with GND-VDD common-mode receiver and flexible multi-channel aligner," *IEEE Int. Solid-State Circuits Conf.*, San Francisco, February 2002.
- [5] M. Nakamura, N. Ishihara, and Y. Akazawa, "A 156 Mbps CMOS clock recovery circuit for burst-mode transmission," *IEEE Symp. On VLSI Circuits Digest of Technical Papers*, pp. 122-123, 1996.
- [6] S. Kaeriyama, and M. Mizuno, "A 10Gb/s/ch 50mW 120×120 μ m<sup>2</sup> clock and data recovery circuit," *IEEE Int. Solid State Circuits Conf.* (*ISSCC*), February 2003.
- [7] J. Kim, and D. –K. Jeong, "Multi-gigabit-rate clock and data recovery based in blind oversampling," *IEEE Comm. Mag.*, pp. 68-74, December 2003.
- [8] L. M. De Vito, "A versatile clock recovery architecture and monolithic implementation," in *Monolithic Phase-Locked Loops* and Clock Recovery Circuits, Theory and Design, B. Razavi, Ed. New York: IEEE Press, 1996.
- [9] B. Razavi, Design of Integrated Circuits for Optical Communications, McGraw-Hill, 2003.
- [10] J. A. McNeill, "Jitter in ring oscillators," IEEE J. of Solid-State Circuits, vol. 32, pp. 870-879, June 1997.
- [11] A. Hajimiri, S. Limotyrakis, and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE J. of Solid-State Circuits*, vol. 34, pp. 790-804, June 1999.