# A 16-Bit, Low-Power Microsystem with Monolithic MEMS-LC Clocking

Robert M. Senger, Eric D. Marsman, Michael S. McCorquodale

Dept. of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109 USA {rsenger, emarsman, mmccorq}@umich.edu

Abstract—Single-chip systems save the power dissipation that would be required for chip-to-chip communication, resulting in compact, low-power solutions for battery-powered applications. This paper describes the design and measured performance of a fully-functional digital core with a low-jitter, on-chip, MEMS-*LC* clock reference. This chip has been fabricated in TSMC's 0.18 $\mu$ m MM/RF bulk CMOS process. Maximum power consumption of the complete microsystem is 48.78mW operating at 90MHz on a 1.8V power supply.

## I. INTRODUCTION

To satisfy the broad range of workload requirements for microsystems and Systems-on-a-Chip (SoCs), an adaptable microcontroller unit (MCU) must be designed with a wide spectrum of communication capabilities and operating specifications. The size, processing, and power requirements for the embedded MCU in PDAs, cell phones, remote environmental sensors, bio-medical devices, etc. vary significantly with the application. By building an MCU that can satisfy these design requirements and by leveraging an intellectual property (IP) based design methodology [1], manpower and design time can be greatly reduced without sacrificing significant power or performance.

## II. MICROSYSTEM ARCHITECTURE

Fig. 1 shows the microsystem architecture consisting of the digital core and the CMOS-MEMS LC tank oscillator used as an on-chip clock reference. The digital core includes a 3-stage pipeline, 16-bit data path, a 24-bit unified instruction and data address space, 64KB of on-chip SRAM, and an external memory port supporting up to 64KB. The load-store instruction set architecture (ISA) was custom designed with 77 instructions supporting eight addressing modes and single- and multi-word arithmetic, shift, logical, and control-flow operations [2]. A 3-stage pipeline was chosen to provide adequate performance for remote sensing and bio-medical applications, yet still remain low-power with minimal pipeline hardware overhead. The pipeline utilizes sixteen 16-bit general purpose registers and four 24-bit address registers, divided evenly over two windows. The windowing scheme reduces the size of the register encoding field to enable 16-bit instructions while providing additional registers for temporary storage. [3] gives a detailed analysis of the compiler's efficient utilization of the register windows to achieve up to 19% reduction in power consumption and 30% improvement in performance when compared to a non-windowed architecture. Address register manipulation is enabled through direct memory mapped access or by using address update instructions.

The memory architecture is a banked style with the 64KB of SRAM split into four single-ported 16KB banks. This allows for instruction and data accesses to occur si-

Richard B. Brown

Dept. of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84412 USA brown@coe.utah.edu



Fig. 1. Microsystem architecture.

multaneously without stalling the machine pipeline as long as they address different banks. To save power, unused banks are deactivated on a cycle-by-cycle basis. Compared to a single 64KB bank, this configuration dissipates 69.2% less energy per access with only a 16.2% area penalty [4]. Additional power savings are enabled by a low-power, 512-byte loop cache. Unlike traditional hardware controlled caches, the loop cache is a tagless bank of low-power memory intelligently managed by the compiler. The cache is filled with commonly executed instructions or accessed data, typically found in program loops.

To generate a stable on-chip clock reference, a complementary. cross-coupled. negative-transconductance MEMS-LC oscillator was implemented. A detailed description of this low-jitter, 1.1GHz CMOS compatible reference oscillator is given in [5]. With the proposed LC oscillator, neither a PLL/DLL nor off-chip crystal is required, thus reducing system size, cost, and power. Moreover, the clock is significantly more stable and obtains better temperature stability than alternative on-chip clock generation techniques such as ring, relaxation, or phase-shift oscillators [5]. To improve the quality factor, the inductor uses thick top metal that is released from the surrounding oxide when the bond pad openings are etched. The capacitors are metal-insulator-metal. Compensation for frequency deviation due to process or temperature variation can be achieved by modulating the current in the LC tank [5]. A buffer amplifier is required to isolate the free-running oscillator from the chained flip-flop frequency divider.

# **III. TEST RESULTS**

Fig. 2 is a die-micrograph of the microsystem fabricated in TSMC's 0.18µm MM/RF bulk CMOS process. The 128 pin die measures 3.54mm per side and contains 3.5 million transistors. The design methodology presented in [6] merges digital, analog, and MEMS domains into a top-down ASIC design flow that was employed to build this chip. The microsystem was verified on an HP82000 digital



Fig. 2. Die micrograph of the complete microsystem.

tester using test vectors generated from assembly programs that were used to check the original Verilog model. These programs consisted of hand-written focused test cases, randomly generated test cases, and compiled application code.

The MCU is fully-functional up to a maximum operating frequency of 92.5MHz at 1.8V and consumes a maximum of 33.9mW. At 10MHz and 1.15V, power consumption drops to 1.41mW. When put into a 2kHz low-power idle mode, the core consumes only 740 $\mu$ W from a 1.15V supply [7]. Digital output pins are available to control an off-chip voltage regulator that can modulate the power supply voltage. Fig. 3 shows the measured MCU digital core power as a function of voltage for 10, 50, and 90MHz operation.

A single access to the loop cache consumes 45% of the energy that an access to the SRAM consumes [7]. The custom-built compiler implemented a novel dynamic loop cache filling algorithm that improved power efficiency over more traditional static filling. The dynamic algorithm was simulated and compared against static filling for different size loop caches and was also compared against traditional instruction caches. Across a subset of the embedded benchmarks from MiBench and MediaBench, the dynamic filling algorithm obtained an average energy savings of 43% and



Fig. 3. Measured power as  $V_{dd}$  is scaled across frequency ranges.



outperformed all other cache configurations [8].

The fabricated *LC* reference oscillates at 1.056GHz with a  $\pm 2\%$  precision before trimming. The oscillator achieves a worst case 48/52 duty-cycle. A 1.1% frequency variation was observed over a temperature range of -40 to 100°C. The reference oscillator occupies only 0.3mm<sup>2</sup> of Si area and consumes 17.28mW from a 1.8V supply. The measured RMS period jitter is 610ppm [5]. Fig. 4 shows oscilloscope traces of the microsystem dynamically selecting different frequency divider outputs without halting the pipeline.

## **IV. CONCLUSION**

This work reports the single-chip integration of a flexible, low-power microsystem that has a custom ISA and C-compiler with a CMOS compatible MEMS-*LC* reference oscillator. Maximum active power consumption of the MCU is 33.9mW at 92.5MHz and 1.8V with an idle-mode drawing only 740 $\mu$ W at 1.15V. The on-chip, 0.3mm<sup>2</sup> MEMS-*LC* reference supplies a highly accurate, low-jitter clock source while consuming 17.28mW at 1.8V.

### **ACKNOWLEDGEMENTS**

Fabrication of this work at TSMC was supported by the MOSIS Educational Program. The authors wish to thank Artisan for digital cell libraries and memory generators. Work was supported by the Engineering Research Centers program of the NSF under award number EEC-9986866.

### REFERENCES

- M. McCorquodale, et al., "Microsystem and SoC design with UMIPS," IFIP International Conf. on VLSI SOC, pp. 324-329, 2003.
- [2] R. Senger, et al., "A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference," in Proc. Design Automation Conf., pp. 520-525, June 2003.
- [3] R. Ravindran, et al., "Partitioning variables across register windows to reduce spill code in a low-power processor," *IEEE Trans. Computers*, to be published.
- [4] S. Martin, et al., "A low-power microinstrument for chemical analysis of remote environments," 11th NASA Symp. on VLSI Design, Coeur d' Alene, ID, pp. 1-4, May 2003.
- [5] M. McCorquodale, "Monolithic and Top-Down Clock Synthesis with Micromachined RF Reference", Ph.D. Dissertation, Dept. of Elec. Eng. and Comp. Sci., Univ. of Michigan, Ann Arbor, MI, 2004.
- [6] M. McCorquodale, F. Gebara, K. Kraver, E. Marsman, R. Senger, and R. Brown, "A top-down microsystems design methodology and associated challenges," in *Design, Automation, and Test in Europe Designers' Forum Proc.*, pp. 292-296, Mar. 2003.
- [7] E. Marsman, et al., "A 16-bit low-power microcontroller with monolithic MEMS-LC clocking," in Proc. Intl. Symp. on Circuits and Systems, pp. 624-627, May 2005.
- [8] R. Ravindran, et al., "Compiler managed dynamic instruction placement in a low-power code cache," Code Generation and Optimization, pp. 179-190, Mar. 2005.