|
ISLPED'98 Abstracts
Sessions:
[Keynote Session]
[M1]
[M2]
[M3]
[M4]
[P1]
[P2]
[Panel]
[T0]
[T1]
[T2]
[T3]
[T4]
[T5]
[T6]
[W0]
[W1]
[W2]
Chair: Anantha Chandrakasan
-
M0.1 High Performance DSPs - What's Hot and What's Not? [p. 1]
- Bryan Ackland, Chris Nicol
This paper compares low power techniques current in the research literature
with those used in commercial DSP design and explores why some techniques
have not yet had commercial impact. It also examines the low power needs
of future DSP applications.
Keywords: DSP, low power, architecture, circuit design.
-
M0.2 Low Power and Low Voltage CMOS Digital Circuits Techniques [p. 7]
- Christer Svensson, Atila Alvandpour
One of many important factors affecting power
consumption is the choice of circuit technique
for logic, latches and flip-flops. We analyze the
power consumption at circuit level and use the
results to guide the choice of circuit technique.
Several types of latches and flip-flops are compared
regarding power consumption and speed.
Comparing logic clearly indicates that simple
static logic in general have the lowest power
consumption. Another very important factor
affecting power consumption is the supply voltage.
We discuss the effect of low supply voltage
on the choice of circuit technique.
Keywords: Low Power, Low voltage, CMOS, Digital circuits.
Session Chair: Lou Williams
Associate Chair: T. R. Viswanathan
-
M1.1 CMOS Front End Components for Micropower RF Wireless Systems [p. 11]
- Tsung-Hsien Lin, Henry Sanchez, Razieh Rofougaran, William J. Kaiser
New applications have recently appeared for a low
power, low cost, "embedded radio". These wireless
interfaces for handheld mobile nodes and Wireless
Integrated Network Sensors (WINS) must provide
spread spectrum signaling for multi-user operation
at 902-928 MHz. Cost considerations motivate the
development of complete micropower CMOS RF
systems operating at previously unexplored low
power levels. Micropower CMOS VCO and mixer
circuits, developed for these emerging narrow-band
communication systems, are reported here. Design
methods combining high-Q inductors and weak
inversion MOSFET operation enable the lowest
reported operating power for RF front end
components including a voltage-controlled oscillator
(VCO) and mixer operating at frequencies of 400
MHz - 1 GHz. In addition, the VCO, by virtue of
its high-Q inductive components, displays the
lowest reported phase noise for 1 GHz CMOS
VCO systems for any power dissipation.
-
M1.2 A 1.4-GHz 3-mW CMOS LC Low Phase Noise VCO Using Tapped Bond Wire
Inductance [p. 16]
- Tamara I. Ahrens, Thomas H. Lee
A 1.4-GHz LC voltage-controlled oscillator has been implemented in a MOSIS
0.5-um CMOS process. Complementary cross-coupled PMOS and NMOS transistors
enhance single-ended symmetry at each of the resonant nodes, reducing
close-in phase noise. Tapped bond wires provide a resonant tank with
high Q. At an offset frequency of 100 kHz, the measured phase noise
is -107 dBc/Hz with 3mW power dissipation from a
3.0 V supply. NMOS gate capacitors achieve a 17% tuning range.
-
M1.3 A 3.8-mW 2.5-GHz Dual-Modulus Prescaler in a 0.8 um Silicon Bipolar
Production Technology [p. 20]
- Herbert Knapp, Wilhelm Wilhelm, Mira Rest, Hans-Peter Trost
This paper presents a dual-modulus ÷128/÷129 prescaler
operating up to 2.5 GHz. It consumes only 3.8 mW from a 2.3 V
supply when driving an 8 pF capacitive load. The circuit is
operational with supplies ranging from 2 V to over 7 V. With a
2 V supply it consumes only 1.38 mA while still operating up to
2 GHz.
The circuit is manufactured in a standard silicon bipolar
production process (Siemens B6HF). This 25 GHz-fT
double-polysilicon technology uses 0.8 mm lithography and LOCOS
isolation. The chip is mounted in a 6-pin SOT363 SMD package.
Session Chair: Jose Monteiro
Associate Chair: Luca Benini
-
M2.1 Towards the Capability of Providing Power-Area-Delay Trade-off at the
Register Transfer Level [p. 24]
- Chun-hong Chen, Chi-ying Tsui
This paper presents a new register-transfer level (RT-level)
power estimation technique based on technology
decomposition. Given the Boolean description of a
circuit function, the power consumption of two typical
circuit implementations, namely the minimum area
implementation and the minimum delay implementation,
are estimated, respectively. This provides a
capability of obtaining a full power-delay-area trade-off
curve at the RT level. Our method makes it possible to
capture the structural and/or functional information of
a circuit without going through actual gate-level
implementation. Experimental results show that the
accuracy is very reasonable.
Keywords: RT-level, power estimation, entropy, technology
decomposition
-
M2.2 Stream Synthesis for Efficient Power Simulation Based on Spectral
Transforms [p. 30]
- Alberto Macii, Enrico Macii, Massimo Poncino, Riccardo Scarsi
One way of minimizing the time required to perform simulation-based
power estimation is that of reducing the length of the input trace
to be fed to the simulator. Obviously, the use of a
reduced stream may introduce some errors in the estimation results.
The generation (or synthesis) of the short input sequence
to be used for power simulation must then be done in such a
way that the resulting error is minimized.
This paper introduces a new stream synthesis method whose peculiar
feature is that of using spectral analysis techniques based
on the discrete Fourier transform to determine a reduced sequence
of vectors that enables to shorten the overall power simulation time
at a very limited accuracy decrease. The effectiveness of the proposed
synthesis procedure is demonstrated by the results we have obtained
on the Iscas'85 combinational benchmarks for a variety of input
streams characterized by different statistical and correlation
properties.
-
M2.3 Theoretical Bounds for Switching Activity Analysis in Finite-State
Machines [p. 36]
- Diana Marculescu, Radu Marculescu, Massoud Pedram
The objective of this paper is to provide lower and upper bounds for the
switching activity on the state lines in Finite State Machines (FSMs).
Using a Markov chain model for the behavior of the states of the FSM,
we derive theoretical bounds for the average Hamming distance on the
state lines which are valid irrespective of the state encoding used in
the final implementation. Such lower and upper bounds, in addition to
providing a target for any state assignment algorithm, can also be as
parameters in a high-level model of power, and thus provide an early
indication about the performance limits of the target FSM. Experimental
results obtained for the mcnc'91 benchmark suite show that our bounds
are tighter than the bounds reported previously by other researchers and
can be effectively used in a high-level power estimation framework.
Keywords: Lower/upper bounds, Hamming distance, Markov chains, switching
activity, power estimation.
Session Chair: Bill Kaiser
Associate Chair: Jim Burr
-
M3.1 Low Power Salient Integration Mode Image Sensor with a Low Voltage
Mixed-Signal Readout Architecture [p. 42]
- Eric Y. Chou, A. J. Budrys, Kit M. Cham
CMOS image sensors are very suitable for battery-operated camera systems
due to their low power nature. In this research work, a salient integration
mode CMOS image sensor pixel design which requires only 1 or 2 transistors
per pixel and a low power readout architecture was developed in a 0.35 um
CMOS technology. High fill factor and small pixel size are achieved
at the same time for the 2T pixel design. The readout architecture includes
a low voltage low power multi-stage analog data buffer which works as a
differential to single-ended conversion mechanism for a new correlated
double sampling method. Total data bandwidth and switching power are also
greatly reduced. The architecture was developed to be scalable to 0.18 um
technology with 1.2 volt supply voltage, and lower. An experimental chip
in an array size of 256 x 256 with a pixel size of 6.3 um x 6.3 um was
fabricated at a HP's 0.35 um CMOS technology. Promising experimental results
strongly indicates that the new pixel design and readout architecture are
suitable for low voltage CMOS camera chips in future generations of CMOS
technology.
Keywords: Salient integration mode image sensor, active pixel sensor,
CMOS imaging, low power mixed-signal design, deep submicron technology
-
M3.2 A Delay Distribution Squeezing Scheme with Speed-Adaptive Threshold-Voltage
CMOS (SA-Vt CMOS) for Low Voltage LSIs [p. 48]
- Masayuki Miyazaki, Hiroyuki Mizuno, Koichiro Ishibashi
In a speed-adaptive threshold-voltage
CMOS (SA-Vt CMOS) circuit, the
substrate bias is controlled so that delay in
the circuit stays constant. Distributions of
device speeds are squeezed under fast-operation
conditions. With a ring
oscillator using 0.25-mm CMOS devices as
a test circuit, we found that the worst-case
operating frequency was improved from
20 MHz to 55 MHz, and the fluctuation of
the operating frequency was suppressed
from 44 % to 15 % while the supply-voltage
variation was under 0.1 V with a
1.8 V supply voltage.
-
M3.3 3D CMOS SOI for High Performance Computing [p. 54]
- S. J. Abou-Samra, P. A. Aisa, A. Guyot, B. Courtois
This paper addresses three topics : First, a new three-dimensional
CMOS-SOI on SOI technology is presented, then design
methodologies are proposed for this technology and last, a
comparison is carried out between 2D and 3D designs. In this
technology the P-channel devices are stacked over the N-channel
ones. All gates are 100nm length. New design constraints are
introduced. Consequently, new design methodologies have to be
developed in order to fully take advantage of the outstanding
features of 3D integration like for example the reduced length of
interconnections. A 16x16 bit multiplier was designed in this
technology. Comparative results between 2D and 3D integration
are given here in terms of energy consumption, delay and area
-
M3.4 A High Speed and Low Power SOI Inverter using Active Body-Bias [p. 59]
- Joonho Gil, Minkyu Je, Jongho Lee, Hyungcheol Shin
We propose a new high speed and low power
SOI inverter that can operate with efficient
body-bias control and free supply voltage. The
performance of the proposed circuit is
evaluated by both the BSIM3SOI circuit
simulator and the ATLAS device simulator,
and then compared with other reported SOI
circuits. The proposed circuit is shown to have
excellent characteristics. At the supply voltage
of 1.5V, the proposed circuit operates 27%
faster than the conventional SOI circuit with
the same power dissipation.
Keywords:
SOI inverter, low power, dynamic threshold, body-bias
Session Chair: Vivek Tiwari
Associate Chair: Chris Nicol
-
M4.1 Power and Performance Tradeoffs using Various Caching Strategies [p. 64]
- R. Iris Bahar, Gianluca Albera, Srilatha Manne
In this paper, we propose several different data and instruction
cache configurations and analyze their power as well as performance
implications on the processor. Unlike most existing work in
low power microprocessor design, we explore a high performance
processor with the latest innovations for performance. Using a detailed,
architectural-level simulator, we evaluate full system performance
using several different power/performance sensitive cache
configurations such as increasing cache size or associativity and
including buffers along side L1 caches. We then use the information
obtained from the simulator to calculate the energy consumption of
the memory hierarchy of the system. As an alternative to simply increasing
cache associativity or size to reduce lower-level memory
energy consumption (which may have a detrimental effect on on-chip
energy consumption), we show that, by using buffers, energy
consumption of the memory subsystem may be reduced by as much
as 13% for certain data cache configurations and by as much as
23% for certain instruction cache configurations without adversely
effecting processor performance or on-chip energy consumption.
-
M4.2 Architectural and Compiler Support for Energy Reduction in the Memory
Hierarchy of High Performance Microprocessors [p. 70]
- Nikolaos Bellas, Ibrahim Hajj, Constantine Polychronopoulos,
George Stamoulis
In this paper we propose a technique that uses an additional mini cache
located between the I-Cache and the CPU core, and buffers instructions
that are nested within loops and are continuously otherwise fetched from
the I-Cache. This mechanism is combined with code modifications, through
the compiler, that greatly simplify that required hardware, eliminate
unnecessary instruction fetching, and consequently reduce signal switching
activity and the dissipated energy. We show that the additional cache,
dubbed L-Cache, is much smaller and simpler than the I-Cache when the
compiler assumes the role of allocating instructions in it. Through
simulation, we show that, for the SPECfp95 benchmarks, the I-Cache
remains disabled most of the time, and the "cheaper" extra cache is
used instead. We present experimental results that validate the
effectiveness of this technique, and present the energy gains for
most of the SPEC95 benchmarks.
-
M4.3 The Simulation and Evaluation of Dynamic Voltage Scaling Algorithms [p. 76]
- Trevor Pering, Tom Burd, Robert Brodersen
The reduction of energy consumption in microprocessors
can be accomplished without
impacting the peak performance through the
use of dynamic voltage scaling (DVS). This
approach varies the processor voltage under
software control to meet dynamically varying
performance requirements. This paper presents
a foundation for the simulation and analysis
of DVS algorithms. These algorithms are
applied to a benchmark suite specifically targeted
for PDA devices.
-
M4.4 Optimizing the DRAM Refresh Count for Merged DRAM/Logic LSIs [p. 82]
- Taku Ohsawa, Koji Kai, Kazuaki Murakami
In merged DRAM/logic LSIs, the DRAM portion could suffer from shorter
data retention time because of heat and noise caused by the logic
portion. Frequent refreshes increase power consumption. Also, they
disturb normal DRAM accesses leading to performance degradation. In
order to overcome this problem, we propose several DRAM refresh
architectures. We have estimated the DRAM refresh count in executing
benchmark programs under several architecture models. As a result,
in the most effective combination of the architectures, we have obtained
more than 80% reduction against a conventional DRAM refresh architecture
for most benchmark programs. In addition to it, even when we have taken
normal DRAM access into account, we have obtained more than 50% reduction
for several benchmarks.
Session Chair: Sayfe Kiaei
-
P1.1 Integrated DC/DC Converter with Digital Controller [p. 88]
- Ferdinand Sluijs, Kees Hart, Wouter Groeneveld, Stephan Haag
A DC/DC converter with integrated digital controller and switches is
realized. This DC/DC converter only needs an external coil, diode and
capacitor. The main advantages of this type of digital DC/DC converter
are the fast response on load variation and the high efficiency over a
wide power range. The DC/DC converter uses low resistance CMOS switches
and operates in multi mode. This controller uses a small output
voltage window as reference for control actions.
-
P1.2 CMOS VCOs for Frequency Synthesis in Wireless Biotelemetry [p. 91]
- Rafael J. Betancourt-Zamora, Thomas H. Lee
A new phase noise model was used to optimize a
differential ring VCO for minimum power consumption.
We compare the phase noise performance
of three buffer stages using clamped,
symmetric and cross-coupled loads, respectively.
We propose a cross-coupled buffer topology
that achieves lower phase noise by
exploiting symmetry. Measured phase noise for
a 1.2mW, 150MHz VCO fabricated in 0.5mm
CMOS is -103.9dBc/Hz at 500KHz offset, showing
good agreement with the theory.
Keywords : Cmos, frequency synthesis, phase noise,
ring oscillator, vco
-
P1.3 The Impact of Data Characteristics and Hardware Topology on Hardware
Selection for Low Power DSP [p. 94]
- Gareth Keane, Jonathan Spanier, Roger Woods
Adders and multipliers are key operations in DSP
systems. The power consumption of adders is well
understood but there are few detailed results on the
choice of multipliers available. This paper considers
how the power consumption of a number of multiplier
structures such as Carry-Save array and Wallace Tree
multipliers varies with data wordlengths and different
layout strategies. In all cases, results were obtained
from EPIC PowerMill TM simulations of actual
synthesised circuit layouts. Analysis of the results
highlights the effects of routing and interconnect
optimization for low power operation and gives clear
indications on choice of multiplier structure and design
flow for the rapid design of DSP systems.
Keywords:
Low power DSP systems, optimum hardware selection,
multiplier structures.
-
P1.4 Low Threshold CMOS Circuits with Low Standby Current [p. 97]
- Mircea R. Stan
Multi-Voltage CMOS (MVCMOS) is a design methodology
for very low power supply voltages that uses low-threshold
transistors in series with the supply rails. The control voltages
on the gating transistors need to be outside of the Vdd - Vss
range (hence the name MVCMOS) in order to reduce the
standby current, but the resulting circuits operate at lower
supply voltages and have a lower area overhead than the
previously proposed Multi-Threshold CMOS (MTCMOS).
-
P1.5 Minimum Supply Voltage for Bulk Si CMOS GSI [p. 100]
- Azeez J. Bhavnagarwala, Blanca Austin, James D. Meindl
Limits on energy dissipation are investigated for bulk Si CMOS circuits
at each node of the 1997 National Technology Roadmap for Semiconductors (NTRS).
Physical, continuous and smooth MOSFET Transregional drain current models that
consider high-field effects in scaled devices, and permit trade-offs between
saturation drive current and subthreshold leakage current are described and
employed to model CMOS circuit performance and power dissipation at low
voltages. The Transregional models are used in conjunction with physical
threshold voltage roll-off models and stochastic interconnect distribution,
at performance, chip sizes and transistor counts forecast by the 1997 NTRS,
to project optimal supply and threshold voltages, minimizing total energy
dissipated by CMOS logic circuits. Techniques exploiting datapath
parallelism to further reduce supply voltage are shown to offer decreasing
reductions in power dissipation with technology scaling.
-
P1.6 0.5 V CMOS Logic Delivering 200 Million 8x8 Bit Multiplications/s at Less
Than 100 fJ Based on a 50 nm T-Gate SOI Technology [p. 103]
- Volker Dudek, Reinhard Grube, Bernd Höfflinger, Michael Schau
High-performance CMOS logic at a very low
voltage of 0.5 V can deliver 150 Million 8x8
multiplications/s at an energy level of only
30fJ, if 0.35 um SOI technology is enhanced
with self-aligned 50 nm T-Gate transistors, if a
new adder with a differential Manchester
chain including special accelerators and if the
DIGILOG multiplier, a leading-one-first
pseudo-log multiplier with complexity order
(n) are optimized simultaneously.
Keywords :
Adder, Multiplier, T-Gate, low power, high-performance
-
P1.8 Decreasing Low-Voltage Manufacturing-Induced Delay Variations with
Adaptive Mixed-Voltage-Swing Circuits [p. 106]
- L. Richard Carley, Akshay Aggarwal, Ram K. Krishnamurthy
One of the major problems faced by the
designer when operating CMOS static logic circuits
at low power supply voltages (normalized
to VT) is that the delay spread introduced by
today's IC manufacturing variations can
increase dramatically. In this paper we describe
an approach for decreasing the delay spread
and power spread in ICs based on adaptively
servoing the circuits between static CMOS
operation and QuadRail operation. An on-chip
series-regulator employing a dummy delay
path is used to generate the adaptive low swing
power supply rails making this approach fully
compatible with a standard CMOS IC design
methodology. Simulation results are presented
demonstrating that for a 16*16+36-bit multiplier-accumulator
designed in 0.5 um CMOS
process the proposed approach decreases the
delay spread from 3.9X to 2.3X and the power
spread from 3.6X to 1.8X.
Keywords :
Low power CMOS logic, mixed-swing CMOS logic,
manufacturing variations, low voltage logic circuits.
-
P1.9 Power-Delay Tradeoffs for Radix-4 and Radix-8 Dividers [p. 109]
- Alberto Nannarelli, Tomas Lang
The use of higher radices in division reduces the number
of iterations to complete the operation, but increases the
complexity of the circuit. In this paper we explore the influence
of the radix on the power dissipation of a floating-point
divider and the power-delay tradeoffs. We compare the performance
and the energy consumption per operation for a
radix-4 and a radix-8 divider, realized in CMOS technology.
A reduction of about 40% in the energy consumption is obtained
for both radices (about 70% if low-voltage gates, for
dual voltage implementation, are available). Also the results
show that the radix-8 divider is about 20% faster and the
energy dissipated to perform a division is about the same,
with respect to the radix-4.
Session Chair: Ingrid Verbauwhede
-
P2.1 Automatic Characterization and Modeling of Power Consumption in Static
RAM's [p. 112]
- Mauro Chinosi, Roberto Zafalon, Carlo Guardiani
An automatic modeling technique is presented in this paper that allows
to build an accurate model of power consumption in embedded memory blocks.
A software neural-network is used to create a regression tree by
automatically splitting those variables that have a discontinuous
effect on the power consumption. An application of the methodology
to the modeling of a 0.35 um CMOS embedded SRAM is presented.
Keywords: Power estimation, Memory modeling, Static RAMs
-
P2.2 Improving Sampling Efficiency for System Level Power Estimation [p. 115]
- Chih-Shun Ding, Cheng-Ta Hsieh, Massoud Pedram
In this paper, we propose an efficient statistical sampling technique
which is suitable for estimating the total power consumption of a
large VLSI system. The basic idea is to generate simulation units
for each module in the system independently and then form samples of
the system power by randomly selecting simulation units for each module.
Hence, sampling is performed both temporally (across different clock
cycles) and spatially (across different modules). A module clustering
step ensures that the module types are compatible with this sampling
strategy. Experimental results show a 4x reduction in the simulation
time compared to existing Monte-Carlo simulation techniques.
-
P2.3 Power Invariant Vector Compaction Based on Bit Clustering and Temporal
Partitioning [p. 118]
- Nicola Dragone, Roberto Zafalon, Carlo Guardiani, Cristina Silvano
Power dissipation is digital circuits is strongly pattern dependent.
Thus, to derive accurate simulation-based power estimates, a large
amount of input vectors is usually required. This paper proposes
a vector compaction technique aiming at providing accurate power figures
in a shorter simulation time for complex sequential circuits characterized
by some hundreds of inputs. From pair-wise spatio-temporal signal correlations,
the proposed approach is based on bit clustering and temporal partitioning
of the input stream aiming at preserving the statistical properties of the
original stream and maintaining the typical switching behavior of the circuit.
The effectiveness of the proposed approach has been demonstrated over a
significant set of industrial case studies implemented in CMOS submicron
technology. While achieving a 10x to 50x stream size reduction, the reported
results show an average and maximum errors of 2.4% and 7.1% respectively,
over the simulation-based power estimates derived from the original input
stream.
Keywords: Power Estimation, Vector Compaction, Markov Chains, Low Power
VLSI Design
-
P2.4 An Empirical Comparison of Algorithmic, Instruction, and Architectural
Power Prediction Models for High Performance Embedded DSP Processors [p. 121]
- Catherine H. Gebotys, Robert J. Gebotys
This paper presents a comparison of statistically-derived
power prediction models at the algorithmic, instruction,
and architectural levels for embedded high
performance DSP processors. The approach is general
enough to be applied to any embedded DSP processor.
Results from 168 power measurements of DSP code
show that power can be predicted at instruction and
architecture levels with less than 2% error. This result
is important for developing a general methodology
for power characterization of embedded DSP software
since low power is critical to complex DSP applications
in many cost sensitive markets.
-
P2.5 Power Calculation and Modeling in Deep Submicron [p. 124]
- Jay Abraham
Over the past few years it has become increasingly apparent that modern
IC design is no longer bounded by timing and area constraints. Power has
become significantly more important. In an era of hand held devices ranging
from mobile computing to wireless communication systems, managing and
controlling power takes on an important role. Several benefits are realized
with low power designs in addition to extended battery life. Low power
devices often run at lower junction temperatures and this leads to high
reliability and low cost cooling systems [1,2,3,6]. Calculation and modeling
of power (and delay) in deep-submicron (less than 0.25 microns) designs
poses several challenges. This paper discusses the use of the Delay and
Power Calculation System (DPCS) as a means by which EDA (Electronic
Design Automation) tools can accurately calculate and model power.
-
P2.6 Partial Bus-Invert Coding for Power Optimization of System Level Bus
[p. 127]
- Youngsoo Shin, Sook-Ik Chae, Kiyoung Choi
We present a partial bus-invert coding scheme for power
optimization of system level bus. In the proposed scheme,
we select a sub-group of bus lines involved in bus encoding
to avoid unnecessary inversion of bus lines not in the sub-group
thereby reducing the total number of bus transitions.
We propose a heuristic algorithm that selects the sub-group
of bus lines for bus encoding. Experiments on benchmark
examples indicate that the partial bus-invert coding reduces
the total bus transitions by 62.6% on the average, compared
to that of the unencoded patterns.
-
P2.7 The Petrol Approach to High-Level Power Estimation [p. 130]
- Rafael Peset Llopis, Kees Goossens
High-level power estimation is essential for designing complex low-power
ICs. However, the lack of flexibility, or restriction to synthesizable
code of previously presented high-level power estimation approaches limits
their use. In this paper we present a novel, more general and flexible
high-level power estimation approach, that avoids these limitations.
Petrol, as we call it, is not limited to specialized application domains,
synthesizable VHDL, or data path parts of a design. We show that glitches
can be usefully modeled at higher levels of abstraction. The Petrol approach
shows good correlation with gate-level power estimates. It is currently
used for commercial designs.
-
P2.8 Power Consumption of Parallel Spread Spectrum Correlator Architectures
[p. 133]
- Won Namgoong, Teresa Meng
Parallel correlation in direct-sequence spread spectrum system allows faster
and more reliable coarse acquisition. However, the power consumed becomes
significant especially for receivers that employ a large number of parallel
correlators. In this paper, the power efficiency of various parallel
correlator architectures is explored assuming baseband sampled signals of
two samples per chip. Active correlators placed in parallel that use both
two's complement and sign-magnitude accumulators are first presented. A
functionally equivalent M-parallel passive correlators are then studied.
In this approach, the baseband sampled signals are passed through a tapped
delay-line. Each tap is then multiplied by a stationary reference
pseudonoise code and summed using a binary tree network. The passive
correlators are generally more power efficient for large M values. Further
reduction in power consumption is possible by splitting the tapped delay-line
into even and odd delays and summing using two smaller binary tree adders.
This proposed architecture consumes significantly less power compared to all
other architectures. The power dissipation of M-parallel correlator
architectures are evaluated for M = 8, 16, 32 using TSMC 0.35 -um CMOS
technology at 3.3V supply voltage.
-
P2.9 A Low Power Video Processor [p. 136]
- Uzi Zangi, Ran Ginosar
Multiple power saving methods were applied to a
video processor for color digital video and still
cameras. Architectural level methods failed to save
power: asynchronous design, dynamic voltage
scaling, bus switching minimization, pipeline stage
merging, reduction of switching times and clock
gating. However changing the algorithm to work
on pixel differences yielded 3-15% power
reduction in typical cases.
-
P2.10 Power Dissipated by CMOS Gates Driving Lossless Transmission Lines
[p. 139]
- Yehea I. Ismail, Eby G. Friedman, Jose L. Neves
The dynamic and short-circuit power consumption of a CMOS gate
driving an LC transmission line
as a limiting case of an RLC transmission line is investigated in
this paper. Closed form solutions for the output voltage and
short-circuit power of a CMOS gate driving an LC
transmission line are presented. These solutions agree with
AS/X simulations within 11% error for a wide range of
transistor widths and line impedances. The ratio of the short-circuit
to dynamic power is less than 7% for CMOS gates
driving LC transmission lines where the line is matched or
underdriven. Therefore, the total power consumption is
expected to decrease as inductance effects becomes more
significant as compared to an RC model of the interconnect.
Moderator: Jan M. Rabaey, Bryan Ackland, Bob Brodersen,
Christer Svenson, Bruce Wooley
Session Chair: Farid N. Najm
-
T0.1 Emerging Power Management Tools for Processor Design [p. 143]
- D. T. Blaauw, A. Dharchoudhury, R. Panda, S. Sirichotiyakul,
C. Oh, T. Edwards
Power management is an increasing concern for processor design. In this
paper, we presented an overview of traditional power simulation tools and
discussed two emerging power management design technologies: power
distribution integrity analysis and standby current measurement and
optimization. We present methods for accurate peak current simulation,
which is needed for power grid integrity analysis, and discuss the generation
and compression of the simulation vectors. Also, static approaches for
calculating an upper-bound on the maximum peak current are presented.
Standby leakage current is state dependent and we present methods for
calculating both the average and maximum leakage current. Finally,
optimization methods for minimizing the leakage current by either
assigning a standby state to the circuit or by using a dual-Vt process
are discussed.
-
T0.2 Recent Developments in High Integration Multi-Standard CMOS Transceivers
for Personal Communication Systems [p. 149]
- Jacque C. Rudell, Jia-Jiunn Ou, Sekhar Narayanaswami, George Chien,
Jeffrey A. Weldon, Li Lin, King-Chun Tsai, Luns Tee, Kelvin Khoo, Danelle Au,
Troy Robinson, Danilo Gerna, Masanori Otsuka, Paul Gray
Issues associated with the integration of transceiver components on to a single
silicon substrate are discussed. In particular, recently proposed receiver
and transmitter architectures for high integration are examined on the
promise of providing multi-standard capability. In addition, existing
barriers to lower power transceiver operation are examined as well as some
proposed directions for future integrated transceiver research
and development.
Session Chair: Brock Barton
Associate Chair: Rick Carley
-
T1.1 Low-Energy Embedded FPGA Structures [p. 155]
- Eric Kusse, Jan M. Rabaey
This paper introduces an energy-efficient FPGA module, intended for
embedded implementations. The main features of the proposed cell
include a rich local-interconnect network, which drastically reduces
the energy dissipated in the wiring, and a dual-voltage scheme that allows
pass-transistor networks to operate at low-voltages yet maintain decent
performance. Simulations on a benchmark set demonstrate that the proposed
module succeeds in its goal of reducing energy consumption by an order of
magnitude over existing implementations.
Keywords: FPGAs, Low Energy, Dual Voltage, Pass-transistors, Power,
Embedded, Low Swing, Interconnect Network.
-
T1.2 Low Swing Interconnect Interface Circuits [p. 161]
- Hui Zhang, Jan Rabaey
This paper reviews a number of low-swing on-chip
interconnect schemes, and presents a thorough
analysis of their effectiveness and limitations.
In addition, several new interface
circuits, presenting even more energy savings,
are proposed. Some of these circuits not only
reduce the interconnect swing, but also use very
low supply voltages, so as to obtain quadratic
energy savings. The performances of each of
the presented circuits are thoroughly examined
using simulation on a benchmark interconnect
circuit. Energy savings with a factor of seven
have been observed for some of the schemes.
-
T1.3 True Single-Phase Energy-Recovering Logic for Low-Power, High-Speed VLSI
[p. 167]
- Suhwan Kim, Marios C. Papaefthymiou
In dynamic logic families that rely on energy recovery to achieve low
energy dissipation, the flow of data through cascaded gates is controlled
using multi-phase clocks. Consequently, these families require multiple
clock generators and can exhibit increased energy consumption on their
clock distribution networks. Moreover, they are not attractive for
high-speed design due to clock skew management problems.
In this paper, we present TSEL, the first energy-recovering logic family
that operates with a single-phase clocking scheme. TSEL outperforms
previous energy-recovering logic families in terms of energy efficiency
and operating speed. In HSPICE simulations with a standard 0.5 um technology
from MOSIS, pipelined carry-lookahead adders in TSEL function correctly
for operating frequencies exceeding 280MHz. For operating frequencies
above 80 MHz, they dissipate considerably less energy per operation than
alternative implementations of the same adder architecture in other
energy-recovering logic families. In comparison with their CMOS counterparts,
the TSEL adders dissipate about half as much energy at 280MHz. Our results
indicate that TSEL is an excellent candidate for high speed and low power
VLSI system design.
Session Chair: Renu Mehra
Associate Chair: Maurizio Damiani
-
T2.1 System-Level Power Estimation and Optimization [p. 173]
- Luca Benini, Robin Hodgson, Polly Siegel
Most work to date on power reduction has focused at the component
level, not at the system level. In this paper, we propose a
framework for describing the power behavior of system-level
designs. The model consists of a set of resources, an environmental
workload specification, and a power management policy,
which serves as the heart of the system model. We map this
model to a simulation-based framework to obtain an estimate
of the system's power dissipation. Accompanying this, we propose
an algorithm to optimize power management policies.
The optimization algorithm can be used in a tight loop with the
estimation engine to derive new power-management policy
algorithms for a given system-level description. We tested our
approach by applying it to a real-life low-power portable
design, achieving a power estimation accuracy of ~10%, and a
23% reduction in power after policy optimization.
-
T2.2 Memory Modeling for System Synthesis [p. 179]
- Sari L. Coumeri, Donald E. Thomas
We present our methodology for developing models of
on-chip SRAM memory organizations. The models were
created to enable the quick evaluation of energy, area,
and performance of different memory configurations
considered during synthesis. The models are defined in
terms of parameters, such as size and mode of operation,
which are known at synthesis time. Our methodology
does not require knowledge of the underlying memory
circuitry and provides models with average percentage
errors within 8%. We found that only 10 different memories
from a large span of possible memory sizes are
needed to obtain reasonably accurate models, with average
errors within 15%. We further use these models to
evaluate different low power memory organizations and
have seen energy reductions of up to 88%. In this paper
we present our modeling methodology, discuss the
important aspects in developing the models, and show
results of using the models in evaluating low power memory
organizations.
-
T2.3 Monitoring System Activity for OS-Directed Dynamic Power Management
[p. 185]
- Luca Benini, Alessandro Bogliolo, Stefano Cavallucci, Bruno Riccó
In this paper we describe a workload monitoring system
that has been specifically designed for supporting dynamic
power management in personal computers with tight power
constraints (such as laptop or notebook computers). Our
monitoring system is minimally intrusive, and has negligible
impact on system activity. Moreover, it can be used both
for on-line system monitoring and off-line data collection.
We used our monitoring tool to collect data on the usage
of system resources (disks, CPU, keyboard and mouse) for
a laptop computer, under several workload conditions. Our
analysis shows that resource usage is strongly resource and
workload dependent, and that on-line usage monitoring capability
is a critical issue of the implementation of effective
power management policies.
Session Chair: Christian Enz
Associate Chair: Venu Gopinathan
-
T3.1 A Reconfigurable Dual Output Low Power Digital PWM Power Converter [p. 191]
- Abram Dancy, Anantha Chandrakasan
This versatile power converter controller provides
dual outputs at a fixed switching frequency and can
regulate either output voltage or target system delay
(using an external L-C filter). In the voltage regulation
mode, the output voltage is monitored with an A/D converter,
and the feedback compensation network is
implemented digitally. The generation of the PWM signal
is done with a hybrid delay line/counter approach,
which saves power and area relative to previous implementations.
Power devices are included on chip to create
the two independently regulated output PWM
signals. The key features of this design are its low
power dissipation, reconfigurability, use of either delay
or voltage feedback, and multiple outputs.
-
T3.2 Voltage Scheduling Problem for Dynamically Variable Voltage Processors
[p. 197]
- Tohru Ishihara, Hiroto Yasuura
This paper presents a model of dynamically variable voltage
processor and basic theorems for power-delay optimization.
A static voltage scheduling problem is also proposed and formulated
as an integer linear programming (ILP) problem.
In the problem, we assume that a core processor can vary its
supply voltage dynamically, but can use only a single voltage
level at a time. For a given application program and
a dynamically variable voltage processor, a voltage scheduling
which minimizes energy consumption under an execution
time constraint can be found.
-
T3.3 On the Optimum Design of Regulated Cascode Operational Transconductance
Amplifiers [p. 203]
- Thomas Burger, Qiuting Huang
An optimal design procedure to achieve minimum power consumption
for a given technology and gain bandwidth is presented. Regulated
cascode gain enhancement is used to ensure sufficient DC-gain
at minimum gate length transistors. To validate the approach
five folded cascode OTA's have been implemented, spanning a bias
range of 1uA -10mA, with measured unity-gain bandwidths within
20% of the designed value. For 17 mW at 3 V, a 0.5 um CMOS
OTA achieves 630 MHz with 51 degree phase margin. The method has
been applied in the design of a 3rd order Change(Summation) modulator
for GSM receivers. The modulator consumes 2.8 mW at 3 V and has a dynamic
range of 86 dB for a 100 kHz input signal bandwidth.
Session Chair: George Stamoulis
Associate Chair: Sarma Vrudhula
-
T4.1 Low Power Logic Synthesis under a General Delay Model
[p. 209]
- Unni Narayanan, Peichen Pan, C. L. Liu
Till now most efforts in low power logic synthesis have concentrated
on minimizing the total switching activity of a circuit under a
zero delay model. This simplification ignores
the effects of glitch transitions which may contribute as much
as 30% of the total power consumption of a circuit. Hence,
low power logic synthesis techniques which optimize power
under a zero delay model are often not successful in attaining
"real" power savings as measured under a more accurate
general delay model. In practice, to accurately estimate the
switching activity in a circuit under a general delay model
can be computationally expensive. Hence, to repeatedly call
accurate but slow power estimation tools to direct the synthesis
flow is not a viable approach in the design of low power
synthesis tools. In this paper we take advantage of a fast
method for estimating the total switching activity in a circuit
under a general delay model to synthesize low power
circuits. Specifically, we use the approximation as a basis
for algorithms that solve two problems: (1) low power technology
decomposition of gates under a general delay model
(2) low power retiming of sequential circuits under a general
delay model.
-
T4.2 Local Transformation Techniques for Multi-Level Logic Circuits Utilizing
Circuit Symmetries for Power Reduction
[p. 215]
- Ki-Seok Chung, C. L. Liu
In this paper, we present several optimization techniques
for power reduction utilizing circuit symmetries.
There are four kinds of symmetries that we detect in
a given circuit implementation. First, we propose an
algorithm for detecting the four different types of symmetries
in a given circuit implementation of a Boolean
function. Several re-synthesis techniques utilizing such
symmetries are proposed. These techniques enable us to
optimize power consumption and delay with no (or very
little) area overhead. We have carried out experiments
on MCNC benchmark circuits to demonstrate the efficiency
of the proposed techniques. The average power
reduction is 14% with little or none area and/or delay
overhead.
-
T4.3 A Power Optimization Method Considering Glitch Reduction by Gate Sizing
[p. 221]
- Masanori Hashimoto, Hidetoshi Onodera, Keikichi Tamaru
We propose a power optimization method considering glitch reduction
by gate sizing. Our method reduces not only the amount of
capacitive and short-circuit power consumption but also the power
dissipated by glitches which has not been exploited previously. In
the optimization method, we improve the accuracy of statistical
glitch estimation method and device a gate sizing algorithm that
utilizes perturbations for escaping a bad local solution. The effect
of our method is verified experimentally using 12 benchmark circuits
with a 0.5 um standard cell library. Gate sizing reduces the
number of glitch transitions by 38.2 % on average and by 63.4 %
maximum. This results in the reduction of total transitions by 12.8
% on average. When the circuits are optimized for power without
delay constraints, the power dissipation is reduced by 7.4 % on
average and by 15.7 % maximum further from the minimum-sized
circuits.
Session Chair: Suresh Rajgopal
Associate Chair: Chi-Ying Tsui
-
T5.1 A Unified Approach in the Analysis of Latches and Flip-Flops for
Low-Power Systems
[p. 227]
- Vladimir Stojanovic, Vojin Oklobdzija, Raminder Bajwa
In this paper we proposed a set of rules for consistent estimation of
the real performance and power features of the latch and flip-flop
structures. A new simulation and optimization approach is presented,
targeting both high-performance and power budget issues. The analysis
approach reveals the sources of performance and power consumption bottlenecks
in different design styles. Certain misleading parameters have been
properly modified and weighted to reflect the real properties of the
compared structures. Furthermore, the results of the comparison of
representative latches and flip-flops illustrate the advantages of our
approach and the suitability of different design styles for low-power
and high-performance applications.
-
T5.2 Estimation of Maximum Power Supply Noise for Deep Sub-Micron Designs
[p. 233]
- Yi-Min Jiang, Kwang-Ting Cheng, An-Chang Deng
We propose a new technique for generating a small set
of patterns to estimate the maximum power supply noise of
deep sub-micron designs. We first build the charge/discharge
current and output voltage waveform libraries for
each cell, taking power and ground pin characteristics, the
power net RC and other input characteristics as parameters.
Based on the cells' current and voltage libraries, the
power supply noise of a 2-vector sequence can be estimated
efficiently by a cell-level waveform simulator. We then
apply the Genetic Algorithm based on the efficient waveform
simulator to generate a small set of patterns producing
high power supply noise. Finally, the results are
validated by simulating the obtained patterns using a transistor
level simulator. Our experimental results show that
the patterns generated by our approach produce a tight
lower bound on the maximum power supply noise.
-
T5.3 Estimation of Standby Leakage Power in CMOS Circuits Considering
Accurate Modeling of Transistor Stacks [p. 239]
- Zhanping Chen, Mark Johnson, Liqiong Wei, Kaushik Roy
Low supply voltage requires the device threshold to be reduced
in order to maintain performance. Due to the exponential
relationship between leakage current and threshold
voltage in the weak inversion region, leakage power can no
longer be ignored. In this paper we present a technique to
accurately estimate leakage power by accurately modeling
the leakage current in transistor stacks. The standby leakage
current model has been verified by HSPICE. We demonstrate
that the dependence of leakage power on primary input
combinations can be accounted for by this model. Based
on our analysis we can determine good bounds for leakage
power in the standby mode. As a by-product of this analysis
, we can also determine the set of input vectors which can
put the circuits in the low-power standby mode. Results on
a large number of benchmarks indicate that proper input selection
can reduce the standby leakage power by more than 50% for
some circuits.
-
T5.4 Separation and Extraction of Short-Circuit Power Consumption in Digital
CMOS VLSI Circuits [p. 245]
- Atila Alvandpour, Per Larsson-Edefors, Christer Svensson
In this paper, we present a new technique which
indirectly separates and extracts the total
short-circuit power consumption of digital
CMOS circuits. We avoid a direct encounter
with the complex behavior of the short-circuit
currents. Instead, we separate the dynamic
power consumption from the total power and
extract the total short-circuit power.
The technique is based on two facts: first, the
short-circuit power consumption disappears at
a Vdd close to VT and, secondly,
the total capacitance
depends on supply voltage in a sufficiently
weak way in standard CMOS circuits.
Hence, the total effective capacitance can be
estimated at a low Vdd .
To avoid reducing Vdd below the specified forbidden
level, a polynomial is used to estimate
the power versus supply voltage down to V T
based on a small voltage sweep over the allowed
supply voltage levels. The result shows good
accuracy for the short-circuit current ranges of
interest.
Keywords :
Short-circuit current, Power consumption, Power estimation.
Session Chair: Naresh Shanbhag
Associate Chair: Mary Jane Irwin
-
T6.1 Decorrelating (DECOR) Transformations for Low-Power Adaptive Filters
[p. 250]
- Sumant Ramprasad, Naresh R. Shanbhag, Ibrahim N. Hajj
Presented in this paper are decorrelating transformations
(referred to as DECOR transformations) to
reduce the power dissipation in adaptive filters. The
coefficients generated by the weight update block in
an adaptive filter are passed through a decorrelating
block such that fewer bits are required to represent
the coefficients. Thus, the size of the arithmetic
units in the filter (F-block) is reduced thereby reducing
the power dissipation. The DECOR transform is well suited
for narrow-band filters because
there is significant correlation between adjacent coefficients.
In addition, the effectiveness of DECOR
transforms increases with increase in the order of
the filter and decrease in coefficient precision. Simulation
results indicate reduction in power dissipation in the
F-block ranging from 12% to 38% for filter bandwidths ranging
from 0:15fs to 0:025fs (where fs is the sample rate).
-
T6.2 The Logarithmic Number System for Strength Reduction in Adaptive
Filtering [p. 256]
- John R. Sacha, Mary Jane Irwin
An important technique for reducing power consumption in
VLSI systems is strength reduction, the substitution of a
less-costly operation such as a shift, for a more-costly operation
such a multiplication. Using a logarithmic number
representation provides several opportunities for strength reductions;
in particular, multiplication is performed as the
fixed-point addition of logarithms, and extracting a square
root is implemented via a shift. These reductions occur
transparently at the hardware level; consequently relatively
little algorithmic modification is required, and they are readily
applicable to adaptive filtering. For performing Givens
rotations in the QR decomposition recursive least squares
adaptive filter, logarithmic arithmetic is shown to compare
favorably to other strength reduction techniques, such as
CORDIC arithmetic, in terms of switched capacitance and
numerical accuracy.
-
T6.3 Low-Power Architecture of the Soft-Output Viterbi Algorithm [p. 262]
- David Garrett, Mircea Stan
This paper investigates the low power implementation
issues of the soft-output Viterbi algorithm
(SOVA), a building block for turbo codes.
By briefly explaining the theory of turbo codes,
and by reviewing several of the decoding algorithms,
we develop the computational requirements
for a SOVA implementation, and
ultimately develop an architecture that completes
those computations with reduced power
consumption. The architecture builds on previous
work on the Viterbi and Soft-Output Viterbi
algorithms, and incorporates a novel
orthogonal access memory structure, which
provides parallel access across sequentially
received data.
Keywords :
SOVA, turbo codes, VA, low power.
-
T6.4 Low Power Methodology and Design Techniques for Processor Design
[p. 268]
- J. Patrick Brennan, Alvar Dean, Stephen Kenyon, Sebastian Ventrone
IBM's ASIC design methodologies is used to develop a low power microprocessor
for the mobile (battery powered) marketplace. The design called for a
reduction of active power by a factor of 10 times from an estimate of
a product designed in a standard 3 volt ASIC design system. An overview
of the design methodology and some of the innovative power reduction
techniques are presented.
Session Chair: Chuck Traylor
-
W0.1 Power Distribution in High-Performance Design [p. 274]
- Michael Benoit, Sandy Taylor, David Overhauser, Steffen Rochel
Power distribution design in high-performance chips is a task that is not
eased through the application of power reduction techniques. Although
the average power of a high-performance design can be reduced, the peak
to average power current ratio of blocks increases as a result, aggravating
the challenges faced prior to average power reduction. This paper discusses
the power distribution design challenge : to reliably deliver a predictable
voltage to all transistors under all operating conditions. Steps in power
estimation, approaches to power distribution implementation, and verification
of power distribution are reviewed. The myths versus reality of power
distribution design in high-performance chips are provided.
Keywords: Power grid, Power distribution, IR drop
-
W0.2 Low-Power Miniaturized Information Display Systems [p. 279]
- Michael Bolotski, Philip Alvelda
This paper discusses low power issues in the design of miniature information
display devices built on silicon substrates.
Keywords: LCOS, microdisplay, power, field-sequential color
Session Chair: Bill Athas
Associate Chair: Dan Dobberpuhl
-
W1.1 Low-Power Embedded SRAM Macros with Current-Mode Read/Write Operations
[p. 282]
- Jinn-Shyan Wang, Po-Hui Yang, Wayne Tseng
The newly proposed SRAM performs both
read and write operations in the current-mode.
Due to the current-mode operations, voltage
swings at bit-lines and data-lines are kept very
small during read and write. The AC power
dissipation of bit-lines and data-lines can thus
be saved efficiently. For an embedded SRAM
macro used in an 8-bit m-controller, the SRAM
using the fully current-mode technique
consumes only 30% power dissipation as
compared to the SRAM with only current-mode
read operation. Experimental results
show good agreement with the simulation
results and prove the feasibility of the new
technique.
-
W1.2 A Three-Port Adiabatic Register File Suitable for Embedded Applications
[p. 288]
- Stephen Avery, Marwan Jabri
Adiabatic logic promises extremely low power consumption
for those applications where slower clock rates are acceptable
. However, there have been very few adiabatic memory
designs, and any circuit of even moderate complexity requires
some form of ram. This paper presents a register file
implemented entirely with adiabatic logic, and fabricated
using a 1.2 um cmos technology. Comparison with a conventional
cmos logic implementation, using both measured
and simulated results, indicates significant power savings
have been realised.
-
W1.3 A Low Power SRAM using Auto-Backgate-Controlled MT-CMOS [p. 293]
- Koji Nii, Hiroshi Makino, Yoshiki Tujihashi, Chikayoshi Morishima,
Yasushi Hayakawa, Hiroyuki Nunogami, Takahiko Arakawa, Hisanori Hamano
We have been proposed a low power SRAM using an effective method called
"ABC-MT-CMOS" [1]. It controls the backgates to reduce the leakage current
when the SRAM is not activated (sleep mode) while retaining the data stored
in the memory cells. We also adopted a "CSB Scheme" which clamps both
the source lines of the memory cell array and the bit lines. We designed
and fabricated test chips containing a 32K-bit gate array SRAM. The
experimental results show that the leakage current is reduced to 1/1000
in sleep mode. The active power is 0.27 mW/MHz at 1 V, which is a
reduction of 1/12 of a conventional SRAM with a 3.3V.
Session Chair: Anand Raghunathan
Associate Chair: Joerg Henkel
-
W2.1 Fast High-Level Power Estimation for Control-Flow Intensive Designs
[p. 299]
- Kamal S. Khouri, Ganesh Lakshminarayana, Niraj K. Jha
In this paper, we present a power estimation technique for
control-flow intensive designs that is tailored towards driving
iterative high-level synthesis systems, where hundreds of architectural
trade-offs are explored and compared. Our method is
fast and relatively accurate. The algorithm utilizes the behavioral
information to extract branch probabilities, and uses these
in conjunction with switching activity and circuit capacitance
information, to estimate the power consumption of a given architecture.
We test our algorithm using a series of experiments, each
geared towards measuring a different indicator. The first set
of experiments measures the algorithm's accuracy when compared
to the actual circuit power. The second set of experiments
measures the average tracking index, and tracking index
fidelity for a series of architectures. This index measures
how well the algorithm makes decisions when comparing the
relative power consumption of two architectures contending as
low-power candidates.
Results indicate that our algorithm achieved an average estimation
error of 11.8% and an average tracking index of 0.95
over all examples.
-
W2.2 The Energy Complexity of Register Files [p. 305]
- Victor Zyuban, P. Kogge
Register files (RF) represent a substantial portion of the energy
budget in modern processors, and are growing rapidly with the
trend towards wider instruction issue. The actual access energy
costs depend greatly on the register file circuitry used. This paper
compares various RF circuitry techniques for their energy efficiencies,
as a function of architectural parameters such as the
number of registers and the number of ports. The Port Priority Selection
technique was found to be the most energy efficient. The
dependence of register file access energy upon technology scaling
is also studied. However, as this paper shows, it appears that none
of these will be enough to prevent centralized register files from
becoming the dominant power component of next-generation superscalar
computers, and alternative methods for inter-instruction
communication need to be developed. Split register file architecture
is analyzed as a possible alternative.
-
W2.3 Power Exploration for Dynamic Data Types Through Virtual Memory
Management Refinement [p. 311]
- Julio L. da Silva Jr., Francky Catthoor, Diederik Verkest, Hugo De Man
In this paper we present our novel power exploration methodology
for applications with dynamic data types. Our methodology is crucial
to obtain effective solutions in an embedded (HW or SW) processor
context. The contributions are twofold. First we define the complete
search space for Virtual Memory Management (VMM) mechanisms in a
structured way with orthogonal decision trees. Secondly we present
our systematic methodology for exploration of the maximal
power that takes into account characteristics of the application
to heavily prune the search space guiding the choices of
a VMM mechanism. Finally we demonstrate for two industrial
examples that power can vary considerably depending
on the VMM chosen. Moreover these experiments show the
effectiveness of our exploration methodology.
|