|
ABSTRACTS ASPDAC 97
Sessions:
[Keynote I]
[1A]
[1B]
[1C]
[1D]
[2A]
[2B]
[2C]
[2D]
[3A]
[3B]
[3C]
[3D]
[Keynote II]
[4A]
[4B]
[4C]
[4D]
[5A]
[5B]
[5C]
[5D]
[6A]
[6B]
[6C]
[6D]
[Keynote III]
[7A]
[7B]
[7C]
[7D]
[8A]
[8B]
[8C]
[8D]
[9A]
[9B]
[9C]
[9D]
Microelectronics Evolution Brings Real Multimedia Era
- Tatsuo Izawa, NTT Science and Core Technology Laboratory Group, Japan
Attractive services and contents will be realized by the development of software
technology, and the reasonable price will be realized by the further
development of microelectronics hardware technology. For example, the higher
speed and lower power consumption technogies increase simultaneous transmission
signals over a single channel and as the result the service cost decreases.
-
1A.1 A Co-evaluation of FPGA Architectures and the CAD System for
Telecommunication
- Tsunemasa Hayashi, Atsushi Takahara, Ken-nosuke Fukami
We propose an FPGA architecture for next generation B-ISDN telecommunications
systems. Such a system requires an FPGA in which an over 10K gates circuit can
be implemented and that has a clock cycle rate of 80MHz. While the FPGA
architecture has been discussed in terms of its circuit structure, we consider
the circuit structure of the FPGA with its CAD tools. We evaluate several FPGA
logic-element structures with a technology mapping method. From our experiments,
the Multiplexor based logic-element is found to be suitable for implementing
such a high-speed circuit using the BDD-based technology mapping method.
-
1A.2 A Rapid Prototyping Method for Top-Down Design of System-on-Chip
Devices Using LPGAs
- Fumio Suzuki, Katsuhiko Seo, Hisao Koizumi, Masanobu Hiramine, Hiroto
Yasuura, Kazuo Okino, Zvi Or-Bach
This paper proposes two methods for a rapid prototyping of top-down
System-On-Chip(SOC) design using laser programmable gate arrays (LPGAs).
The first one is a design flow of SOC consisting of four steps:
concept-making, virtual-world prototyping, synthesis, and real-world
prototyping. The steps can be undertaken individually or in tandem and
provisional product models are transformed from upper stream (concept-making)
to lower stream (synthesis) either automatically or semi-automatically.
This method differs from ordinary rapid prototyping methods in that design
evaluation is shifted more upper stream. The SOC device is manufactured
early; The steps follow concept-making, virtual-world prototyping, synthesis
and real-world prototyping to synthesis). This method allows the device to
be evaluated in the actual operating environment. The second method we propose
is based on LPGAs and use of a real-time production fabrication system (FPFs).
With these design methods and environment, we can get a shortest time to
market which offers exciting audio and video capabilities while giving
designers the flexibility they need to rapidly produce innovative and
creative products. This paper describes on the application of these
methods to develop video signal processors for LCD projectors, demonstrating
their efficiency for design tuning and performance optimization.
-
1A.3 Performance Test of Viterbi Decoder for Wideband CDMA System
- Jang-Hyun Park, Yeo-Chul Rho
This paper describes the design, the implementation, and the performance
test of the Serial Viterbi decoder (SVD) using VHDL and FPGAs. The
decoding scheme assumes the transmitted symbols were coded with
a K=9, 32Kbps, and rate 1/2 convolutional encoder with generator function
g0=(753)8 and g1=(561)8 as defined JTC TAG-7 W-CDMA PC8
standard. The SVD is designed using VHDL and implemented using FPGAs.
Main algorithm except memories is implemented in two Altera FLEX81500 FPGAs.
And the performance test results with 3DB Gaussian noises show that the
function of SVD works well.
-
1B.1 Delay Estimation on Optimization of Logic Circuits: A Survey
- Masahiro Fujita, Rajeev Murgai
Logic synthesis has two stages of optimization: technology-independent and
technology-dependent. This paper surveys state-of-the-art methods for
estimation and optimization of delays of logic circuits at
technology-independent stage. Although at this stage we cannot completely
predict final delays after technology mapping, there exist reasonably accurate
estimation techniques. Final delays can be reduced with optimization
techniques that use such estimation.
-
1B.2 Delay Estimation for Technology Independent Synthesis
- Yutaka Tamiya
This paper proposes "path mapping", a method of delay estimation
for technology independent combinational circuits. Path mapping provides
fast and accurate delay estimation using the common ideas with the tree
covering based technology mapping. First, path mapping does technology mapping
for all paths in the circuit with minimum delay. Then, it finds the most
critical path among all the paths in the circuit. Finally, it answers its path
delay as the circuit delay. Experimental results show path mapping estimates
more accurate circuit delay than unit delay, and runs much faster than the
technology mapper.
-
1B.3 Performance and Reliability Driven Clock Scheduling of Sequential Logic
Circuits
- Atsushi Takahashi, Yoji Kajitani
It is known that the clock-period in a sequential circuit can be shorter than
the maximum signal delay between registers if the clock arrival time to each
register is controlled. We propose an algorithm to find the minimum clock-period
of a circuit whose signal propagation delays are given. Experimental results
on LGSynth93 benchmarks show that this technique achieves as much as about
16% reduction of clock-period compared with the conventional maximum signal
delay based methods. An application of this technique to improve the
reliability of circuits is considered.
-
1C.1 CBLO: A Clustering Based Linear Ordering Algorithm for Netlist
Partitioning
- K.S. Seong, C.-M. Yung
In this paper, we propose a clustering based linear ordering algorithm which
consists of global ordering and local ordering. In the global ordering, the
algorithm forms clusters from n given vertices and orders the clusters. In the
local ordering, the elements in each cluster are linearly ordered. The linear
order, thus produced, is used to obtain optimal k-way partitioning based on
scaled cost objective function. Experiment with 11 benchmark circuits for
k-way ( 2 <= K <= 10) partitioning shows that the proposed algorithm yields an
average of 10.6% improvement over MELO for the k-way scaled cost partitioning.
-
1C.2 Design Driven Partitioning
- Dirk Behrens, Robert Tolkiehn, Erich Barke
A new approach for partitioning VLSI digital integrated circuits is presented.
In contrast to known approaches, which use only topological information, the
presented method also exploits specific information about design modules and
higher level design structure. Based on this knowledge, the design driven
procedure creates a cluster structure that incorporates the inherent design
relationships (e.g. signal flow, logic blocks) in the best way possible.
Followed by standard iterative improvement algorithms partitions are produced
that outperform many partitioning approaches published before. Because of its
linear time complexity the presented clustering strategy is able to handle very
large designs. Due to its modular structure it can be easily extended to
incorporate special design features or target architectures such as emulation
systems.
-
1C.3 A RTL Partitioning Method with a Fast Min-Cut Improvement Algorithm
- Kenichi Kawaguchi, Chie Iwasaki, Michiaki Muraoka
A design flow with register-transfer-level (RTL) partitioning and a RTL
partitioning algorithm for efficient logic synthesis and layout are described
in this paper. Changing the parameter of partitioning optimization dynamically,
the algorithm improves an interconnection cost in a short CPU time.
Experimental results on large circuits show that the algorithm partitioned
circuits with the large number of RTL components in a tenth to a hundredth of
conventional partitioning times.
-
1C.4 Acceleration of Mincut Partitioning using Hardware CAD Accelerator
TP5000
- Masahiro Sano, Shintaro Shimogori, Fumiyasu Hirose
This paper presents a new approach of data pipelining for mincut partitioning
acceleration using a parallel computer. We choose the hardware CAD accelerator
TP5000 to implement our approach. We obtain a speed improvement of 20 to 25
times as fast as a SPARCStation-10 by using 10 processors in the TP5000.
Computing Brokerage and Its Applications in VLSI Design
- Youn-Long Lin
With Internet access available to virtually every one in this community,it is
interesting to investigate on how Internet will affect the future of VLSI
design and CAD. We will describe an experimental WWW-based computing broker.
Theoretically, the broker is capable of providing every user with access to
any hardware platforms and any software over the Internet. It makes possible
pay-per-use of both hardware and software resources. It also automatically
manages multiple resources ranging from a few seats within an organization to
thousands of seats anywhere with the Internet access. This new model of
resource usage will have significant impact on the users, the software
developers, and the computer vendors. Users no longer have to own nor maintain
expensive computers and software tools before they can start their projects.
They will have more flexibility in allocating resources to meet the project
schedule. Also they will be able to access to the latest technology at lower
overall cost. Tool developers and computer vendors will have broader customer
base with very little marketing and field support effort. This new model will
also provide a better chance for new tools and new platforms.
Key Words:Computing Brokerage; WWW; Internet; CAD; VLSI Design; Pay-per-Use;
-
2A.1 A Programmable Application-Specific VLSI Architecture and Implementation
for Speech Word-Recognizer
- An-Nan Suen, Jhing-Fa Wang, Tswen-Duh Wang
In this paper, the efficient and flexible VLSI architecture and implementation
for the voice word-recognizer processor are presented. In order to achieve a
exible and efficient VLSI realization, we use a programmable with specific
core design strategy which incorporates the best aspects of both programmable
and application specific signal processors to achieve high speed, high accuracy,
and efficient hardware realization for the word-recognizer. On the whole, the
single chip is fabricated in 0.8 um double-metal CMOS technology after the
physical design and circuit verification. The chip can process 40 MHz sampled
data and it contains about 70000 transistors which occupy 0.62.x0.60 cm2
area.
-
2A.2 A High Performance FIR Filter Dedicated to Digital Video Transmission
- Shun Morikawa, Keisuke Okada, Isao Shirakawa, Sumitaka Takeuchi
A digital filter is one of the fundamental elements in the digital video
transmission, and a multiplier acts as the key factor that determines the
operation speed and silicon area of the filter. Even though the coeficients to
the filter are desired to be programmable, it is possible to change coeficients
in the vertical y-back interval of television receivers. This allows the
preloadability of coeficients to the filter such that each coeficient can be
treated as a constant during the filtering operation. Motivated by such
functionalities, a novel multiplier together with an FIR filter architecture is
described, which has been designed by means of a 0.5um double metal CMOS
technology.
-
2A.3 An Efficient Hierarchical Clustering Method for the Multiple Constant
Multiplication Problem
- Akihiro Matsuura, Mitsuteru Yukishita, Akira Nagoya
In this paper, we propose an efficient solution for th Multiple Constant
Multiplication (MCM) problem. The method exploits common subexpressions among
constants based on hierarchical clustering and reduce the number of shifts,
additions, and subtractions. The algorithm defines appropriate weights which
indicate the operation priorities and selects the common subexpressions which
results in the least number of local operations. It can also be extended to
various high-level synthesis tasks such as arbitrary linear transforms.
Experimental results show the effectiveness of our method.
-
2A.4 Structural Approach for Performance Driven ECC Circuit Synthesis
- Chau-Chin Su, Kathy Y. Chen, Shyh-Jye Jou
ECCGen is a logic synthesizer for error control coding circuits. It takes H
matrices as inputs and produces circuit schematics in two steps, literal
minimization and gate/pin assignment. Different from conventional logic
synthesis tools, it takes a structural approach to avoid the combinatorial
explosion problem in Boolean function and/or true table representations of ECC
circuits. Moreover, the structural approach also reduce the complexity of
timing and area optimization significantly when multiple-input exclusive-or
gates are used. The test results show that ECCGen achieves a reduction of 57%
in transistor count and 15% in delay time on thirteen industrial ECC
circuits.
-
2B.1 Statistical Estimation of Combinational and Sequential
CMOS Digital Circuit Activity Considering Uncertainty of Gate Delay
- Tan-Li Chou, Kaushik Roy
While estimating glitches or spurious transitions is challenge due to signal
correlations, the random behavior of logic gate delays makes the estimation
problem even more difficult. In this paper, we present statistical estimation
of signal activity at the internal and output nodes of combinational and
sequential CMOS logic circuits considering uncertainty of gate delays. The
methodology is based on the stochastic models of logic signals and the
probabilistic behavior of gate delays due to process variations, interconnect
parasitics, etc. We propose a statistical technique of estimating average-case
activity, which is flexible in adopting different delay models and variations.
Experimental results show that the uncertainty of gate delays makes a great
impact on activity at individual nodes (more than 100%) and total power
dissipation (can be overestimated up to 65 %) as well.
-
2B.2 An Entropy Measure for Power Estimation of Boolean Functions
- Chi-Hong Hwang, Allen Chung-Hao Wu
In this paper, we present a study on the relationship between entropy and the
average power consumption of circuits generated from Boolean functions. Based
ona general-delay model, an entropy-based formulation for power estimation is
derived from a large set of experimental data. The study shows that the entropy
measure provides an effective power estimate for single-output and
fully-correlated multiple-output functions. The study also shows that if
entropy is used as a power measure, the internal structure of a circuit must be
considered inorder to achieve accurate power estimates for non-correlated
multiple-output functions. Experiments on a set of benchmarks demonstrate that
combining entropy-based power measures with input-output correlation analyses
of logic functions leads to a viable measure for high-level power estimation.
-
2B.3 An Enhanced Iterative Improvement Method for Evaluating the Maximum
Number of Simultaneous Switching Gates for Combinational Circuits
- Kai Zhang, Haruhiko Takase, Terumine Hayashi, Hidehiko Kita
This paper presents an enhanced iterative improvement method with
multiple pins (EIIMP) to evaluate the maximum number of simultaneous
switching gates. Although the iterative improvement method is a simple
algorithm, it is powerful to this purpose. Keeping this advantage, we
enhance it by two points. The first one is to change values for multiple
successive primary inputs at a time. The second one is to rearrange
primary inputs on the basis of the closeness that represents the number of
overlapping gates between fan-out regions. Our method
is shown to be effective by experiments for
ISCAS benchmark circuits.
-
2B.4 A Power Driven Two-Level Logic Optimizer
- Jyh-Mou Tseng, Jing-Yang Jou
In this paper we present Boolean techniques for reducing the power consumption
in two-level combinational circuits. The two-level logic optimizer performs
the logic minimization for low power targeting static PLA, general logic gates
and dynamic PLA implementations. We modify Espressa algorithm by adding our
heuristics that bias the logic minimization toward lowering the power dissipation. In our heuristics, signal probablities and transition densities are two
important parameters. The experimental results are promising.
-
2B.5 A Note on the Relationship Between Signal Probability and Switching
Activity
- Massoud Pedram, Qing Wu, Xunwei Wu
In current probability calculation algorithms for power estimation, switching
activity ESW of a node is calculated from its signal probability p by the
following simple relation: ESW = 2p(1-p). It is generally understood that this
simple relationship holds under the temporal independence assumption for the
node. This paper however shows that the above equation also gives the expected
value of the transition activity in any sequence that satisfies the given
signal probability (averaged over all such sequences). Therefore, this equation
can be used to calculate the switching activity under more general conditions
than previously thought.
-
2C.1 Modeling and Layout Optimization of VLSI Devices and Interconnects in
Deep Submicron Design
- Jason Cong
This paper presents an overview of recent advances on modeling and layout
optimization of devices and interconnects for high-performance VLSI circuit
design under the deep submicron technology. First, we review a number of
interconnect and driver/gate delay models, which are most useful to guide the
layout optimization. Then, we summarize the available performance optimization
techniques for VLSI device and interconnect layout, including driver and
transistor sizing, transistor ordering, interconnect topology optimization,
optimal wire sizing, optimal buffer placement, and simultaneous topology
construction, buffer insertion, buffer and wire sizing. The efficiency and
impact of these techniques will be discussed in the tutorial.
-
2C.2 A New Layout-Driven Timing Model for Incremental Layout Optimization
- Fang-Jou Liu, John Lillis, Chung-Kuan Cheng
In this paper we present a new layout-driven timing model based on Asymptotic
Waveform Evaluation (AWE) for improved timing analysis during routing. Our
model enables the bottom-up computation of interconnect tree moments, and can
be easily integrated with such a global router. Such an integration achieves
incremental layout optimization, i.e., timing analysis and routing are tightly
coupled, with feedback between them. This achieved incremental layout
optimization, through our innovative timing model, is the main contribution of
this work.
-
2C.3 Par-POPINS: A Timing-Driven Parallel Placement Method with the Elmore
Delay Model for Row Based VLSIs
- Tetsushi Koide, Mitsuhiro Ono, Shin'ichi Wakabayashi, Yutaka Nishimaru
In this paper, we present a parallel algorithm running on a shared memory
multi-processor workstation for timing driven standard cell layout. The
proposed algorithm is based on POPINS2.0 [13] and consists of three phases.
First, we get an initial placement by a hierarchical timing-driven mincut
placement algorithm. At the top level of partitioning hierarchy, we perform one
step of bi-partitioning by several processors, and in the lower levels of
partitioning hierarchy, partitionings of each region in a level are performed
in parallel. Next, in phase 2, iterative improvement of the sub-circuit which
contains critical paths is performed by nonlinear programming. Parallel
processing is realized by performing the nonlinear programming method to each
sub-circuit in parallel. Finally, in phase 3, the placement is transformed to
a row based layout style by a timing-driven row assignment method. We have
implemented the proposed method on a 4CPU multi-processor workstation and
showed that the proposed method is promising through experimental results.
JavaTM in Electronic Design Automation
- Pete Denyer, Jean Brouwers
Increasing design complexity and the need for multi-disciplinary /
multi-national design collaboration is causing a paradigm shift in the EDA
application environment. This shift is necessary in order that time-to-profit
goals are met in increasingly compressed market windows. The envisioned
paradigm shift is enabled through Sun's JavaTM technology. This technology
will impact significantly the development, deployment, use and support of
Electronic Design Automation (EDA) applications. This paper will examine some
of the factors influencing this forthcoming EDA revolution and review some of
the challenges yet to be resolved.
-
3A.1 Polling-based Real-time Software for MPEG2 System Protocol LSIs
- Jiro Naganuma, Makoto Endo
This paper proposes polling-based real-time software for MPEG2 System protocol
LSIs, which is a typical embedded and real-time system on a chip, and
demonstrates its performance and usefulness. The polling-based real-time
software is designed and optimized by analyzing application specific function
requirements and deciding scheduling intervals and the execution cycles of
each task. It requires neither hardware for multiple interrupt handling nor
software for heavy context switching. The polling-based approach provides
sufficient performance without any hardware and software overhead for a
real-time application like the MPEG2 System protocol.
-
3A.2 Synthesis and Analysis of an Industrial Embedded Microcontroller
- Ing-Jer Huang, Li-Rong Wang, Yu-Min Wang
This paper presents a case study of synthesis and analysis of the industrial
embedded microcontroller HT48100, using the hardware/software co-synthesis tool
(PIPER-II) for microcontrollers/microprocessors. The synthesis tool accepts as
input the instruction set architecture (behavioral) specification, and produces
as outputs the pipelined RTL designs with their simulators, and the reordering
constraints which guide the assembler how to generate code for the synthesized
designs. The study shows that the synthesis approach was able to help the
original design team to evaluate their design quality, analyze the
architectural properties and explore possible architectural improvements and
their impacts in both hardware and software. Feasible future upgrade for the
microcontroller family is identified by the study. Further cooperation with the
design team has been undertaken to integrate the synthesis methodology into
their design flow.
-
3A.3 ASAver.1: An FPGA-Based Education Board for Computer Architecture/System
Design
- Hiroyuki Ochi
This paper proposes a new approach that makes it possible for every
undergraduate student to perform experiments of developing a pipelined RISC
processor within limited time available for the course. The approach consists
of 4 steps; at the first step, modeling of pipelined RISC processor is
simplified by avoiding structural hazard and by ignoring other hazards, and in
the succeeding steps, students learn difficulties of pipelining by themselves.
An educational FPGA board ASAver.1 and results of feasibility study are also
shown.
-
3B.1 Property Verification in the Design of Telecom Applications
- M. Bombana, P. Cavalloro, F. Ferrandi
The industrial interest in the application of formal methods in the design of
complex ASICs is noteworthy to improve the efficiency of the design process
(reduced time-to-market) and to increase the quality of the final products
(increased competitive profile). In this paper we focus our attention on design
capture and functional verification, two critical phases in the current design
methodologies. A modular toolset built around a model checker is described. A
telecom co-processor is presented, and general properties derived. A
user-oriented taxonomy of properties is introduced to support the design
practice. Guidelines for the application of this technique are inferred from
the example and generalized.
-
3B.2 Verification Methodology of Compatible Microprocessors
- Joon-Seo Yim, Chang-Jae Park, Woo-Seung Yang, Hun-Seung Oh,
Hee-Choul Lee, Hoon Choi, Tae-Hoon Kim, Seung-Jong Lee, Nara Won, Yung-Hee Lee,
In-Cheol Park, Chong-Min Kyung
As the complexity of high-performance microprocessor increases, functional
verification becomes more difficult and emerges as the bottleneck of the design
cycle. In this paper, we suggest a functional verification methodology,
especially for the compatible microprocessor design. To guarantee the perfect
compatibility with previous microprocessors, we developed three C models in
different representation levels, i.e., Polaris, MCV(Micro-Code Verifier) and
StreC. C models are co-simulated with consistency checking between different
two models. The simulation speed of C models makes it possible to test the
"real-world" application programs on the RTL design with a software board model.
To increase the confidence level of verifications, Profiler reports the
verification coverage of the test vector, which is fed back to the automatic
test program generator. Restartability feature also helps significantly reduce
the total simulation time. Using the proposed verification methodology, we
designed and verified an Intel 486-compatible microprocessor successfully.
-
3B.3 RTL Verification of Timed Asynchronous and Heterogeneous Systems
using Symbolic Model Checking
- Peter A. Beerel, Vida Vakilotojar
This paper describes a tool-supported methodology for the
register-transfer-level formal verification of a growing hardware design
paradigm-timed asynchronous systems. These systems are a network of
communicating asynchronous and synchronous components and have correctness
constraints that depend on specified bounded delays. This paper formalizes the
verification problem and demonstrates how time-discretization, abstraction,
and non-determinism can lead to a system model comprised of communicating
finite state machines composed synchronously. The paper then describes a
translator that accepts structural VHDL system description along with controller
specifications and generates the input to a symbolic model checker (SMV).
Finally, we describe two case studies in which concurrent verification and
design led to the correction of many errors not easily found using simulation.
-
3C.1 CB-Power: A Hierarchical Cell-Based Power Characterization and Estimation
Environment for Static CMOS Circuits
- Wen-Zen Shen, Jiing-Yuan Lin, Jyh-Ming Lu
In this paper, we present CB-Power, a hierarchical cell-based power
characterization and estimation environment for static CMOS circuits. The
environment is based on a cell characterization system for timing, power and
input capacitance and on a cell-based power estimator. The characterization
system can characterize basic, complex and transmission gates. During the
characterization, input slew rate, output loading, capacitive feedthrough
effect and the logic state dependence of nodes in a cell are all taken into
account. The characterization methodology separates the power consumption of a
cell into three components, e.g., capacitive feedthrough power, short-circuit
power, and dynamic power. With the characterization data, a cell-based power
estimator (CBPE) embedded in Verilog-XL is used for estimating the power
consumption of a circuit. CB-Power is also a hierarchical power estimator.
Macrocells such as flip-flops and adders are partitioned into primitive gates
during power estimation. Experimental results on a set of MCNC benchmark
circuits show that CB-Power provides within 6% error of SPICE simulation on
average while the CPU time consumed is more than two orders of magnitude less.
-
3C.2 Power Consumption in CMOS Combinational Logic Blocks at High
Frequencies
- Sri Parameswaran, Hui Guo
A new model for estimating dynamic power dissipation in CMOS combinational
circuits at differing voltages is presented in this paper. The proposed model
deals with power dissipation of circuits at saturation frequencies, where the
output voltage does not reach 100% of the supply voltage and the output voltage
waveform is almost a triangular waveform. In this paper we show that the
dynamic power consumption at saturation frequencies is only dependent on the
supply voltage, and is independent of load capacitance and switching speed.
This model shows that when a circuit is working in the saturation frequency
range, as the frequency is increased, the performance/power ratio is increased.
However, this increase in performance/power ratio is at the expense of noise
margin. The model is theoretically and empirically shown to be correct. This
model can be used to design a system where the differing combinational logic
blocks are supplied with differing voltages. Such a system would consume lower
power than if the system was supplied by a single voltage rail.
-
3C.3 A New Approach for an AHDL Based on System Semantics
- Youcef Bourai, Nouma Izeboudjen, Yacine Bouhabel, Amine Tafat
A new approach for Analog Hardware Design Language (AHDL) is presented.
This is based on system semantics principle. This principle allows to
define a language that provides a unified syntex to describe the different
aspects of a Op_Amp. This is applicable by considering that the basic
components of an Op_Amp are adirectional systems. These components are
described by combinators. A set of semantic functions are applied on these
combinators to give them a meaning.
-
3D.1 EMC-Adequate Design of Printed Circuit Board as a Part of the System
Development
- W. John
The EMC-adequate design of microelectronics systems includes all actions
intended to eliminate electromagnetic interference in electronic systems.
Challenges faced in the mircoelectronic area include growing system complexity,
higher operating speed, denser design at all levels of integration (chip,printed
circuit board, MCM and system).
Growing complexity, denser design and higer speed all lead to a substantial
increase in EMC problems and design time. EMC is not commonly accepted in
microelectronic design. Microelectronic designers have the opinion the EMC
has to do with electrical and electronic systems and mandatory product
regulations instead of requirements to the integrated circuit they are
designing.
In this contribution a concept for an EMC-adequate design of electronic systems
will be introduced. This concept is based on a generalized development
process to integrate EMC-constraints into system design. A prototype of
an environment to analyse signal integrity effects on PCB based on a workflow
oriented integration approach will be introduced. Based on this approach the
generation of user specific design and anlysis environments including
various set of EMC-tools is possible.
-
3D.2 Multi-Pride: A System for Supporting Multi-Layered Printed Wiring
Board Design
- Toshimasa Watanabe
The purpose of the paper is to outline MULTI-PRIDE, a system for supporting
multi-layered printed wiring board design. It consists of (i) circuit
bipartition, (ii) placement and routing on each outside layer, (iii)
modification of wiring and compaction, and (iv) routing on inside layers.
-
3D.3 Crosstalk Noise in High Density and High Speed Interconnections
due to Inductive Coupling
- Tetsuhisa Mido, Kunihiro Asada
Crosstalk noise in long interconnections is studied based on capacitive
coupling and inductive coupling. It is shown that pulse noise is induced due
to inductive coupling in heterogeneous insulators. The pulse noise becomes
predominant noise factor on the condition that lines become longer, line
resistance become lower, the signal raising time becomes faster and dielectric
constant of materials in the gaps on lines becomes smaller.
CAD Methodology and Business Models for Future Products
- Daniel D. Gajski
The advances in CAD tools and fabrication technology allow system companies
today to offer new product models every year. This short design and
manufacturing cycle forces system companies to rethink their product
development methologies and business models. The design technology plays an
essential role in this new mode of operation. In this talk we will give a
brief overview of the past trends and speculate on the future trends based on
the new focus of design automation on complete product concept, executable
specification, electrical/mechanical, software/hardware codesign and
product-level synthesis, validation and integration.
-
4A.1 Embedded Architectural Simulation within Behavioral Synthesis Environment
- A. Jemai, P. Kission, A.A. Jerraya
This paper introduces one way to integrate an interactive simulator within a
behavioral synthesis tool, thereby allowing concurrent synthesis and simulation.
Such a simulator performs dynamic analysis and execution time evaluation. This
paper also discusses an implementation of this concept resulting in a simulator,
called AMIS. This tool assists the designer for understanding the results of
behavioral synthesis and for architecture exploration.
-
4A.2 Evaluating Cost-Performance Tradeoffs for System Level Applications
- Wek-Liang Ing, Cheng-Tsung Hwang, Allen Chung-Hao Wu
Evaluation of design cost and performance is indispensable to system
partitioning. In the absence of a system-level estimation and analysis tool,
system partitioning is difficult to perform in an efficient and accurate
manner because design evaluation can only be done after the final results are
achieved. Furthermore, without cost-performance tradeoff information relating
to different design alternatives, the designer can not make intelligent design
decisions at the early system-level partitioning stages. In this paper, we
present a system-level cost/performance evaluation approach which systematically
explores the AT (Area-Time) design-space from a system description. This allows
the designer to obtain first-hand design tradeoff information before the
partitioning process has taken place. We have also developed a system-level
interactive design evaluation system on top of the proposed approach.
Experiments on a number of examples demonstrate that our approach provides the
designer with a comprehensive system-level design evaluation method to
effectively explore all possible design alternatives in the early stages of
system development.
-
4A.3 A Quantitative Analysis for Optimizing Memory Allocation
- Youn-Sik Hong, Choong-Hee Cho, Daniel D.Gajski
Memory allocation problem has two independent goals: minimization of number
of memories and minimization of number of registers in one memory. Our
concern is the ordering of bindings during memory allocation. We formulate
and analyze three different memory allocation algorithms by changing their
binding order. It is shown that when we combine these subtasks and solve them
simultaneously by heuristic cost function significant savings (up to 20%)
can be obtained in the total area of memories.
-
4B.1 Concurrent Cell Generation and Mapping for CMOS Logic Circuits
- Mineo Kaneko, Jialin Tian
The conventional technology mapping method is selecting cells from a limited
standard library, and the performance of the resultant circuit deeply depends
on the characteristics of the library. To realize detailed optimization not
limited by an instance of cell library and to reduce the maintenance cost of
standard cell libraries, a novel paradigm for technology mapping, in which cell
generation and mapping can be executed concurrently, will be considered. This
paper shows an outline of a concurrent cell generation and mapping strategy,
and proposes a method to map an input Boolean network into CMOS transistor
network. The transduction in transistor level is introduced for cell generation
and the Dynamic Programming is utilized for cell assignment.
-
4B.2 Logic Synthesis for Cellular Architecture FPGAs Using BDDs
- Gueesang Lee
In this paper, an efficient approach to the synthesis of CA(Cellular
Architecture)-type FPGAs is presented. To exploit the array structure of cells
in CA-type FPGAs, logic expressions called Maitra term s, which can be mapped
directly to the cell arrays are generated. In this approach, a BDD is modified
so that each node of the BDD has another branch which is an exclusive-OR of the
two branches of a node. Once the modified BDD is obtained, a traversal of the
BDD is sufficient to generate the Maitra terms needed. Since a BDD can be
traversed in O ( n ) steps, where n is the number of nodes in the BDD, Maitra
terms are generated very efficiently. This also removes the need for
generating minimal SOP or ESOP expressions which can be costly in some cases.
The experiments show that the proposed method generates better results than
existing methods.
-
4B.3 BDD Based Lambda Set Selection in Roth-Karp Decomposition for LUT
Architecture
- Jie-Hong Jiang, Jing-Yang Jou, Juinn-Dar Huang, Jung-Shian Wei
Field Programmable Gate Arrays (FPGA's) are important devices for rapid system
prototyping. Roth-Karp decomposition is one of the most popular decomposition
techniques for Look-Up Table (LUT)-based FPGA technology mapping. In this
paper, we propose a novel algorithm based on Binary Decision Diagrams (BDD's)
for selecting good lambda set variables in Roth-Karp decomposition to minimize
the number of consumed configurable logic blocks (CLB's) in FPGAs. The
experimental results on a set of benchmarks show that our algorithm can
produce much better results than those of the previous approach [1].
-
4C.1 General Floorplanning with L-shaped, T-shaped and Soft Blocks Based on
Bounded Slicing Grid Structure
- Maggie Kang, Wayne Wei-Ming Dai
A new method of non-slicing floorplanning is proposed, which is based on the
new representation for non-slicing floorplans proposed by [1], called bounded
slicing grid (BSG) structure. We developed a new greedy algorithm based on the
BSG structure, running in linear time, to select the alternative shape for each
soft block so as to minimize the overall area for general floorplan, including
non-slicing structures. We propose a new stochastic optimization method, named
genetic simulated annealing (GSA) [3] for general floorplanning. Based on BSG
structure, we extend SA-based local search and GA-based global crossover to
L-shaped, T-shaped blocks and obtain high density packing of rectilinear
blocks.
-
4C.2 A Building Block Placement Tool
- Jonathan Dufour, Robert McBride, Ping Zhang, Chung-Kuan Cheng
When designing integrated circuits, sub-components rarely end up being
perfectly rectangular. However, currently most block-placers only consider
rectangular components, resulting in inefficient area utilization. We propose
a placement tool that allows arbitrarily sized and shaped convex components.
It extends the rectangle-packing method proposed by Kajitani. We describe the
methods used to create the placement and give some performance results.
-
4C.3 VEAP: Global Optimization based Efficient Algorithm for VLSI Placement
- Kong Tianming, Hong Xianlong, Qiao Changge
In this paper we present a very simple, efficient while effective placement
algorithm for Row-based VLSIs. This algorithm is based on strict mathematical
analysis, and provably can find the global optima. From our experiments, this
algorithm is one of the fastest algorithms, especially for very large scale
circuits. Another point desired to point out is that our algorithm can be run
in both wirelength and timing-driven modes.
-
4C.4 An Improved Objective for Cell Placement
- Yu-Wen Tsay, Hsiao-Pin Su, Youn-Long Lin
To estimate the wiring area needed by the router to connect a signal net, most
placement tools measure one half of the perimeter of the minimum rectangle
enclosing all terminals of the net. In the past, this approach is reasonable
because the half-perimeter value correlates well with the wiring area. As we
are entering the deep-submicron era, the approach is no longer appropriate
because the wiring delay must be characterized based on a distributed-RC model,
in which not only the wiring area but also the wiring topology affects the
wiring delay. In this paper, we show that the half-perimeter metric does not
correlate well with the wiring delay under the distributed-RC model. We show
that the radius of a net estimates the wiring delay more accurately than the
half-perimeter metric does. We expand the acceptance criteria of a simulated
annealing based placement tool to include moves that do not improve on the
wiring length but do reduce the radius. Over all, for a set of benchmark
circuits the critical path delays are improved up to 15%.
-
4D.1 HK386: An x86-Compatible 32bit CICS Microprocessor
- C.M. Kyung, I.C. Park, S.K. Hong, K.S. Seong, B.S. Kong, S.J. Lee,
H. Choi, S.R. Maeng, D.T. Kim, J.S. Kim, S.H. Park, Y.J. Kang
In this paper, we describe the implementation and design methodology of a
microprocessor, called HK386. The microprocessor is compatible with Intel
80386 with respect to the behavior of each instruction set. As the extraction
of the exact behavior of each instruction set is the single most important
step in compatible chip design, we focused our effort on establishing the
reliable verification strategy ensuring the complete instruction level
compatibility. The HK386 was successfully designed and fabricated using 0.8
um CMOS technology.
-
4D.2 Super Low Power 8-bit CPU with Pass-Transistor Logic
- Kazuo Taki, Bu-Yeol Lee, Hideki Tanaka, Kenzo Konishi
A very low power 8-bit CPU core has been designed based on an original
pass-transistor logic family, SPL and SPHL. The instruction set and external
timings are compatible with the Zilog Z80. Average supply current is 740uA at
3V with a 10MHz-clock, equivalent to 26% of that of the commercial CMOS Z80
CPU cores using the same design rules (0.8m, w-metal).
-
4D.3 A Functional Memory Type Parallel Processor for Vector Quantization
- K. Kobayashi, M. Kinoshita, M. Takeuchi, H. Onodera, K. Tamaru
We propose a memory-based parallel processor for vector quantization called a
functional memory type parallel processor for vector quantization (FMPP-VQ).
It accelerates nearest neighbor search of vector quantization. All distances
between an input vector and reference vectors in a codebook are computed
simultaneously in all PEs. The minimum value of all distances is searched in
parallel. The nearest vector is obtained in O(k ), where k stands for the
dimension of vectors. An LSI including four PEs has been implemented. It
operates at 25MHz clock frequency.
-
4D.4 High Speed Bit-Serial Parallel Processing on Array Architecture
- Kazuhito Ito, Takenobu Shimizugashira, Hiroaki Kunieda
Word-parallel bit-serial processing is a solution to high speed processing
suitable for VLSI. In this paper a new bit-serial parallel processing
architecture is proposed. A VLSI chip for a digital filter is designed based
on the proposed architecture and it is implemented on a gate array chip.
Through the implementation, it is verified that bit-serial parallel processing
on an array architecture achieves high speed processing and easy design.
-
4D.5 Self-Timed 1-D ICT Processor
- Johnson T.C. Pang, Oliver C.S. Choy, C.F. Chan, W.K. Cham
This paper describes a LSI implementation of 1-D order-8 Integer Cosine
Transform (ICT) which can calculate either forward or reverse transformation.
It is a standard-cell based design using 0.7mm CMOS SDLM LM process. The chip's
performance is maximized with the fast computation algorithm and self-timed
circuit technique. It consists of eight parallel self-timed pipelines. Each
self-timed block is designed based on 2-phase handshaking protocol and variable
delay concept. The die size is 5.7x4.1mm with about 76k transistors. This chip
supports 16-bit I/O data and its data rate is up to 60MHz.
-
4D.6 A Real-Time High Performance Edge Detector for Computer Vision
Applications
- Fahad Alzahrani, Tom Chen
We present a high performance edge detection architecture for real-time image
processing applications. The architecture is finely pipelined. The proposed
ASIC is capable of producing one edge-pixel every clock cycle. At a clock rate
of 10 MHz, the architecture can process 30 frames per second, where the size of
each frame is 640480 8-bit pixels. The ASIC was laid out and fabricated
using Samsung's 0.8um double-metal CMOS process.
-
4D.7 An LSI Implementation of the Simple Serial Synchronized Multistage
Interconnection Network
- Takayuki Kamei, Masashi Sasahara, Hideharu Amano
A high speed switch is a critical
component of multiprocessors. Multistage Interconnection Network (MIN) has been
utilized as a switch for connection processors and memory modules in
multiprocessors. Unlike the crossbar, it consists of small switching elements,
and provides a high bandwidth with relatively small hardware. Most of
traditional MINs are blocking networks and packets are transferred in the
store-and-forward manner between switching elements with bit-parallel(8-64bits)
lines. Since the width of communication paths and transferrd mannar cause
pin-limitation problem and complicated structure, the high density
implementation and high speed clock is not utilized. In order to solve these
problems, we implemented the SSS-PBSF chip. This switch uses the PBSF
connection structure which can obtain a higher bandwidth than that of crossbar
with connecting banyan networks in 3 dimensional direction. Simple Serial
Synchronized (SSS) style control mechanism is adopted both for high speed
operation and solving the pin-limitation problem.
-
4D.8 The DRT Network Router Chip
- Hiroaki Nishi, Hideharu Amano, Katsunobu Nishimura, Ken-ichiro Anjo,
Tomohiro Kudoh
The RDT network Router chip is a versatile router for the massively parallel
computer prototype JUMP-1, which is currently under development by collaboration
between 7 Japanese universities[1]. The major goal of this project is to
establish techniques for building an efficient distributed shared memory on a
massively parallel processor. For this purpose, the reduced hierarchical
bit-map directory (RHBD) schemes [2] are used for efficient cache management
of the distributed shared memory. In order to implement (RHBD) schemes
efficiently, we proposed a novel interconnection network RDT (Recursive
Diagonal Torus)[3], and developed a sophisticated router chip for the RDT
which equips a hierarchical multicast mechanism without deadlock and
acknowledge combining mechanism. By using the 0.5uBiCMOS SOG technology, it
can transfer all packets synchronized with a unique CPU clock(60MHz). Long
coaxial cables(4m at maximum) are directly driven with the ECL interface of
this chip. Using the dual port RAM, packet buffers allow to push and pull a it
of the packet simultaneously. The mixed design approach with schematic and VHDL
permits the development of the complicated chip with 90,522 gates in a year.
-
4D.9 Single Cycle Access Cache for the Misaligned Data and Instruction Prefetch
- Joon-Seo Yim, Hee-Choul Lee, Tae-Hoon Kim, Bong-Il Park,
Chang-Jae Park, In-Cheol Park, Chong-Min Kyung
In microprocessors, reducing the cache access time and the pipeline stall is
critical to improve the system performance. To overcome the pipeline stall
caused by the misaligned multi-words data or multi cycle accesses of prefetch
codes which are placed over two cache lines, we proposed the Separated
Word-line Decoding (SEWD) cache. SEWD cache makes it possible to access
misaligned multiple words as well as aligned words in one clock cycle. This
feature is invaluable in most microprocessors because the branch target address
is usually misaligned, and many of data accesses are misaligned. 8K-byte SEWD
cache chip consists of 489,000 transistors on a die size of 0.853 x 0.827
cm(2) and is implemented in 0.8 um DLM CMOS process operating at 60 MHz.
-
4D.10 VLSI Implementation of a Real-time Operating System
- Takumi Nakano, Yoshiki Komatsudaira, Akichika Shiomi, Masaharu Imai
This paper proposes a new approach to realize a very high performance
real-time OS using VLSI technology. In order to confirm the effectiveness
of this method, the most basic system calls have been designed. According to
the evaluation results based on a gate array implementation, hardware portion
of system calls can be executed within 4 clocks and the task scheduler can be
performed in only 8 clocks simultaneously, which are about 130 to 1880 times
faster than software implementation.
-
4D.11 A CMOS Delayed Locked Loop (DLL) for Reducing Clock Skew to Under
500ps
- Yong-Bin Kim, Tom Chen
This paper presents a variable delay line DLL circuit implemented in a 0.8um
CMOS technology. A phase detector and two charge pump circuits calibrate the
delay per stage of the delay line using push-pull type clock synchronization
scheme. The delay line can be programmed 6 to 18 stages. The DLL circuit is
capable of reducing clock skew from 1-3ns to below 500ps for clock frequencies
from 50Mhz to 150Mhz.
-
4D.12 A Current Mode Cyclic A/D Converter with a 0.8um CMOS Process
- Masaki Kondo, Hidetoshi Onodera, Keikichi Tamaru
We have developed a current mode cyclic analog-to-digital converter using a
0.8um CMOS process. Our circuit structure makes it possible to construct the
converter without any precise analog components, hence, it is well compatible
with submicron processes. The fabricated circuit has an area of 0.014mm^2 and
performs 8-bit resolution at a sampling rate of 40kHz and average power
dissipation of 370uW at 4V supply voltage.
-
4D.13 A Current-mode,3V,20MHz, 9-bit equivalent CMOS Sample-and-Hold
Circuit
- Yasuhiro Sugimoto, Tetsuya Iida
A new current-mode, low-power, low-voltage and high-speed CMOS
sample-and-hold circuit has been designed and fabricated. A new current-mode
differential switching scheme has been adopted to eliminate errors caused by
feedthrough injection from the sample switches. The experimental result
yields 9-bit resolution in 9mW power dissipation, in a 20MHz clock frequency
from a 3V power supply.
-
5A.1 Hardware-Software Co-design: Tools for Architecting Systems-On-A-Chip
- Rajesh K. Gupta
This paper examines the issues and progress in the design of highly integrated
microelectronic systems. These microsystems rely on an array of diverse
components such as processors, memory, network interfaces, graphics and DSP
'cares'. In particular, we discuss problems in the combined design
of hardware and software for these systems. We present a decomposition
of the co-design problem, and identify the needed technologies in
specification/modeling, synthesis and validation for efficient and
error-free system designs. Co-design tools along with domain-specific design
and methodologies provide a key advantage to the system integrator in building
complex single-chip systems. We illustrate this point in the specific area of
architectural evaluation using co-simulation tools.
-
5A.2 Trade-off Evaluation in Embedded System Design Via Co-simulation
- Claudio Passerone, Luciano Lavagno, Claudio Sansoe, Massimiliano Chiodo, Alberto Sangiovanni-Vincentelli
Current design methodologies for embedded systems often force the designer to
evaluate early in the design process architectural choices that will heavily
impact the cost and performance of the final product. Examples of these choices
are hardware/software partitioning, choice of the micro-controller, and choice
of a run-time scheduling method. This paper describes how to help the designer
in this task, by providing a flexible cosimulation environment in which these
alternatives can be interactively evaluated.
-
5A.3 A Transformational Codesign Methodology
- Tommy King-Yin Cheung, Graham Hellestrand, Prasert Kanthamanon
We present a hardware/software codesign methodology using formal
transformations. The goal is to refine a given function specification of a
task to an operational structure involving both hardware and software
components. The refinement process is separated into two levels, the
algorithmic and the structural. Within each level, refinement is accomplished
by applying sequences of transformations that preserve the functionality of an
initial specification. This allows various 'correct' design alternatives to be
generated and their costs analyzed. At the algorithmic level, different
algorithm designs are explored, each producing a computational schedule that
has a different performance cost. At the structural level, different spatial
structures with different resources and performance costs are explored. These
costs which characterize the designs are used to assist in the hardware/software
partitioning. An example is used throughout to illustrate this methodology.
-
5B.1 A Testability Analysis Method for Register-Transfer Level Descriptions
- Mizuki Takahashi, Ryoji Sakurai, Hiroaki Noda, Takashi Kambe
In this paper, we propose a new testability analysis method for
Register-Transfer Level(RTL) descriptions. The proposed method is based on the
idea of testability analysis in terms of data flow and control structure which
can be extracted from RTL designs. We analyze testability of RTL descriptions
with more testability measures than those of conventional gate-level
testability, so that the method provides information for design for
testability(DFT). We have implemented the presented method and experimental
results show that we can reduce circuit cost for test and achieve highly
testable circuits by DFT using our RTL testability analysis.
-
5B.2 Non-Scan Design for Testable Data Paths Using Thru Operation
- Katsuyuki Takabatake, Michiko Inoue, Toshimitsu Masuzawa, Hideo Fujiwara
We present a new non-scan DFT technique for register-transfer (RT) level data
paths. In the technique, we add thru operations to some operational modules to
make the data path easily testable. We define a testable measure, weak
testability, and consider the problem to make the data path weakly testable
with minimum hardware overhead. We also define a measure to estimate the test
generation time. Experimental results show the effectiveness of our technique
and the proposed measure.
-
5B.3 Block-Level Fault Isolation Using Partition Theory and Logic
Minimization Techniques
- C.-J. Richard Shi
Multichip modules are emerging as a key packaging technology for mixed-signal
circuits and systems. In this paper, we consider how to localize a failure
within a chip boundary as rapidly as possible in order to expedite the rework
process and to minimize its overall impact on manufacturing throughput and
cycle time. A key contribution of this paper isto provide a unified block-level
fault isolation framework for analog and digital circuits, and to show that
optimum fault isolation reduces to set covering. This allows us to apply
directly powerful set covering techniques and solvers developed recently in
logic minimization. In addition, we present a greedy peeling heuristic with
performance bound computation. Some preliminary experimental results are
included to demonstrate the feasibility and performance of the proposed
approach.
-
5B.4 The Use of Hierarchical Information to Test Large Controllers
- F. Fummi, D. Sciuto
Gate-level test pattern generators require insertion of scan paths to handle
the at gate-level representation of a large sequential controller. In contrast,
we present a testing methodology based on the hierarchical finite state machine
model. Such a model is used to specify very complex control devices by means
of a top-down design approach. Our approach allows the generation of compact
test sets with very high stuck-at fault coverages, without any DfT logic.
-
5B.5 Hierarchical Fault Tracing for VLSI Sequential Circuits from CAD
Layout Data in the CAD-linked EB Test System
- Katsuyoshi Miura, Koji Nakamae, Hiromu Fujioka
A previous hierarchical fault tracing method for combinational circuits
which requires only CAD layout data in the CAD-linked electron beam test
system is expanded as applicable to sequential circuits. The characteristics
in the method remains unchanged that allows us to trace a fault hierarchically
from the top level cell to the lowest primitive cell and from the primitive
cell to the transistor-level circuit in a consistent manner independently of
circuit functions. The applied results to the CAD layouts of some sequential
CMOS benchmark circuits show our superiority in the guided-probe method where
circuit logical functions are first extracted from the CAD layout data and
then the guided-probe testing is executed.
-
5C.1 Interconnect Capacitances, Crosstalk, and Signal Delay in High Speed
and High Density VLSI Circuits (No Paper Submitted)
- D.H. Cho, M.H. Seung, N.H. Kim, H.S. Park
-
5C.2 Monte Carlo Simulation for Single Electron Circuits
- Masaharu Kirihara, Kenji Taniguchi
In single electron circuits composed of small tunnel junctions, capacitances,
and voltage sources, a tunneling electron can be described as a discrete charge
due to stochastic nature of a tunneling event. We developed a Monte Carlo
simulator for the numerical study of single electron circuits because no more
conventional simulation methods based on Kirchhoff's laws can be applicable.
The calculated dynamic operation of a quasi-CMOS inverter reveals that ultra
small load capacitors give rise to large output voltage uctuation during the
logic operation. Future SET circuits should be designed with several electron
logic rather than ultimate single electronic logic circuits in which a bit is
represented with an electronic charge.
-
5C.3 Parallel Calculation of 3-D Parasitic Resistance and Capacitance
with Linear Boundary Elements
- Wenming Zhou, Zeyi Wang, Lan Rao
The widespread application of deep sub-micron and multilayer routing techniques
makes the interconnection parasitic influence become the main factor to limit
the performance of VLSI circuits. Parallel direct boundary element calculation
of three-dimensional (3-D) resistance and capacitance is an important method
for fast extraction. In this paper, a parallel algorithm to implement linear
element calculation by using PVM (Parallel Virtual Machine, a distributed
calculating software on PC network) is introduced. The hierarchical calculation
scheme of the setup and solution processes of linear equations is discussed.
At the end, the performance and workload balance of the algorithm are
analyzed.
-
5C.4 Simulation of Gate Switching Characteristics of a Miniaturized MOSFET
based on a Non-Isothermal Non-Equilibrium Transport Model
- Won-Cheol Choi, Hirobumi Kawahima, Ryo Dang
Our device simulator is developed for the analysis of a MOSFET based on
Thermally Coupled Energy Transport Model (TCETM). The simulator has the
ability to calculate not only steady-state characteristics but also
transient characteristics of a MOSFET. It solves basic semiconductor
devices equations including Poisoon equation, current continuity equations
for electrons and holes, energy balance equation for electrons and
heat flow equation, using finite difference method.
Moderator :Hideharu Amano(Keio Univ.,) and Tokinori Kozawa(STARC)
Chairperson : Kazuhiro Ueda (Shibaura Institute of Technology)
Presented by
-
5D.1 The EUROPRACTICE MPC Service
- C. Das
IMEC has been involved in MPC services for universities and industry since 1984.
In the beginning these services have been set up to support the local
educational programme. Lateron in 1989, IMEC was coordinator of the European
wide MPC services in the EC funded project EUROCHIP. Today since October 1995,
IMEC has been coordinator of the IC Manufacturing Service in the EC funded
project EUROPRACTICE.
-
5D.2 Multi-Project Chip Activities in Korea-IDEC Perspective-
- Chong-MIn Kyung, In-Cheol Park, Ho-Jun Song
This paper describes the current status of multi-project chip(MPC) services in
Korea to promote full-custom and semi-custom IC design activities in
universities. Although MPC foundry services for IC designs were started in a
lesser scale more than 10 years ago, it is only recent that systematic and
effective educations and MPC foundry services program called IDEC(IC design
education center) was launched with the planned support of the government and
three major semiconductor companies in Korea. In this paper, we introduce the
activities of IDEC and other MPC foundry services currently being provided.
-
5D.3 Multi-Project Chip Service for University and Industry in Taiwan
- Jen-Sheng Hwang
Acting as the bridge between the designers and manufacturing companies, Chip
Implementation Center (CIC), founded in 1992 under National Science Council,
aims at the services for the fabrication of multi-project chip, the
procurement/integration of software CAD tools, and the promotion of IC
design / testing / CAD software technology. To date, 2000 academic licenses
of software CAD tools have been obtained and 739 chips of the academics have
been fabri-cated through CIC.
-
5D.4 VLSI Design and Education Center (VDEC) Current Status and Future Plan
- Kunihiro Asada and Koichiro Hoh
After briefly reviewing a history of VDEC, its functions and facilities
are summarized, followed by future plans of chip implemantation along with a
network society.
-
6A.1 Choosing a Digital Simulator
- John Hillawi
This paper summarises the second in a series of benchmarking efforts conducted
by DA Solutions between August 1995 and April 1996, for VHDL and Verilog
simulators. The paper discusses the methodology used and the results of an
independent public benchmark for leading VHDL and Verilog simulators, for RTL,
Gate, VITAL and Co-simulations products. The paper also makes performance
comparisons between VHDL and Verilog technologies and between PC and UNIX
solutions.
-
6A.2 A Hardware/Software Co-simulation Environment for Micro-processor
Design with HDL Simulator and OS interface
- Yoshiyuki Ito, Yuichi Nakamura
We proposed a hardware/software cosimulation environment using an RTL
simulator with a software language interface. The proposed simulation
environment introduces the 'OS interface (OSIF),' which invokes system calls in
the OS on the simulation platform to execute application software. The OSIF
consists of data adaption facility and function correspondence management
allowing it to cooperate with the OS of the simulation platform. We show the
results of experiments with an R3000-compatible processor model. This
environment verified our processor model with SPEC benchmarks that require
various operating system services. For example, with a lisp interpreter program
li , our detailed RTL description for the core part of R3000 was simulated only
within 20 hours on a 109 MIPS workstation.
-
6A.3 VIDE: A Visual VHDL Integrated Design Environment
- Jinian Bian, Hongxi Xue, Ming Su
In this paper, a visual VHDL integrated design environment VIDE for high
level design is presented. In VIDE, there are several graphical and
textual mixed design entry tools (VDES) and a graphical object-oriented
debugger (VDBG). VDES consists of several diagram editors and a visual
text editor, while VDBG is a debugging environment based on a hierarchical
VHDL simulator. The graphical objects can be specified as a debugging target.
-
6A.4 Advanced Processor Design using Hardware Description Language AIDL
- Takayuki Morimoto, Kazushi Saito, Hiroshi Nakamura, Taisuke Boku,
Kisaburo Nakazawa
In order to design advanced processors in a short time, designers must simulate
their designs and reflect the results to the designs at the very early stages.
However, conventional hardware description languages (HDLs) do not have enough
ability to describe designs easily and accurately at these stages. Then, we
have proposed a new hardware description language AIDL. In this paper, in order
to evaluate the effectiveness of AIDL, we describe and compare three
processors in AIDL and VHDL descriptions.
-
6B.1 Adaptive Models for Input Data Compaction for Power Simulators
- Radu Marculescu, Diana Marculescu, Massoud Pedram
This paper presents an effective and robust technique for compacting a large
sequence of input vectors into a much smaller input sequence so as to reduce
the circuit/gate level simulation time by orders of magnitude and maintain the
accuracy of the power estimates. In particular, this paper introduces and
characterizes a family of dynamic Markov trees that can model complex
spatiotemporal correlations which occur during power estimation both in
combinational and sequential circuits. As the results demonstrate, large
compaction ratios of 1-2 orders of magnitude can be obtained without
significant loss (less than 5% on average) in the accuracy of power estimates
.
-
6B.2 Fuzzy-based Circuit Partitioning in Built-in Current Testing
- Wang-Dauh Tseng, Kuochen Wang
Partitioning a digital circuit into modules before implementing on a single chip
is key to balancing between test cost and test correctness of built-in current
testing (BICT). Most partitioning methods use statistic analysis to find the
threshold value and then to determine the size of a module. These methods are
rigid and inflexible since IDDQ testing requires the measurement of an analog
quantity rather than a digital signal. In this paper, we propose a fuzzy-based
approach which provides a soft threshold to determine the module size for BICT
partitioning. Evaluation results show that our design approach indeed provides
a feasible way to exploit the design space of BICT partitioning.
-
6B.3 Reducing the Complexity of Path Classification by Reconvergence Analysis
- Paul Tafertshofer, Andreas Ganz, Manfred Henftling
In this paper we present a new and efficient method for path classification,
i.e. for determining the set of functional unsensitizable or robust dependent
paths. In a pre-processing step, the new method computes a minimal set of
reconvergence regions that need to be considered for path classification.
Functional sensitization is only performed for path segments contained in
these regions. Thus, the complexity for path classification can be reduced
from the total number of paths in the circuit to the number of paths contained
in the minimal set of reconvergence regions.
-
6B.4 Modelling and Detection of Dynamic Errors due to Reflection - and
Crosstalk-Noise
- J. Schrage
A new algorithm for the generation of test sequences to detect dynamic errors
due to reflection and crosstalk noise in combinational circuits is presented.
Based on the circuit level a new approach for error modeling including the
duration of reflection and crosstalk errors, is described. The presented
algorithm takes the high influence of error durations as well as gate and
transmission line delays on the testability into account.
-
6B.5 Fault Coverage Improvement Based on Error Signal Analysis
- Mike W.T. Wong, Y. Zhou, Y.S. Lee, Y. Min
Fault-tolerant design of analog circuits is more difficult than that of digital
circuits. Abhijit Chatterjee has proposed a continuous checksum-based technique
to design fault-tolerant linear analog circuits. However, some faults in the
passive elements cannot be detected if the checker has not been designed
appropriately. This paper addresses the fault coverage issue in the continuous
checksum based technique and proposes an error signal analysis based method
for improving fault coverage of the checker.
-
6C.1 Low-Power Multiple-Valued Current-Mode Integrated Circuit with
Current-Source Control and Its Application
- Takahiro Hanyu, Satoshi Kazama, Mitchitaka Kameyama
A new current-source control technique is proposed to design a low-power
high-speed multiple-valued current-mode (MVCM) integrated circuit in a low
supply voltage. The use of a differential logic circuit (DLC) with a pair of
dual-rail inputs makes the input voltage swing small, which results in a high
driving capability at a lower supply voltage, while having large static power
dissipation. In the proposed DLC using switched current control, the static
power dissipation is greatly reduced because current sources in non-active
circuit blocks are switched off. In the current control, no additional
transistors are required to control the current sources because a
current-control circuit is already used in the threshold detector. As a typical
example of arithmetic circuits, a new 1.5V- supply 54 x 54 -bit multiplier
based on a 0.8 um standard CMOS technology is also designed. Its performance
is about 1.3 times faster than that of a binary fastest multiplier under the
normalized power dissipation.
-
6C.2 Analysis and Design of Multiple-Bit High-Order E-^ Modulator
- Hao-Chiao Hong, Bin-Hong Lin, Cheng-Wen Wu
The high-order (E-^) modulator is an appropriate approach for high-bandwidth,
high-resolution A/D conversion. However, non-ideal effects such as the finite
op-amp gain and the capacitor mismatch have great impacts on its performance at
a low oversampling ratio. To achieve greater performance under the inevitable
non-ideal effects, we explore several multiple-bit schemes, based on our CIQE
high-order (E-^) architecture, to remove the non-ideal deterioration. Design
rules of these multiple-bit schemes are developed and verified by extensive
simulations.
-
6C.3 Optimal Loop Bandwidth Design for Low Noise PLL Applications
- Kyoohyun Lim, Seunghee Choi, BeomSup Kim
This paper presents a salient method to find an optimal bandwidth for low
noise phase-locked loop (PLL) applications by analyzing a discrete-time model
of charge-pump PLLs based on ring oscillator VCOs. The analysis shows that the
timing jitter of the PLL system depends on the jitter in the ring oscillator
and an accumulation factor which isinversely proportional to the bandwidth of
the PLL. Further analysis shows that the timing jitter of the PLL system,
however, proportionally depends on the band-width of the PLL when an external
jitter source is applied. The analysis of the PLL timing jitter of both cases
gives the clue to the optimal bandwidth design for low noise PLL applications.
Simulation results using a C-language PLL model are compared with the
theoretical predictions and show good agreement.
-
6C.4 +-1.5V CMOS Four-Quadrant Multiplier
- Simon C. Li
A low-voltage CMOS four-quadrant analogue multiplier using two NMOS operated
in the triode region with modified bi-directional regulated cascade (RGC)
structure is presented. The circuit can operate from a supply voltage of
+-1.5 V. For a differential input voltage range up to +-0.8V, this circuit
has kept nonlinearity below 0.9 % and total harmonic distortion less than
1%. Th e-3dB bandwidth of this multiplier is 15MHz. Th echip was fabricated
in Taiwan Semiconductor Manufacturing Corporation (TSMC) 0.8um
Single-Poly-Double-Metal (SPDM) N-well process. The chip dissipates 24.4mW
and occupies 251x653um2 active area.
Moderator : Tokinori Kozawa (STARC)
Chair : Tokinori Kozawa (STARC, Japan)
Panelist :
- Ralph Cavin (SRC,USA)
- Paul Six (IMEC, Belgium)
- Taro Okabe (STARC, Japan)
- Akihiko Morino (NEC,Japan)
- Hiroto Yasuura (Kyushu Univ., Japan)
- Youn-Long Lin (Ting-Hua Univ., Taiwan)
Some Thoughts on Process Retargettable and Reusable IC Intellectual
Property
- Neil Weste
With the observation that today's printed wire board combining a myriad
of different vendors' ICs is tomorrow's "system on a chip," many companies are
interested in methods of combining disparate designs onto one piece of silicon.
Systems companies are interested because they want to do it. Silicon managers
are interested because they sell silicon. CAD companies are interested because
it looks like a future revenue stream for them. And finally, garage shop
companies are interested because there looks like a demand for portable
integrated circuit intellectual property.
But how will this happen technically?
From a business perspective?
At what level is portability required?
Are HDLs the answer?
What about mixed signal?
This talk will attempt to address some of these and related questions
by examining some activities around the world and also drawing on the
experience of the speaker at promoting technology independent layout designs
for over 10 years.
-
7A.1 ChipEst-FPGA: A Tool for Chip Level Area and Timing Estimation of Lookup
Table Based FPGAs for High Level Applications
- Min Xu, Fadi Kurdahi
The importance of efficient area and timing estimation techniques for
hierarchical design methodology is well-established in High-Level Synthesis
(HLS), since the estimation allows more realistic exploration of the design
space, and hierarchical design methodology matches well with HLS paradigm. In
this paper, we present ChipEst-FPGA, a chip level estimator for designs
implemented using a hierarchical design methodology for Lookup Table Based
FPGAs. In FPGAs, the wire delay may contribute to a significant portion of the
overall design delay. ChipEst-FPGA uses a realistic model which takes the
component area/delay as well as wiring effects into account.We tested our
ChipEst-FPGA on several benchmarks and the results show that we can get
accurate area and timing estimates efficiently.
-
7A.2 Bit-Serial Pipeline Synthesis and Layout for Large-Scale Configurable
Systems
- Tsuyoshi Isshiki, Wayne Wei-Ming Dai, Hiroaki Kunieda
In this paper, we present our datapath synthesis and layout tools which are
targeted toward large-scale configurable systems with the logic capacity of up
to millions of gates which consists of an easy design entry using C++,
customized bit-serial circuit library for SRAM-based FPGAs, bit-serial pipeline
circuit generator, and a circuit partitioner.
-
7A.3 An Optimal Scheduling Method for Parallel Processing System of Array
Architecture
- Kazuhito Ito, Tadashi Iwata, Hiroaki Kunieda
In high-level synthesis for digital signal processing systems of array
structured architecture, one of the most important procedures is the scheduling.
By taking into account the allocation of operations to processors, it is
mandatory to take into account the communication time between processors. In
this paper we propose a scheduling method which derives an optimal schedule
achieving the minimum iteration period and latency for a given signal
processing algorithm on the specified processor array. The scheduling problem
is modeled as an integer linear programming and solved by an ILP solver.
Furthermore, we improve the scheduling method so that it can be applied to
large scale signal processing algorithms without degrading the schedule
optimality.
-
7B.1 AQUILA: An Equivalence Verifier for Large Sequential Circuits
- Shi-Yu Huang, Kwang-Ting Cheng, Kuang-Chien Chen
In this paper, we address the problem of verifying the equivalence of two
sequential circuits. A hybrid approach that combines the advantages of
BDD-based and ATPG-based approaches is introduced. Furthermore, we incorporate
a technique called partial justification to explore the sequential similarity
between the two circuits under verification to speed up the verification
process. Compared with flexisting approaches, our method is much less vulnerable
to the memory explosion problem, and therefore can handle larger designs. The
experimental results show that in a few minutes of CPU time, our tool can
verify the sequential equivalence of an intensively optimized benchmark
circuit with hundreds of flip-flops against its original version.
-
7B.2 On the Representational Power of Bit-Level and Word-Level Decision
Diagrams
- Bernd Becker, Rolf Drechsler, Reinhard Enders
Several types of Decision Diagrams (DDs) have have been proposed in the area of
Computer Aided Design (CAD), among them being bit-level DDs like OBDDs, OFDDs
and OKFDDs. While the aforementioned types of DDs are suitable for representing
Boolean functions at the bit-level and have proved useful for a lot of
applications in CAD, recently DDs to represent integer-valued functions, like
MTBDDs (=ADDs), EVBDDs, FEVBDDs, (*)BMDs, HDDs (=KBMDs), and K*BMDs, attract
more and more interest, e.g., using *BMDs it was for the first time possible to
verify multipliers of bit length up to n = 256 . In this paper we clarify the
representational power of these DD classes. Several (inclusion) relations and
(exponential) gaps between specific classes differing in the availability of
additive and/or multiplicative edge weights and in the choice of decomposition
types are shown. It turns out for example, that K(*)BMDs, a generalization of
OKFDDs to the word-level, also 'include' OBDDs, MTBDDs and (*)BMDs. On the
other hand, it is demonstrated that a restriction of the K(*)BMD concept to
subclasses, such as OBDDs, MTBDDs, (*)BMDs as well, results in families of
functions which lose their efficient representation.
-
7B.3 Learning Heuristics for OKFDD Minimization by Evolutionary Algorithms
- Nicole Göckel, Rolf Drechsler, Bernd Becker
Ordered Kronecker Functional Decision Diagrams (OKFDDs) are a data structure
for efficient representation and manipulation of Boolean functions. OKFDDs are
very sensitive to the chosen variable ordering and the decomposition type list,
i.e. the size may vary from linear to exponential. In this paper we present
an Evolutionary Algorithm (EA) that learns good heuristics for OKFDD
minimization starting from a given set of basic operations. The difference to
other previous approaches to OKFDD minimization is that the EA does not solve
the problem directly. Rather, it develops strategies for solving the problem.
To demonstrate the efficiency of our approach experimental results are given.
The newly developed heuristics combine high quality results with reasonable
time overhead.
-
7B.4 On Properties of Kleene TDDs
- Yukihiro Iguchi, Tsutomu Sasao, Munehiro Matsuura
Three types of ternary decision diagrams (TDDs) are considered: AND_TDDs,
EXOR_TDDs, and Kleene_TDDs. Kleene_TDDs are useful for logic simulation in the
presence of unknown inputs. Let N(BDD : f), N(AND TDD : f), and N(EXOR TDD :
f) be the number of non-terminal nodes in the BDD, the AND_DD, and the
EXOR_TDD for f, respectively. Let N(Kleene_TDD : F) be the number of
non-terminal nodes in the Kleene_TDD for F, where F is the Kleenean ternary
function corresponding to f. Then N(BDD : f) <= N(TDD : f).
For parity functions, N(BDD : f) =N(AND_TDD: f) =N(EXOR_TDD: f) =N(Kleene
TDD : F). For unate functions, N(BDD : f) = N(AND_TDD : f). The sizes of
Kleene TDDs are O(3n/n), and O(n3) for arbitrary functions, and symmetric
functions, respectively. There exist a 2n-variable function, where Kleene TDDs
require O(n) nodes with the best order, while O(3n) nodes in
the worst order.
-
7C.1 A Time-Domain Method for Numerical Noise Analysis of Oscillators
- Makiko Okumura, Hiroshi Tanimoto
A numerical noise analysis method for oscillators is proposed. Noise sources
are usually small and can be considered as perturbations to a large amplitude
oscillation. Transfer functions from each noise source to the oscillator output
can be calculated by modeling the oscillator as a linear periodic time-varying
circuit. The proposed method is a time domain method and can be applied to
strongly non-linear circuits. Thermal noise, shot noise and flicker noise are
considered as noise sources. Error in the time domain method is also discussed.
-
7C.2 A New Linear-Time Harmonic Balance Algorithm for Cyclostationary
Noise Analysis in RF Circuits
- J.S. Roychowdhury, Peter Feldmann
A new technique is presented for computing noise in nonlinear circuits. The
method is based on a formulation that uses harmonic power spectral densities
(HPSDs), using which a block-structured matrix relation between the
second-order statistics of noise within a circuit is derived. The HPSD
formulation is used to devise a harmonic-balance-based noise algorithm that
requires O(nN log N ) time and O(nN) memory, where n represents circuit size
and N the number of harmonics of the large-signal steady state. The method
treats device noise sources with arbitrarily shaped PSDs (including thermal,
shot and flicker noises), handles noise input correlations and computes
correlations between different outputs. The HPSD formulation is also used to
establish the non-intuituive result that bandpass filtering of cyclostationary
noise can result in stationary noise. The new technique is illustrated using
an example that exhibits noise folding and interaction between harmonic PSD
components. The results are validated against Monte-Carlo simulations. The
noise performance of a large industrial integrated RF circuit (with > 300 nodes)
is also analyzed in less than 2 hours using the new method.
-
7C.3 Enhancement of Parallelism for Tearing-based Circuit Simulation
- Koutaro Hachiya, Toshiyuki Saito, Toshiyuki Nakata, Norio Tanabe
A new circuit simulation system is presented with techniques 'Subcircuit
Balancing with Estimated Update operation count'(SBEU) and 'Asynchronous
Distributed Row-based interconnection parallelization'(A-DR). SBEU estimates
Gaussian elimination cost of each subcircuit by counting number of update
operations to achieve balanced circuit partitioning. A-DR makes it possible to
overlap numerical operations and interprocessor communications in parallel
Gaussian elimination of interconnection equations. On a 16-PE distributed
memory parallel machine, an experimental simulation shows 9.9 times speedup
over 1PE and distribution of the time consumed for each subcircuit is within
26% deviation from the median.
Design and Test of Processor-Core Based Sytems
- Peter Marwedel
This tutorial responds to the rapidly increasing use of various cores for
implementing systems-on-a-chip. It specifically focusses on processor cores.
We will give some examples of cores, including DSP cores
and application-specific instruction-set processors (ASIPs). We will mention
market trends for these components, and we will touch design procedures, in
particular the use compilers. Finally, we will discuss the problem of testing
core-based designs. Existing solutions include boundaryscan, embedded
in-circuit emulation (ICE), the use of processor resources for
stimuli/response compaction and self-test programs.
-
8A.1 Architecture Evaluation Based on the Datapath Structure and Parallel Constraint
- Masayuki Yamaguchi, Akihisa Yamada, Toshihiro Nakaoka, Takashi Kambe
This paper presents a novel way of evaluating architecture of embedded custom
DSPs which helps designers optimizing the datapath configuration and the
instruction set. Given a datapath structure, it evaluates the performance in
terms of an estimated number of steps to execute the target program on the
datapath. A concept of 'parallel constraint' is newly introduced, which
enables evaluation of the impact of instruction format design on the performance
without explicitly specifying the instruction format. The number of execution
steps is estimated by a combination of static analysis and dynamic analysis.
It enables fast and precise estimation of actual performance in the early
design stage. We show some experimental results on an actual signal processor
to demonstrate the accuracy of estimation and the usefulness of this method in
architecture design.
-
8A.2 A Constructive Method for Data Path Area Estimation During High-Level VLSI
Synthesis
- V. Natesan, Anurag Gupta, Srinivas Katkoori, Dinesh Bhatia, Ranga Vemuri
In this paper we present a fast and computationally efficient deterministic
method for estimating the area of a Register Transfer Level datapath obtained
during high level VLSI synthesis. The estimation makes use of a RT level
netlist along with a pre-synthesized library of RT level components. The
layout area is estimated using a quadratic programming based framework to get
a quick module allocation and generating a topological floorplan which is then
followed by heuristic algorithms for mapping RTL modules and their
interconnections on a standard cell based layout design style. Experiments on
a suite of benchmark examples show promising results with reliable accuracy.
-
8A.3 RT Level Power Analysis
- Jianwen Zhu, Poonam Agrawal, Daniel D. Gajski
Elevating power estimation to architectural and behavioral level is essential
for design exploration beyond logic level. In contrast with purely statistical
approach, an analytical model is presented to estimate the power consumption in
datapath and controller for a given RT level design. Experimental result shows
that order of magnitude speed-up over low level tools as well as satisfactory
accuracy can be achieved. This work can also serve as the basis for behavioral
level estimation tool.
-
8A.4 Statistical Design of Macro-models For RT-level Power Evaluation
- Qing Wu, Chihshun Ding, Chengtah Hsieh, Massoud Pedram
This paper introduces the notion of cycle-accurate macro-models for RT-level
power evaluation. These macro-models provide us with the capability to estimate
the circuit power dissipation cycle by cycle at RT-level without the need to
invoke low level simulations. The statistical framework allows us to compute
the error interval for the predicted value from the user specified confidence
level. The proposed macro-model generation strategy has been applied to a
number of RT-level blocks and detailed results and comparisons are provided.
-
8B.1 AND/OR Reasoning Graphs for Determining Prime Implicants in
Multi-Level Combinational Networks
- Dominik Stoffel, Wolfgang Kunz, Stefan Gerber
This paper presents a technique to determine prime implicants in multi-level
combinational networks. The method is based on a graph representation of
Boolean functions called AND/OR reasoning graphs. This representation follows
from a search strategy to solve the satisfiability problem that is radically
different from conventional search for this purpose (such as exhaustive
simulation, backtracking, BDDs). The paper shows how to build AND/OR reasoning
graphs for arbitrary combinational circuits and proves basic theoretical
properties of the graphs. It will be demonstrated that AND/OR reasoning graphs
allow us to naturally extend basic notions of two-level switching circuit
theory to multi-level circuits. In particular, the notions of prime implicants
and permissible prime implicants are defined for multi-level circuits and it
is proved that AND/OR reasoning graphs represent all these implicants.
Experimental results are shown for PLA factorization.
-
8B.2 Efficient Synthesis of AND/XOR Networks
- Yibin Ye, Kaushik Roy
A new graph-based synthesis method for general Exclusive Sum-of-Product forms
(ESOP) is presented in this paper. Previous research has largely concentrated
on a class of ESOP's, the Canonical Restricted Fixed/Mixed Polarity Reed-Muller
form, also known as Generalized Reed-Muller (GRM) form. However, for many
functions, the minimum GRM can be much worse than the ESOP. We have defined a
Shared Multiple Rooted XOR-based Decomposition Diagram (XORDD) to represent
functions with multiple outputs. By iteratively applying transformations and
reductions, we obtain a compact XORDD which gives a minimized ESOP. Our method
can synthesize larger circuits than previously possible. The compact ESOP
representation provides a form that is easier to synthesize for XOR heavy
multi-level circuit, such as arithmetic functions. The method successfully
minimized large functions with multiple outputs. Results are also compared to
the minimized SOP's obtained from ESPRESSO. Experimental results show that for
many circuits ESOP's have considerably more compact form than SOP's.
-
8B.3 An Optimization of AND-OR-EXOR Three-level Networks
- Debatosh Debnath, Tsutomu Sasao
In this paper, we present a design method for AND-OR-EXOR three-level networks,
where a single two-input EXOR gate is used. The network realizes an
exclusive-OR of two sum-of-products expressions (EX-SOP), where the two
sum-of-products expressions (SOP) cannot share products. The problem is to
minimize the total number of product in the two SOPs. We introduced the
u-equivalence of logic functions to develop minimization algorithms for EX-SOPs
with up to five variables. We minimized all the representative functions of
NP-equivalence classes for up to five variables and found that five-variable
functions require up to 9 products in minimum EX-SOPs. For n-variable functions,
minimum EX-SOPs require at most 9 x 2 (n -5) (n >= 6) products. This upper
bound is smaller than 2 (n-1), the upper bound for the conventional
sum-of-products expressions. Index Terms - Three-level network, AND-EXOR,
logic minimization, spectral method, NP-equivalence, u-equivalence, coordinate
representations, complexity.
-
8B.4 A New Description of CMOS Circuits at Switch-Level
- Xunwei Wu, Massoud Pedram
After analyzing the limitations of the traditional description of CMOS circuits
at the gate level, this paper introduces the notions of switching and signal
variables for describing the switching states of MOS transistors and signals in
CMOS circuits, respectively. Two connection operations for describing the
interaction between MOS transistors and signals and a new description for CMOS
circuits at the switch level are presented. This new description can be used
to express the functional relationship between inputs and the output at the
switch level. It can also be used to describe the circuit structure composed
of various transistor switches. Based on the new description, the design of
CMOS circuits at switch level can be efficiently realized. It is expected that
this will provide a basis for techniques for analyzing and optimizing delay and
power dissipation of CMOS circuits.
-
8C.1 A 2-Dimensional Transistor Placement for Cell Synthesis
- Shunji Saika, Masahiro Fukui, Noriko Shinomiya, Toshira Akino
This paper proposes a transistor placement algorithms to generate standard
cell layout in a two-dimensional placement style that is not restricted
to row-based transistor placement. The cost function constructed for
transistor placement optimization is able to optimize wirings directly
and diffusion sharing indirectly but sufficiently. This transisitor
placement algorithm, applied to several standard cells, has demonstrated
the capability to generate a nearly optimal two-dimensional placement
that is comparable to manually designed placement.
-
8C.2 DP-Gen: A Datapath Generator for Multiple-FPGA Applications
- Wen-Jong Fang, Allen C.-H. Wu, Ti-Yen Yen, Tsair-Chin Lin
This paper presents a datapath generator for multiple-FPGA applications. This
datapath generator is able to generate complex datapath designs described in
HDLs. Our datapath generator uses a novel synthesis and partitioning approach
which bridges the gap between RTL/logic synthesis and physical partitioning to
fully exploit design structural hierarchy for multiple-FPGA implementations.
Experiments on a number of benchmarking circuits and industry designs
demonstrate that the generator can effectively and efficiently produce
high-density multiple-FPGA datapaths.
-
8C.3 A Simultaneous Placement and Global Routing Algorithm with Path
Length Constraints for Transport-Processing FPGAs
- Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki
In layout design of transport-processing FPGAs, it is required that not only
routing congestion kept small but also circuits implemented on them operate
with higher operation frequency. This paper extends the proposed simultaneous
placement and global routing algorithm for transport-processing FPGAs whose
objective is to minimize routing congestion and proposes a new algorithm in
which the length of each critical signal path (path length) is limited within
specified upper bound imposed on it (path length constraint). The algorithm is
based on hierarchical bipartitioning of layout regions and LUT (LookUp Table)
sets to be placed. Each bipartitioning procedure consists of three phases:
(0) estimation of path lengths, (1) bipartitioning of a set of terminals, and
(2) bipartitioning of a set of LUTs. After searching the paths with tighter
path length constraints by estimating path lengths in (0), (1) and (2) are
executed so that their path lengths are reduced with higher priority and thus
path length constraints are not violated. The algorithm has been implemented
and applied to transport-processing circuits compared with conventional
approaches. The results demonstrate that the algorithm resolves path length
constraints for 11 out of 13 circuits, though it increases routing congestion
by an average of 20%. After detailed routing, it achieves 100% routing for all
the circuits and decreases a circuit delay by anaverage of 23%.
-
8C.4 Not Necessarily More Switches More Routability
- Yu-Liang Wu, Douglas Chang, Malgorzata Marek-Sadowska, Shuji Tsukiyama
It has been observed experimentally that the mapping of global to detailed
routing in conventional FPGA routing architecture (2D array) yields
unpredictable results. In [8,10,13], a different class of FPGA structures
called Greedy Routing Architectures (GRAs), where a locally optimal switch box
routing can be extended to an optimal entire chip routing, were investigated.
It was shown that GRAs have good mapping properties. An H-tree GRA [10] with
W2+2W switches per switch box (SpSB) and a 2D array GRA [13] with
4W2+2W
SpSB were proposed (W is the number of tracks in each switch box). Here, we
continue this work by introducing an H-tree GRA with W2/2+2W SpSB and a 2D
array GRA with 3.5W2+2W SpSB. These new GRAs have the same good mapping
properties but use fewer switches. We also show a class of FPGA architectures
in which the mapping problem remains NP-complete, even with 6(W-1)2 +
6W2 SpSB.
This is close to the maximum number of SpSB which is 6W2.
The SEMATECH Chip Hierarchical Design System - new paradigms for deep
submicron design
Invited Talk: Greg Ledenbach
Panel : EDA Standardization including CHDS
Moderator : Hitoshi Yoshizawa, NEC
Organizer: Hisakazu Edamatsu, Matsushita Electric Industrial
Panel: The Role of Design Standardization in future complex design
-
9A.1 On the Control-subroutine Implementation of Subprogram Synthesis
- Cheng-Tsung Hwang, Hsiao-Cheng Weng, Yu-Chin Hsu, Mike Tien-Chien Lee
In this paper, synthesis of VHDL procedures and functions is studied from the
VHDL transformation point of view. Among all the proposed methods, inline
expansion and module can be integrated into a VHDL synthesis system by a
source-to-source transformation, while a control-subroutine approach requires
additional work at the higher level synthesis phases before it can link to a
logic synthesis tool. A lot of optimization possibility is explored during
the process. We also present various generations of the control-subroutine
approach, including the synthesis of recursive programs, a behavioral
partitioning methodology that divides the controller into several communicating
state machines, and a methodology that mixes the execution of subprograms.
Our study shows that the combination of these approaches is flexible to be
adapted to various applications in an efficient way.
-
9A.2 A Procedure for Software Synthesis from VHDL Models
- Venkatram Krishnaswamy, Rajesh K. Gupta, Prithviraj Banerjee
In this paper we address the problem of software generation from a Hardware
Description Language (HDL). In particular, we examine the issues involved in
translating VHDL into C or C++ for use in system simulation and cosynthesis.
Because of the concurrency supported by VHDL, and a notion of timing behavior,
care must be taken to ensure behavioral correctness of the generated software.
The issues involved will be shown to be different in each of the application
areas. The ideas set forth here have been used in an efficient VHDL simulator
designed to execute on multi-processor systems. Results are presented for
simulation on uniprocessor as well as multiprocessor systems.
-
9A.3 Built-in Chaining: Introducing Complex Components into Architectural
Synthesis
- Peter Marwedel, Birger Landwehr, Rainer Dömer
In this paper, we extend the set of library components which are usually
considered in architectural synthesis by components with built-in chaining.
For such components, the result of some internally computed arithmetic function
is made available as an argument to some other function through a local
connection. These components can be used to implement chaining in a data-path
in a single component. Components with built-in chaining are combinatorial
circuits. They correspond to 'complex gates' in logic synthesis. If compared to
implementations with several components, components with built-in chaining
usually provide a denser layout, reduced power consumption, and a shorter delay
time. Multiplier/accumulators are the most prominent example of such components.
Such components require new approaches for library mapping in architectural
synthesis. In this paper, we describe an IP-based approach taken in our OSCAR
synthesis system.
-
9B.1 BDD-based Logic Partitioning for Sequential Circuits
- Ming-Ter Kuo, Yifeng Wang, Chung-Kuan Cheng, Masahiro Fujita
This paper presents a BDD-based approach to perform logic partitioning for
sequential circuits. We use a sequential machine to model a circuit and
represent the machine by its transition relation. A heuristic algorithm based
on the BDD representation of the transition relation is proposed to partition
the sequential machine with minimum number of input/output pins. Using BDDs
and their operations, we have developed an efficient method to iteratively
improve a partition. Experimental results show that our sequential logic
partitioning algorithm significantly outperforms partitioning algorithms at
the netlist level.
-
9B.2 Cube-Embedding Based State Encoding for Low Power Design
- De-Sheng Chen, Majid Sarrafzadeh
In this paper we consider the problem of minimizing power consumption of a
sequential circuit using low power state encoding. One of the previously
published results is based on recursive matching. In general, a matched pair
can be considered as a 1-cube being embedded in a hypercube. We generalize this
idea of 1-cube embedding and propose a new encoding algorithm based on r-cube
embedding. We then present an efficient 2-cube embedding based state encoding
approach for low power design. It considers both Hamming distance and the
complexity of logic function (by estimation). Experimental results show that
this approach is competitive to other existed techniques.
-
9B.3 On Synthesis of Speed-Independent Circuits at STG Level
- Kuan-Jen Lin, Chi-Wen Kuo
Synthesizing hazard-free asynchronous circuits directly at Signal Transition
Graph (STG) level has been shown to need significantly less CPU time than
approaches at the state graph [10, 16, 4]. However, all previous methods at
STG level were based on sufficient conditions only. Hence, the synthesized
circuit results generally are inferior, due to the incomplete transformation.
In this paper, we present a new Characteristic Graph (CG) to encapsulate
all feasible solutions of the original STG in reduced size, which compares
favorably with the state graph approach. The requirements of speed independent
circuits can then be completely transformed into the CG. Furthermore, we derive
a necessary and sufficient condition for speed independent implementation
based on a predefined general circuit model, which has not yet been reported.
With CGs and this condition, we develop a heuristic synthesis algorithm which
derives solutions similar to the state-graph approach while requiring
significantly less CPU time.
-
9C.1 A Mapping from Sequence-Pair to Rectangular Dissection
- Hiroshi Murata, Kunihiro Fujiyoshi, Tomomi Watanabe, Yoji Kajitani
A fundamental issue in floorplanning is in how to represent candidate solutions.
Recently, a representation called sequence-pair is proposed [1]. Seq-pair is
so general as to represent an area minimum placement, and also efficient
because it does not represent any overlapping placement. However, seq-pair is
not expressive enough since channels are not represented. This paper gives a
mapping from seq-pair to rectangular dissection, which represents channels by
line segments. Consequently, candidate arrangements of modules and channels
are successfully represented with the generality and the efficiency inherited
from the seq-pair.
-
9C.2 Solving Constrained Via Minimization by Compact Linear Programming
- C.-J. Richard Shi
Via minimization is an important problem in integrated circuit layout and
printed circuit board design. In this paper, a linear (non-integral) programming
approach to two-layer constrained via minimization (CVM) is presented. The
approach finds optimum solutions for routings containing no more than three way
splits, and guarantees provably good results for the general case. Most
importantly, the size of linear programming formulation is polynomial in terms
of the size of the CVM problem. The significance of our work lies in three
aspects. First, since linear programming can be solved in polynomial time, our
work thus provides, for the first time, a mathematical programming solution
with computational efficiency comparable to known combinatorial CVM algorithms.
Second, our compact linear programming approach is provably good and natural
for general CVM, while previous restricted CVM algorithms are difficult to be
extended to the general case. Third, our approach can handle additional
constraints in a unified manner, and thus provides an efficient method for
performance-driven layer assignment. Our approach is based on some new
graph-theoretic and polyhedron-combinatorial results presented in this paper
on the structure of the CVM problem.
-
9C.3 Efficient Routability Checking for Global Wires in Planar Layouts
- Naoyuki Iso, Yasushi Kawaguchi, Tomio Hirata
In VLSI and printed wiring board design, routing process usually consists of
two stages: the global routing and the detailed routing. The routability
checking is to decide whether the global wires can be transformed into the
detailed ones or not. In this paper, we propose two graphs, the capacity
checking graph and the initial flow graph, for the efficient routability
checking.
-
9C.4 Topological Routing Path Search Algorithm with Incremental Routability
Test
- Toshiyuki Hama, Hiroaki Etoh
This article describes a topological routing path search algorithm embedded in
our auto-router for printed circuit boards. The algorithm searches for a
topological path that is guaranteed to be transformable into a physical wire
satisfying design rules. We propose a method for incrementally verifying design
rules during topological path search in a graph based on constrained Delaunay
triangulation, and describe several improvements to the routing path search
algorithm that remedy the overhead of the routability test and avoid
combinatorial explosion.
VHDL Analog and Mixed-Signal Extensions Through Examples
- Alain Vachoux
VHDL 1076.1 denotes an effort to enhance the IEEE VHDL 1076 standard with
analog and mixed-signal capabilities. At the time of this writing (November
1995), the development of the extensions is nearing completion. An IEEE ballot
to adopt the extensions as a new IEEE standard should happen in Q1 97. This
tutorial provides an overview of the proposed extensions through a number of
simple, but characteristic, examples.
|