ABSTRACTS ASPDAC 97

Sessions: [Keynote I] [1A] [1B] [1C] [1D] [2A] [2B] [2C] [2D] [3A] [3B] [3C] [3D] [Keynote II] [4A] [4B] [4C] [4D] [5A] [5B] [5C] [5D] [6A] [6B] [6C] [6D] [Keynote III] [7A] [7B] [7C] [7D] [8A] [8B] [8C] [8D] [9A] [9B] [9C] [9D]


Keynote Address I:

Microelectronics Evolution Brings Real Multimedia Era
Tatsuo Izawa, NTT Science and Core Technology Laboratory Group, Japan

Attractive services and contents will be realized by the development of software technology, and the reasonable price will be realized by the further development of microelectronics hardware technology. For example, the higher speed and lower power consumption technogies increase simultaneous transmission signals over a single channel and as the result the service cost decreases.


Session 1A : Rapid Prototyping

1A.1 A Co-evaluation of FPGA Architectures and the CAD System for Telecommunication
Tsunemasa Hayashi, Atsushi Takahara, Ken-nosuke Fukami

We propose an FPGA architecture for next generation B-ISDN telecommunications systems. Such a system requires an FPGA in which an over 10K gates circuit can be implemented and that has a clock cycle rate of 80MHz. While the FPGA architecture has been discussed in terms of its circuit structure, we consider the circuit structure of the FPGA with its CAD tools. We evaluate several FPGA logic-element structures with a technology mapping method. From our experiments, the Multiplexor based logic-element is found to be suitable for implementing such a high-speed circuit using the BDD-based technology mapping method.

1A.2 A Rapid Prototyping Method for Top-Down Design of System-on-Chip Devices Using LPGAs
Fumio Suzuki, Katsuhiko Seo, Hisao Koizumi, Masanobu Hiramine, Hiroto Yasuura, Kazuo Okino, Zvi Or-Bach

This paper proposes two methods for a rapid prototyping of top-down System-On-Chip(SOC) design using laser programmable gate arrays (LPGAs). The first one is a design flow of SOC consisting of four steps: concept-making, virtual-world prototyping, synthesis, and real-world prototyping. The steps can be undertaken individually or in tandem and provisional product models are transformed from upper stream (concept-making) to lower stream (synthesis) either automatically or semi-automatically. This method differs from ordinary rapid prototyping methods in that design evaluation is shifted more upper stream. The SOC device is manufactured early; The steps follow concept-making, virtual-world prototyping, synthesis and real-world prototyping to synthesis). This method allows the device to be evaluated in the actual operating environment. The second method we propose is based on LPGAs and use of a real-time production fabrication system (FPFs). With these design methods and environment, we can get a shortest time to market which offers exciting audio and video capabilities while giving designers the flexibility they need to rapidly produce innovative and creative products. This paper describes on the application of these methods to develop video signal processors for LCD projectors, demonstrating their efficiency for design tuning and performance optimization.

1A.3 Performance Test of Viterbi Decoder for Wideband CDMA System
Jang-Hyun Park, Yeo-Chul Rho

This paper describes the design, the implementation, and the performance test of the Serial Viterbi decoder (SVD) using VHDL and FPGAs. The decoding scheme assumes the transmitted symbols were coded with a K=9, 32Kbps, and rate 1/2 convolutional encoder with generator function g0=(753)8 and g1=(561)8 as defined JTC TAG-7 W-CDMA PC8 standard. The SVD is designed using VHDL and implemented using FPGAs. Main algorithm except memories is implemented in two Altera FLEX81500 FPGAs. And the performance test results with 3DB Gaussian noises show that the function of SVD works well.


Session 1B : Delay Estimation and Optimization

1B.1 Delay Estimation on Optimization of Logic Circuits: A Survey
Masahiro Fujita, Rajeev Murgai

Logic synthesis has two stages of optimization: technology-independent and technology-dependent. This paper surveys state-of-the-art methods for estimation and optimization of delays of logic circuits at technology-independent stage. Although at this stage we cannot completely predict final delays after technology mapping, there exist reasonably accurate estimation techniques. Final delays can be reduced with optimization techniques that use such estimation.

1B.2 Delay Estimation for Technology Independent Synthesis
Yutaka Tamiya

This paper proposes "path mapping", a method of delay estimation for technology independent combinational circuits. Path mapping provides fast and accurate delay estimation using the common ideas with the tree covering based technology mapping. First, path mapping does technology mapping for all paths in the circuit with minimum delay. Then, it finds the most critical path among all the paths in the circuit. Finally, it answers its path delay as the circuit delay. Experimental results show path mapping estimates more accurate circuit delay than unit delay, and runs much faster than the technology mapper.

1B.3 Performance and Reliability Driven Clock Scheduling of Sequential Logic Circuits
Atsushi Takahashi, Yoji Kajitani

It is known that the clock-period in a sequential circuit can be shorter than the maximum signal delay between registers if the clock arrival time to each register is controlled. We propose an algorithm to find the minimum clock-period of a circuit whose signal propagation delays are given. Experimental results on LGSynth93 benchmarks show that this technique achieves as much as about 16% reduction of clock-period compared with the conventional maximum signal delay based methods. An application of this technique to improve the reliability of circuits is considered.


Session 1C : Circuit Partitioning

1C.1 CBLO: A Clustering Based Linear Ordering Algorithm for Netlist Partitioning
K.S. Seong, C.-M. Yung

In this paper, we propose a clustering based linear ordering algorithm which consists of global ordering and local ordering. In the global ordering, the algorithm forms clusters from n given vertices and orders the clusters. In the local ordering, the elements in each cluster are linearly ordered. The linear order, thus produced, is used to obtain optimal k-way partitioning based on scaled cost objective function. Experiment with 11 benchmark circuits for k-way ( 2 <= K <= 10) partitioning shows that the proposed algorithm yields an average of 10.6% improvement over MELO for the k-way scaled cost partitioning.

1C.2 Design Driven Partitioning
Dirk Behrens, Robert Tolkiehn, Erich Barke

A new approach for partitioning VLSI digital integrated circuits is presented. In contrast to known approaches, which use only topological information, the presented method also exploits specific information about design modules and higher level design structure. Based on this knowledge, the design driven procedure creates a cluster structure that incorporates the inherent design relationships (e.g. signal flow, logic blocks) in the best way possible. Followed by standard iterative improvement algorithms partitions are produced that outperform many partitioning approaches published before. Because of its linear time complexity the presented clustering strategy is able to handle very large designs. Due to its modular structure it can be easily extended to incorporate special design features or target architectures such as emulation systems.

1C.3 A RTL Partitioning Method with a Fast Min-Cut Improvement Algorithm
Kenichi Kawaguchi, Chie Iwasaki, Michiaki Muraoka

A design flow with register-transfer-level (RTL) partitioning and a RTL partitioning algorithm for efficient logic synthesis and layout are described in this paper. Changing the parameter of partitioning optimization dynamically, the algorithm improves an interconnection cost in a short CPU time. Experimental results on large circuits show that the algorithm partitioned circuits with the large number of RTL components in a tenth to a hundredth of conventional partitioning times.

1C.4 Acceleration of Mincut Partitioning using Hardware CAD Accelerator TP5000
Masahiro Sano, Shintaro Shimogori, Fumiyasu Hirose

This paper presents a new approach of data pipelining for mincut partitioning acceleration using a parallel computer. We choose the hardware CAD accelerator TP5000 to implement our approach. We obtain a speed improvement of 20 to 25 times as fast as a SPARCStation-10 by using 10 processors in the TP5000.


Session 1D : Invited Talk

Computing Brokerage and Its Applications in VLSI Design
Youn-Long Lin

With Internet access available to virtually every one in this community,it is interesting to investigate on how Internet will affect the future of VLSI design and CAD. We will describe an experimental WWW-based computing broker. Theoretically, the broker is capable of providing every user with access to any hardware platforms and any software over the Internet. It makes possible pay-per-use of both hardware and software resources. It also automatically manages multiple resources ranging from a few seats within an organization to thousands of seats anywhere with the Internet access. This new model of resource usage will have significant impact on the users, the software developers, and the computer vendors. Users no longer have to own nor maintain expensive computers and software tools before they can start their projects. They will have more flexibility in allocating resources to meet the project schedule. Also they will be able to access to the latest technology at lower overall cost. Tool developers and computer vendors will have broader customer base with very little marketing and field support effort. This new model will also provide a better chance for new tools and new platforms.
Key Words:Computing Brokerage; WWW; Internet; CAD; VLSI Design; Pay-per-Use;


Session 2A : Application Specific Design

2A.1 A Programmable Application-Specific VLSI Architecture and Implementation for Speech Word-Recognizer
An-Nan Suen, Jhing-Fa Wang, Tswen-Duh Wang

In this paper, the efficient and flexible VLSI architecture and implementation for the voice word-recognizer processor are presented. In order to achieve a exible and efficient VLSI realization, we use a programmable with specific core design strategy which incorporates the best aspects of both programmable and application specific signal processors to achieve high speed, high accuracy, and efficient hardware realization for the word-recognizer. On the whole, the single chip is fabricated in 0.8 um double-metal CMOS technology after the physical design and circuit verification. The chip can process 40 MHz sampled data and it contains about 70000 transistors which occupy 0.62.x0.60 cm2 area.

2A.2 A High Performance FIR Filter Dedicated to Digital Video Transmission
Shun Morikawa, Keisuke Okada, Isao Shirakawa, Sumitaka Takeuchi

A digital filter is one of the fundamental elements in the digital video transmission, and a multiplier acts as the key factor that determines the operation speed and silicon area of the filter. Even though the coeficients to the filter are desired to be programmable, it is possible to change coeficients in the vertical y-back interval of television receivers. This allows the preloadability of coeficients to the filter such that each coeficient can be treated as a constant during the filtering operation. Motivated by such functionalities, a novel multiplier together with an FIR filter architecture is described, which has been designed by means of a 0.5um double metal CMOS technology.

2A.3 An Efficient Hierarchical Clustering Method for the Multiple Constant Multiplication Problem
Akihiro Matsuura, Mitsuteru Yukishita, Akira Nagoya

In this paper, we propose an efficient solution for th Multiple Constant Multiplication (MCM) problem. The method exploits common subexpressions among constants based on hierarchical clustering and reduce the number of shifts, additions, and subtractions. The algorithm defines appropriate weights which indicate the operation priorities and selects the common subexpressions which results in the least number of local operations. It can also be extended to various high-level synthesis tasks such as arbitrary linear transforms. Experimental results show the effectiveness of our method.

2A.4 Structural Approach for Performance Driven ECC Circuit Synthesis
Chau-Chin Su, Kathy Y. Chen, Shyh-Jye Jou

ECCGen is a logic synthesizer for error control coding circuits. It takes H matrices as inputs and produces circuit schematics in two steps, literal minimization and gate/pin assignment. Different from conventional logic synthesis tools, it takes a structural approach to avoid the combinatorial explosion problem in Boolean function and/or true table representations of ECC circuits. Moreover, the structural approach also reduce the complexity of timing and area optimization significantly when multiple-input exclusive-or gates are used. The test results show that ECCGen achieves a reduction of 57% in transistor count and 15% in delay time on thirteen industrial ECC circuits.


Session 2B : Power: Estimation and Synthesis

2B.1 Statistical Estimation of Combinational and Sequential CMOS Digital Circuit Activity Considering Uncertainty of Gate Delay
Tan-Li Chou, Kaushik Roy

While estimating glitches or spurious transitions is challenge due to signal correlations, the random behavior of logic gate delays makes the estimation problem even more difficult. In this paper, we present statistical estimation of signal activity at the internal and output nodes of combinational and sequential CMOS logic circuits considering uncertainty of gate delays. The methodology is based on the stochastic models of logic signals and the probabilistic behavior of gate delays due to process variations, interconnect parasitics, etc. We propose a statistical technique of estimating average-case activity, which is flexible in adopting different delay models and variations. Experimental results show that the uncertainty of gate delays makes a great impact on activity at individual nodes (more than 100%) and total power dissipation (can be overestimated up to 65 %) as well.

2B.2 An Entropy Measure for Power Estimation of Boolean Functions
Chi-Hong Hwang, Allen Chung-Hao Wu

In this paper, we present a study on the relationship between entropy and the average power consumption of circuits generated from Boolean functions. Based ona general-delay model, an entropy-based formulation for power estimation is derived from a large set of experimental data. The study shows that the entropy measure provides an effective power estimate for single-output and fully-correlated multiple-output functions. The study also shows that if entropy is used as a power measure, the internal structure of a circuit must be considered inorder to achieve accurate power estimates for non-correlated multiple-output functions. Experiments on a set of benchmarks demonstrate that combining entropy-based power measures with input-output correlation analyses of logic functions leads to a viable measure for high-level power estimation.

2B.3 An Enhanced Iterative Improvement Method for Evaluating the Maximum Number of Simultaneous Switching Gates for Combinational Circuits
Kai Zhang, Haruhiko Takase, Terumine Hayashi, Hidehiko Kita

This paper presents an enhanced iterative improvement method with multiple pins (EIIMP) to evaluate the maximum number of simultaneous switching gates. Although the iterative improvement method is a simple algorithm, it is powerful to this purpose. Keeping this advantage, we enhance it by two points. The first one is to change values for multiple successive primary inputs at a time. The second one is to rearrange primary inputs on the basis of the closeness that represents the number of overlapping gates between fan-out regions. Our method is shown to be effective by experiments for ISCAS benchmark circuits.

2B.4 A Power Driven Two-Level Logic Optimizer
Jyh-Mou Tseng, Jing-Yang Jou

In this paper we present Boolean techniques for reducing the power consumption in two-level combinational circuits. The two-level logic optimizer performs the logic minimization for low power targeting static PLA, general logic gates and dynamic PLA implementations. We modify Espressa algorithm by adding our heuristics that bias the logic minimization toward lowering the power dissipation. In our heuristics, signal probablities and transition densities are two important parameters. The experimental results are promising.

2B.5 A Note on the Relationship Between Signal Probability and Switching Activity
Massoud Pedram, Qing Wu, Xunwei Wu

In current probability calculation algorithms for power estimation, switching activity ESW of a node is calculated from its signal probability p by the following simple relation: ESW = 2p(1-p). It is generally understood that this simple relationship holds under the temporal independence assumption for the node. This paper however shows that the above equation also gives the expected value of the transition activity in any sequence that satisfies the given signal probability (averaged over all such sequences). Therefore, this equation can be used to calculate the switching activity under more general conditions than previously thought.


Session 2C : Timing-Driven Layout

2C.1 Modeling and Layout Optimization of VLSI Devices and Interconnects in Deep Submicron Design
Jason Cong

This paper presents an overview of recent advances on modeling and layout optimization of devices and interconnects for high-performance VLSI circuit design under the deep submicron technology. First, we review a number of interconnect and driver/gate delay models, which are most useful to guide the layout optimization. Then, we summarize the available performance optimization techniques for VLSI device and interconnect layout, including driver and transistor sizing, transistor ordering, interconnect topology optimization, optimal wire sizing, optimal buffer placement, and simultaneous topology construction, buffer insertion, buffer and wire sizing. The efficiency and impact of these techniques will be discussed in the tutorial.

2C.2 A New Layout-Driven Timing Model for Incremental Layout Optimization
Fang-Jou Liu, John Lillis, Chung-Kuan Cheng

In this paper we present a new layout-driven timing model based on Asymptotic Waveform Evaluation (AWE) for improved timing analysis during routing. Our model enables the bottom-up computation of interconnect tree moments, and can be easily integrated with such a global router. Such an integration achieves incremental layout optimization, i.e., timing analysis and routing are tightly coupled, with feedback between them. This achieved incremental layout optimization, through our innovative timing model, is the main contribution of this work.

2C.3 Par-POPINS: A Timing-Driven Parallel Placement Method with the Elmore Delay Model for Row Based VLSIs
Tetsushi Koide, Mitsuhiro Ono, Shin'ichi Wakabayashi, Yutaka Nishimaru

In this paper, we present a parallel algorithm running on a shared memory multi-processor workstation for timing driven standard cell layout. The proposed algorithm is based on POPINS2.0 [13] and consists of three phases. First, we get an initial placement by a hierarchical timing-driven mincut placement algorithm. At the top level of partitioning hierarchy, we perform one step of bi-partitioning by several processors, and in the lower levels of partitioning hierarchy, partitionings of each region in a level are performed in parallel. Next, in phase 2, iterative improvement of the sub-circuit which contains critical paths is performed by nonlinear programming. Parallel processing is realized by performing the nonlinear programming method to each sub-circuit in parallel. Finally, in phase 3, the placement is transformed to a row based layout style by a timing-driven row assignment method. We have implemented the proposed method on a 4CPU multi-processor workstation and showed that the proposed method is promising through experimental results.


Session 2D : Invited Talk

JavaTM in Electronic Design Automation
Pete Denyer, Jean Brouwers

Increasing design complexity and the need for multi-disciplinary / multi-national design collaboration is causing a paradigm shift in the EDA application environment. This shift is necessary in order that time-to-profit goals are met in increasingly compressed market windows. The envisioned paradigm shift is enabled through Sun's JavaTM technology. This technology will impact significantly the development, deployment, use and support of Electronic Design Automation (EDA) applications. This paper will examine some of the factors influencing this forthcoming EDA revolution and review some of the challenges yet to be resolved.


Session 3A : Co-Design Experience

3A.1 Polling-based Real-time Software for MPEG2 System Protocol LSIs
Jiro Naganuma, Makoto Endo

This paper proposes polling-based real-time software for MPEG2 System protocol LSIs, which is a typical embedded and real-time system on a chip, and demonstrates its performance and usefulness. The polling-based real-time software is designed and optimized by analyzing application specific function requirements and deciding scheduling intervals and the execution cycles of each task. It requires neither hardware for multiple interrupt handling nor software for heavy context switching. The polling-based approach provides sufficient performance without any hardware and software overhead for a real-time application like the MPEG2 System protocol.

3A.2 Synthesis and Analysis of an Industrial Embedded Microcontroller
Ing-Jer Huang, Li-Rong Wang, Yu-Min Wang

This paper presents a case study of synthesis and analysis of the industrial embedded microcontroller HT48100, using the hardware/software co-synthesis tool (PIPER-II) for microcontrollers/microprocessors. The synthesis tool accepts as input the instruction set architecture (behavioral) specification, and produces as outputs the pipelined RTL designs with their simulators, and the reordering constraints which guide the assembler how to generate code for the synthesized designs. The study shows that the synthesis approach was able to help the original design team to evaluate their design quality, analyze the architectural properties and explore possible architectural improvements and their impacts in both hardware and software. Feasible future upgrade for the microcontroller family is identified by the study. Further cooperation with the design team has been undertaken to integrate the synthesis methodology into their design flow.

3A.3 ASAver.1: An FPGA-Based Education Board for Computer Architecture/System Design
Hiroyuki Ochi

This paper proposes a new approach that makes it possible for every undergraduate student to perform experiments of developing a pipelined RISC processor within limited time available for the course. The approach consists of 4 steps; at the first step, modeling of pipelined RISC processor is simplified by avoiding structural hazard and by ignoring other hazards, and in the succeeding steps, students learn difficulties of pipelining by themselves. An educational FPGA board ASAver.1 and results of feasibility study are also shown.


Session 3B : Design Verification -- Case Studies

3B.1 Property Verification in the Design of Telecom Applications
M. Bombana, P. Cavalloro, F. Ferrandi

The industrial interest in the application of formal methods in the design of complex ASICs is noteworthy to improve the efficiency of the design process (reduced time-to-market) and to increase the quality of the final products (increased competitive profile). In this paper we focus our attention on design capture and functional verification, two critical phases in the current design methodologies. A modular toolset built around a model checker is described. A telecom co-processor is presented, and general properties derived. A user-oriented taxonomy of properties is introduced to support the design practice. Guidelines for the application of this technique are inferred from the example and generalized.

3B.2 Verification Methodology of Compatible Microprocessors
Joon-Seo Yim, Chang-Jae Park, Woo-Seung Yang, Hun-Seung Oh, Hee-Choul Lee, Hoon Choi, Tae-Hoon Kim, Seung-Jong Lee, Nara Won, Yung-Hee Lee, In-Cheol Park, Chong-Min Kyung

As the complexity of high-performance microprocessor increases, functional verification becomes more difficult and emerges as the bottleneck of the design cycle. In this paper, we suggest a functional verification methodology, especially for the compatible microprocessor design. To guarantee the perfect compatibility with previous microprocessors, we developed three C models in different representation levels, i.e., Polaris, MCV(Micro-Code Verifier) and StreC. C models are co-simulated with consistency checking between different two models. The simulation speed of C models makes it possible to test the "real-world" application programs on the RTL design with a software board model. To increase the confidence level of verifications, Profiler reports the verification coverage of the test vector, which is fed back to the automatic test program generator. Restartability feature also helps significantly reduce the total simulation time. Using the proposed verification methodology, we designed and verified an Intel 486-compatible microprocessor successfully.

3B.3 RTL Verification of Timed Asynchronous and Heterogeneous Systems using Symbolic Model Checking
Peter A. Beerel, Vida Vakilotojar

This paper describes a tool-supported methodology for the register-transfer-level formal verification of a growing hardware design paradigm-timed asynchronous systems. These systems are a network of communicating asynchronous and synchronous components and have correctness constraints that depend on specified bounded delays. This paper formalizes the verification problem and demonstrates how time-discretization, abstraction, and non-determinism can lead to a system model comprised of communicating finite state machines composed synchronously. The paper then describes a translator that accepts structural VHDL system description along with controller specifications and generates the input to a symbolic model checker (SMV). Finally, we describe two case studies in which concurrent verification and design led to the correction of many errors not easily found using simulation.


Session 3C : Circuit Modeling

3C.1 CB-Power: A Hierarchical Cell-Based Power Characterization and Estimation Environment for Static CMOS Circuits
Wen-Zen Shen, Jiing-Yuan Lin, Jyh-Ming Lu

In this paper, we present CB-Power, a hierarchical cell-based power characterization and estimation environment for static CMOS circuits. The environment is based on a cell characterization system for timing, power and input capacitance and on a cell-based power estimator. The characterization system can characterize basic, complex and transmission gates. During the characterization, input slew rate, output loading, capacitive feedthrough effect and the logic state dependence of nodes in a cell are all taken into account. The characterization methodology separates the power consumption of a cell into three components, e.g., capacitive feedthrough power, short-circuit power, and dynamic power. With the characterization data, a cell-based power estimator (CBPE) embedded in Verilog-XL is used for estimating the power consumption of a circuit. CB-Power is also a hierarchical power estimator. Macrocells such as flip-flops and adders are partitioned into primitive gates during power estimation. Experimental results on a set of MCNC benchmark circuits show that CB-Power provides within 6% error of SPICE simulation on average while the CPU time consumed is more than two orders of magnitude less.

3C.2 Power Consumption in CMOS Combinational Logic Blocks at High Frequencies
Sri Parameswaran, Hui Guo

A new model for estimating dynamic power dissipation in CMOS combinational circuits at differing voltages is presented in this paper. The proposed model deals with power dissipation of circuits at saturation frequencies, where the output voltage does not reach 100% of the supply voltage and the output voltage waveform is almost a triangular waveform. In this paper we show that the dynamic power consumption at saturation frequencies is only dependent on the supply voltage, and is independent of load capacitance and switching speed. This model shows that when a circuit is working in the saturation frequency range, as the frequency is increased, the performance/power ratio is increased. However, this increase in performance/power ratio is at the expense of noise margin. The model is theoretically and empirically shown to be correct. This model can be used to design a system where the differing combinational logic blocks are supplied with differing voltages. Such a system would consume lower power than if the system was supplied by a single voltage rail.

3C.3 A New Approach for an AHDL Based on System Semantics
Youcef Bourai, Nouma Izeboudjen, Yacine Bouhabel, Amine Tafat

A new approach for Analog Hardware Design Language (AHDL) is presented. This is based on system semantics principle. This principle allows to define a language that provides a unified syntex to describe the different aspects of a Op_Amp. This is applicable by considering that the basic components of an Op_Amp are adirectional systems. These components are described by combinators. A set of semantic functions are applied on these combinators to give them a meaning.


Session 3D : Special Session - Printed Circuit Board (PCB) Design and Electro-magnetic Compatibility

3D.1 EMC-Adequate Design of Printed Circuit Board as a Part of the System Development
W. John

The EMC-adequate design of microelectronics systems includes all actions intended to eliminate electromagnetic interference in electronic systems. Challenges faced in the mircoelectronic area include growing system complexity, higher operating speed, denser design at all levels of integration (chip,printed circuit board, MCM and system). Growing complexity, denser design and higer speed all lead to a substantial increase in EMC problems and design time. EMC is not commonly accepted in microelectronic design. Microelectronic designers have the opinion the EMC has to do with electrical and electronic systems and mandatory product regulations instead of requirements to the integrated circuit they are designing. In this contribution a concept for an EMC-adequate design of electronic systems will be introduced. This concept is based on a generalized development process to integrate EMC-constraints into system design. A prototype of an environment to analyse signal integrity effects on PCB based on a workflow oriented integration approach will be introduced. Based on this approach the generation of user specific design and anlysis environments including various set of EMC-tools is possible.

3D.2 Multi-Pride: A System for Supporting Multi-Layered Printed Wiring Board Design
Toshimasa Watanabe

The purpose of the paper is to outline MULTI-PRIDE, a system for supporting multi-layered printed wiring board design. It consists of (i) circuit bipartition, (ii) placement and routing on each outside layer, (iii) modification of wiring and compaction, and (iv) routing on inside layers.

3D.3 Crosstalk Noise in High Density and High Speed Interconnections due to Inductive Coupling
Tetsuhisa Mido, Kunihiro Asada

Crosstalk noise in long interconnections is studied based on capacitive coupling and inductive coupling. It is shown that pulse noise is induced due to inductive coupling in heterogeneous insulators. The pulse noise becomes predominant noise factor on the condition that lines become longer, line resistance become lower, the signal raising time becomes faster and dielectric constant of materials in the gaps on lines becomes smaller.


Keynote Note Address II

CAD Methodology and Business Models for Future Products
Daniel D. Gajski

The advances in CAD tools and fabrication technology allow system companies today to offer new product models every year. This short design and manufacturing cycle forces system companies to rethink their product development methologies and business models. The design technology plays an essential role in this new mode of operation. In this talk we will give a brief overview of the past trends and speculate on the future trends based on the new focus of design automation on complete product concept, executable specification, electrical/mechanical, software/hardware codesign and product-level synthesis, validation and integration.


Session 4A : Analysis and Trade-Offs in System Synthesis

4A.1 Embedded Architectural Simulation within Behavioral Synthesis Environment
A. Jemai, P. Kission, A.A. Jerraya

This paper introduces one way to integrate an interactive simulator within a behavioral synthesis tool, thereby allowing concurrent synthesis and simulation. Such a simulator performs dynamic analysis and execution time evaluation. This paper also discusses an implementation of this concept resulting in a simulator, called AMIS. This tool assists the designer for understanding the results of behavioral synthesis and for architecture exploration.

4A.2 Evaluating Cost-Performance Tradeoffs for System Level Applications
Wek-Liang Ing, Cheng-Tsung Hwang, Allen Chung-Hao Wu

Evaluation of design cost and performance is indispensable to system partitioning. In the absence of a system-level estimation and analysis tool, system partitioning is difficult to perform in an efficient and accurate manner because design evaluation can only be done after the final results are achieved. Furthermore, without cost-performance tradeoff information relating to different design alternatives, the designer can not make intelligent design decisions at the early system-level partitioning stages. In this paper, we present a system-level cost/performance evaluation approach which systematically explores the AT (Area-Time) design-space from a system description. This allows the designer to obtain first-hand design tradeoff information before the partitioning process has taken place. We have also developed a system-level interactive design evaluation system on top of the proposed approach. Experiments on a number of examples demonstrate that our approach provides the designer with a comprehensive system-level design evaluation method to effectively explore all possible design alternatives in the early stages of system development.

4A.3 A Quantitative Analysis for Optimizing Memory Allocation
Youn-Sik Hong, Choong-Hee Cho, Daniel D.Gajski

Memory allocation problem has two independent goals: minimization of number of memories and minimization of number of registers in one memory. Our concern is the ordering of bindings during memory allocation. We formulate and analyze three different memory allocation algorithms by changing their binding order. It is shown that when we combine these subtasks and solve them simultaneously by heuristic cost function significant savings (up to 20%) can be obtained in the total area of memories.


Session 4B : Technology Mapping

4B.1 Concurrent Cell Generation and Mapping for CMOS Logic Circuits
Mineo Kaneko, Jialin Tian

The conventional technology mapping method is selecting cells from a limited standard library, and the performance of the resultant circuit deeply depends on the characteristics of the library. To realize detailed optimization not limited by an instance of cell library and to reduce the maintenance cost of standard cell libraries, a novel paradigm for technology mapping, in which cell generation and mapping can be executed concurrently, will be considered. This paper shows an outline of a concurrent cell generation and mapping strategy, and proposes a method to map an input Boolean network into CMOS transistor network. The transduction in transistor level is introduced for cell generation and the Dynamic Programming is utilized for cell assignment.

4B.2 Logic Synthesis for Cellular Architecture FPGAs Using BDDs
Gueesang Lee

In this paper, an efficient approach to the synthesis of CA(Cellular Architecture)-type FPGAs is presented. To exploit the array structure of cells in CA-type FPGAs, logic expressions called Maitra term s, which can be mapped directly to the cell arrays are generated. In this approach, a BDD is modified so that each node of the BDD has another branch which is an exclusive-OR of the two branches of a node. Once the modified BDD is obtained, a traversal of the BDD is sufficient to generate the Maitra terms needed. Since a BDD can be traversed in O ( n ) steps, where n is the number of nodes in the BDD, Maitra terms are generated very efficiently. This also removes the need for generating minimal SOP or ESOP expressions which can be costly in some cases. The experiments show that the proposed method generates better results than existing methods.

4B.3 BDD Based Lambda Set Selection in Roth-Karp Decomposition for LUT Architecture
Jie-Hong Jiang, Jing-Yang Jou, Juinn-Dar Huang, Jung-Shian Wei

Field Programmable Gate Arrays (FPGA's) are important devices for rapid system prototyping. Roth-Karp decomposition is one of the most popular decomposition techniques for Look-Up Table (LUT)-based FPGA technology mapping. In this paper, we propose a novel algorithm based on Binary Decision Diagrams (BDD's) for selecting good lambda set variables in Roth-Karp decomposition to minimize the number of consumed configurable logic blocks (CLB's) in FPGAs. The experimental results on a set of benchmarks show that our algorithm can produce much better results than those of the previous approach [1].


Session 4C : Floorplanning and Placement

4C.1 General Floorplanning with L-shaped, T-shaped and Soft Blocks Based on Bounded Slicing Grid Structure
Maggie Kang, Wayne Wei-Ming Dai

A new method of non-slicing floorplanning is proposed, which is based on the new representation for non-slicing floorplans proposed by [1], called bounded slicing grid (BSG) structure. We developed a new greedy algorithm based on the BSG structure, running in linear time, to select the alternative shape for each soft block so as to minimize the overall area for general floorplan, including non-slicing structures. We propose a new stochastic optimization method, named genetic simulated annealing (GSA) [3] for general floorplanning. Based on BSG structure, we extend SA-based local search and GA-based global crossover to L-shaped, T-shaped blocks and obtain high density packing of rectilinear blocks.

4C.2 A Building Block Placement Tool
Jonathan Dufour, Robert McBride, Ping Zhang, Chung-Kuan Cheng

When designing integrated circuits, sub-components rarely end up being perfectly rectangular. However, currently most block-placers only consider rectangular components, resulting in inefficient area utilization. We propose a placement tool that allows arbitrarily sized and shaped convex components. It extends the rectangle-packing method proposed by Kajitani. We describe the methods used to create the placement and give some performance results.

4C.3 VEAP: Global Optimization based Efficient Algorithm for VLSI Placement
Kong Tianming, Hong Xianlong, Qiao Changge

In this paper we present a very simple, efficient while effective placement algorithm for Row-based VLSIs. This algorithm is based on strict mathematical analysis, and provably can find the global optima. From our experiments, this algorithm is one of the fastest algorithms, especially for very large scale circuits. Another point desired to point out is that our algorithm can be run in both wirelength and timing-driven modes.

4C.4 An Improved Objective for Cell Placement
Yu-Wen Tsay, Hsiao-Pin Su, Youn-Long Lin

To estimate the wiring area needed by the router to connect a signal net, most placement tools measure one half of the perimeter of the minimum rectangle enclosing all terminals of the net. In the past, this approach is reasonable because the half-perimeter value correlates well with the wiring area. As we are entering the deep-submicron era, the approach is no longer appropriate because the wiring delay must be characterized based on a distributed-RC model, in which not only the wiring area but also the wiring topology affects the wiring delay. In this paper, we show that the half-perimeter metric does not correlate well with the wiring delay under the distributed-RC model. We show that the radius of a net estimates the wiring delay more accurately than the half-perimeter metric does. We expand the acceptance criteria of a simulated annealing based placement tool to include moves that do not improve on the wiring length but do reduce the radius. Over all, for a set of benchmark circuits the critical path delays are improved up to 15%.


Session 4D: University LSI Design Contest Presentation

4D.1 HK386: An x86-Compatible 32bit CICS Microprocessor
C.M. Kyung, I.C. Park, S.K. Hong, K.S. Seong, B.S. Kong, S.J. Lee, H. Choi, S.R. Maeng, D.T. Kim, J.S. Kim, S.H. Park, Y.J. Kang

In this paper, we describe the implementation and design methodology of a microprocessor, called HK386. The microprocessor is compatible with Intel 80386 with respect to the behavior of each instruction set. As the extraction of the exact behavior of each instruction set is the single most important step in compatible chip design, we focused our effort on establishing the reliable verification strategy ensuring the complete instruction level compatibility. The HK386 was successfully designed and fabricated using 0.8 um CMOS technology.

4D.2 Super Low Power 8-bit CPU with Pass-Transistor Logic
Kazuo Taki, Bu-Yeol Lee, Hideki Tanaka, Kenzo Konishi

A very low power 8-bit CPU core has been designed based on an original pass-transistor logic family, SPL and SPHL. The instruction set and external timings are compatible with the Zilog Z80. Average supply current is 740uA at 3V with a 10MHz-clock, equivalent to 26% of that of the commercial CMOS Z80 CPU cores using the same design rules (0.8m, w-metal).

4D.3 A Functional Memory Type Parallel Processor for Vector Quantization
K. Kobayashi, M. Kinoshita, M. Takeuchi, H. Onodera, K. Tamaru

We propose a memory-based parallel processor for vector quantization called a functional memory type parallel processor for vector quantization (FMPP-VQ). It accelerates nearest neighbor search of vector quantization. All distances between an input vector and reference vectors in a codebook are computed simultaneously in all PEs. The minimum value of all distances is searched in parallel. The nearest vector is obtained in O(k ), where k stands for the dimension of vectors. An LSI including four PEs has been implemented. It operates at 25MHz clock frequency.

4D.4 High Speed Bit-Serial Parallel Processing on Array Architecture
Kazuhito Ito, Takenobu Shimizugashira, Hiroaki Kunieda

Word-parallel bit-serial processing is a solution to high speed processing suitable for VLSI. In this paper a new bit-serial parallel processing architecture is proposed. A VLSI chip for a digital filter is designed based on the proposed architecture and it is implemented on a gate array chip. Through the implementation, it is verified that bit-serial parallel processing on an array architecture achieves high speed processing and easy design.

4D.5 Self-Timed 1-D ICT Processor
Johnson T.C. Pang, Oliver C.S. Choy, C.F. Chan, W.K. Cham

This paper describes a LSI implementation of 1-D order-8 Integer Cosine Transform (ICT) which can calculate either forward or reverse transformation. It is a standard-cell based design using 0.7mm CMOS SDLM LM process. The chip's performance is maximized with the fast computation algorithm and self-timed circuit technique. It consists of eight parallel self-timed pipelines. Each self-timed block is designed based on 2-phase handshaking protocol and variable delay concept. The die size is 5.7x4.1mm with about 76k transistors. This chip supports 16-bit I/O data and its data rate is up to 60MHz.

4D.6 A Real-Time High Performance Edge Detector for Computer Vision Applications
Fahad Alzahrani, Tom Chen

We present a high performance edge detection architecture for real-time image processing applications. The architecture is finely pipelined. The proposed ASIC is capable of producing one edge-pixel every clock cycle. At a clock rate of 10 MHz, the architecture can process 30 frames per second, where the size of each frame is 640480 8-bit pixels. The ASIC was laid out and fabricated using Samsung's 0.8um double-metal CMOS process.

4D.7 An LSI Implementation of the Simple Serial Synchronized Multistage Interconnection Network
Takayuki Kamei, Masashi Sasahara, Hideharu Amano

A high speed switch is a critical component of multiprocessors. Multistage Interconnection Network (MIN) has been utilized as a switch for connection processors and memory modules in multiprocessors. Unlike the crossbar, it consists of small switching elements, and provides a high bandwidth with relatively small hardware. Most of traditional MINs are blocking networks and packets are transferred in the store-and-forward manner between switching elements with bit-parallel(8-64bits) lines. Since the width of communication paths and transferrd mannar cause pin-limitation problem and complicated structure, the high density implementation and high speed clock is not utilized. In order to solve these problems, we implemented the SSS-PBSF chip. This switch uses the PBSF connection structure which can obtain a higher bandwidth than that of crossbar with connecting banyan networks in 3 dimensional direction. Simple Serial Synchronized (SSS) style control mechanism is adopted both for high speed operation and solving the pin-limitation problem.

4D.8 The DRT Network Router Chip
Hiroaki Nishi, Hideharu Amano, Katsunobu Nishimura, Ken-ichiro Anjo, Tomohiro Kudoh

The RDT network Router chip is a versatile router for the massively parallel computer prototype JUMP-1, which is currently under development by collaboration between 7 Japanese universities[1]. The major goal of this project is to establish techniques for building an efficient distributed shared memory on a massively parallel processor. For this purpose, the reduced hierarchical bit-map directory (RHBD) schemes [2] are used for efficient cache management of the distributed shared memory. In order to implement (RHBD) schemes efficiently, we proposed a novel interconnection network RDT (Recursive Diagonal Torus)[3], and developed a sophisticated router chip for the RDT which equips a hierarchical multicast mechanism without deadlock and acknowledge combining mechanism. By using the 0.5uBiCMOS SOG technology, it can transfer all packets synchronized with a unique CPU clock(60MHz). Long coaxial cables(4m at maximum) are directly driven with the ECL interface of this chip. Using the dual port RAM, packet buffers allow to push and pull a it of the packet simultaneously. The mixed design approach with schematic and VHDL permits the development of the complicated chip with 90,522 gates in a year.

4D.9 Single Cycle Access Cache for the Misaligned Data and Instruction Prefetch
Joon-Seo Yim, Hee-Choul Lee, Tae-Hoon Kim, Bong-Il Park, Chang-Jae Park, In-Cheol Park, Chong-Min Kyung

In microprocessors, reducing the cache access time and the pipeline stall is critical to improve the system performance. To overcome the pipeline stall caused by the misaligned multi-words data or multi cycle accesses of prefetch codes which are placed over two cache lines, we proposed the Separated Word-line Decoding (SEWD) cache. SEWD cache makes it possible to access misaligned multiple words as well as aligned words in one clock cycle. This feature is invaluable in most microprocessors because the branch target address is usually misaligned, and many of data accesses are misaligned. 8K-byte SEWD cache chip consists of 489,000 transistors on a die size of 0.853 x 0.827 cm(2) and is implemented in 0.8 um DLM CMOS process operating at 60 MHz.

4D.10 VLSI Implementation of a Real-time Operating System
Takumi Nakano, Yoshiki Komatsudaira, Akichika Shiomi, Masaharu Imai

This paper proposes a new approach to realize a very high performance real-time OS using VLSI technology. In order to confirm the effectiveness of this method, the most basic system calls have been designed. According to the evaluation results based on a gate array implementation, hardware portion of system calls can be executed within 4 clocks and the task scheduler can be performed in only 8 clocks simultaneously, which are about 130 to 1880 times faster than software implementation.

4D.11 A CMOS Delayed Locked Loop (DLL) for Reducing Clock Skew to Under 500ps
Yong-Bin Kim, Tom Chen

This paper presents a variable delay line DLL circuit implemented in a 0.8um CMOS technology. A phase detector and two charge pump circuits calibrate the delay per stage of the delay line using push-pull type clock synchronization scheme. The delay line can be programmed 6 to 18 stages. The DLL circuit is capable of reducing clock skew from 1-3ns to below 500ps for clock frequencies from 50Mhz to 150Mhz.

4D.12 A Current Mode Cyclic A/D Converter with a 0.8um CMOS Process
Masaki Kondo, Hidetoshi Onodera, Keikichi Tamaru

We have developed a current mode cyclic analog-to-digital converter using a 0.8um CMOS process. Our circuit structure makes it possible to construct the converter without any precise analog components, hence, it is well compatible with submicron processes. The fabricated circuit has an area of 0.014mm^2 and performs 8-bit resolution at a sampling rate of 40kHz and average power dissipation of 370uW at 4V supply voltage.

4D.13 A Current-mode,3V,20MHz, 9-bit equivalent CMOS Sample-and-Hold Circuit
Yasuhiro Sugimoto, Tetsuya Iida

A new current-mode, low-power, low-voltage and high-speed CMOS sample-and-hold circuit has been designed and fabricated. A new current-mode differential switching scheme has been adopted to eliminate errors caused by feedthrough injection from the sample switches. The experimental result yields 9-bit resolution in 9mW power dissipation, in a 20MHz clock frequency from a 3V power supply.


Session 5A : Co-Design: Architecture and Partitioning

5A.1 Hardware-Software Co-design: Tools for Architecting Systems-On-A-Chip
Rajesh K. Gupta

This paper examines the issues and progress in the design of highly integrated microelectronic systems. These microsystems rely on an array of diverse components such as processors, memory, network interfaces, graphics and DSP 'cares'. In particular, we discuss problems in the combined design of hardware and software for these systems. We present a decomposition of the co-design problem, and identify the needed technologies in specification/modeling, synthesis and validation for efficient and error-free system designs. Co-design tools along with domain-specific design and methodologies provide a key advantage to the system integrator in building complex single-chip systems. We illustrate this point in the specific area of architectural evaluation using co-simulation tools.

5A.2 Trade-off Evaluation in Embedded System Design Via Co-simulation
Claudio Passerone, Luciano Lavagno, Claudio Sansoe, Massimiliano Chiodo, Alberto Sangiovanni-Vincentelli

Current design methodologies for embedded systems often force the designer to evaluate early in the design process architectural choices that will heavily impact the cost and performance of the final product. Examples of these choices are hardware/software partitioning, choice of the micro-controller, and choice of a run-time scheduling method. This paper describes how to help the designer in this task, by providing a flexible cosimulation environment in which these alternatives can be interactively evaluated.

5A.3 A Transformational Codesign Methodology
Tommy King-Yin Cheung, Graham Hellestrand, Prasert Kanthamanon

We present a hardware/software codesign methodology using formal transformations. The goal is to refine a given function specification of a task to an operational structure involving both hardware and software components. The refinement process is separated into two levels, the algorithmic and the structural. Within each level, refinement is accomplished by applying sequences of transformations that preserve the functionality of an initial specification. This allows various 'correct' design alternatives to be generated and their costs analyzed. At the algorithmic level, different algorithm designs are explored, each producing a computational schedule that has a different performance cost. At the structural level, different spatial structures with different resources and performance costs are explored. These costs which characterize the designs are used to assist in the hardware/software partitioning. An example is used throughout to illustrate this methodology.


Session 5B : Hierarchical/High-Level Testing

5B.1 A Testability Analysis Method for Register-Transfer Level Descriptions
Mizuki Takahashi, Ryoji Sakurai, Hiroaki Noda, Takashi Kambe

In this paper, we propose a new testability analysis method for Register-Transfer Level(RTL) descriptions. The proposed method is based on the idea of testability analysis in terms of data flow and control structure which can be extracted from RTL designs. We analyze testability of RTL descriptions with more testability measures than those of conventional gate-level testability, so that the method provides information for design for testability(DFT). We have implemented the presented method and experimental results show that we can reduce circuit cost for test and achieve highly testable circuits by DFT using our RTL testability analysis.

5B.2 Non-Scan Design for Testable Data Paths Using Thru Operation
Katsuyuki Takabatake, Michiko Inoue, Toshimitsu Masuzawa, Hideo Fujiwara

We present a new non-scan DFT technique for register-transfer (RT) level data paths. In the technique, we add thru operations to some operational modules to make the data path easily testable. We define a testable measure, weak testability, and consider the problem to make the data path weakly testable with minimum hardware overhead. We also define a measure to estimate the test generation time. Experimental results show the effectiveness of our technique and the proposed measure.

5B.3 Block-Level Fault Isolation Using Partition Theory and Logic Minimization Techniques
C.-J. Richard Shi

Multichip modules are emerging as a key packaging technology for mixed-signal circuits and systems. In this paper, we consider how to localize a failure within a chip boundary as rapidly as possible in order to expedite the rework process and to minimize its overall impact on manufacturing throughput and cycle time. A key contribution of this paper isto provide a unified block-level fault isolation framework for analog and digital circuits, and to show that optimum fault isolation reduces to set covering. This allows us to apply directly powerful set covering techniques and solvers developed recently in logic minimization. In addition, we present a greedy peeling heuristic with performance bound computation. Some preliminary experimental results are included to demonstrate the feasibility and performance of the proposed approach.

5B.4 The Use of Hierarchical Information to Test Large Controllers
F. Fummi, D. Sciuto

Gate-level test pattern generators require insertion of scan paths to handle the at gate-level representation of a large sequential controller. In contrast, we present a testing methodology based on the hierarchical finite state machine model. Such a model is used to specify very complex control devices by means of a top-down design approach. Our approach allows the generation of compact test sets with very high stuck-at fault coverages, without any DfT logic.

5B.5 Hierarchical Fault Tracing for VLSI Sequential Circuits from CAD Layout Data in the CAD-linked EB Test System
Katsuyoshi Miura, Koji Nakamae, Hiromu Fujioka

A previous hierarchical fault tracing method for combinational circuits which requires only CAD layout data in the CAD-linked electron beam test system is expanded as applicable to sequential circuits. The characteristics in the method remains unchanged that allows us to trace a fault hierarchically from the top level cell to the lowest primitive cell and from the primitive cell to the transistor-level circuit in a consistent manner independently of circuit functions. The applied results to the CAD layouts of some sequential CMOS benchmark circuits show our superiority in the guided-probe method where circuit logical functions are first extracted from the CAD layout data and then the guided-probe testing is executed.


Session 5C : Technology Related Issues

5C.1 Interconnect Capacitances, Crosstalk, and Signal Delay in High Speed and High Density VLSI Circuits (No Paper Submitted)
D.H. Cho, M.H. Seung, N.H. Kim, H.S. Park

5C.2 Monte Carlo Simulation for Single Electron Circuits
Masaharu Kirihara, Kenji Taniguchi

In single electron circuits composed of small tunnel junctions, capacitances, and voltage sources, a tunneling electron can be described as a discrete charge due to stochastic nature of a tunneling event. We developed a Monte Carlo simulator for the numerical study of single electron circuits because no more conventional simulation methods based on Kirchhoff's laws can be applicable. The calculated dynamic operation of a quasi-CMOS inverter reveals that ultra small load capacitors give rise to large output voltage uctuation during the logic operation. Future SET circuits should be designed with several electron logic rather than ultimate single electronic logic circuits in which a bit is represented with an electronic charge.

5C.3 Parallel Calculation of 3-D Parasitic Resistance and Capacitance with Linear Boundary Elements
Wenming Zhou, Zeyi Wang, Lan Rao

The widespread application of deep sub-micron and multilayer routing techniques makes the interconnection parasitic influence become the main factor to limit the performance of VLSI circuits. Parallel direct boundary element calculation of three-dimensional (3-D) resistance and capacitance is an important method for fast extraction. In this paper, a parallel algorithm to implement linear element calculation by using PVM (Parallel Virtual Machine, a distributed calculating software on PC network) is introduced. The hierarchical calculation scheme of the setup and solution processes of linear equations is discussed. At the end, the performance and workload balance of the algorithm are analyzed.

5C.4 Simulation of Gate Switching Characteristics of a Miniaturized MOSFET based on a Non-Isothermal Non-Equilibrium Transport Model
Won-Cheol Choi, Hirobumi Kawahima, Ryo Dang

Our device simulator is developed for the analysis of a MOSFET based on Thermally Coupled Energy Transport Model (TCETM). The simulator has the ability to calculate not only steady-state characteristics but also transient characteristics of a MOSFET. It solves basic semiconductor devices equations including Poisoon equation, current continuity equations for electrons and holes, energy balance equation for electrons and heat flow equation, using finite difference method.


Session 5D : (Special Session) Multiproject Chip Services in Asia and the South Pacific

Moderator :Hideharu Amano(Keio Univ.,) and Tokinori Kozawa(STARC)
Chairperson : Kazuhiro Ueda (Shibaura Institute of Technology)

Presented by

5D.1 The EUROPRACTICE MPC Service
C. Das

IMEC has been involved in MPC services for universities and industry since 1984. In the beginning these services have been set up to support the local educational programme. Lateron in 1989, IMEC was coordinator of the European wide MPC services in the EC funded project EUROCHIP. Today since October 1995, IMEC has been coordinator of the IC Manufacturing Service in the EC funded project EUROPRACTICE.

5D.2 Multi-Project Chip Activities in Korea-IDEC Perspective-
Chong-MIn Kyung, In-Cheol Park, Ho-Jun Song

This paper describes the current status of multi-project chip(MPC) services in Korea to promote full-custom and semi-custom IC design activities in universities. Although MPC foundry services for IC designs were started in a lesser scale more than 10 years ago, it is only recent that systematic and effective educations and MPC foundry services program called IDEC(IC design education center) was launched with the planned support of the government and three major semiconductor companies in Korea. In this paper, we introduce the activities of IDEC and other MPC foundry services currently being provided.

5D.3 Multi-Project Chip Service for University and Industry in Taiwan
Jen-Sheng Hwang

Acting as the bridge between the designers and manufacturing companies, Chip Implementation Center (CIC), founded in 1992 under National Science Council, aims at the services for the fabrication of multi-project chip, the procurement/integration of software CAD tools, and the promotion of IC design / testing / CAD software technology. To date, 2000 academic licenses of software CAD tools have been obtained and 739 chips of the academics have been fabri-cated through CIC.

5D.4 VLSI Design and Education Center (VDEC) Current Status and Future Plan
Kunihiro Asada and Koichiro Hoh

After briefly reviewing a history of VDEC, its functions and facilities are summarized, followed by future plans of chip implemantation along with a network society.


Session 6A : Simulation Environment

6A.1 Choosing a Digital Simulator
John Hillawi

This paper summarises the second in a series of benchmarking efforts conducted by DA Solutions between August 1995 and April 1996, for VHDL and Verilog simulators. The paper discusses the methodology used and the results of an independent public benchmark for leading VHDL and Verilog simulators, for RTL, Gate, VITAL and Co-simulations products. The paper also makes performance comparisons between VHDL and Verilog technologies and between PC and UNIX solutions.

6A.2 A Hardware/Software Co-simulation Environment for Micro-processor Design with HDL Simulator and OS interface
Yoshiyuki Ito, Yuichi Nakamura

We proposed a hardware/software cosimulation environment using an RTL simulator with a software language interface. The proposed simulation environment introduces the 'OS interface (OSIF),' which invokes system calls in the OS on the simulation platform to execute application software. The OSIF consists of data adaption facility and function correspondence management allowing it to cooperate with the OS of the simulation platform. We show the results of experiments with an R3000-compatible processor model. This environment verified our processor model with SPEC benchmarks that require various operating system services. For example, with a lisp interpreter program li , our detailed RTL description for the core part of R3000 was simulated only within 20 hours on a 109 MIPS workstation.

6A.3 VIDE: A Visual VHDL Integrated Design Environment
Jinian Bian, Hongxi Xue, Ming Su

In this paper, a visual VHDL integrated design environment VIDE for high level design is presented. In VIDE, there are several graphical and textual mixed design entry tools (VDES) and a graphical object-oriented debugger (VDBG). VDES consists of several diagram editors and a visual text editor, while VDBG is a debugging environment based on a hierarchical VHDL simulator. The graphical objects can be specified as a debugging target.

6A.4 Advanced Processor Design using Hardware Description Language AIDL
Takayuki Morimoto, Kazushi Saito, Hiroshi Nakamura, Taisuke Boku, Kisaburo Nakazawa

In order to design advanced processors in a short time, designers must simulate their designs and reflect the results to the designs at the very early stages. However, conventional hardware description languages (HDLs) do not have enough ability to describe designs easily and accurately at these stages. Then, we have proposed a new hardware description language AIDL. In this paper, in order to evaluate the effectiveness of AIDL, we describe and compare three processors in AIDL and VHDL descriptions.


Session 6B : Testing for Non-Conventional Fault Models

6B.1 Adaptive Models for Input Data Compaction for Power Simulators
Radu Marculescu, Diana Marculescu, Massoud Pedram

This paper presents an effective and robust technique for compacting a large sequence of input vectors into a much smaller input sequence so as to reduce the circuit/gate level simulation time by orders of magnitude and maintain the accuracy of the power estimates. In particular, this paper introduces and characterizes a family of dynamic Markov trees that can model complex spatiotemporal correlations which occur during power estimation both in combinational and sequential circuits. As the results demonstrate, large compaction ratios of 1-2 orders of magnitude can be obtained without significant loss (less than 5% on average) in the accuracy of power estimates .

6B.2 Fuzzy-based Circuit Partitioning in Built-in Current Testing
Wang-Dauh Tseng, Kuochen Wang

Partitioning a digital circuit into modules before implementing on a single chip is key to balancing between test cost and test correctness of built-in current testing (BICT). Most partitioning methods use statistic analysis to find the threshold value and then to determine the size of a module. These methods are rigid and inflexible since IDDQ testing requires the measurement of an analog quantity rather than a digital signal. In this paper, we propose a fuzzy-based approach which provides a soft threshold to determine the module size for BICT partitioning. Evaluation results show that our design approach indeed provides a feasible way to exploit the design space of BICT partitioning.

6B.3 Reducing the Complexity of Path Classification by Reconvergence Analysis
Paul Tafertshofer, Andreas Ganz, Manfred Henftling

In this paper we present a new and efficient method for path classification, i.e. for determining the set of functional unsensitizable or robust dependent paths. In a pre-processing step, the new method computes a minimal set of reconvergence regions that need to be considered for path classification. Functional sensitization is only performed for path segments contained in these regions. Thus, the complexity for path classification can be reduced from the total number of paths in the circuit to the number of paths contained in the minimal set of reconvergence regions.

6B.4 Modelling and Detection of Dynamic Errors due to Reflection - and Crosstalk-Noise
J. Schrage

A new algorithm for the generation of test sequences to detect dynamic errors due to reflection and crosstalk noise in combinational circuits is presented. Based on the circuit level a new approach for error modeling including the duration of reflection and crosstalk errors, is described. The presented algorithm takes the high influence of error durations as well as gate and transmission line delays on the testability into account.

6B.5 Fault Coverage Improvement Based on Error Signal Analysis
Mike W.T. Wong, Y. Zhou, Y.S. Lee, Y. Min

Fault-tolerant design of analog circuits is more difficult than that of digital circuits. Abhijit Chatterjee has proposed a continuous checksum-based technique to design fault-tolerant linear analog circuits. However, some faults in the passive elements cannot be detected if the checker has not been designed appropriately. This paper addresses the fault coverage issue in the continuous checksum based technique and proposes an error signal analysis based method for improving fault coverage of the checker.


Session 6C : Circuit Design and Methodology

6C.1 Low-Power Multiple-Valued Current-Mode Integrated Circuit with Current-Source Control and Its Application
Takahiro Hanyu, Satoshi Kazama, Mitchitaka Kameyama

A new current-source control technique is proposed to design a low-power high-speed multiple-valued current-mode (MVCM) integrated circuit in a low supply voltage. The use of a differential logic circuit (DLC) with a pair of dual-rail inputs makes the input voltage swing small, which results in a high driving capability at a lower supply voltage, while having large static power dissipation. In the proposed DLC using switched current control, the static power dissipation is greatly reduced because current sources in non-active circuit blocks are switched off. In the current control, no additional transistors are required to control the current sources because a current-control circuit is already used in the threshold detector. As a typical example of arithmetic circuits, a new 1.5V- supply 54 x 54 -bit multiplier based on a 0.8 um standard CMOS technology is also designed. Its performance is about 1.3 times faster than that of a binary fastest multiplier under the normalized power dissipation.

6C.2 Analysis and Design of Multiple-Bit High-Order E-^ Modulator
Hao-Chiao Hong, Bin-Hong Lin, Cheng-Wen Wu

The high-order (E-^) modulator is an appropriate approach for high-bandwidth, high-resolution A/D conversion. However, non-ideal effects such as the finite op-amp gain and the capacitor mismatch have great impacts on its performance at a low oversampling ratio. To achieve greater performance under the inevitable non-ideal effects, we explore several multiple-bit schemes, based on our CIQE high-order (E-^) architecture, to remove the non-ideal deterioration. Design rules of these multiple-bit schemes are developed and verified by extensive simulations.

6C.3 Optimal Loop Bandwidth Design for Low Noise PLL Applications
Kyoohyun Lim, Seunghee Choi, BeomSup Kim

This paper presents a salient method to find an optimal bandwidth for low noise phase-locked loop (PLL) applications by analyzing a discrete-time model of charge-pump PLLs based on ring oscillator VCOs. The analysis shows that the timing jitter of the PLL system depends on the jitter in the ring oscillator and an accumulation factor which isinversely proportional to the bandwidth of the PLL. Further analysis shows that the timing jitter of the PLL system, however, proportionally depends on the band-width of the PLL when an external jitter source is applied. The analysis of the PLL timing jitter of both cases gives the clue to the optimal bandwidth design for low noise PLL applications. Simulation results using a C-language PLL model are compared with the theoretical predictions and show good agreement.

6C.4 +-1.5V CMOS Four-Quadrant Multiplier
Simon C. Li

A low-voltage CMOS four-quadrant analogue multiplier using two NMOS operated in the triode region with modified bi-directional regulated cascade (RGC) structure is presented. The circuit can operate from a supply voltage of +-1.5 V. For a differential input voltage range up to +-0.8V, this circuit has kept nonlinearity below 0.9 % and total harmonic distortion less than 1%. Th e-3dB bandwidth of this multiplier is 15MHz. Th echip was fabricated in Taiwan Semiconductor Manufacturing Corporation (TSMC) 0.8um Single-Poly-Double-Metal (SPDM) N-well process. The chip dissipates 24.4mW and occupies 251x653um2 active area.


Session 6D : (Panel Discussion)Collaboration between University and Industry

Moderator : Tokinori Kozawa (STARC)
Chair : Tokinori Kozawa (STARC, Japan)

Panelist :

  • Ralph Cavin (SRC,USA)
  • Paul Six (IMEC, Belgium)
  • Taro Okabe (STARC, Japan)
  • Akihiko Morino (NEC,Japan)
  • Hiroto Yasuura (Kyushu Univ., Japan)
  • Youn-Long Lin (Ting-Hua Univ., Taiwan)


Keynote Address III:

Some Thoughts on Process Retargettable and Reusable IC Intellectual Property
Neil Weste

With the observation that today's printed wire board combining a myriad of different vendors' ICs is tomorrow's "system on a chip," many companies are interested in methods of combining disparate designs onto one piece of silicon. Systems companies are interested because they want to do it. Silicon managers are interested because they sell silicon. CAD companies are interested because it looks like a future revenue stream for them. And finally, garage shop companies are interested because there looks like a demand for portable integrated circuit intellectual property.

But how will this happen technically?
From a business perspective?
At what level is portability required?
Are HDLs the answer?
What about mixed signal?

This talk will attempt to address some of these and related questions by examining some activities around the world and also drawing on the experience of the speaker at promoting technology independent layout designs for over 10 years.


Session 7A : High-Level Synthesis Techniques for FPGA's and Regular Arrays

7A.1 ChipEst-FPGA: A Tool for Chip Level Area and Timing Estimation of Lookup Table Based FPGAs for High Level Applications
Min Xu, Fadi Kurdahi

The importance of efficient area and timing estimation techniques for hierarchical design methodology is well-established in High-Level Synthesis (HLS), since the estimation allows more realistic exploration of the design space, and hierarchical design methodology matches well with HLS paradigm. In this paper, we present ChipEst-FPGA, a chip level estimator for designs implemented using a hierarchical design methodology for Lookup Table Based FPGAs. In FPGAs, the wire delay may contribute to a significant portion of the overall design delay. ChipEst-FPGA uses a realistic model which takes the component area/delay as well as wiring effects into account.We tested our ChipEst-FPGA on several benchmarks and the results show that we can get accurate area and timing estimates efficiently.

7A.2 Bit-Serial Pipeline Synthesis and Layout for Large-Scale Configurable Systems
Tsuyoshi Isshiki, Wayne Wei-Ming Dai, Hiroaki Kunieda

In this paper, we present our datapath synthesis and layout tools which are targeted toward large-scale configurable systems with the logic capacity of up to millions of gates which consists of an easy design entry using C++, customized bit-serial circuit library for SRAM-based FPGAs, bit-serial pipeline circuit generator, and a circuit partitioner.

7A.3 An Optimal Scheduling Method for Parallel Processing System of Array Architecture
Kazuhito Ito, Tadashi Iwata, Hiroaki Kunieda

In high-level synthesis for digital signal processing systems of array structured architecture, one of the most important procedures is the scheduling. By taking into account the allocation of operations to processors, it is mandatory to take into account the communication time between processors. In this paper we propose a scheduling method which derives an optimal schedule achieving the minimum iteration period and latency for a given signal processing algorithm on the specified processor array. The scheduling problem is modeled as an integer linear programming and solved by an ILP solver. Furthermore, we improve the scheduling method so that it can be applied to large scale signal processing algorithms without degrading the schedule optimality.


Session 7B : Decision Diagrams and Their Applications

7B.1 AQUILA: An Equivalence Verifier for Large Sequential Circuits
Shi-Yu Huang, Kwang-Ting Cheng, Kuang-Chien Chen

In this paper, we address the problem of verifying the equivalence of two sequential circuits. A hybrid approach that combines the advantages of BDD-based and ATPG-based approaches is introduced. Furthermore, we incorporate a technique called partial justification to explore the sequential similarity between the two circuits under verification to speed up the verification process. Compared with flexisting approaches, our method is much less vulnerable to the memory explosion problem, and therefore can handle larger designs. The experimental results show that in a few minutes of CPU time, our tool can verify the sequential equivalence of an intensively optimized benchmark circuit with hundreds of flip-flops against its original version.

7B.2 On the Representational Power of Bit-Level and Word-Level Decision Diagrams
Bernd Becker, Rolf Drechsler, Reinhard Enders

Several types of Decision Diagrams (DDs) have have been proposed in the area of Computer Aided Design (CAD), among them being bit-level DDs like OBDDs, OFDDs and OKFDDs. While the aforementioned types of DDs are suitable for representing Boolean functions at the bit-level and have proved useful for a lot of applications in CAD, recently DDs to represent integer-valued functions, like MTBDDs (=ADDs), EVBDDs, FEVBDDs, (*)BMDs, HDDs (=KBMDs), and K*BMDs, attract more and more interest, e.g., using *BMDs it was for the first time possible to verify multipliers of bit length up to n = 256 . In this paper we clarify the representational power of these DD classes. Several (inclusion) relations and (exponential) gaps between specific classes differing in the availability of additive and/or multiplicative edge weights and in the choice of decomposition types are shown. It turns out for example, that K(*)BMDs, a generalization of OKFDDs to the word-level, also 'include' OBDDs, MTBDDs and (*)BMDs. On the other hand, it is demonstrated that a restriction of the K(*)BMD concept to subclasses, such as OBDDs, MTBDDs, (*)BMDs as well, results in families of functions which lose their efficient representation.

7B.3 Learning Heuristics for OKFDD Minimization by Evolutionary Algorithms
Nicole Göckel, Rolf Drechsler, Bernd Becker

Ordered Kronecker Functional Decision Diagrams (OKFDDs) are a data structure for efficient representation and manipulation of Boolean functions. OKFDDs are very sensitive to the chosen variable ordering and the decomposition type list, i.e. the size may vary from linear to exponential. In this paper we present an Evolutionary Algorithm (EA) that learns good heuristics for OKFDD minimization starting from a given set of basic operations. The difference to other previous approaches to OKFDD minimization is that the EA does not solve the problem directly. Rather, it develops strategies for solving the problem. To demonstrate the efficiency of our approach experimental results are given. The newly developed heuristics combine high quality results with reasonable time overhead.

7B.4 On Properties of Kleene TDDs
Yukihiro Iguchi, Tsutomu Sasao, Munehiro Matsuura

Three types of ternary decision diagrams (TDDs) are considered: AND_TDDs, EXOR_TDDs, and Kleene_TDDs. Kleene_TDDs are useful for logic simulation in the presence of unknown inputs. Let N(BDD : f), N(AND TDD : f), and N(EXOR TDD : f) be the number of non-terminal nodes in the BDD, the AND_DD, and the EXOR_TDD for f, respectively. Let N(Kleene_TDD : F) be the number of non-terminal nodes in the Kleene_TDD for F, where F is the Kleenean ternary function corresponding to f. Then N(BDD : f) <= N(TDD : f). For parity functions, N(BDD : f) =N(AND_TDD: f) =N(EXOR_TDD: f) =N(Kleene TDD : F). For unate functions, N(BDD : f) = N(AND_TDD : f). The sizes of Kleene TDDs are O(3n/n), and O(n3) for arbitrary functions, and symmetric functions, respectively. There exist a 2n-variable function, where Kleene TDDs require O(n) nodes with the best order, while O(3n) nodes in the worst order.


Session 7C : Circuit Analysis and Simulation

7C.1 A Time-Domain Method for Numerical Noise Analysis of Oscillators
Makiko Okumura, Hiroshi Tanimoto

A numerical noise analysis method for oscillators is proposed. Noise sources are usually small and can be considered as perturbations to a large amplitude oscillation. Transfer functions from each noise source to the oscillator output can be calculated by modeling the oscillator as a linear periodic time-varying circuit. The proposed method is a time domain method and can be applied to strongly non-linear circuits. Thermal noise, shot noise and flicker noise are considered as noise sources. Error in the time domain method is also discussed.

7C.2 A New Linear-Time Harmonic Balance Algorithm for Cyclostationary Noise Analysis in RF Circuits
J.S. Roychowdhury, Peter Feldmann

A new technique is presented for computing noise in nonlinear circuits. The method is based on a formulation that uses harmonic power spectral densities (HPSDs), using which a block-structured matrix relation between the second-order statistics of noise within a circuit is derived. The HPSD formulation is used to devise a harmonic-balance-based noise algorithm that requires O(nN log N ) time and O(nN) memory, where n represents circuit size and N the number of harmonics of the large-signal steady state. The method treats device noise sources with arbitrarily shaped PSDs (including thermal, shot and flicker noises), handles noise input correlations and computes correlations between different outputs. The HPSD formulation is also used to establish the non-intuituive result that bandpass filtering of cyclostationary noise can result in stationary noise. The new technique is illustrated using an example that exhibits noise folding and interaction between harmonic PSD components. The results are validated against Monte-Carlo simulations. The noise performance of a large industrial integrated RF circuit (with > 300 nodes) is also analyzed in less than 2 hours using the new method.

7C.3 Enhancement of Parallelism for Tearing-based Circuit Simulation
Koutaro Hachiya, Toshiyuki Saito, Toshiyuki Nakata, Norio Tanabe

A new circuit simulation system is presented with techniques 'Subcircuit Balancing with Estimated Update operation count'(SBEU) and 'Asynchronous Distributed Row-based interconnection parallelization'(A-DR). SBEU estimates Gaussian elimination cost of each subcircuit by counting number of update operations to achieve balanced circuit partitioning. A-DR makes it possible to overlap numerical operations and interprocessor communications in parallel Gaussian elimination of interconnection equations. On a 16-PE distributed memory parallel machine, an experimental simulation shows 9.9 times speedup over 1PE and distribution of the time consumed for each subcircuit is within 26% deviation from the median.


Session 7D : Tutorial

Design and Test of Processor-Core Based Sytems
Peter Marwedel

This tutorial responds to the rapidly increasing use of various cores for implementing systems-on-a-chip. It specifically focusses on processor cores. We will give some examples of cores, including DSP cores and application-specific instruction-set processors (ASIPs). We will mention market trends for these components, and we will touch design procedures, in particular the use compilers. Finally, we will discuss the problem of testing core-based designs. Existing solutions include boundaryscan, embedded in-circuit emulation (ICE), the use of processor resources for stimuli/response compaction and self-test programs.


Session 8A : Estimation from High-Level/RTL Descriptions

8A.1 Architecture Evaluation Based on the Datapath Structure and Parallel Constraint
Masayuki Yamaguchi, Akihisa Yamada, Toshihiro Nakaoka, Takashi Kambe

This paper presents a novel way of evaluating architecture of embedded custom DSPs which helps designers optimizing the datapath configuration and the instruction set. Given a datapath structure, it evaluates the performance in terms of an estimated number of steps to execute the target program on the datapath. A concept of 'parallel constraint' is newly introduced, which enables evaluation of the impact of instruction format design on the performance without explicitly specifying the instruction format. The number of execution steps is estimated by a combination of static analysis and dynamic analysis. It enables fast and precise estimation of actual performance in the early design stage. We show some experimental results on an actual signal processor to demonstrate the accuracy of estimation and the usefulness of this method in architecture design.

8A.2 A Constructive Method for Data Path Area Estimation During High-Level VLSI Synthesis
V. Natesan, Anurag Gupta, Srinivas Katkoori, Dinesh Bhatia, Ranga Vemuri

In this paper we present a fast and computationally efficient deterministic method for estimating the area of a Register Transfer Level datapath obtained during high level VLSI synthesis. The estimation makes use of a RT level netlist along with a pre-synthesized library of RT level components. The layout area is estimated using a quadratic programming based framework to get a quick module allocation and generating a topological floorplan which is then followed by heuristic algorithms for mapping RTL modules and their interconnections on a standard cell based layout design style. Experiments on a suite of benchmark examples show promising results with reliable accuracy.

8A.3 RT Level Power Analysis
Jianwen Zhu, Poonam Agrawal, Daniel D. Gajski

Elevating power estimation to architectural and behavioral level is essential for design exploration beyond logic level. In contrast with purely statistical approach, an analytical model is presented to estimate the power consumption in datapath and controller for a given RT level design. Experimental result shows that order of magnitude speed-up over low level tools as well as satisfactory accuracy can be achieved. This work can also serve as the basis for behavioral level estimation tool.

8A.4 Statistical Design of Macro-models For RT-level Power Evaluation
Qing Wu, Chihshun Ding, Chengtah Hsieh, Massoud Pedram

This paper introduces the notion of cycle-accurate macro-models for RT-level power evaluation. These macro-models provide us with the capability to estimate the circuit power dissipation cycle by cycle at RT-level without the need to invoke low level simulations. The statistical framework allows us to compute the error interval for the predicted value from the user specified confidence level. The proposed macro-model generation strategy has been applied to a number of RT-level blocks and detailed results and comparisons are provided.


Session 8B : Logic Synthesis and Modeling

8B.1 AND/OR Reasoning Graphs for Determining Prime Implicants in Multi-Level Combinational Networks
Dominik Stoffel, Wolfgang Kunz, Stefan Gerber

This paper presents a technique to determine prime implicants in multi-level combinational networks. The method is based on a graph representation of Boolean functions called AND/OR reasoning graphs. This representation follows from a search strategy to solve the satisfiability problem that is radically different from conventional search for this purpose (such as exhaustive simulation, backtracking, BDDs). The paper shows how to build AND/OR reasoning graphs for arbitrary combinational circuits and proves basic theoretical properties of the graphs. It will be demonstrated that AND/OR reasoning graphs allow us to naturally extend basic notions of two-level switching circuit theory to multi-level circuits. In particular, the notions of prime implicants and permissible prime implicants are defined for multi-level circuits and it is proved that AND/OR reasoning graphs represent all these implicants. Experimental results are shown for PLA factorization.

8B.2 Efficient Synthesis of AND/XOR Networks
Yibin Ye, Kaushik Roy

A new graph-based synthesis method for general Exclusive Sum-of-Product forms (ESOP) is presented in this paper. Previous research has largely concentrated on a class of ESOP's, the Canonical Restricted Fixed/Mixed Polarity Reed-Muller form, also known as Generalized Reed-Muller (GRM) form. However, for many functions, the minimum GRM can be much worse than the ESOP. We have defined a Shared Multiple Rooted XOR-based Decomposition Diagram (XORDD) to represent functions with multiple outputs. By iteratively applying transformations and reductions, we obtain a compact XORDD which gives a minimized ESOP. Our method can synthesize larger circuits than previously possible. The compact ESOP representation provides a form that is easier to synthesize for XOR heavy multi-level circuit, such as arithmetic functions. The method successfully minimized large functions with multiple outputs. Results are also compared to the minimized SOP's obtained from ESPRESSO. Experimental results show that for many circuits ESOP's have considerably more compact form than SOP's.

8B.3 An Optimization of AND-OR-EXOR Three-level Networks
Debatosh Debnath, Tsutomu Sasao

In this paper, we present a design method for AND-OR-EXOR three-level networks, where a single two-input EXOR gate is used. The network realizes an exclusive-OR of two sum-of-products expressions (EX-SOP), where the two sum-of-products expressions (SOP) cannot share products. The problem is to minimize the total number of product in the two SOPs. We introduced the u-equivalence of logic functions to develop minimization algorithms for EX-SOPs with up to five variables. We minimized all the representative functions of NP-equivalence classes for up to five variables and found that five-variable functions require up to 9 products in minimum EX-SOPs. For n-variable functions, minimum EX-SOPs require at most 9 x 2 (n -5) (n >= 6) products. This upper bound is smaller than 2 (n-1), the upper bound for the conventional sum-of-products expressions. Index Terms - Three-level network, AND-EXOR, logic minimization, spectral method, NP-equivalence, u-equivalence, coordinate representations, complexity.

8B.4 A New Description of CMOS Circuits at Switch-Level
Xunwei Wu, Massoud Pedram

After analyzing the limitations of the traditional description of CMOS circuits at the gate level, this paper introduces the notions of switching and signal variables for describing the switching states of MOS transistors and signals in CMOS circuits, respectively. Two connection operations for describing the interaction between MOS transistors and signals and a new description for CMOS circuits at the switch level are presented. This new description can be used to express the functional relationship between inputs and the output at the switch level. It can also be used to describe the circuit structure composed of various transistor switches. Based on the new description, the design of CMOS circuits at switch level can be efficiently realized. It is expected that this will provide a basis for techniques for analyzing and optimizing delay and power dissipation of CMOS circuits.


Session 8C : Module Generation and FPGA Layout

8C.1 A 2-Dimensional Transistor Placement for Cell Synthesis
Shunji Saika, Masahiro Fukui, Noriko Shinomiya, Toshira Akino

This paper proposes a transistor placement algorithms to generate standard cell layout in a two-dimensional placement style that is not restricted to row-based transistor placement. The cost function constructed for transistor placement optimization is able to optimize wirings directly and diffusion sharing indirectly but sufficiently. This transisitor placement algorithm, applied to several standard cells, has demonstrated the capability to generate a nearly optimal two-dimensional placement that is comparable to manually designed placement.

8C.2 DP-Gen: A Datapath Generator for Multiple-FPGA Applications
Wen-Jong Fang, Allen C.-H. Wu, Ti-Yen Yen, Tsair-Chin Lin

This paper presents a datapath generator for multiple-FPGA applications. This datapath generator is able to generate complex datapath designs described in HDLs. Our datapath generator uses a novel synthesis and partitioning approach which bridges the gap between RTL/logic synthesis and physical partitioning to fully exploit design structural hierarchy for multiple-FPGA implementations. Experiments on a number of benchmarking circuits and industry designs demonstrate that the generator can effectively and efficiently produce high-density multiple-FPGA datapaths.

8C.3 A Simultaneous Placement and Global Routing Algorithm with Path Length Constraints for Transport-Processing FPGAs
Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki

In layout design of transport-processing FPGAs, it is required that not only routing congestion kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. Each bipartitioning procedure consists of three phases: (0) estimation of path lengths, (1) bipartitioning of a set of terminals, and (2) bipartitioning of a set of LUTs. After searching the paths with tighter path length constraints by estimating path lengths in (0), (1) and (2) are executed so that their path lengths are reduced with higher priority and thus path length constraints are not violated. The algorithm has been implemented and applied to transport-processing circuits compared with conventional approaches. The results demonstrate that the algorithm resolves path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by anaverage of 23%.

8C.4 Not Necessarily More Switches More Routability
Yu-Liang Wu, Douglas Chang, Malgorzata Marek-Sadowska, Shuji Tsukiyama

It has been observed experimentally that the mapping of global to detailed routing in conventional FPGA routing architecture (2D array) yields unpredictable results. In [8,10,13], a different class of FPGA structures called Greedy Routing Architectures (GRAs), where a locally optimal switch box routing can be extended to an optimal entire chip routing, were investigated. It was shown that GRAs have good mapping properties. An H-tree GRA [10] with W2+2W switches per switch box (SpSB) and a 2D array GRA [13] with 4W2+2W SpSB were proposed (W is the number of tracks in each switch box). Here, we continue this work by introducing an H-tree GRA with W2/2+2W SpSB and a 2D array GRA with 3.5W2+2W SpSB. These new GRAs have the same good mapping properties but use fewer switches. We also show a class of FPGA architectures in which the mapping problem remains NP-complete, even with 6(W-1)2 + 6W2 SpSB. This is close to the maximum number of SpSB which is 6W2.


Session 8D : (Panel Discussion) Design Standardization for 2001

The SEMATECH Chip Hierarchical Design System - new paradigms for deep submicron design
Invited Talk: Greg Ledenbach

Panel : EDA Standardization including CHDS
Moderator : Hitoshi Yoshizawa, NEC
Organizer: Hisakazu Edamatsu, Matsushita Electric Industrial

Panel: The Role of Design Standardization in future complex design


Session 9A : Aspects of Hardware and Software Synthesis

9A.1 On the Control-subroutine Implementation of Subprogram Synthesis
Cheng-Tsung Hwang, Hsiao-Cheng Weng, Yu-Chin Hsu, Mike Tien-Chien Lee

In this paper, synthesis of VHDL procedures and functions is studied from the VHDL transformation point of view. Among all the proposed methods, inline expansion and module can be integrated into a VHDL synthesis system by a source-to-source transformation, while a control-subroutine approach requires additional work at the higher level synthesis phases before it can link to a logic synthesis tool. A lot of optimization possibility is explored during the process. We also present various generations of the control-subroutine approach, including the synthesis of recursive programs, a behavioral partitioning methodology that divides the controller into several communicating state machines, and a methodology that mixes the execution of subprograms. Our study shows that the combination of these approaches is flexible to be adapted to various applications in an efficient way.

9A.2 A Procedure for Software Synthesis from VHDL Models
Venkatram Krishnaswamy, Rajesh K. Gupta, Prithviraj Banerjee

In this paper we address the problem of software generation from a Hardware Description Language (HDL). In particular, we examine the issues involved in translating VHDL into C or C++ for use in system simulation and cosynthesis. Because of the concurrency supported by VHDL, and a notion of timing behavior, care must be taken to ensure behavioral correctness of the generated software. The issues involved will be shown to be different in each of the application areas. The ideas set forth here have been used in an efficient VHDL simulator designed to execute on multi-processor systems. Results are presented for simulation on uniprocessor as well as multiprocessor systems.

9A.3 Built-in Chaining: Introducing Complex Components into Architectural Synthesis
Peter Marwedel, Birger Landwehr, Rainer Dömer

In this paper, we extend the set of library components which are usually considered in architectural synthesis by components with built-in chaining. For such components, the result of some internally computed arithmetic function is made available as an argument to some other function through a local connection. These components can be used to implement chaining in a data-path in a single component. Components with built-in chaining are combinatorial circuits. They correspond to 'complex gates' in logic synthesis. If compared to implementations with several components, components with built-in chaining usually provide a denser layout, reduced power consumption, and a shorter delay time. Multiplier/accumulators are the most prominent example of such components. Such components require new approaches for library mapping in architectural synthesis. In this paper, we describe an IP-based approach taken in our OSCAR synthesis system.


Session 9B : Sequential Synthesis

9B.1 BDD-based Logic Partitioning for Sequential Circuits
Ming-Ter Kuo, Yifeng Wang, Chung-Kuan Cheng, Masahiro Fujita

This paper presents a BDD-based approach to perform logic partitioning for sequential circuits. We use a sequential machine to model a circuit and represent the machine by its transition relation. A heuristic algorithm based on the BDD representation of the transition relation is proposed to partition the sequential machine with minimum number of input/output pins. Using BDDs and their operations, we have developed an efficient method to iteratively improve a partition. Experimental results show that our sequential logic partitioning algorithm significantly outperforms partitioning algorithms at the netlist level.

9B.2 Cube-Embedding Based State Encoding for Low Power Design
De-Sheng Chen, Majid Sarrafzadeh

In this paper we consider the problem of minimizing power consumption of a sequential circuit using low power state encoding. One of the previously published results is based on recursive matching. In general, a matched pair can be considered as a 1-cube being embedded in a hypercube. We generalize this idea of 1-cube embedding and propose a new encoding algorithm based on r-cube embedding. We then present an efficient 2-cube embedding based state encoding approach for low power design. It considers both Hamming distance and the complexity of logic function (by estimation). Experimental results show that this approach is competitive to other existed techniques.

9B.3 On Synthesis of Speed-Independent Circuits at STG Level
Kuan-Jen Lin, Chi-Wen Kuo

Synthesizing hazard-free asynchronous circuits directly at Signal Transition Graph (STG) level has been shown to need significantly less CPU time than approaches at the state graph [10, 16, 4]. However, all previous methods at STG level were based on sufficient conditions only. Hence, the synthesized circuit results generally are inferior, due to the incomplete transformation. In this paper, we present a new Characteristic Graph (CG) to encapsulate all feasible solutions of the original STG in reduced size, which compares favorably with the state graph approach. The requirements of speed independent circuits can then be completely transformed into the CG. Furthermore, we derive a necessary and sufficient condition for speed independent implementation based on a predefined general circuit model, which has not yet been reported. With CGs and this condition, we develop a heuristic synthesis algorithm which derives solutions similar to the state-graph approach while requiring significantly less CPU time.


Session 9C : Theoretical Aspects of Layout Design

9C.1 A Mapping from Sequence-Pair to Rectangular Dissection
Hiroshi Murata, Kunihiro Fujiyoshi, Tomomi Watanabe, Yoji Kajitani

A fundamental issue in floorplanning is in how to represent candidate solutions. Recently, a representation called sequence-pair is proposed [1]. Seq-pair is so general as to represent an area minimum placement, and also efficient because it does not represent any overlapping placement. However, seq-pair is not expressive enough since channels are not represented. This paper gives a mapping from seq-pair to rectangular dissection, which represents channels by line segments. Consequently, candidate arrangements of modules and channels are successfully represented with the generality and the efficiency inherited from the seq-pair.

9C.2 Solving Constrained Via Minimization by Compact Linear Programming
C.-J. Richard Shi

Via minimization is an important problem in integrated circuit layout and printed circuit board design. In this paper, a linear (non-integral) programming approach to two-layer constrained via minimization (CVM) is presented. The approach finds optimum solutions for routings containing no more than three way splits, and guarantees provably good results for the general case. Most importantly, the size of linear programming formulation is polynomial in terms of the size of the CVM problem. The significance of our work lies in three aspects. First, since linear programming can be solved in polynomial time, our work thus provides, for the first time, a mathematical programming solution with computational efficiency comparable to known combinatorial CVM algorithms. Second, our compact linear programming approach is provably good and natural for general CVM, while previous restricted CVM algorithms are difficult to be extended to the general case. Third, our approach can handle additional constraints in a unified manner, and thus provides an efficient method for performance-driven layer assignment. Our approach is based on some new graph-theoretic and polyhedron-combinatorial results presented in this paper on the structure of the CVM problem.

9C.3 Efficient Routability Checking for Global Wires in Planar Layouts
Naoyuki Iso, Yasushi Kawaguchi, Tomio Hirata

In VLSI and printed wiring board design, routing process usually consists of two stages: the global routing and the detailed routing. The routability checking is to decide whether the global wires can be transformed into the detailed ones or not. In this paper, we propose two graphs, the capacity checking graph and the initial flow graph, for the efficient routability checking.

9C.4 Topological Routing Path Search Algorithm with Incremental Routability Test
Toshiyuki Hama, Hiroaki Etoh

This article describes a topological routing path search algorithm embedded in our auto-router for printed circuit boards. The algorithm searches for a topological path that is guaranteed to be transformable into a physical wire satisfying design rules. We propose a method for incrementally verifying design rules during topological path search in a graph based on constrained Delaunay triangulation, and describe several improvements to the routing path search algorithm that remedy the overhead of the routability test and avoid combinatorial explosion.


Session 9D : Tutorial

VHDL Analog and Mixed-Signal Extensions Through Examples
Alain Vachoux

VHDL 1076.1 denotes an effort to enhance the IEEE VHDL 1076 standard with analog and mixed-signal capabilities. At the time of this writing (November 1995), the development of the extensions is nearing completion. An IEEE ballot to adopt the extensions as a new IEEE standard should happen in Q1 97. This tutorial provides an overview of the proposed extensions through a number of simple, but characteristic, examples.