SIGDA Super Compendium, FPGA 1994, Abstracts

<body> <h3><a href="../../../../../main_ss.htm">Main Page</a></BR> FPGA 1994 : [ <a href="fpgatoc.htm">Table of Contents</a> ] [ <a href="fpgasess.htm">Session Index</a> ] [ <a href="fpgaai.htm">Author Index</a> ]</H3> <hr> <h2>FPGA 1994 ABSTRACTS</h2> <b> Sessions: [<a href="#sess1">1</a>] [<a href="#poster1">Posters</a>] [<a href="#sess2">2</a>] [<a href="#poster2">Posters</a>] [<a href="#sess3">3</a>] [<a href="#poster3">Posters</a>] [<a href="#sess4">4</a>] [<a href="#poster4">Posters</a>] [<a href="#panel">Panel</a>] [<a href="#sess5">5</a>] [<a href="#poster5">Posters</a>] [<a href="#sess6">6</a>] [<a href="#poster6">Posters</a>] [<a href="#sess7">7</a>] [<a href="#poster7">Posters</a>] [<a href="#sess8">8</a>] [<a href="#poster8">Posters</a>]<p> </b> <hr> <h2>Session 1: <a NAME="sess1"> Field-Programmable Systems </a></h2> Chair: Chi-Ping Hsu, Quickturn Design Systems <dl> <dt> <a href="../../pdffiles/fpga94_1_1.pdf" name="1_1"><img src="../../../../../icons/pdficon.gif"></a> High-Speed Emulation of Communication Circuits on a Multiple-FPGA System <dd><i> K. Yamada, Hiroshi Nakada, A. Tsutsui, and Naohisa Ohta, NTT, Japan </i><p> This paper proposes a prototype emulation system based on Xilinx LCA for digital signal transport circuits. This system has a high speed interface, over 150Mbps, to emulate digital signal transport circuits at real-time operation speeds. This paper also introduces a design method that uses a multiple-FPGA system to emulate large scale digital circuits for high speed communication such as digital signal transport. Excellent emulator performance is achieved with the introduction of a circuit partitioning technique that utilizes the high-level partitioning described by the designer, and an automatic latch insertion technique. Experimental results show that the proposed method achieves the high-speed emulation of digital signal transport circuits at the rate of 156 Mbps. <p> <dt> <a href="../../pdffiles/fpga94_1_2.pdf" name="1_2"><img src="../../../../../icons/pdficon.gif"></a> The Virtual Wire Emulation System: A Gate-Efficient ASIC Prototyping Environment <dd><i> R. Tessier, J. Babb, M. Dahl, S. Hanono, A. Agarwal, MIT </i><p> FPGA-based ASIC development systems have become important tools in contemporary ASIC design. Existing systems exhibit low per-FPGA gate utilization (10 to 20 percent) due to limited inter-chip communication. Attempts at overcoming this limitation through the use of high dimensional interconnection topologies have met with limited success. This paper focuses on the prototype hardware and software interfaces that have been developed for an FPGA-based ASIC emulation system based on a new technique for overcoming inter-chip communication limitations. This tech- nique, referred to as virtual wires, intelligently multiplexes each physical FPGA wire among a number of logical wires. The Virtual Wires Emulation System exhibits high FPGA gate utilization while achieving system speeds comparable to existing logic emulators. A two-dimensional mesh interconnection topology of FPGAs is used to eliminate the cost of signal switching elements and to facilitate scalability. A system capable of emulating 20,000 gates has been constructed for under $3000. This system includes both prototype emulation hardware and a Virtual Wires netlist compiler. Currently, this system is being used as both a simulation accelerator for the LSI Logic LSIM and Cadence Verilog simulators and as an in-circuit emulator. Results from mapping netlists, such as the 18K gate Sparcle microprocessor [2], to this system for simulation acceleration and in-circuit emulation indicate that virtual wires can substantially increase FPGA utilization without adversely affecting emulation speed.<BR> Keywords: FPGA, logic emulation, prototyping, mesh topology, static routing, virtual wires. <P> <dt> <a href="../../pdffiles/fpga94_1_3.pdf" name="1_3"><img src="../../../../../icons/pdficon.gif"></a> A Fast-FPGA Prototyping System that Uses Inexpensive High-Performance FPIC <dd><i> D. Brasen, M. Slimane-Kadi, G. Saucier, Institut National Polytechnique de Grenoble/CSI </i><p> Field-Programmable Gate Arrays (FPGAs) are rapidly replacing low end Gate-Arrays because of their fast turn around time and low cost. However, system level Printed Circuit Board (PCB) concerns such as package partitioning and inter-package routing remain as difficult as ever. In this paper, we present a fast prototyping system that uses inexpensive high-performance Field Programmable Interconnect Chips (FPIC). The FPIC form a fixed crossbar that is 100% routable when used in cooperation with automatic package partitioning and crossbar routing algorithms. Regular PCB performance can be obtained in less time to market and less cost. <p> </dl> <hr> <h2><a NAME="poster1"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_1_p1.pdf" name="1_p1"><img src="../../../../../icons/pdficon.gif"></a> Springbok: A Rapid-Prototyping System for Board-Level Designs <dd><i> S. Hauck, G. Borriello, C. Ebeling, University of Washington </i><p> Multi-chip, board-level designs form a large portion of today's digital system designs. Unfortunately, traditional methods for debugging these designs, such as wire wrap, prototype fabrication and software simulation, are inadequate. Wire-wrap is complex and error-prone, prototype fabrication is time-consuming and it is difficult to isolate errors, and simulation is too slow fir full testing. Recent advances in FPGA-based systems offer hope for significant improvements in board-level prototyping. In this paper, we present Springbok, an integrated system for board-level prototyping that promises near-speed testing with the construction and modification simplicity of software simulation. It is comprised of an integrated software system for mapping designs to a flexible, extensible hardware implementation platform. This allows designs to take advantage of FPGA flexibility while using many of the complex chips that will implement the final design. <P> <dt> <a href="../../pdffiles/fpga94_1_p2.pdf" name="1_p2"><img src="../../../../../icons/pdficon.gif"></a> Improved Multiple-Way Circuit Partitioning Algorithms <dd><i> A. Dasdan, C. Aykanat, Bilkent University, Turkey </i><p> We propose two novel approaches to direct multiple-way circuit partitioning. Each approach includes a generic algorithm. In this paper, we discuss one of those algorithms in detail but we also give the basis of the other algorithm. The generic algorithm is used to generate three algorithms. The proposed algorithms outperform the other Kernighan-Lin based circuit partitioning algorithms on benchmark circuits. Since circuit partitioning plays an important role in the implementation of FPGA logic designs, the proposed algorithms are expected to provide great help in this process. <p> <dt> <a href="../../pdffiles/fpga94_1_p3.pdf" name="1_p3"><img src="../../../../../icons/pdficon.gif"></a> FPGA Package Partitioning for Performance <dd><i> D. Brasen, G. Saucier, Institut National Polytechnique de Grenoble/CSI </i><p> FPGA packages have maximum Size constraints much larger than the number of 10 pins. The resulting IO bottleneck during circuit partitioning means more required packages and more ordinary signal wires crossing. between the packages. This cuts more critical timing paths between packages and drastically decreases the circuit operational frequency. In this paper, a bottom-up circuit partitioning method with cone structures is presented. The solution can minimize the delay of critical paths that slow down the circuit without changing the number of partitioned packages. <p> </dl> <hr> <h2>Session 2: <a NAME="sess2"> FPGA Architecture </a></h2> Chair: Jonathan Rose, University of Totonto <dl> <dt> <a href="../../pdffiles/fpga94_2_1.pdf" name="2_1"><img src="../../../../../icons/pdficon.gif"></a> Routing Architecture for Hierarchical Field Programmable Gate Arrays <dd><i> A. Agarwal, D. Lewis, University of Toronto </i><p> This paper proposes an architecture that implements a hierarchical routing structure for FPGAs, called a hierarchical FPGA (HFPGA). The architectural parame- ters of this architecture are defined, and two switch matrix topologies for routing channels proposed. A set of new tools has been used to place and route several circuits on this architecture, with the goal of comparing the cost of HFPGAs to conventional symmetrical FPGAs. The results show that HFPGAs can implement circuits with fewer routing switches, and fewer switches in total, compared to symmetrical FGPAs, although they have the potential disadvantage that they may require more logic blocks due to coarser granularity. <p> <dt> <a href="../../pdffiles/fpga94_2_2.pdf" name="2_2"><img src="../../../../../icons/pdficon.gif"></a> An Effective Hardware/Software Solution for Fine Grained Architectures <dd><i> G.J. Jones, D.M. Wedgwood, Pilkington Microelectronics Ltd., England </i><p> This paper describes the development of a new fine grained sea-of-gates FPGA architecture. The architecture has been developed simultaneously with a proprietary set of autolayout tools and has been designed to be sympathetic towards these tools. New structures have been added that aid the autolayout process and the interaction of these structures with the software is explained. In addition the paper describes some of the problems traditionally associated with the fine grained approach and looks at how these have been addressed in this new architecture. <p> <dt> <a href="../../pdffiles/fpga94_2_3.pdf" name="2_3"><img src="../../../../../icons/pdficon.gif"></a> A Datapath Oriented Architectures for FPGAs <dd><i> D. Cherepacha, D. Lewis, University of Toronto </i><p> This paper describes a new Field-Programmable Gate Array (FPGA) architecture which reduces the density gap between FPGAs and Mask-Programmed Gate Arrays (MPGAs) for datapath oriented circuits. This is primarily achieved by operating on data as a number of four-bit slices. The interconnection network incorporates distinct sets of resources for routing control and data signals. These features reduce circuit area by sharing programming bits among four-bit slices, lowering the total number of storage cells required. Experimental results suggest that the implementation area of a design's datapath can be reduced by a factor of two through the use bit sharing. This paper discusses the requirements of logic blocks and routing structures that can be used to implement typical circuits containing a number of regularly structured datapaths of various sizes, as well as a small number of irregularities. <P> </dl> <hr> <h2><a NAME="poster2"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_2_p1.pdf" name="2_p1"><img src="../../../../../icons/pdficon.gif"></a> Reduced Look Up Tables with Increased Functionality <dd><i> K. Gudger </i><p> This paper proposes a method to reduce the size of 'n' input FPGA look up tables (LUTs) to less than 2n bits. The reduced LUT (RLUT) requires more inputs, which also increases the functionality of the LUT. The increase in the number of inputs is a trade-off against the smaller size of the RLUT. RLUTs for n = 4 should decrease total chip area while increasing logic capacity. The RLUT performance is equal to or better than standard LUTs. <p> <dt> <a href="../../pdffiles/fpga94_2_p2.pdf" name="2_p2"><img src="../../../../../icons/pdficon.gif"></a> Binary Decision Diagram Oriented FPGAs <dd><i> A. Stauffer, P. Marchal, Swiss Federal Institute of Technology </i><p> This paper investigates very fine-grained architectures for a new family of FPGAs and compares different VLSI solutions. The new architectures take advantage of Binary Decision Trees (BDTs) for logic synthesis and Binary Decision Diagrams (BDDs) for logic optimization. Implementation can further be improved by correct ordering of the decision variables in the vertices (OBDDs). The two very fine-grained building cells, proposed herewith, can automatically implement the OBDD structure of any given function. They realize a direct MUX/DMUX implementation of OBDDs. As the topology of BDDs reflects a cellular organization, this technique allows the suppression of the technology mapping phase needed with conventional FPGAs. Two examples of the PREP benchmarks are summarized in this paper. Performance and cost of the new architectures are then discussed as well as the benefit brought by the MUX/DMUX solution from a power consumption point of view. Finally we present open avenues and conclusions. <P> </dl> <hr> <h2>Session 3: <a NAME="sess3"> Technology Mapping I </a></h2> Chair: Jason Cong, UCLA <dl> <dt> <a href="../../pdffiles/fpga94_3_1.pdf" name="3_1"><img src="../../../../../icons/pdficon.gif"></a> Performance Driven Technology Mapping for Lookup-Table Based FPGAs Using the General Delay Model <dd><i> A. Mathur, C.L. Liu, University of Illinois, Urbana-Champaign </i><p> In this paper we present an efficient algorithm for performance driven technology mapping for table-look-up based FPGA architectures using the general delay model. Since our algorithm does not impose any restrictions on the allowed values of edge delays, it can use delay information generated by a layout phase to intelligently guide the technology mapping in an iterative loop. The algorithm uses a set of node weights to measure the criticality of the nodes in a boolean network. A biased depth first search is then used to generate a topological ordering of the nodes and this ordering is used to guide the clustering step. An important feature of our algorithm is that it exploits reconvergent paths in the network automatically. The technology mapped circuits produced by our algorithm were compared with those produced by other algorithms. In terms of delay, our algorithm produces results with delays that are 30% less than those produced by flowmap on the average, showing that minimizing the number of levels in the technology mapped circuit does not ensure a low delay realization. In terms of the number of logic blocks, our results are comparable to those produced by flowmap.<P> <dt> <a href="../../pdffiles/fpga94_3_2.pdf" name="3_2"><img src="../../../../../icons/pdficon.gif"></a> Technology Mapping for Heterogeneous FPGAs <dd><i> J. He, J. Rose, University of Toronto </i><p> Truly heterogenous FPGAs, those with two different kinds of logic block, don't exist in the commercial world in part because logic synthesis for them is difficult. The difficulty arises because FPGAs are pre-fabricated, and so the ratio of the number of each type of block is fixed, which requires a constraint on the mapping that is not present for either homogenous FPGAs or ASICs. This paper presents a general approach for technology mapping into heterogenous lookup-table based FPGAs. It is applied both at the boolean network node level and at a more global level across a collection of mapped sub-circuits. This latter portion of the algorithm is optimal, and is applicable to any heterogenous FPGA, not simply those based on lookup tables. In comparison to a reasonably obvious alternative approach (adjusting the output of an homogenous mapping algorithm) the new algorithm achieves, depending on the heterogenous architecture, an average improvement of up to 48% over a large set of benchmark circuits. <P> <dt> <a href="../../pdffiles/fpga94_3_3.pdf" name="3_3"><img src="../../../../../icons/pdficon.gif"></a> Functional Decision Diagrams for Technology Mapping to Lookup-Table FPGA <dd><i> E. Schubert, U. Kebschull, W. Rosenstiel, Universitat Tubingen, Germany </i><p> This paper addresses the problem of technology mapping to lookup-table based field-programmable gate arrays. It uses a different approach than others for the decomposition of the network into feasible nodes, the Davio expansion. To perform this it uses an efficient representation of the decomposed nodes, Functional Decision Diagrams (FDD). The influence of FDD optimisation methods to the mapped solution is shown. Certain methods for the covering phase into lookup tables were discussed, which are targeted to area, delay and mutability optimisation. <p> </dl> <hr> <h2><a NAME="poster3"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_3_p1.pdf" name="3_p1"><img src="../../../../../icons/pdficon.gif"></a> An Efficient Graph-Based Technology Mapping Algorithm for FPGAs Using Lookup Tables <dd><i> B. Kapoor, Texas Instruments, Dallas </i><p> A graph-based technology mapper for optimizing area in lookup table-based FPGA. designs has been described. It is based on an efficient circuit partitioning algorithm which has been developed with the goal of minimizing the number of lookup tables required to implement a given circuit. The partitioned circuit is then merged using a heuristic algorithm for the matching problem. These algorithms and heuristics are described in detail. Some results obtained on the ISCAS85 benchmark circuits are included. <P> <dt> <a href="../../pdffiles/fpga94_3_p2.pdf" name="3_p2"><img src="../../../../../icons/pdficon.gif"></a> Logic Module Independent Mapping for Table-Lookup FPGAs <dd><i> U. Weinmann, W. Rosenstiel, Forschungszetrum Informatik an der Universitat Karlsruhe </i><p> This paper presents an architecture independent method of technology mapping for table-lookup FPGAs. The internal architecture of logic blocks is a parameter for the mapping step. The first part of the paper describes a new mapping algorithm, which results were compared to existing mapping tools for the Xilinx 4000 series. In a second step the tool was used to examine different new architectures with respect to chip area and delay minimization. The mapping of those architectures showed an improved factor between gate equivalent and used chip area in comparison to commercially available architectures. <P> <dt> <a href="../../pdffiles/fpga94_3_p3.pdf" name="3_p3"><img src="../../../../../icons/pdficon.gif"></a> Use of Binary Decision Diagrams for FPGA Mapping <dd><i> T. Besson, V.V. Le, S. Tixier, S. Saucier, Instit National Polytechnique de Grenoble </i><p> This paper reports on mapping methods for FPGA working directly on Binary Decision Diagram and will be illustrated both in a multiplexor and LUT based FPGAs respectively Actel and Altera Flex products. The Actel mapper is presently included in the Actel design kit. <p> <dt> <a href="../../pdffiles/fpga94_3_p4.pdf" name="3_p4"><img src="../../../../../icons/pdficon.gif"></a> State Assignment Selection for FPGAs and CPLDs <dd><i> H. Belhadj, A. Fortas, G. Saucier, Instit National Polytechnique de Grenoble </i><p> This paper deals with the selection of state assignment options for finite state machines implemented in various targets. This choice depends on the intrinsic controller complexity, the partitioning strategy, the area/delay constraints and the target technologies. Industrial controllers as well as a sub-set of MCNC benchmarks have been synthesized and analyzed after place and route to define the selection strategy. <P> <dt> <a href="../../pdffiles/fpga94_3_p5.pdf" name="3_p5"><img src="../../../../../icons/pdficon.gif"></a> On the Lookup-Table Minimization Problem for FPGA Technology Mapping <dd><i> A. Farrahi, M. Sarrafzadeh, Northwestern University </i><p> One of the main objectives in the process of mapping a digital circuit onto a LUT-based FPGA structure is minimizing the total number of lookup-tables needed to implement the circuit. This will increase the size of the circuit that can be implemented using the available FPGA structure. In this paper we show that even restricted cases of the lookup-table minimization for FPGA technology mapping are NP-complete (even when K is a small constant), and that it can be solved optimally for all values of K on a tree input in O(min{nlogn, nK}) time where n is the number of nodes in the network and K is the input capacity of the LUTs. We study this because most practical digital circuits are "close to tree" in topology. standpoint. Furthermore, our algorithm for the trees forms the basis of our algorithm for general circuits. Based on our algorithm for trees we present a polynomial time heuristic algorithm for general Boolean networks. Experimental results confirm substantial decrease on the number of LUTs on a number of MCNC logic synthesis benchmarks. We obtain 10% to 80% improvement on the number of LUTs compared to the previous algorithms (even though we allow very limited operations, e.g., we do not exploit Boolean properties of the network).<BR> Key words: Field Programmable Gate Array, Technology Mapping, Lookup-Table Mini- mization, NP-completeness, Directed Acyclic Graphs. <P> </dl> <hr> <h2>Session 4: <a NAME="sess4"> Field-Programmable Analog Arrays and Yield-Enhancement </a></h2> Chair: Bahram Ahanin, Altera <dl> <dt> <a href="../../pdffiles/fpga94_4_1.pdf" name="4_1"><img src="../../../../../icons/pdficon.gif"></a> MOS Transconductor-Based Field-Programmable Analog Array <dd><i> G. Gulak, E. Lee, University of Toronto </i><p> An area-efficient technique for implementation of a field-programmable analog array is proposed. The design consists of configurable analog blocks (CABs) that contain op-amps and/or passive elements. The connections between CABs are realized using four-transistor MOS transconductors. The conductance is controlled by varying the gate voltages as defined by either multi-valued memories or external/internal signals. A 1.2 µm CMOS IC with four CABs and 48 transconductors has been fabricated. Several analog circuits including: filter biquad, VCO, precision rectifier, quadrature oscillator, four-quadrant multiplier, square root circuit and sample-and-hold circuit have been successfully prototyped and tested on this IC. <P> <dt> <a href="../../pdffiles/fpga94_4_2.pdf" name="4_2"><img src="../../../../../icons/pdficon.gif"></a> High Speed Field Programmable Analog Array Architecture Design <dd><i> E. Pierzchala, M. Perkowski, Portland State University </i><p> In this paper we address the problem of the best architecture for high speed (high-frequency) field programmable analog array (FPAA). An architecture for current-mode FPAA is presented, which is suitable for a wide class of circuits, including continuous-time filters, differential and algebraic equation solvers, dynamic systems, and fuzzy and multi-valued logic circuits. Examples demonstrate an implementation of a ladder continuous-time filter, an analog rank filter, a circuit for tracking a product of two matrices, a linear equations solver, and a circuit for solving a linear programming problem by the method of steepest descent. The device achieves high performance through the reduced use of global signal interconnections, in favor of local ones. The FPAA is being implemented in a bipolar transistor array technology. <P> <dt> <a href="../../pdffiles/fpga94_4_3.pdf" name="4_3"><img src="../../../../../icons/pdficon.gif"></a> A Novel Approach to Defect Tolerant Design for SRAM Based FPGAs <dd><i> J.L. Kelly, P.A. Ivey, University of Sheffield, England </i><p> A new approach to redundancy for Field Programmable Gate Arrays (FPGAs) is proposed in this paper. The scheme is a novel reconfiguration network drawing on work in the design of wafer scale SIMD processors. The scheme calls for modifications to the wiring segments as well as the inclusion of a spare element at the end of each row. The proposed approach has removed the need for poly (anti) fuses by use of a dedicated on-chip reconfiguration processor. We show that using such a technique it should be possible to construct arrays up to 10 times larger than commercial devices with no detrimental effect on yield. The scheme has been designed to be applicable to any SRAM based FPGA and to keep full software compatibility with existing configuration tools. <P> </dl> <hr> <h2><a NAME="poster4"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_4_p1.pdf" name="4_p1"><img src="../../../../../icons/pdficon.gif"></a> FPGAs with Self Repair Capabilities <dd><i> S. Durand, C. Piguet, Swiss Fed. Inst. of Tech. </i><p> In this paper, we present a new FPGA based on a multiplexer cell with selfdiagnosis and selfrepair capabilities. It is shown that the regular structure of FPGAs is particularly well suited for reconfigurable circuits. Error detection and reconfiguration mechanisms using spare cells are also presented. <P> </dl> <hr> <h2><a NAME="panel"> Panel: Open, Closed Untouched, Irrelevant and Unprofitable: Questions Worth Pursuing in FPGAs </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_panel.pdf"><img src="../../../../../icons/pdficon.gif"></a> </dl> <hr> <h2>Session 5: <a NAME="sess5"> FPGA-Based Computing Architectures and Paradigms </a></h2> Chair: Duncan Buell, Supercomputing Research Centre <dl> <dt> <a href="../../pdffiles/fpga94_5_1.pdf" name="5_1"><img src="../../../../../icons/pdficon.gif"></a> Mesh Routing Topologies for FPGA Arrays <dd><i> S. Hauck, G. Borriello, C. Ebeling, University of Washington </i><p> There is currently great interest in' using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect on the area and delay of the resulting system. Tree, Bipartite Graph, and Mesh interconnection schemes have all been proposed for use in FPGA-based systems. In this paper we examine Mesh interconnection schemes, and propose several constructs for more efficient topologies. These reduce inter-chip routing costs by more than 50% over the basic 4-way Mesh. <P> <dt> <a href="../../pdffiles/fpga94_5_2.pdf" name="5_2"><img src="../../../../../icons/pdficon.gif"></a> Programming the Hawaii Parallel Computer <dd><i> A. Lew, R. Halverson Jr., University of Hawaii </i><p> A new application of field programmable gate-ways is featured in the prototype of the University of Hawaii parallel computer (HPC). User programs are compiled and then executed partly or completely in one or more field programmable gate-arrays (FPGAs). Some compile-for-FPGA systems have yet to effectively implement full high-level language loop constructs. In this paper we show how the conditional logic used for generating jump addresses can be consolidated into one jump address calculation and computed in a FPGA. In addition, we show how this coprocessing can be easily added to reduce the number of memory transactions and cycles it takes to execute a program. An overview of the HPC shared memory architecture is presented. A shortest path programming example begins by describing the steps for automatically generating the FPGA source code and concludes with the results of a load-store analysis comparing execution with and without the FPGA. <P> <dt> <a href="../../pdffiles/fpga94_5_3.pdf" name="5_3"><img src="../../../../../icons/pdficon.gif"></a> Unifying FPGAs and SIMD Arrays <dd><i> M. Bolotski, A. Dellon, T. Knight Jr., MIT </i><p> Field-Programmable Gate Arrays (FPGAs) and Single-Instruction Multiple-Data (SIMD) processing arrays share many architectural features. In both architectures, an array of simple, fine-grained logic elements is employed to provide high-speed, customisable, bit-wise computation. In this paper, we present a unified computational array model which encompasses both FPGAs and SIMD arrays. Within this framework, we examine the differences and similarities between these array structures and touch upon techniques and lessons which can be transfered between the architectures. The unified model also exposes promising prospects for hybrid array architectures. We introduce the Dynamically Programmable Gate Array (DPGA) which combines the best features from FPGAs and SIMD arrays into a single array architecture. <P> </dl> <hr> <h2><a NAME="poster5"> Posters : Placement and Routing </a></h2> Chair: David Marple, Crosspoint <dl> <dt> <a href="../../pdffiles/fpga94_5_p1.pdf" name="5_p1"><img src="../../../../../icons/pdficon.gif"></a> Algorithms for FPGA Switch Module Routing <dd><i> M.D.F. Wong, S. Thakur, S. Muthukrishnan, The University of Texas at Austin </i><p> We consider a switch module routing problem for FPGAs. This problem was introduced in [7] for evaluating switch module designs. Its solution is also useful for FPGA routabilty analysis. Only an approximate algorithm was previously known for this problem. In this paper, we present an optimal algorithm for the problem based on integer linear programming. Experimental results consistently show that our algorithm is very efficient in practical sized switch modules. We also identify interesting special cases of the problem which can be solved in polynomial time. <P> <dt> <a href="../../pdffiles/fpga94_5_p2.pdf" name="5_p2"><img src="../../../../../icons/pdficon.gif"></a> On Minimizing Clock Skew During FPGA Placement <dd><i> M.D.F. Wong, K. Zhu, The University of Texas at Austin </i><p> Unlike traditional ASIC technologies, the geometrical structures of clock trees in an FPGA are usually fixed and cannot be changed for different circuit designs. Moreover, the clock pins are connected to the clock trees via programmable switches. As a result, the load capacitances of a clock tree may be changed depending on the utilization and distribution of logic modules in an FPGA. To minimize clock skew, it is necessary to distribute the load capacitances, or equivalently the logic modules used by the circuit design, carefully according to the circuit design. In this paper we present an algorithm for selecting logic modules used for circuit placement such that the clock skew is minimized. The algorithm uses a dynamic programming technique and can be applied to the clock tree architectures used in commercial FPGAs. Furthermore, the algorithm can be extended to handle buffered clock trees and multi-phase clock trees. Experimental results show that the algorithm can reduce clock skews significantly as compared with the traditional placement algorithms which do not consider clock skew minimization. <P> <dt> <a href="../../pdffiles/fpga94_5_p3.pdf" name="5_p3"><img src="../../../../../icons/pdficon.gif"></a> Switch Bound Allocation in Timing-Driven Routing of FPGAs <dd><i> M.D.F. Wong, K. Zhu, The University of Texas at Austin </i><p> In segmented channel routing of row-based FPGAs [6, 7], the routability and interconnection delays depend on the choice of the upper bounds on the number of programmable switches used in routing connections in the channel. In this paper, we present an algorithm for determining such upper bounds for all connections of a net simultaneously so that the predefined source-to-sink delay bound of the net is satisfied and the routability of the net is maximized. Distributed RC delay model is used to compute source-to-sink delay. The algorithm can also be applied to timing-driven routing of symmetrical-array based FPGAs and FPICs. Preliminary experimental results show that our algorithm can significantly improve routability and delay bound satisfiability as compared with the traditional uniform switch bound allocation approach [6, 7]. <P> </dl> <hr> <h2>Session 6: <a NAME="sess6"> Applications of FPGAs </a></h2> Chair: Stephen Trimberger, Xilinx Inc. <dl> <dt> <a href="../../pdffiles/fpga94_6_1.pdf" name="6_1"><img src="../../../../../icons/pdficon.gif"></a> A Field-Programmable Gate Array Implementation of a Self-Adapting and Scalable Connectionist Network <dd><i> A. Ferrucci, M. Martin, T. Geocaris, M. Schlag, P.K. Chan, University of California, Santa Cruz </i><p> <dt> <a href="../../pdffiles/fpga94_6_2.pdf" name="6_2"><img src="../../../../../icons/pdficon.gif"></a> Taking Advantage of Reconfigurable Logic <dd><i> B.K. Fawcett, Xilinx Inc. </i><p> Logic implemented in an SRAM-based FPGA is reconfigurable; that is, changes can be made to the system's logic functions by reprogramming the FPGA(s) in the system. Examples are cited of systems that make use of this in-system reconfigurability. These applications can be divided into three main categories based on how the FPGA's reconfigurability is applied: systems with built-in diagnostics, adaptable system designs, and systems with multi-purpose hardware. <P> <dt> <a href="../../pdffiles/fpga94_6_3.pdf" name="6_3"><img src="../../../../../icons/pdficon.gif"></a> A Variable Precision Multiplier for Field Programmable Gate Arrays <dd><i> M. Louie, M. Ercegovac, University of California, Los Angeles </i><p> Off-the-shelf arithmetic chips provide high speed but offer only a limited choice of precisions. With Field Programmable Gate Arrays (FPGAs), good performance arithmetic hardware can be developed in precisions where custom chips are unavailable. We present a modular multiplier design for the Xilinx XC4010 FPGA that is easily tailored to any precision. Each module extends the precision in 4-bit increments, and modules are appended as needed. Our design avoids the common problem of large fanout delay in the critical path to provide a cycle time that is independent of precision. High precision multipliers span multiple chips without additional delays or additional logic block modules for interfacing. <P> </dl> <hr> <h2><a NAME="poster6"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_6_p1.pdf" name="6_p1"><img src="../../../../../icons/pdficon.gif"></a> Space Efficient Neural Net Implementation <dd><i> M. Gshwind, V. Salapura, O. Maischberer, Technische Universitat Wien </i><p> We show how field-programable gate arrays can be used to efficiently implement neural nets. By implementing the training phase in software and the actual application in hardware, conflicting demands can be met: training benefits from a fast edit-debug cycle, and once the design has stabilized, a hardware implementation results in higher performance. While neural nets have been implemented in hardware in the past, larger digital nets have not been possible due to the real-estate requirements of single neurons. We present a bit-serial encoding scheme and computation model, which allows space-efficient computation of the sum of weighted inputs, thereby facilitating the implementation of complex neural networks. <P> <dt> <a href="../../pdffiles/fpga94_6_p2.pdf" name="6_p2"><img src="../../../../../icons/pdficon.gif"></a> Implementation of a Local Controller for the Reconfigurable Machine Using the Xilinx's 4005 FPGA <dd><i> S.S. Erdogan, N. Kuo, A. Wahab, Nanyang Technological University, Singapore </i><p> The Reconfigurable Machine (RM) is a parallel architecture that is housed on the IBM/AT machine as a host computer. RM is built using of Xilinx's 4005 Field Programmable Gate Array (FPGA) and associated fast SRAMs. This paper describes the architecture and implementation of the Local Controller, which is also implemented using Xilinx's 4005 FPGA, specifically optimized to handle the communication between the RM and host computer. The Xilinx FPGA and it's associated CAD system XACT, together with the Mentor Graphics EDA system is used for implementing the control circuit easily and efficiently. <P> </dl> <hr> <h2>Session 7: <a NAME="sess7"> Partitioning and Technology Mapping </a></h2> Chair: Ewald Detfens, Exemplar <dl> <dt> <a href="../../pdffiles/fpga94_7_1.pdf" name="7_1"><img src="../../../../../icons/pdficon.gif"></a> Area/Pin-Constrained Circuit Clustering for Delay Minimization <dd><i> M.D.F. Wong, H. Yang, The University of Texas at Austin </i><p> We consider the problem of circuit partitioning for multiple-chip implementation. One motivation for studying this problem is the current needs of good partitioning tools for implementing a circuit on multiple FPGA chips. We allow replication of logic gates as it would reduce circuit delay. Circuit partitioning with replication of logic gates is also called circuit clustering. In this paper, we present a circuit clustering algorithm that minimizes circuit delay subject to area and pin constraints on each chip, under the general delay model. Our algorithm is optimal under either the area constraint only or the pin constraint only. We tested our algorithm on a set of benchmark circuits and consistently obtained optimal or near-optimal results. <P> <dt> <a href="../../pdffiles/fpga94_7_2.pdf" name="7_2"><img src="../../../../../icons/pdficon.gif"></a> Min-Cut Replication for Improved Partitions <dd><i> J. Hwang, Quickturn Design Systems & Stanford </i><p> Min-cut replication is an optimization that has been recently identified as a method for improving partitions with application, for example, to logic emulation systems based on field-programmable gate arrays. By using unused gates on the FPGAs, min-cut replication can reduce the number of pins required, allowing relaxation of the partitioning pin constraint. The increased partitioning flexibility can result in partitions containing significantly fewer FPGAs. We describe Ti.PIa, a software tool incorporating rain-cut replication, and present the results of applying min-cut replication to designs partitioned by commercial software that itself introduces replication for timing correctness. The results indicate that min-cut replication can reduce the number of FPGAs required to implement a large design. <P> <dt> <a href="../../pdffiles/fpga94_7_3.pdf" name="7_3"><img src="../../../../../icons/pdficon.gif"></a> Using Encoding for Functional Decomposition with Application to Look Up Table Architectures <dd><i> R. Murgai, R. Brayton, A. Sangiovanni-Vincentelli, University of California, Berkeley </i><p> In this paper, we revisit the classical problem of functional decomposition [1, 2J that arises so often in logic synthesis. One basic problem that has remained largely unaddressed to the best of our knowledge is that of decomposing a function such that the resulting sub-functions are simple, i.e., have small number of cubes or literals. In this paper, we show how to solve this problem. We demonstrate that this problem is intimately related to the encoding problem, which is also of fundamental importance in sequential synthesis, especially state-machine synthesis. We formulate the decomposition problem using encoding. In general, an input-output encoding formulation has to be employed to solve the problem. However, we show that for programmable gate array architectures which use look-up tables, the input encoding formulation suffices, provided we use minimum-length codes. The last condition is actually not a constraint, since each extra code bit means that an extra table has to be used (and that could be expensive). The unused codes are used as don't cares for simplifying the sub-functions. We compare the original implementation of functional decomposition, which ignores the encoding problem, with the new version that uses encoding while doing decomposition. We obtain an average improvement of over 20% on a set of standard benchmarks for look-up table architectures. <P> </dl> <hr> <h2><a NAME="poster7"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_7_p1.pdf" name="7_p1"><img src="../../../../../icons/pdficon.gif"></a> A Technique for Synthesizing Data Part Using FPGAs <dd><i> M. Balakrishnan, A.R. Naseer, A. Kumar, Indian Institute of Technology Delhi, India </i><p> <dt> <a href="../../pdffiles/fpga94_7_p2.pdf" name="7_p2"><img src="../../../../../icons/pdficon.gif"></a> Vertex Ordering in Partitioning-Based Fitter for an Application Specific EPLD Device <dd><i> M. Jeske, T. Gao, A. Coppola, Portland State University </i><p> This paper presents a vertex ordering heuristic which speeds up the search process in partitioning-based fitting algorithm. The fitting algorithm maps a sequential circuit onto a special application EPLD device, CY7C361 from Cypress Semiconductor. The same approach can also be used for other EPLD and FPGA architectures. The vertex ordering heuristic has decreased the search time from hours to seconds for more complex designs. The quality of results is not effected as the algorithm remains exact. <P> <dt> <a href="../../pdffiles/fpga94_7_p3.pdf" name="7_p3"><img src="../../../../../icons/pdficon.gif"></a> RTL Synthesis System Using FPGA Macro Bloc Capabilities <dd><i> M.C. Bertrand, A. Mignotte, O. Khalil, G. Saucier, Institut National Polytechnique </i><p> For any technology targets, it is very important to make a consistent use of library macro blocks. It is obvious that the FPGA vendors designed carefully the basic blocks and that for some targets placement and routing is pretty well handled or prepared for these blocks. Therefore, and especially for FPGA targets, the resynthesis of arithmetic operators would lead to poor results compared to designs where the library macro blocks are used. It seems very important to realize a good link from a VHDL description to fully use library macro block capabilities. In addition, most of the designers use a meet-in-the-middle design methodology. This means that they design some basic, dedicated, efficient basic blocks and they assemble or use them in a more complex design. Therefore it is very important to support a VHDL description using the functionalities of the predesigned basic blocks. The ASYL RTL VHDL synthesis system presented in this paper provides facilities for both library macro blocks handling and predesigned designees blocks. <P> </dl> <hr> <h2>Session 8: <a NAME="sess8"> Placement, Routing and Timing Modelling </a></h2> Chair: Sinan Kaptanoglu, Actel <dl> <dt> <a href="../../pdffiles/fpga94_8_1.pdf" name="8_1"><img src="../../../../../icons/pdficon.gif"></a> A New Approach to FPGA Routing Based on Multi-Weighted Graphs <dd><i> G. Robins, M.J. Alexander, University of Virginia </i><p> Field-programmable gate arrays (FPGAs) are an inexpensive and flexible "low risk" design alternative to custom integrated circuits. While FPGA partitioning and technology mapping have been well-studied, FPGA routing has received much less attention. In this paper we propose a unified general framework for FPGA routing, which allows simultaneous optimization of multiple competing objectives under a smooth designer-controlled tradeoff. Our approach is based on a new and general multi-weighted graph formulation, enabling a good theoretical performance characterization, as well as an effective, practical implementation. Our router is architecture-independent, computationally efficient, and performs very well on industrial benchmarks. Finally, our multi-weighted graph formulation is quite general and is directly applicable to many other areas of CAD. <P> <dt> <a href="../../pdffiles/fpga94_8_2.pdf" name="8_2"><img src="../../../../../icons/pdficon.gif"></a> A Simultaneous Placement and Global Routing Algorithm for Symmetric FPGAs <dd><i> N. Togawa, M. Sato, T. Ohtsuki, Waseda University </i><p> An FPGA layout algorithm is presented, which deals with placement and global routing simultaneously by fully exploiting its regular structure. It is based on a simple and fast top-down hierarchical bi-partitioning, with placement and global routes represented by positions of logic-blocks and pseudo-blocks, respectively. For each critical signal path, its upper bound in length is given as a constraint. During the bi-partitioning, the algorithm calculates estimated lengths of the paths and, based on them, partitions the set of blocks so as to satisfy the constraints. In this way, the signal path length is controlled during layout design. Experimental results for several benchmark circuits demonstrates its efficiency and effectiveness. <P> <dt> <a href="../../pdffiles/fpga94_8_3.pdf" name="8_3"><img src="../../../../../icons/pdficon.gif"></a> Fast Delay Estimation in Segmented Channel FPGAs <dd><i> M. Chew, J-C. Lien, Actel Corp. </i><p> Two existing methods are used in computing interconnection delays for FPGAs. The first method, based on SPICE-like circuit simulation, provides very accurate results but requires intensive computations. The second method, based on (some variation of) a bounding box perimeter wire length estimate for the routing, is extremely fast. However, its inaccuracy is high (up to 50 % ) for FPGAs. Neither methods can be incorporated into optimization applications requiring reasonable accurate delays and a large number(100,000 or more) of delay estimates. In this paper, we propose a third method, applicable to FPGAs with segmented channels to quickly estimate net delays accurately (within 10% of SPICE). This method allows place and route (and other applications) to take advantage of the more accurate delays during iterative operations. <P> </dl> <hr> <h2><a NAME="poster8"> Posters </a></h2> <dl> <dt> <a href="../../pdffiles/fpga94_8_p1.pdf" name="8_p1"><img src="../../../../../icons/pdficon.gif"></a> Timing Modelling for Antifuse Based FPGAs <dd><i> N.S. Nagaraj, P. Krivacek, M. Harward, Texas Instruments Inc., India </i><p> An accurate timing model plays a key role in performance driven synthesis, placement and routing besides timing verification. Timing modeling for antifuse based FPGAs is different from standard cells or gate arrays due to technology and architecture differences. In this paper, the methodology followed to develop a timing model for an enthuse based FPGA architecture is presented. The methodology includes extraction of Interconnect parasitics in the form of RC trees from the place and route results. A reduced order approximation obtained for the driving point admittance of RC trees Is coupled with SAS equations to compute load dependent delay and output slew. Rapid Interconnect Circuit Evaluator(RICE) is used for computing the wire delay. Experimental results of using the timing model to find correlation between delay and fanout is presented. <P> <dt> <a href="../../pdffiles/fpga94_8_p2.pdf" name="8_p2"><img src="../../../../../icons/pdficon.gif"></a> Computational Complexity of 2-D FPGA Routing for Arbitrary Switch Box Topologies <dd><i> Y-L. Wu, S. Tsukiyama, Chuo Uaniversity, Japan </i><p> In this paper, we consider the problem of mapping a given global routes to detailed routes (tracks) for two dimensional homogeneous FPGAs. We prove that for an arbitrary fixed switch box topology of flexibility three, this problem remains NP-complete with or without doglegs.<BR> Index Term: FPGA Detailed Routing, NP-Complete, Switch Box Topology, multi-Dimensional Interval Packing.<P> </dl>