|
DATE 2006 DESIGNERS' FORUM, ABSTRACTS
Sessions:
[4D]
[5D]
[6D]
[Interactive Presentations]
[7D]
[8D]
[9D]
[10D]
[Interactive Presentations]
[11D]
Moderators: B. Kasser, STMicroelectronics, FR; G. Bertoni, STMicroelectronics, IT
-
Architectures for Efficient Face Authentication in Embedded Systems [p. 1]
-
N. Aaraj, S. Ravi, A. Raghunathan and N. K. Jha
Biometrics represent a promising approach for reliable and secure user
authentication. However, they have not yet been widely adopted in embedded
systems, particularly in resource-constrained devices such as cell
phones and personal digital assistants (PDAs). In this paper, we investigate
the challenges involved in using face-based biometrics for authenticating
a user to an embedded system. To enable high authentication
accuracy, we consider robust face verifiers based on principal component
analysis/linear discriminant analysis (PCA-LDA) algorithms and
Bayesian classifiers, and their combined use (multi-modal biometrics).
Since embedded systems are severely constrained in their processing capabilities,
algorithms that provide sufficient accuracy tend to be computationally
expensive, leading to unacceptable authentication times. On
the other hand, achieving acceptable performance often comes at the
cost of degradation in the quality of results.
Our work aims at developing embedded processing architectures that
improve face verification speed with minimal hardware requirements,
and without any compromise in verification accuracy. We analyze the
computational characteristics of face verifiers when running on an embedded
processor, and systematically identify opportunities for accelerating
their execution. We then present a range of targeted hardware and
software enhancements that include the use of fixed-point arithmetic,
various code optimizations, application-specific custom instructions and
co-processors, and parallel processing capabilities in multi-processor
systems-on-chip (SoCs).
We evaluated the proposed architectures in the context of open-source
face verification algorithms running on a commercial embedded processor
(Xtensa from Tensilica). Our work shows that fast, in-system verification
is possible even in the context of many resource-constrained embedded
systems. We also demonstrate that high authentication accuracy
can be achieved with minimum hardware overheads, while requiring no
modifications to the core face veriication algorithms.
-
Software Implementation of Tate Pairing over GF(2m) [p. 7]
-
G. Bertoni, L. Breveglieri, P. Fragneto, G. Pelosi and L. Sportiello
Recently, the interest about the Tate pairing over binary
fields has decreased due to the existence of efficient attacks
to the discrete logarithm problem in the subgroups of such
fields. We show that the choice of fields of large size to make
these attacks infeasible does not lead to a degradation of the
computation performance of the pairing. We describe and
evaluate by simulation an implementation of the Tate pairing
that allows to achieve good timing results, comparable
with those reported in the literature but with a higher level
of security.
-
Optimization of Regular Expression Pattern Matching Circuits on FPGA [p. 12]
-
C.-H. Lin, C.-T. Huang, C.-P. Jiang and S. C. Chang
Regular expressions are widely used in Network
Intrusion Detection System (NIDS) to represent patterns of
network attacks. Since traditional software-only NIDS
cannot catch up to the speed advance of networks, many
previous works propose hardware architectures on FPGA
to accelerate attack detection. The challenge of hardware
implementation is to accommodate the regular expressions
to FPGAs of the large number of attacks. Although the
minimization of logic equations has been studied
intensively in the CAD area, the minimization of multiple
regular expressions has been largely neglected. This paper
presents a novel architecture allowing our algorithm to
extract and share common sub-regular expressions.
Experimental results show that our sharing scheme
significantly reduces the area of regular expression
circuits.
-
Satisfiability-Based Framework for Enabling Side-Channel Attacks on Cryptographic Software [p. 18]
-
N. R. Potlapally, A. Raghunathan, S. Ravi, N. K. Jha and R. B. Lee
Many electronic systems contain implementations
of cryptographic algorithms in order to provide security. It is
well known that cryptographic algorithms, irrespective of their
theoretical strength, can be broken through weaknesses in their
implementation. In particular, side-channel attacks, which exploit
unintended information leakage from the implementation, have been
established as a powerful way of attacking cryptographic systems.
All side-channel attacks can be viewed as consisting of two phases
- an observation phase, wherein information is gathered from
the target system, and an analysis or deduction phase in which
the collected information is used to infer the cryptographic key.
Thus far, most side-channel attacks have focused on extracting
information that directly reveals the key, or variables from which
the key can be easily deduced.
We propose a new framework for performing side-channel attacks
by formulating the analysis phase as a search problem that
can be solved using modern Boolean analysis techniques such
as satisfiability solvers. This approach can substantially enhance
the scope of side-channel attacks by allowing a potentially wide
range of internal variables to be exploited (not just those that are
"simply" related to the key). For example, software implementations
take great care in protecting secret keys through the use of onchip
key generation and storage. However, they may inadvertently
expose the values of intermediate variables in their computations.
We demonstrate how to perform side-channel attacks on software
implementations of cryptographic algorithms based on the use of
a satisfiability solver for reasoning about the secret keys from the
values of the exposed variables. Our attack technique is automated,
and does not require mathematical expertise on the part of the
attacker. We demonstrate the merit of the proposed technique by
successfully applying it to two popular cryptographic algorithms,
DES and 3DES.
-
An 830mW, 586kbps 1024-Bit RSA Chip Design [p. 24]
-
C. Yeh, E.-F. Hsu, K.-W. Cheng, J.-S. Wang and N.-J. Chang
This paper presents an RSA hardware design that
simultaneously achieves high-performance and low-power.
A bit-oriented, split modular multiplication
algorithm and architecture are proposed to fully exert
the radix-4 computational capability. Further, we
identify the switching profile of RSA data and
accordingly propose power-optimized designs for the
storage elements and key computational components.
The complete RSA modular exponentiation hardware
has been implemented using cell-based 0.18μm CMOS
technology. Post-layout simulation shows that the
design delivers an average performance of 586kbps at
460MHz, 1.8V while consuming only 830mW.
-
Platform Independent Debug Port Controller Architecture with Security Protection for Multi-Processor
System-on-Chip ICs [p. 30]
-
D. Akselrod, A. Ashkenazi and Y. Amon
A Debug Port Controller (DPC) architecture, designed
for re-use in multiple System-on-Chip (SoC) Integrated
Circuits (ICs) is presented. The DPC incorporates
security protection against unauthorized access along
with advanced debugging features such as long chain
debugging, universal BIST engines control, and generic
serial interfaces. An implemented security architecture of
DPC is presented together with an overall IC security
scheme. DPC is the most important part of this IC
security scheme. The suggested architecture demonstrates
extensive use of the debug process, and re-use of the
DPC in multiple SoC ICs without the need of adopting
its design for a specific SoC. The implementation of
the DPC for IEEE1149.1 standard is presented and
the hardware realization of the proposed architecture
is described in detail. The DPC that incorporates the
proposed architecture has been designed in a 90 nm
CMOS process as an integral part of several SoC ICs.
Moderators: C. Heer, Infineon Technologies, DE; H. Blume, TU Aachen, DE
-
Automated Conversion from LUT-Based FPGA to a LUT-Based MPGA with Fast Turnaround Time [p. 36]
-
F.-J. Veredas, M. Scheppler and H.-J. Pfleiderer
Mask Programmable Gate Arrays (MPGAs) see a growing
importance because of the increase of design cost
and turnaround times in ultra-deep submicron technologies
which mostly impact ASICs. Several design methodologies
have been proposed in recent years for converting an evaluated
Field-Programmable Gate-Array (FPGA) prototypedesign
into an MPGA. An automatic conversion flow is
essential to success. In this paper, we present a conversion
flow for a Look-up Table-based (LUT-based) MPGA
without applying re-synthesis but preserving the gate-level
netlist and reusing the placement. The resulting flow has a
special routing tool and buffer insertion algorithm for timing
integrity. The experimental investigations use a commercial
FPGA and industrial benchmarks.
-
Energy-Efficient FPGA Interconnect Design [p. 42]
-
M. Meijer, R. Krishnan and M. Bennebroek
Despite recent advances in FPGA devices and embedded
cores, their deployment in commercial products remains
rather limited due to practical constraints on, for
example, cost, size, performance, and/or energy consumption.
In this paper, we address the latter bottleneck and
propose a novel FPGA interconnect architecture that reduces
energy consumption without sacrificing performance
and size. It is demonstrated that the delay of a fullswing,
fully-buffered interconnect architecture can be
matched by a low-swing solution that dissipates significantly
less power and contains a mix of buffer and passgate
switches. The actual energy savings depend on the
specifics of the interconnect design and applications involved.
For the considered fine-grain FPGA example,
energy savings are observed to range from a factor 4.7 for
low-load critical nets to a factor 2.8 for high-load critical
nets. The results are obtained from circuit simulations in a
0.13 μm CMOS technology for various benchmarks.
-
A New Approach to Compress the Configuration Information of Programmable Devices [p. 48]
-
M. Martina, G. Masera, A. Molino, F. Vacca, L. Sterpone and M. Violante
During the last decade programmable devices have
gained an impressive diffusion, tackling some traditional
ASIC marked domains. In particular, multi-million gate FPGAs
have become a very appealing low-cost solution even
for consumer applications. However, one of the big issues
that can arise with modern FPGA devices is the need for
large and expensive external non-volatile memory to keep
the configuration data. In this work we developed an alternative
technique to compress FPGA bitstreams based on
the knowledge of the device internal structure. The proposed
method performs a two-step coder: in the first step
the bitstream is adaptively "fitered" to remove data redundancy,
while in the second step an arithmetic coder is used
to actually compress the information. The effectiveness of
the proposed technique has been demonstrated on a set of
case studies. As a result conventional approaches are outperformed
reaching a compression ratio of 4.26 against 3.3
times.
-
Design and Implementation of a Rendering Algorithm in a SIMD Reconfigurable Architecture (MorphoSys) [p. 52]
-
J. Davila, A. de Torres, J. M. Sanchez, M. Sanchez-Elez, N. Bagherzadeh and F. Rivera
In this paper we analyze a 3D image rendering
algorithm and the different mapping schemes to implement
it in a SIMD reconfigurable architecture. 3D image render
is highly computational and has an important restriction in
execution time due to the requirement to get interactive
results. We demonstrate that the execution of this
algorithm in MorphoSys can take advantage of the
available parallel resources, as well as of the possibility of
one cycle configuration change. In this paper we show that
it is possible to implement the rendering algorithm in our
coarse grain reconfigurable architecture, obtaining values
over 100 fps.
-
Application Specific Instruction Processor Based Implementation of a GNSS Receiver on an FPGA [p. 58]
-
G. Kappen and T. G. Noll
In this paper the concept of a reconfigurable hardware
macro to be used as a generic building block in lowpower,
low-cost SoC for multioperable GNSS positioning
is described, featuring sufficient computational power and
flexibility. The central processing unit of the reconfigurable
hardware macro is an ASIP accelerated by additional
eFPGA and weakly configurable ASIC based coprocessors.
The different hardware building blocks (i.e.
ASIP, eFPGA, ASIC) of the target architecture are motivated
with state of the art GNSS receiver algorithms. To
explore the design space of the target architecture and to
develop appropriate partitioning cost functions a GNSS
receiver testbed was realised on an FPGA board. The
testbed utilises a programmable ASIP, designed and generated
with the processor description language LISA, as a
central processing unit. As a first accelerating coprocessor
the correlator was realised. Exemplary optimisations
of the ASIP / co-processor architecture as well as
the achieved improvements are described.
-
A Methodology for FPGA to Structured-ASIC Synthesis and Verification [p. 64]
-
M. Hutton, R. Yuan, J. Schleicher, G. Baeckler, S. Cheung, K. K. Chua and H. K. Phoon
Structured-ASIC design provides a mid-way point
between FPGA and cell-based ASIC design for
performance, area and power, but suffers from the same
increasing verification burden associated with cell-based
design. In this paper we address the verification issue with
a methodology and fabric to directly tie FPGA prototype
and functional in-system verification with a clean
migration path to structured ASIC. The most important
aspects of this methodology are the use of physically
identical blocks for difficult-to-verify PLLs, I/O and RAM
and a structured re-synthesis of FPGA logic blocks to
target cells that guarantees anchor points for easy formal
verification.
Moderators: M. de Marinis, SensorDynamics, DE; D. Strle, Ljubljana U, SL
-
Synthesis of System Verilog Assertions [p. 70]
-
S. Das, R. Mohanty, P. Dasgupta and P. P. Chakrabarti
In recent years, Assertion-Based Verification is being
widely accepted as a key technology in the pre-silicon validation
of system-on-chip(SOC) designs. The System Verilog
language integrates the specification of assertions with the
hardware description. In this paper we show that there are
several compelling reasons for synthesizing assertions in
hardware, and present an approach for synthesizing System
Verilog Assertions (SVA) in hardware. Our method investigates
the structure of SVA properties and decomposes them
into simple communicating parallel hardware units that together
act as a monitor for the property. We present a tool
that performs this synthesis, and also show that the chip
area required by the monitors for a industry standard ABV
IP for the ARM AMBA AHB protocol is quite modest.
-
Generating Finite State Machines from SystemC [p. 76]
-
A. Habibi, H. Moinudeen and S. Tahar
SystemC is a system level language proposed to raise the
abstraction level for embedded systems design and verification.
In this paper, we propose to generate Finite State Machines
(FSM) from SystemC designs using two algorithms
originally proposed for the generation of FSM from Abstract
State Machines (ASM). This proposal enables the integration
of SystemC with existing tools for test case generation
from FSM. Hence, enabling two important applications: (1)
using the FSM graph structure to produce test
suites allowing functional testing of SystemC designs; and
(2) performing conformance testing, where the FSM serves
as a precise model of the observable behavior of the system
used to validate lower abstraction levels of the design
(e.g., Register Transfer Level (RTL)).
-
Flexible Specification and Application of Rule-Based Transformations in an Automotive Design Flow [p. 82]
-
J. H. Oetjens, J. Gerlach and W. Rosenstiel
This paper addresses an XML-based design environment,
which provides a powerful basis for the manipulation
of hardware design descriptions. The contribution of
the paper is a flexible specification entry for the definition
of transformation rules, which allows a designer to specify
transformations by his/her own without having XML
expertise. The specification entry provides a guided and
graphically supported mechanism to define transformation
rules. This opens up a new approach, in which the
specification and verification of a transformation rule is
carried out by using simple design examples, to be applied
to arbitrary complex designs subsequently. A new
key characteristic of our approach is that both transformation
environment and transformation entry tool are
based on a very compact definition of the hardware description
language grammar in use, and both of them are
fully automatically generated from that basic grammar
definition. This makes our approach highly open for other
hardware and system specification languages. The paper
describes the transformation environment and transformation
entry tool, and demonstrates its application in
terms of two automotive-typical transformations, addressing
power aspects on the one hand, and safety aspects on
the other.
-
A Mixed-Signal Verification Kit for Verification of Analogue-Digital Circuits [p. 88]
-
G. Bonfini, M. Chiavacci, R. Mariani and E. Pescari
This paper presents an innovative approach for
analogue and mixed-signal verification. It consists in a
"verification kit" that makes use of concepts used in
state-of-art digital verification, such as automatic
results collection, coverage elaboration, data checking
capability, pseudo-random and constrained stimuli
generation. Using a Bandgap cell as case study, the
paper shows as the presented approach allows a
precise definition of the verification space and a saving
of more than 50% of the total verification effort respect
traditional verification methodologies. The paper
shows also how the approach can be extended to more
complex mixed-signal systems.
-
A Complete and Fully Qualified Design Flow for Verification of Mixed-Signal SoC with Embedded
Flash Memories [p. 94]
-
P. Daglio
Today almost all the people in the industry are
talking widely about full chip mixed-signal
simulation, both in pre-layout and post-layout
conditions, basically for two main reasons: a large
range of applications is moving from fully digital to
mixed-signal and full chip simulation with parasitic
components, together with IR drop analysis, is
becoming strictly mandatory before going to silicon.
In fact, the cost of a mask set for a 90nm or a 65nm
technology is growing in an exponential way,
passing the million dollar for any single mask set.
For these reasons, it is strategic to set up a very
complete mixed-signal design flow allowing
designers to go to the silicon in a safe way with the
minimum risk of failure. Nowadays, various
approaches to the same problem are pursued by
different organizations, sometimes privileging the
fully digital modeling of the mixed-signal system and
some other times setting the digital part in VHDL
and keeping the analog part at transistor level,
simulating the whole chip with a mixed-signal
simulator. Which is the right approach ? Which are
the status and the reliability of the tools on the
market ? Which is the acceptable trade-off among
simulation speed, code coverage and precision of
simulation results ? This paper tries to answer to
these questions proposing a fully qualified and
complete mixed-signal flow for SoC verification,
implemented to design applications also containing
embedded flash memories.
-
Software-Friendly HW/SW Co-Simulation: An Industrial Case Study [p. 100]
-
J. Noguera, L. Baldez, N. Simon and L. Abello
This paper proposes a novel HW/SW co-simulation approach
that minimizes the impact on software designers.
We propose a SystemC-based system that enables the
software team to test their software with their own tools
and environment using an accurate simulated ASIC (Application
Specific Integrated Circuit) model.
The solution presented here enables a smooth and early
ASIC and SW integration, which reduces the project development
time and improves the ASIC design quality (i.e.,
SW engineers can help in the ASIC verification and ASIC
engineers can help in the SW development). In this solution,
the real and full software (i.e., multi-threaded application)
runs in its native environment with minimal
changes and interfaces with a simulated ASIC model using
sockets. We have tested this approach on a pilotproject,
which has demonstrated the feasibility of this co-development
methodology.
Moderators: C. Grassmann, Infineon Technologies, DE; W. Mueller, C-LAB/Paderborn U, DE
-
Modeling and Simulation of Mobile Gateways Interacting with Wireless Sensor Networks [p. 106]
-
F. Fummi, D. Quaglia, F. Ricciato and M. Turolla
Sensor networks are emerging wireless technologies;
their integration with the existing 2.5G, 3G mobile networks
is a key issue to provide advanced services, e.g., health control.
However this integration poses new challenges in the
design and simulation of the involved embedded systems
since it requires the cooperation of simulation tools that
model hardware, software, and network aspects and their
interactions. We present the modeling and simulation of a
network scenario, core of a telecom provider's future portfolio,
in which an ARM-based mobile handset is used as the
gateway between a wireless sensor network (WSN) and remote
users through a wide area network (WAN). Initially,
the gateway and the WSN are modeled at system level with
SystemC while the wide area network is modeled with NS-2.
Then, HW/SW partitioning is performed on the gateway
and an instruction set simulator of the ARM processor is
used for the cycle-accurate execution of the RTOS and the
application software.
-
A Hardware-Engine for Layer-2 Classification in Low-Storage, Ultra-High Bandwidth Environments [p. 112]
-
V. Papaefstathiou and I. Papaefstathiou
Ethernet is the most common Layer-2
network protocol, and it is currently being deployed beyond
the tight borders of LANs. In order to accommodate the
needs of MANs and WANs, several QoS mechanisms
employed at the MAC sublayer of Ethernet have been
proposed. These QoS mechanisms require identification of
network flows and the classification of Ethernet packets
according to certain Ethernet header fields. In this paper,
we propose a classification engine employed at the MAC
sublayer which uses an innovative hashing scheme and
internal replacement of MAC Vendor IDs; the Hash Based
Classification Engine (HBCE) compacts the tables
containing the rules associated with certain MAC addresses
and supports extremely high speed decisions - at a rate of
more than 100Gb/sec-, while its memory needs are
significantly lower compared to those of the similar
schemes currently used. This engine has been implemented
in hardware utilizing less than 0.1mm2 in a state of the art
CMOS technology. As a result HBCE is a very promising
candidate for the next-generation Ethernet equipments that
need to support classification at Data Link Layer at multi-Gigabit
per second network speeds, whereas due to its very
low memory requirements and low implementation
complexity, it can also be employed very efficiently in
lower-bandwidth wireless environments that utilize MAC
mechanisms.
-
ASIP Architecture for Multi-Standard Wireless Terminals [p. 118]
-
D. Lo Iacono, J. Zory, E. Messina, N. Piazzese, G. Saia and A. Bettinelli
This paper presents the Block Processing Engine
(BPE), an Application Specific Instruction-Set Processor
(ASIP) explicitly designed for the implementation of multistandard
wireless terminals. Thanks to a high level of
parallelism and a consistent use of pipeline, the BPE
architecture fully satisfies stringent real-time constraints
imposed by emerging technologies. Its efficiency has been
proven through the implementation, the physical synthesis
for the CMOS 90nm STM technology and the FPGA
prototyping on the ARM Versatile platform of a dualstandard
Frequency Domain Equalizer (FDE) supporting
the 3GPP HSDPA and the IEEE 802.11a standards.
-
Interconnection Framework for High-Throughput, Flexible LDPC Decoders [p. 124]
-
F. Quaglio, F. Vacca, C. Castellano, A. Tarable and G. Masera
This paper presents a possible interconnection structure
suitable for being used in a flexible LDPC decoder.
The main feature of the proposed approach is the possibility
of implementing parallel or semi-parallel decoders
with a reduced communication complexity. To the best of
our knowledge this is the first work detailing the implementation
of a fully flexible LDPC decoder, able to support
any type of code. To prove the effectiveness of this
approach, a complete decoder has been implemented on a
XC2V8000, achieving a decoding throughput of 529 Mbps
on a (1920,640) code.
-
Low Cost LDPC Decoder for DVB-S2 [p. 130]
-
J. Dielissen, A. Hekstra and V. Berg
Because of its excellent bit-error-rate performance, the
Low-Density Parity-Check (LDPC) algorithm is gaining increased
attention in communication standards and literature.
The new Digital Video Broadcast via Satellite standard
(DVB-S2) is the first broadcast standard to include
a LDPC-code, and the first implementations are available.
In our investigation of generic LDPC-implementations we
found that scalable sub-block parallelism enables efficient
implementations for a wide range of applications. For the
DVB-S2 case, using sub-block parallelism we obtain half
the chip-size of known solutions. For the required performance
in the normative configurations for the broadcast
service (90Mbps), the area is even
-
3dID: A Low-Power, Low-Cost Hand Motion Capture Device [p. 136]
-
M. Sama, V. Pacella, E. Farella, L. Benini and B. Riccó
This paper presents a novel input device design for capturing
gestures. The system is based on commodity components
and combines accelerometers, gyroscopes and bend
sensors. It is a low-power, low-cost hand device, characterized
by extreme wearability thanks to wireless communication
support and small form-factor. It can be used
as a stand-alone platform or combined with other wireless
sensor nodes in a body area network. The system has
been tested as input interface for moving a virtual three-dimensional
hand in real-time.
Organiser/Moderator: C. K. Lennard, ARM Ltd, UK
-
Industrially Proving SPIRIT Consortium Standards for Design Chain Integration [p. 142]
-
V. Berman, S. Fazzari, M. Indovina, C. Ussery, M. Strik, J. Wilson, O. Florent, F. Rémond, P. Bricaud
There has traditionally been significant
engineering overhead required for the integration
of multi-vendor tool and IP design methodologies.
Making design-chain integration efficient is the key
objective of the SPIRIT Consortium. This Special
Session paper provides an insight into how the
specifications of the SPIRIT Consortium are being
adopted in the industry today. We present 3
production design-flow stories which show
improved efficiency gained through use of the
SPIRIT Consortium specifications. These include
an IP generator for hierarchical VLIW processor
design, a full hardware / software SoC integration
design flow managed through generators, and
methodology support for a flow from electronic
system level (ESL) design through to the 65 nm
CMOS process.
Moderators: K. Goossens, Philips Research, NL; M. Coppola, STMicroelectronics, FR
-
Networks on Chips for High-End Consumer-Electronics TV System Architectures [p. 148]
-
F. Steenhof, H. Duque, B. Nilsson, K. Goossens and R. Peset Llopis
Consumer electronics products, such as high-end (digital) TVs,
contain complex systems on chip (SOC) that offer high computational
performance at low cost. Traditionally, these SOCs are
application-specific standard products (ASSPs) with limited programmability.
We describe why TV SOCs must become more flexible,
and why companion chips together with networks on chips
(NOC) are a crucial enabling technology. In particular, networks
that span multiple chips will become important in the near future.
We demonstrate our ideas by extending a commerciallyavailable
SOC for picture improvement in high-end TVs with the
ethereal NOC. Our first unoptimised results indicate that replacing
the original interconnect (consisting of dedicated links and
multiplexers for bypasses) by programmable NOC increases the
SOC area by 4% and its power dissipation by 12%. The new,
flexible SOC allows new tasks to be spliced in at any point in the
task graph. Both analytical performance verification and system
simulations at RTL VHDL show that the extended SOC meets its
functional requirements. Using the &AELig;thereal design flow the extended
architecture was designed, implemented, and verified in 12
person months.
To the best of our knowledge, this is the first application of a
NOC to a commercial SOC. The quantitive results indicate that
even retrofitting a NOC to an existing architecture is beneficial at
acceptable cost.
-
Simulation and Analysis of Network on Chip Architectures: Ring, Spidergon and 2D Mesh [p. 154]
-
L. Bononi and N. Concer
NoC architectures can be adopted to support general
communications among multiple IPs over multi-processor
Systems on Chip (SoCs). In this work we illustrate the
modeling and simulation-based analysis of some recent
architectures for Network on Chip (NoC). Specifically,
the Ring, Spidergon and 2D Mesh NoC topologies have
been compared, both under uniform load and under more
realistic load assumptions in the SoC domain. The main
performance indexes considered are NoC throughput and
latency, as a function of variable data-injection rates,
source and destination distributions, variable number of
nodes. Results show that the Spidergon topology is a
good trade-off between performance, scalability of the
most efficient architectures inherited from the parallel
computing systems design, constraints about simple
management, small energy and area requirements for
SoCs.
-
GALS Networks on Chip: A New Solution for Asynchronous Delay-Insensitive Links [p. 160]
-
G. Campobello, M. Castano, C. Ciofi and D. Mangano
In this paper a cost effective solution for asynchronous
delay-insensitive on-chip communication is proposed. Our
solution is based on the Berger coding scheme and allows
to obtain a very low wire overhead. For instance, the
results of our evaluation show that a 64-bit link can be
built paying a wire overhead of 10% and 30 equivalent
two-input gates per wire. As a general rule, when the
number of bits to be transmitted increases, the wire
overhead decreases and the gate overhead remains almost
the same.
-
Flexible MPSoC Platform with Fast Interconnect Exploration for Optimal System Performance for a
Specific Application [p. 166]
-
F. Dumitrascu, I. Bacivarov, L. Pieralisi, M. Bonaciu and A.A. Jerraya
One of the key elements in Multi-Processor Systems-on-Chip
(MPSoC) design is to select the optimal on-chip
interconnect architecture, in order to maximize the
overall system performance.
This paper proposes a flexible MPSoC platform,
designed for a target application, which allows
customizing the interconnect by selecting various
architectures. It allows fast building of executable models
from architecture specifications and performance
evaluation using the cycle-accurate cosimulation.
We experimented a DivX encoder application with
three different interconnects: DMS (Distributed Memory
Server), AMBA bus and Octagon Network-on-Chip (NoC).
The simulation results relative to performance metrics
such as, average latency, throughput and execution time
allowed to compare these different interconnect
architectures, to verify the application real-time
constraints and to propose further optimizations.
-
STAX: Statistical Crosstalk Target Set Compaction [p. 172]
-
S. Nazarian, M. Pedram, S.K. Gupta and M.A. Breuer
This paper presents STAX, a crosstalk target set
compaction framework to reduce the complexity of the
crosstalk ATPG process by pruning non-fault-producing
targets. In general, existing pruning techniques do not
employ their processes in a cost-effective manner. Neither
do they handle process variations properly. To address the
first weakness, this paper presents a framework to
determine a sequence of available analysis and pruning
tool invocations to prune as many of the crosstalk targets
as fast as possible. As a result, an initially enormous
collection of crosstalk targets is usually reduced to a very
small set of targets via a vectorless process. A statistical
static timing analyzer is developed and embedded to
address the second shortcoming of existing approaches.
Experimental results on ISCAS'85 benchmark demonstrate
that STAX greatly improves the runtime compared to other
crosstalk target pruning methodologies, including ATPG,
with no prior target set compaction.
KEYWORDS
ATPG, fault-producing target, compaction degree, pruning
power, safe target, statistical static timing analyzer
-
A Fast-Lock Mixed-Mode DLL with Wide-Range Operation and Multiphase Outputs [p. 178]
-
K.-H. Cheung and Y.-L. Lo
This paper presents STAX, a crosstalk target set
compaction framework to reduce the complexity of the
crosstalk ATPG process by pruning non-fault-producing
targets. In general, existing pruning techniques do not
employ their processes in a cost-effective manner. Neither
do they handle process variations properly. To address the
first weakness, this paper presents a framework to
determine a sequence of available analysis and pruning
tool invocations to prune as many of the crosstalk targets
as fast as possible. As a result, an initially enormous
collection of crosstalk targets is usually reduced to a very
small set of targets via a vectorless process. A statistical
static timing analyzer is developed and embedded to
address the second shortcoming of existing approaches.
Experimental results on ISCAS'85 benchmark demonstrate
that STAX greatly improves the runtime compared to other
crosstalk target pruning methodologies, including ATPG,
with no prior target set compaction.
KEYWORDS
ATPG, fault-producing target, compaction degree, pruning
power, safe target, statistical static timing analyzer
Moderators: L. Fanucci, Pisa U, IT; J. Gerlach, Robert Bosch GmbH, DE
-
How OEMs and Suppliers Can Face the Network Integration Challenges [p. 183]
-
K. Richter and R. Ernst
Systems integration is a major challenge in many industries. Systematic analysis of the complex integration effects,
especially with respect to timing and performance, significantly improves the design process, enables optimizations, and
increases the quality and profit of a product. And it helps to improve supply-chain communications. This paper surveys a
set of interesting experiments we have conducted on a real-world automotive communication network using our new
SymTA/S system-level schedulability analysis technology. We demonstrate that, and how, analysis technology helps
answering key integration questions, thereby carefully respecting the established business models.
-
A Practical Implementation of the Fault-Tolerant Daisy-Chain Clock Synchronization Algorithm on CAN [p. 189]
-
F. C. Carvalho, C. E. Pereira, E. T. Silva, Jr. and E. P. Freitas
Networked processing units are becoming widely used in
the automotive embedded system domain aiming not only to
reduce vehicle weight and cost but also to assist the driver
to cope with critical situations. Because the fact that these
embedded networked systems are strictly involved with human
safety, there is a high demand on dependability requirements
which can only be guaranteed if active redundancy
is employed. Considering that the processing units
are usually connected by a shared serial media, the underlying
communication platform is the most important building
block. It must provide low-level support for deterministic
data transmission as well as a global time base to coordinate
the actions of replicated units. Within this context, this
paper presents the development of the fault-tolerant Daisy-Chain
clock synchronization algorithm over the CAN protocol,
resulting in an highly optimized communication architecture
for safety-critical applications. Implementation issues
and some obtained practical results are also discussed
in the paper.
-
On the Verification of Automotive Protocols [p. 195]
-
G. Zarri, F. Colucci, F. Dupuis, R. Mariani, M. Pasquariello, G. Risaliti and C. Tibaldi
Verification quality is a must for functional safety in
electronic systems. In automotive, the verification flow is
historically based on a layered approach, where each
level (model, design and system) has its proper
verification and validation methodology. Very often,
these methodologies are badly or not interconnected at
all one to another, and it's still common to see some of
the most critical verification tasks confined to postsilicon
validation, where costs to solve issues could be a
killing factor for deeply integrated electronic systems.
This paper presents the architecture of verification
components that can be applied in all the different levels
and shows how they have been successfully applied to
the verification of systems integrating LIN, CAN and
FlexRay protocols.
-
FlexRay Transceiver in a 0.35 μm CMOS High-Voltage Technology [p. 201]
-
F. Baronti, P. D'Abramo, M. Knaipp, R. Minixhofer, R. Roncella, R. Saletti, , M. Schrems, R. Serventi
and V. Vescoli
This paper presents one of the first fully functional
FlexRay transceivers manufactured in a 0.35 μm CMOS
High-Voltage technology, which provides high voltage
MOS devices together with standard 3.3 V gates. The
circuit operates as interface between a generic controller
and the copper wire FlexRay physical bus, to be used in
fault tolerant and fail safe applications. In particular, the
transceiver meets the operating requirements of the
automotive environment. The design was validated by
means of simulations and experimental measurements on
fabricated prototypes.
-
Space-Efficient FPGA-Accelerated Collision Detection for Virtual Reality [p. 206]
-
A. Raabe, S. Hochguertel, J. Anlauf and G. Zachmann
We present a space-efficient, FPGA-optimized architecture
to detect collisions among virtual objects. The design
consists of two main modules, one for traversing a hierarchical
acceleration data structure, and one for intersecting
triangles. This paper focuses on the former.
The design is based on a novel algorithm for testing discretely
oriented polytopes for overlap in 3D space. In addition,
we derive a new overlap test algorithm that can be
implemented using fixed-point arithmetic without producing
false negatives and with bounded error.
SystemC simulation results on different levels of abstraction
show that real-time collision detection of complex objects
at rates required by force-feedback and physicallybased
simulations can be obtained. In addition, synthesis
results show that the design can still be fitted into a sixmillion
gates FPGA. Furthermore, we compare our FPGAbased
design with a fully parallelized ASIC-targeted architecture
and a software implementation.
-
Mixed-Signal Design of a Digital Input Power Amplifier for Automotive Audio Applications [p. 212]
-
S. Saponara and P. Terreni
With reference to digital input power amplifier for
automotive audio applications, the paper presents an
exhaustive exploration of the huge mixed-signal space to
find optimal trade-offs among different cost-functions:
distortion, efficiency, circuit complexity and sensitivity.
Different architectural solutions are modelled and
compared in a Simulink/Spice framework. All building
blocks (i.e. oversampling filter, noise shaping, type of
PWM modulation, type of feedback, power stage, LC
filter) are optimized considering the whole system
performance. A novel mixed-signal scheme is finally
derived and prototyped.
-
Automatic SystemC Design Configuration for a Faster Evaluation of Different Partitioning Alternatives [p. 217]
-
N. Bannow, K. Haug and W. Rosenstiel
In this paper we present a methodology that is based
on SystemC [1] for rapid prototyping to greatly enhance
and accelerate the exploration of complex systems to
optimize the system architecture. The approach
introduces a methodology to automatically configure
system components with regards to the memory mapping
of modules. The approach reduces the implementation
effort that in conventional approaches has to be done by
hand to re-assign and re-configure modules in a system.
This does not only save time for manual adaptation but
also reduces the chance to introduce errors like known
from complex manual modifications. The new approach
for automatic system configuration is derived as one of
the results and features that come along with the Module-Adapter
(MA) based approach that we have proposed in
different presentations [2], [3], [4]. One of the main
goals, our proposed methodology has to fulfill, are
industrial requirements such as applicability for complex
system development, integration of existing IP, improved
code quality and decreased development effort. The
automated system configuration as well as the whole MA
based approach greatly support the designers in the
concept phase to simulate a design before the
implementation starts.
-
Multi-Sensor Configurable Platform for Automotive Applications [p. 219]
-
L. Serafini, F. Carrai, T. Ramacciotti and V. Zolesi
This paper presents a configurable and generic
platform architecture suitable to interface several kinds of
sensors for automotive applications. A platform-based
design approach is pursued to reduce time-to-market. The
platform is essentially a library of hardware and software
reconfigurable resources. It is based on a microprocessor
core plus a set of analog and digital peripherals
dedicated to signal acquisition, data processing, storage
and transmission. A particular instance of this platform
has been developed. The prototype electronic board
produced is able to acquire temperature, humidity,
pressure and perform voltage/current measurements and
settings. The results achieved prove the validity of the
proposed approach in terms of system performance and
high reconfigurability of the generic platform.
Moderators: M. Heijligers, Philips Research, NL; L. Benini, DEIS - Bologna U, IT
-
Design and Implementation of a Modular and Portable IEEE 754 Compliant Floating-Point Unit [p. 221]
-
K. Karuri, R. Leupers, G. Ascheid, H. Meyr and M. Kedia
Multimedia and communication algorithms from embedded
system domain often make extensive use of
floating-point arithmetic. Due to the complexity and expense
of the floating-point hardware, final implementations
of these algorithms are usually carried out using
floating-point emulation in software, or conversion (manually
or automatically) of the floating-point operations to
fixed point operations. Such strategies often lead to semioptimal
and imprecise software implementation.
This paper presents the design and implementation of a
Floating-Point Unit (FPU) for anApplication Specific Instruction
set Processor (ASIP) suitable for embedded systems
domain. Using a state-of-the-art Architecture Description
Language (ADL) based ASIP design framework,
the FPU is implemented in such a modular way that it can
be easily adapted to any otherRISClike processor. The implemented
operations are fully compliant to the IEEE 754
standard which facilitates portable software development.
The benchmarking, in terms of energy, area and speed,
of the designed FPU highlights the trade-offs of having a
hardware FPU w.r.t. software emulation of floating-point
operations.
-
A Novel FPGA-Based Implementation of Time Adaptive Clustering for Logical Story Unit Segmentation [p. 227]
-
S. Arifin and P. Y. K. Cheung
Time Adaptive Clustering (TAC) is a cognitive Logical
Story Unit (LSU) segmentation algorithm that is found to
show good and consistent results. This paper presents an efficient
hardware implementation for approximating the TAC
algorithm. The design consists of three main blocks. The
first block generates similarity values needed in the clustering
process. To take full advantage of the parallelism of
Field Programmable Gate Arrays (FPGA) devices, a video
shot sequence is divided into subsets and processed in parallel
by the second block. The third block combines all the
output results of each subset. The design is implemented on
a Xilinx Virtex-II xc2v3000 on board a RC203E board and
it runs 27 times faster than a Pentium 4-based PC at 3.4
Ghz.
-
ASIP Design and Synthesis for Non Linear Filtering in Image Processing [p. 233]
-
L. Fanucci, M. Cassiano, S. Saponara, D. Kammler, E. M. Witte, O. Schleibusch, G. Ascheid, R. Leupers
and H. Meyr
This paper presents an Application Specific Instruction
Set Processor (ASIP) design for the implementation of a
class of nonlinear image processing algorithms, the
Retinex-like filters. Starting from high level descriptions,
first algorithmic optimization is accomplished. Then a
processor architecture and an instruction set are
customized with special respect to the algorithmic
computations in order to achieve the specified timing at
reasonable complexity. Taking advantage of the
programmability of processor architectures, the
flexibility of the system is increased, involving e.g.
dynamic parameter adjustment and color treatment.
ASIP implementation results in 0.13 μm CMOS
technology are presented.
-
A 124.8Msps, 15.6mW Field-Programmable Variable-Length Codec for Multimedia Applications [p. 239]
-
C. Yeh, C.-C. Wang, L.-C. Lee and J.-S. Wang
Variable-length coding is one of the key
compression methods for multimedia bitstreams. To
accommodate new or user-defined variable-length codes
(VLC) for maximal compressions in various applications,
we propose a variable-length codec that supports field
programmability along with very competitive
performance indices. The design has 33% less
transistors than its field-programmable predecessor.
Moreover, measurement on the real chip demonstrates
that the design is capable of processing 124.8
mega-symbols (Msym) per second for MPEG4, while
consuming only 15.6mW at 1.4V. When measured by
μW/Msym, the realized variable-length codec is even 5%
better than the state-of-the-art non-programmable
MPEG2 variable-length decoder that hardwires the
entire design into random logic.
-
The Vector Fixed Point Unit of the Synergistic Processor Element of the Cell Architecture Processor [p. 244]
-
N. Maeding, J. Leenstra, J. Pille, R. Sautter, S. Buettner, S. Ehrenreich and W. Haller
A Vector Fixed Point Unit (FXU) is designed to speed up
multi-media processing. The FXU implements SIMD style
integer arithmetic and permute operations. The adder,
rotator and permute structure enables the use of static
circuits only. The FXU was fabricated using IBM 90nm
CMOS SOI technology.
-
Design and Test of Fixed-Point Multimedia Co-Processor for Mobile Applications [p. 249]
-
J.-H. Sohn, J. H.-Woo, J. Yoo and H.-J. Yoo
In this research, a fixed-point multimedia co-processor is
designed and tested into an ARM-10 based mobile
graphics processor for portable 2-D and 3-D multimedia
applications. The fixed-point co-processor architecture
with dual operations realizes advanced 3-D graphics
algorithms and various streaming multimedia functions
in a single hardware while consuming low power. The
instruction-wise clock gating on fixed-point SIMD
datapath allows fine-grained power control in
application-specific manner. The co-processor takes
10.2mm2 in 0.18μm 6-metal standard CMOS logic
process and achieves 50Mvertices/s graphics
performance with 75.4mW power consumption. The
implemented chip is successfully demonstrated on the
development board equipped with software graphics
library and evaluation environment.
|