Seminars by CECS

PhD Defense: Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation

Name: Guantao Liu

Date: March 2, 2017

Time: 4:00 PM

Location: Engineering Hall 3206

Committee: Rainer Doemer (Chair), Kwei-Jay Lin, Mohammad Al Faruque


In hardware/software co-design, Discrete Event Simulation (DES) has been in use for decades to verify and validate the functionality of Electronic System Level (ESL) models. Since the parallel computing platforms are readily available today, many Parallel Discrete Event Simulation (PDES) approaches are proposed to improve the simulation performance. However, as the thread parallelism increases in ESL designs and core count multiplies on multi-core and many-core platforms, thread-to-core mapping becomes critical in PDES.

In this dissertation, we propose a computation- and communication-aware approach to optimize thread mapping for parallel ESL simulation, with the aims of load balancing and communication minimization. As we identify that the order of dispatching parallel threads has a significant influence on the total simulation time, and Longest Job First (LJF) shows better performance than the Linux default thread dispatch policy, we first propose a segment- aware LJF scheduler for PDES. Our segment-aware scheduler can accurately predict the run time of the thread segments ahead, and thus make better dispatching decisions. Next, we define the concept of core distance for multi-core and many-core architectures, which quantifies core-to-core communication latency and characterizes processor hierarchies. For many-core architectures using directory-based cache coherence protocols, we observe that core-to-core transfers are not always significantly faster than main memory accesses, and the core-to-core communication latency depends not only on the physical placement on the chip, but also on the location of the distributed cache tag directory. Thus, using a memory ping-pong benchmark, we quantify the core distance on a ring-network many-core platform and propose an algorithm to optimize thread-to-core mapping in order to minimize on-chip communication overhead. Altogether, based on a static analysis of communication patterns and core distance and a dynamic profiling of computation load, our proposed framework utilizes a heuristic graph partitioning algorithm and automatically generates an optimized thread mapping, which minimizes inter-chip communication overhead. In our systematic evaluation, our approach consistently shows a significant performance gain on top of the order-of-magnitude speedup of PDES.

The contributions of this dissertation include a segment-aware multi-core scheduler, core distance profiling, a communication-aware thread mapping framework, together with an open-source software package for Out-of-Order PDES.

PhD Defense: Low Power Reliable Design using Pulsed Latch Circuits

Name: Wael Mahmoud Elsharkasy

Date: February 15, 2017

Time: 11:00 AM

Location: Engineering Hall 3206

Committee: Prof. Fadi J. Kurdahi, Prof. Ahmed Eltawil, Rainer Doemer


System-on-Chip (SoC) faced lots of challenges over the past decade. With nowadays applications centered around Internet-of-Everything (IoE), these challenges are expected to be more critical. Among these challenges are the reduction of power consumption for better energy efficiency, the overcoming of different sources of variations to ensure reliable operation and the reduction of design area to reduce the cost and increase the integration. As a result, chip designers find themselves facing lots of problems, trying to build reliable systems that integrate complex level of functionality, on a minimum die size and with a limited power budgets. Among different circuit components in every chip, memory components are of great concern. They consume the majority of the chip area and power, in addition to affecting the entire chip performance and reliability. These include large memory arrays, caches, register files and different sequential elements in the logic paths. Sequential elements play an important and critical role in modern synchronous CMOS circuits. Indeed, they can represent up to 50% of the standard cells used in a chip. In addition, the power consumption of the clock tree, including these elements can be more than half of the total chip power. In addition, they come in the second place after memory to be affected by different sources of variation. Hence, efficient implementation of these elements is of great importance for the design of energy efficient and reliable integrated circuits. Pulsed latches have been proposed as efficient replacement of flip-flops in the implementation of sequential element. They can achieve higher performance when compared to traditional flip-flop, and can be designed to be smaller in area and more power efficient. However, the operation of pulsed latch is more sensitive to process, voltage and temperature (PVT) variations. In this thesis, we are proposing a methodology to study the reliability of pulsed latches and we have used it to evaluate the effect of PVT variations on their behavior. In addition, novel approaches to enhance the reliability of pulsed latches without significant degradation in performance, area or power are presented. Also, since sequential elements can be used to build small size register files, pulsed latch implementation of register files are discussed and compared to other traditional implementations, including SRAM and flip-flops. In addition, since multiport register files are very beneficial for quite few applications, novel implementations of multiport register files are also presented. The proposed implementation is proved to highly reduce the significant overhead in area, power and latency associated with the traditional way of designing multiport register files.

PhD Defense: Runtime Memory Management in Many-core Systems

Name: Hossein Tajik

Date: November 15, 2016

Time: 3:00PM – 4:00PM

Location: DBH 3011 Conference Room

Committee: Nikil Dutt (Chair), Tony Givargis, Alex Nicolau


With the increasing number of cores on a chip, we are moving towards an era where many-core platforms will soon be ubiquitous. Efficient use of tens to hundreds of cores on a chip and their memory resources comes with unique challenges.

In this dissertation, we propose SPMPool: a scalable platform for sharing Software Programmable Memories (SPMs). The SPMPool approach exploits underutilized memory resources by dynamically sharing SPM resources between applications running on different cores and adapts to the overall memory requirements of multiple applications that are concurrently executing on the many-core platform. We propose both central and distributed management schemes for SPMPool and study the efficiency of auction-based mechanisms in solving the memory mapping problem. We also propose offline and online memory phase detection methods in order to increase the adaptivity of memory management to temporal changes in memory requirements of a single application. The runtime memory management schemes proposed in this dissertation enable better performance and power for many-core systems.

PhD Defense: Scalable Runtime support for Edge-To-Cloud integration of Distributed Sensing Systems

Name: Brett Chien

Date: November 29, 2016

Time: 10:00AM

Location: EH 2210

Committee: Pai H. Chou (Chair)


While Internet-Of-Things (IoT) has drawn more attention to researchers and the public, to build a complete system from the edge sensing units to the cloud services requires massive amount of efforts. Researchers with strong interests in collected information are often lost in various technologies, including distributed sensing embedded systems, bridge devices between Internet and local network, and data backend services.

This work takes a cross-system, script-based, and semantic-enhanced approach to address the problem of lacking suitable runtime supports. We proposed a threaded code runtime support for edge sensing systems, a script based wrapper on Physical-to-Cyber bridges, and scalable middleware into the backend services.

With proposed runtime supports, we are able to apply distributed sensing systems into real world applications quickly and explorer insights from collected information. As a result, a building structure monitoring system is installed and allow civil researchers to develop algorithms to prevent disaster events. Body area sensing systems such as ECG monitoring, CO2 detection, and body movement are developed. This enables baby screening and detect potential heart problems. The results have shown that with proposed runtime supports applications can be realized quickly and scalable.

PhD Defense: Resource Aggregation for Collaborative Projected Video from Multiple Mobile Devices

Name: Hung Nguyen

Date: November 17, 2016

Time: 1:30 P.M.

Location: EH 3206

Committee: Fadi Kurdahi (Chair), Aditi Majumder (Co-Chair)


We explore and develop an embedded real time system and associated algorithms that enable an aggregation of limited resource, low-quality, projection-enabled mobile devices to collaboratively produce a higher quality video stream for a superior viewing experience. Such a resource aggregation across multiple projector enabled devices can lead to a per unit resource savings while moving the cost to the aggregate.

The pico-projectors that are embedded in mobile devices such as cell phones have a much lower resolution and brightness than standard projectors. Tiling (putting the projection area of multiple projectors in a rectangular array overlapping them slightly around the boundary) and superimposing (putting the projection area of multiple projectors right on top of each other) multiples of such projectors, registered via automated registration through the cameras residing within those mobile devices, result in different ways of aggregating resources across these multiple devices. Evaluation of our proof-of-concept system shows significant improvement for each mobile device in two primary factors of bandwidth usage and power consumption when using a collaborative federation of projection-embedded mobile devices.

The portable, low-power, light weight, small size pico-projectors are key components of projection-enabled mobile devices for the future. Due to the reduction of weight and dimension and the portability nature of the projector-enabled mobile devices, the calibrated integrated systems are prone to physical un-stabilizing of the projected image during the presentation. Thus the auto re-calibration and projected video stabilization features during the presentation time becomes essential requirements to enhance user experience. The design, algorithm, and implementation methods for these features will be presented in the second part of the dissertation.

PhD Defense: Specification and Runtime Verification of Distributed Multiprocessor Systems: Languages, Tools and Architectures

Name: Ahmed Nassar

Date: September 2, 2016

Time: 12:00 P.M.

Location: EH 3403

Committee: Fadi Kurdahi (Chair), Rainer Doemer, Ahmed Eltawil

Abstract: Post-Deployment runtime verification (RV) has recently emerged as a complementary technology to extend coverage of conventional software verification and testing methods. This thesis is an attempt to tackle three major barriers that need to be surmounted before RV technologies become in widespread use:
Barrier-1: Lack of an expressive, yet efficiently monitorable, specification language. Distributed software behavior is projected onto an observation interface consisting of data-carrying (or parameterized) events, such as Linux system calls including argument values, and self-replicating deterministic finite automata (SR-DFAs) are introduced for RV purposes as well as anomaly-based intrusion detection in embedded and general-purpose software systems based on these parametric traces.
Barrier-2: The substantial performance and power overhead of pure software RV frameworks. NUVA, which stands for nonuniform verification architecture, a distributed automata-based RV architecture for software specifications in the form of SR-DFAs. NUVA has been implemented over a cache-coherent nonuniform-memory-access (ccNUMA) multiprocessor and can be deployed on the FPGA fabric that will reside on all next-generation processor chips. The core of NUVA is a coherent distributed automata transactional memory (ATM) that efficiently maintains states of a dynamic population of automata checkers organized into a rooted dynamic directed acyclic graph (DAG) concurrently shared among all processor nodes.
Barrier-3: Formal specifications are hard to formulate and maintain for evolving complex embedded and general-purpose software systems. Therefore, specification mining has long ago been envisioned to play a key role in software verification, modification and documentation. However, in order to scale beyond simple, library/API-level properties having short temporal spans, specification mining tools need to support more expressive specification languages that can capture complex, application-level properties. This thesis introduces a bio-inspired complete specification mining methodology for SR-DFAs using an iterative and interactive mining tool, called ParaMiner. ParaMiner relies on novel mining algorithms invoking multiple-sequence alignment (MSA) techniques to enable learning specifications from temporal slices of software behavior while overcoming the initial-state uncertainty problem.
SR-DFAs and ParaMiner have been leveraged in a new specification-based intrusion detection (ID) framework that protects distributed, reactive computing systems against cyberattacks having very sparse signatures, arbitrarily long time spans and wide attack fronts. Such attacks lie outside the scope of conventional anomaly-based ID methods which typically work with short event windows and ignore manipulated data objects, such as files and sockets. We demonstrate the effectiveness of the constructed SR-DFAs at classifying as well as resolving subtle behaviors typical of cyberattacks with varying evasion parameter values.

PhD Defense:

Name:  Aras Pirbadian

Date: August 29, 2016

Time: 3:30pm

Location:  EH 3403

Committee: Ahmed Eltawil (Chair), Fadi Kurdahi, Rainer Doemer

With the continued scaling of chip manufacturing technologies, the significance of process variation in performance of the systems is increasing. Specifically, process variation results in growing voltage and frequency overhead margins required to ensure error free operation of circuits. However, the traditional practice of overdesigning the systems to cover process variation is no longer an efficient design methodology in an age with high demands for processing power and limited energy supplies. In this dissertation, a novel analytical model is proposed to predict the required margin accurately in the early stages of design space exploration. The model can be used to optimize the system overhead in error free calculations or to release the bound by full correctness in error tolerant parts of systems and optimize the energy vs. performance trade-off. Additionally, this model also considers the statistics of the inputs of the circuit as compared to other existing efforts enabling it to achieve close predictions of full circuit simulation results in a short time. This model is finally used in an adaptive carry select/ripple carry adder configuration to demonstrate the potential achievable power savings.

Growing variation in newer technology nodes is not always a negative side effect. The increased inherent randomness in the process manufacturing technology can be utilized to develop unique physically unclonable functions (PUFs). These functions are irreproducible hardware-based authenticating systems, which do not require memory-based storage. A low overhead delay-based PUF using the variation of the silicon manufacturing is also proposed in the second part of this work. The proposed PUF uses a simple and efficient structure to convert the randomness of the manufacturing process into random responses to fixed challenges in identically designed circuits.

PhD Defense: Progression and Edge Intelligence Framework for IoT Systems

Name:  Zhenqiu Huang

Date: July 29, 2016

Time: 10:00am

Location:  EH 4106

Committee: Kwei-Jay Lin (Chair), Fadi Kurdahi, Mohammad Al Faruque

Abstract: This thesis studies the issues on building and managing future Internet of Things (IoT)
systems. IoT systems consist of distributed components with services for sensing, processing,
and controlling through devices deployed in our living environment as part of the global
cyber-physical ecosystem.

Systems with perpetually running IoT devices may use a lot of energy. One challenge is to
implement good management policies for energy saving. In addition, a large scale of devices
may be deployed in wide geographical areas through low bandwidth wireless communication
networks. This brings the challenge of con

PhD Defense: Frameworks and Algorithms for Wearable Medical Applications

Name:  SeungJae Lee

Date:  April, 5, 2016

Time:  3:00p.m.

Location:  EH 4106

Committee:  Pai H. Chou (Chair), Fadi Kurdahi, Tony Givargis


Wearable embedded systems with sensing, communication, and computing capabilities have given rise to innovations in e-health and telemedicine in general. The scope of such systems ranges from devices and mobile apps to cloud backend and analysis algorithms, all of which must be well integrated. To manage the development, operation, and evolution of such complex systems, a framework systematic framework is needed. This dissertation makes contributions in two parts. First is a framework for defining the structure of a wide range of wearable medical applications with modern cloud support. The second part includes several algorithms that can be plugged into this framework for making these systems more efficient in terms of processing performance and data size. We propose a novel QT analysis algorithm that can take advantage of GPU as well as in a server-client environment, and we show competitive results in terms of both performance and energy consumption with or without parallelization. We also propose ECG compression techniques using trained overcomplete dictionary. After constructing the dictionary through learning process with a given dataset, the signal can be compressed by sparse estimation using the trained dictionary. We propose reconstructing ECG signal from undersampled data based on compressive sensing framework that can reconstruct the ECG signals precisely from fewer samples so long as the signal is sparse or compressible. Together, these algorithms operating in the context of our proposed framework validate the effectiveness of our structured approach to the framework for wearable medical application.


PhD Defense: Power Optimization for Medical Sensing Systems

Name:  Jun Luan

Location:  EH 4106

Date: February 24, 2016

Time: 1:00pm

Committee: Pai Chou (Chair), Mohammad Al Faruque, Fadi Kurdahi


Medical sensing systems collect and analyze the patients’ physiological data for monitoring, aid or diagnostic purposes. System designers are faced with stringent requirements on not only correctness and safety but also power. Reference designs and multi-purpose platforms help to significantly shorten the development cycle.

This work takes a cross-layer, system-level, platform-based approach to addressing the problem of saving power in a class of portable medical system. We propose a low-power medical sensing system that can be used to monitor Electrocardiography (ECG), Photo- plethysmogram (PPG), and muscle tension. It also includes a hand gesture recognition system to aid mobility-impaired patients.

We explore the theory and application of a compressive sensing framework to medical signal processing. A novel compressive sensing-based ECG compression algorithm and a dominant frequency extraction-based PPG heart-rate calculation algorithm are proposed to reduce the system power. The unique combination of hardware structure and software signal-processing algorithms makes low-power design possible. The system test results show that the proposed system is superior to existing works in terms of power consumption and system size.

PhD Defense: A Centralized IoT Middleware System for Devices Working Across Application Domains Using Self-descriptive Capability Profile

Name: Chengjia Huo

Location: EH 4404

Date: September 30, 2015

Time: 1pm

Committee: Pai H. Chou (chair), Phillip Sheu, Rainer Doemer


The Internet of Things (IoT) has been receiving growing attention in recent years as the next wave of computing revolution made possible by all types of networks of things (NoTs), where devices powered with low-cost, miniature low-power systems-on-chip (SoC) with computing and communication capabilities, and are bridged to the Internet with the assistance of gateways. More and more NoT device are designed to provide more than one functionalities to fulfill different requirements from the application domains. We believe that the true power of IoT is that functionalities of devices can work across application domains. In order to reveal the potential of IoT, the description of a device’s capability needs to represent the functionalities that the device can provide. We discover the previous solutions on describing a device’s capability focus mainly on hiding the vendor-specific interfaces made by different manufacturers, but they do not reflect different functionalities that a device provides. In this thesis, the concept of device capability profile is proposed. Different from the previous solutions, the device capability profile specified in the firmware of a device allows the device to work across different application domains. Together with device capability profile, a centralized IoT middleware framework, called rimware, is proposed. Rimware tracks every device’s capability and state in a centralized manner and provides different ways for application domains to query against the device’s functionalities. In addition, rimware utilizes the device capability profile to carry out the enforcement of the security and privacy throughout the communication with the devices. Moreover, tasks can be scheduled through the rimware which enables functionalities from multiple devices to work together to fulfill the requirements from application domains. Optimization is applied on cases that one device working for multiple task simultaneously. An implementation of rimware that is specifically designed for BLE devices, called BlueRim, which takes advantages of BLE’s very long battery life on the device side and the cloud functionality on the centralized side is provided. The fundamental features of rimware have been validated in several real-world applications from different different domains while incurring minimal code size and communication overhead on BLE devices. We believe that our approach represents an important technology in taking IoT closer to realizing the full potentials.

PhD Defense: CARL-SJR: A Socially Assistive Neurorobot for Autism Therapy and Research

Name: Ting-Shuo Chou

Date: May 21, 2015

Time: 3:00pm – 5:00pm

Location: Social & Behavioral Sciences Gateway 2200 Conference Room

Committee Chair: Jeffrey Krichmar (Chair), Nikil Dutt, Alexandru Nicolau


Neurodevelopmental disorders, such as Attention-Deficit–Hyperactivity
Disorder (ADHD) and Autism Spectrum Disorder (ASD), have core clinical
symptoms of inattention, hyperactivity, and impulsivity (often hyper-
and hypo- responsiveness. These symptoms are often accompanied by
reduced motor coordination and impaired sensory processing. We introduce
a Socially Assistive Robot (SAR) with the goal of automating therapy for
children with neurodevelopmental disorders. The novel robot, which is
called Cognitive Anteater Robotics Laboratory – Spiking Judgment Robot
(CARL-SJR), is designed for therapy and diagnosis. CARL-SJR is
autonomous and capable of tactile sensing and interaction. A spiking
neural network model and neurally inspired algorithms controls
CARL-SJR’s behavior. By providing a large tactile sensing surface that
encourages touching with hand movements, CARL-SJR especially addresses
impairments in tactile sensitivity and social interaction observed in
children with neurodevelopmental disorders. Using CARL-SJR, we conducted
a pilot study where children with different neurodevelopment disorders
show different behavioral metrics and tactile movements. The results
suggest CARL-SJR might serve as a diagnose tool for developmental
disorders. Second, we showed that the information carried by temporal
coding is higher than the traditional rate coding when decoding spike
trains in response to tactile movements. Third, we implemented online
learning capabilities on CARL-SJR, where the robot could associate a
user’s preferred color pattern displayed on the robot with the user’s
hand sweep across the robot’s body. The emerged behaviors and neural
activities in the SNN are consistent with biological recordings. The
underlying neural mechanism in the SNN also serves as an alternative
explanation of how brains encode timing and associate (or learning) two
temporal separated events.

PhD Defense: Ensuring Reliability and Fault-Tolerance for the Cyber-Physical System Design

Name: Volkan Gunes

Date: May 27, 2015

Time: 11:00AM – 12:00PM

Location: Donald Bren Hall 3013 Conference Room

Committee: Tony Givargis (Chair), Alexandru Nicolau, Ian Harris, Steffen Peter


The cyber-physical system (CPS) is a term describing a broad range of
complex, multi-disciplinary, physically-aware next generation engineered
systems that integrate embedded computing technologies (cyber part) into
the physical world. Sensors play an important role in this integration
because they provide the data extracted from the physical world for the
cyber systems to fulfill the decision making process. However, this
process is likely to be misled by incorrect data due to sensor fault

In this dissertation, the main focus is on sensor fault mitigation and
achieving high reliability in CPS operations. One of the challenges we
ponder is timely event (e.g., motion as a phenomenon) detection in CPS
under possible faulty sensor conditions. In this regard, our
demonstrative example of CPS is the falling ball example (FBE) using
binary event detectors (i.e., motion sensors), a controller, and a
camera for timely motion detection of a falling ball. Another challenge
we ponder is satisfying thermal comfort and energy efficiency under
certain faulty sensor conditions in a multi-room building incorporating
temperature sensors, controllers, and heating, ventilation, and air
conditioning (HVAC) systems as a CPS application. For both cases, we
adopt a model-based design (MBD) methodology to analyze the effect of
sensor faults on the desired system outcome. We specify well-defined
fault semantics for the event detectors and temperature sensors to make
the problem definition more clear. We provide a MATLAB/Simulink
simulation framework for our CPS examples. Besides having the
traditional CPS model that comprises the cyber, interface (e.g. sensors
and actuators) and physical models, we develop fault models and a system
evaluation model in Simulink and incorporate them into the CPS model.

We explore various techniques for fault mitigation in a holistic design
perspective. Therefore, the approaches presented in this study
contributes to the design of fault-tolerant CPSs. Furthermore,
considering compute demands of large scale CPSs, we introduce the XGRID
embedded many-core system-on-chip architecture. XGRID makes use of a
novel, FPGA-like, programmable interconnect infrastructure, offering
scalability and deterministic communication using hardware supported
message passing among cores. We provide a conceptual mapping of control
algorithms for the automation of a multi-room building onto target XGRID

Our findings regarding reliable CPS design show that the physical system
attributes (e.g., sensor placement and environmental effects) can be a
more dominant factor than the cyber system attributes on the system
outcome. In addition, sensor faults may lead to unsatisfactory system
outcome in CPSs since CPSs heavily rely on sensor readings for decision
making. Therefore, the analysis of temporal and spatial correlations
between sensor readings helps mitigate certain types of sensor faults
and enable CPSs to utilize sensors’ data more efficiently for decision

PhD Defense: Temperature-Aware Design for SoCs using Thermal Gradient Analysis

Name: Jun Yong Shin

Date: May 18th, 2015

Time: 2:00PM

Location: EH2430, Harut colloquia room

Committee Chair: Nikil Dutt


Over the last few decades, chip performance has increased steadily due to continuous and aggressive technology scaling. However, it leaves chips quite vulnerable to several issues at the same time; high power densities in some particular areas spread across a chip might result in hotspots and thermal gradients, and these can lead to permanent damage to the chip and also can reduce the reliability of the entire system using the chip. As a result, a large number of dynamic thermal management solutions have been proposed in recent years for use in multi-core architectures, and the accurate temperature information over the entire chip area has become indispensable especially for fine-grain dynamic thermal management solutions. Naturally, on-chip thermal sensors came to play an important role in providing the accurate information on the temperature distribution of a chip, but there still remain some issues regarding the allocation of on-chip thermal sensors; due to power, area and routing issues, it is preferable to limit the number of on-chip thermal sensors on a die, and their placement needs to be considered carefully in order to increase the accuracy of full-chip thermal profile reconstruction especially when just a small number of sensors can be implemented; due to the limited reading accuracy of low-power, small-sized on-chip thermal sensors, it would be better to have some way to improve their reading accuracy.

In this work, an issue will be firstly addressed regarding how to improve the reading accuracy of a low-power, small-sized on-chip thermal sensor such as Ring-Oscillator (RO) based sensors at runtime on a software level. Secondly, a question of how to allocate a proper number of sensors on a die in order to get the accurate full-chip scale thermal information on the run is addressed. Additionally, a temperature-aware routing for global interconnects to minimize the delay and also to reduce the probability of chip failure due to electromigration is presented at the end.

PhD Defense: Resilient On-Chip Memory Design in the Nano Era

Final Defense – Abbas Banaiyanmofrad

May 20, 2015
3pm – 5pm
Donald Bren Hall 3011 Conference Room

Nikil Dutt (Chair), Alex Nicolau, Alex Veidenbaum

Resilient On-Chip Memory Design in the Nano Era

Aggressive technology scaling in the nano-scale regime makes chips more susceptible to failures. This causes multiple reliability challenges in the design of modern chips, including manufacturing defects, wear-out, and parametric variations. By increasing the number, amount, and hierarchy of on-chip memory blocks in emerging computing systems, the reliability of the memory sub-system becomes an increasingly challenging design issue. Existing resilient memory design schemes are unable to effectively address the key features of scalability, interconnect-awareness, and cost-effectiveness for these platforms. In this thesis, we propose different approaches to address resilient on-chip memory design in computing systems ranging from traditional single-core processors to emerging many-core platforms. We classify our proposed approaches in five main categories: 1) Flexible and low-cost approaches to protect cache memories in single-core processors against permanent faults and transient errors, 2) Scalable fault-tolerant approaches to protect last-level caches with non-uniform cache access in chip multiprocessors, 3) Interconnect-aware cache protection schemes in network-on-chip architectures, 4) Application-aware memory resiliency for approximate computing era, and 5) System-level design space exploration, analysis, and optimization for redundancy-aware on-chip memory resiliency in many-core platforms. ​

In summary, the premise of this thesis is to provide multiple solutions in different layers of system hierarchy targeting a verity of architectures from embedded single-core microprocessors to emerging large many-core platforms to address cost-efficient error-resiliency of on-chip memory components.