Seminars by CECS

PhD Defense: Transaction-Level Modeling of Deep Neural Networks for Efficient Parallelism and Memory Accuracy

Name: Emad Arasteh

Chair: Prof. Rainer Doemer

Date: July 7, 2022

Time: 10:00 am

Location: EH 3206

Committee: Prof. Fadi Kurdahi, Prof. Ian Harris

Title: Transaction-Level Modeling of Deep Neural Networks for Efficient Parallelism and Memory Accuracy


The emergence of data-intensive applications, such as Deep Neural Networks (DNNs), exacerbates the well-known memory bottleneck in computer systems and demands early attention in the design flow. Electronic System-Level (ESL) design using SystemC Transaction Level Modeling (TLM) enables effective performance estimation, design space exploration, and gradual refinement. In this dissertation, we present our exploratory modeling framework for hardware-software codesign based on SystemC TLM with particular focus on exposing parallelism and memory contention. We demonstrate the effectiveness of our approach for representative complex DNNs such as GoogLeNet and Single Shot MultiBox Detector.

First, we study the impact of communication mechanisms on the available parallelism in TLM models. Specifically, we demonstrate the impact of varying synchronization mechanisms and buffering schemes on the exposed parallelism using different modeling styles of a DNN. We measure the performance of aggressive out-of-order parallel discrete event simulation and analyze the available parallelism in the models. Our study suggests that increased parallel simulation performance indicates better models with higher amounts of parallelism exposed.

Second, we explore the critical aspects of modeling and analysis of timing accuracy and memory contention. A major hurdle in tackling the memory bottleneck is the detection of memory contention late in the design cycle once detailed timed or cycle-accurate models are developed. A memory bottleneck detected at such a late stage can severely limit the available design choices or even require costly redesign. To explore new architectures prior to RTL implementation, we propose a novel TLM-2.0 loosely-timed contention-aware (LT-CA) modeling style that offers high-speed simulation close to traditional loosely-timed (LT) models, yet shows the same accuracy for memory contention as low level approximately-timed (AT) models.

Finally, we refine further the TLM-2.0 AT model by adding a cycle-accurate model of a memory subsystem. This model provides a higher timing accuracy for contention analysis, hence it gives a more accurate estimation of the performance. We revise our LT-CA memory delay modeling to provide further accuracy comparable to the cycle-accurate TLM model of the shared memory subsystem. The high amount of contention on the shared memory suggests new processor architectures with local memories.

Short Bio: 

Emad Arasteh is a Ph.D. candidate in Computer Engineering at University of California, Irvine and a graduate researcher at the Center for Embedded and Cyber-physical Systems (CECS). He received the M.Sc. degree in Electronic Design from Lund University in Sweden. His current research interests include system-level modeling and design of embedded systems, scalable memory system design, and massively-parallel processor architecture. Previously, he worked as a hardware and software engineer at semiconductor companies in Sweden and USA including Ericsson, Canon (Axis Communications) and Samsung.

PhD Defense: Internet of Things-Based Collaborative Position-Sensing Systems for Cardiopulmonary Resuscitation and Indoor Localization

Name: Chen, Hsin-Chung “Andrew”

Chair: Pai Chou

Date:  Wednesday, March 16, 2022

Time: 9:00 am


Committee: Nader Bagherzadeh, Rainer Doemer

Title: Internet of Things-Based Collaborative Position-Sensing Systems for Cardiopulmonary Resuscitation and Indoor Localization


The ubiquitous IoT devices have promoted the growth of applications that require target positioning. While precise positioning remains a challenge for IoT devices, existing methods mainly focus on optimizing the positioning algorithms or expanding the sensing modalities. One opportunity area of improvement is to leverage their wireless communication capability to exchange sensing data of different modalities with nearby devices. This work proposes collaborative sensing frameworks to enhance the accuracy of position inference.

Our approach to collaborative sensing is divided into selection of reliable devices, the data-exchange mechanism, and the data-fusion algorithm. We demonstrate our approach on two different position-sensing systems: one for chest compression-depth estimation and one for indoor pedestrian localization. The key components used in the frameworks include quality assessment of sensor data using mutual information, data-fusion algorithms based on the chest-compression model and the pedestrian-encounter model, and the data-exchange mechanism using Bluetooth Low Energy (BLE). A real-time position-estimation method based on our chest-compression model is proposed to remove the noise and handle the cumulative error. A collaborative conditional random field algorithm is developed to reduce the convergence distance in the localization estimation. Experimental results show our collaborative-sensing approach outperforms the existing solutions in terms of real-time estimated position.

This dissertation proposes an alternative approach to optimizing the position-sensing systems through collaborative sensing. Our work represents the first step towards the universal positioning on IoT devices and provides research opportunities for further improvement and exploration.

Short Bio: Hsinchung Chen is a Ph.D. candidate in computer engineering. His research interests include signal processing, machine learning, and embedded system design.

PhD Defense: Scalable Scientific Computation Acceleration Using Hardware-Accelerated Compression

Name: Gongjin Sun

Chair: Sang Woo Jun

Date: March 7, 2022

Time: 1:00 pm

Location: Zoom

Committee: Sang Woo Jun, Nikil Dutt, and Elaheh (Eli) Bozorgzadeh

Title: Scalable Scientific Computation Acceleration Using Hardware-Accelerated Compression


Hardware accelerators such as GPUs and FPGAs can often provide enormous computing capabilities and power efficiency, as long as the working set fits in the on-board memory capacity of the accelerator. But if the working set does not fit, data must be streamed from the larger host memory or storage, causing performance to be limited by the slow communication bandwidth between the accelerator and the host. While compression is an effective method to reduce data storage and movement overhead, it has not been very useful in solving this issue due to efficiency and performance limitations. This is especially true for scientific computing accelerators with heavy floating-point arithmetic, because efficiently compressing floating-point numbers requires complex, floating-point specific algorithms.

This dissertation addresses the host-side bandwidth issue of accelerators, specifically FPGA accelerators, using a series of hardware-optimized compression algorithms. Since typical compression algorithms are not designed with efficient hardware implementation in mind, we explore and implement variants of existing algorithms for high performance and efficiency. We demonstrate the impact of our ideas using two classes of applications: Grid-based scientific computing, and high-dimensional nearest neighbor search. We have implemented a scientific computing accelerator platform (BurstZ+), which uses a class of novel error-controlled lossy floating-point compression algorithms (ZFP-V Series). We demonstrate that BurstZ+ can completely remove the host-accelerator communication bottleneck for accelerators. Evaluated against hand-optimized kernel accelerator implementations without compression, our single-pipeline BurstZ+ prototype outperforms an accelerator without compression by almost 4×, and even an accelerator with enough memory for the entire dataset by over 2×. We have also developed a near-storage high-dimensional nearest neighbor search accelerator (ZipNN) which uses a hardware-optimized group varint compression algorithm to remove the host-side communication bottleneck. Our ZipNN prototype outperforms an accelerator without compression by 6×, and even much costlier in-memory multithreaded software implementations by over 2×.

PhD Defense: Computation-Communication Co-Optimization in the Era of Networked Embedded Device

Name: Seyyed Ahmad Razavi Majomard

Chair: Eli Bozorgzadeh

Date: February 28, 2022

Time: 1:00 pm

Location: Zoom

Committee: Eli Bozorgzadeh (chair), Solmaz Kia, Ian Harris

Title: Computation-Communication Co-Optimization in the Era of Networked Embedded Device


Nowadays, networked embedded systems are widely being used. These systems consist of multiple embedded devices connected through wired or wireless networks to accomplish a goal, such as rescue missions and surveillance systems. Networked embedded systems are driven by the tight coordination between computational components, sensors and the interaction with each other. These devices have a limited computation and computation capacity, therefore, solely considering just the computation or just the communication in designing applications, may result in enormous communication or computation overhead. Therefore, computation-communication aware design is unavoidable in order to achieve a reliable and yet fast execution. In this dissertation, we explore a graph based methodology to decentralize an application among a team of cooperative agents (networked embedded systems). As a case study, we focus on UKF based Cooperative Localization, where there is a tightly coupled data dependency between agents. Cooperative localization (CL) is a popular method for localizing a team of communicating robots in GPS-denied environments. Fast and accurate localization is essential because a delayed estimation may lead to a mission failure. Therefore, we first distribute the UKF based CL among robots, then using a replication technique, we reduce the communication overhead. Computation replication increases the CPU workload on the agents. Therefore, we provide a method to selectively replicate the computation to minimize the overall CPU overhead considering a user defined application latency.

Due to the limited computation capacity of embedded devices, for applications such as DNN, the data has to be sent to a node with a stronger computation unit, such as a local server, edge, or cloud. In addition, in some systems, agents are not willing to share their information with each other, other than a trusted node (e.g. edge), therefore computation offloading is unavoidable. The edge serves multiple applications, and its computation resources are shared among those applications. For an edge with a fixed time slotted schedule, the data arrival time affects the system performance. The data frames might arrive after the allocated time slot on the edge, or arrive too early, so they need to wait on the edge until the computation resource becomes available. We provide a framework, based on a feedback loop, to reduce the data frame wait time on the edge by staggering the end device sensing time. The on-device computation/network delay varies, and computing the staggering time is challenging. In our framework, based on the arrival time distribution, the edge computes the Target arrival time that minimizes the overall wait time on the edge. The staggering time is the difference between the Target arrival time and the Estimated arrival time. Since the arrival time is noisy, the edge uses estimation methods to compute the Estimated arrival time. Then, the edge sends a command to the end device to stagger its sensing time. In this dissertation, we illustrated how communication-computation co-optimization can improve the performance of networked embedded systems. We provided solutions for application decentralization while considering the computation-communication tradeoff. Furthermore, we show that the coordination between networked embedded systems and the edge can improve responsiveness despite the network noise by just utilizing the data arrival times and a feedback loop.

PhD Defense: A Multiple Compiler Approach for Improved Performance and Efficiency

Name: Aniket Shivam

Chair: Alexander V. Veidenbaum

Date: May 25, 2021

Time: 3:30 PM

Location: Zoom

Committee: Alexander V. Veidenbaum, Alexandru Nicolau, and Tony Givargis

Title: A Multiple Compiler Approach for Improved Performance and Efficiency


Production compilers have achieved a high level of maturity in terms of generating efficient code. Compilers are embedded with numerous code optimization techniques, with special focus on loop nest optimizations, that have been developed over the last four decades. The code generated by any two production compilers can turn out to be very different based on pros and cons of their respective Intermediate Representation (IR), implemented loop transformations and their ordering, cost models used and even instruction selection (such as vector instructions) and scheduling. The compilers also need to predict the behavior of a multi-core processor which has complex pipelines, multiple functional units, complex memory hierarchy, etc. on the overall performance. Hence, the performance of produced code for a program segment by a given compiler may not necessarily be matched by other compilers. Additionally, there is no way of knowing how close a compiler gets to optimal performance or if there is any headroom for improvement.

The complexity and rigidity of the compilation process makes it very difficult to modify a given compiler to improve the performance of generated code for every case where it couldn’t produce the best possible code. Therefore, this thesis presents a compilation approach that turns the differences between compilation processes and performance optimizations in each compiler from a weakness to a strength. This approach is implemented as a novel compilation framework, the MCompiler. This meta-compilation framework allows different segments of a program to be compiled using an ensemble of compilers/optimizers and combined into a single executable. Utilizing the highest performing code for each segment, identified via Exploratory Search, can lead to a significant overall improvement in performance. The framework is shown to produce performance improvements for serial (including auto-vectorized code), auto-parallelized and hand-optimized (using OpenMP) parallel code

Next, this thesis explores the possibility of learning which compiler will produce the best code for a segment. This is accomplished using Machine Learning. The Machine Learning models learn about inherent characteristics of loop nests and then predict which code optimizer is the most suited for each loop nest in an application. These Machine Learning models are then incorporated into the MCompiler to predict the best code optimizer, during compilation, for each code segment of the application. This feature allows the MCompiler to replace the expensive Exploratory Search with Machine Learning predictions and still keep performance very close to the Exploratory Search.

Finally, this thesis expands the compilation approach to achieve energy efficiency on modern architectures. Prior research has advocated both for and against the hypothesis that optimizing for performance translates into optimizing for energy efficiency. No production compiler optimizes for energy efficiency directly, expecting optimizing for performance to translate into higher energy efficiency. Optimizing for performance is complex for recent generations of processors and, with automatic DVFS management in these processors, optimizing for energy efficiency would add another level of complexity for compilers with no guarantee of success. Using the MCompiler, this thesis shows how the performance-oriented compiler optimizations can be used to achieve energy efficiency.

PhD Defense: Data -Driven Modeling and Analysis for Trustworthy Cyber-Physical Systems

Name: Sina Faezi

Chair: Dr. Mohammad Al Faruque

Date: April 21, 2021

Time: 10:00 AM

Location: Zoom

Committee: Dr. Mohammad Al Faruque (UCI), Dr. Philip Brisk (UCR), Dr. Zhou Li (UCI)

Title: “Data-Driven Modeling and Analysis for Trustworthy Cyber-Physical Systems”


In the age of digitization, a layer of cyber software sits on a hardware circuit and controls the physical systems around us. The tight integration of cyber and physical components is referred to as Cyber-Physical Systems (CPS). The interaction between cyber and physical components brings unique challenges which traditional modeling tools struggle to resolve. Particularly, they often fail to model the unintentional physical manifestation of cyber-domain information flows (side-channel signals) which may result in trust issues in the system.

In this thesis, we take a data-driven approach to model a CPS behavior when it is exposed to various information flows. First, we demonstrate how it is possible to extract valuable cyber-domain information by recording the acoustic noise generated by a DNA synthesizer. Then, we consider an integrated circuit as a CPS by itself and monitor the chip through electromagnetic and power side-channels to detect hardware Trojans (HT) in the chip.

HT is a malicious modification of the hardware implementation of a circuit design which may lead to various security issues over the life-cycle of a chip. One of the major challenges for HT detection is its reliance on a trusted reference chip (a.k.a golden chip). However, in practice, manufacturing a golden chip is costly and often considered infeasible.  This thesis investigates a creative neural network  design and training methodology which eliminates the need for a golden chip. Furthermore, it proposes using hierarchical temporal memory (HTM) as a data driven approach which can be updated over the chip’s life-cycle and uses that for run-time HT detection.

Bio: Sina Faezi is a Ph.D. candidate in computer engineering at the University of California, Irvine (UCI).  He works under Professor M. Al Faruque in the Autonomous and Intelligent Cyber-Physical Systems (AICPS) laboratory on the topic of data-driven modeling and analysis for Cyber-Physical systems. He creates data-driven models and then uses them to tackle practical issues like durability, security, process control, etc., in cyber-physical systems.  Through his Ph.D.,  he has published numerous articles in prestigious conferences and has received Broadcom Foundation Graduate Engineering fellowship. He has completed his B.Sc. in electrical engineering at the Sharif University of Technology in 2015 and has received his M.S. degree in computer engineer from UCI in 2017.


PhD Defense: One-Class Classification with Hyperdimensional Computing

Name: Neftali D. Watkinson Medina

Chair: Alexandru Nicolau

Date: December 7, 2020

Time: 11:00am

Location: Zoom

Committee: Alexandru Nicolau, Alexander Veidenbaum, and Tony Givargis

Title: One-Class Classification with Hyperdimensional Computing


Contemporary research in cognitive and neurological sciences confirms that human brains perform object detection and classification by identifying membership to a single class. When observing a scene with various objects, we can quickly point out and answer queries about the object we recognize, without needing to know what the unknown objects are. Within the field of machine learning (ML), the closest algorithm that emulates this behavior is one-class classification (OCC). With this approach, models are trained using samples of a single class in order to identify membership or anomalies from query instances. However, research about OCC is scarce and most approaches focus on repurposing models that were designed for binary or multi-class classification, resulting in suboptimal performance. A novel, neuro-inspired approach to computing, called Hyperdimensional (HD) computing, promises to be closer than traditional approaches to how humans encode information. With HD computing we have the opportunity to design OCC models without having to manipulate multi-class models. This makes for a more straightforward approach that can be easily tuned to the problem requirements.

In this dissertation I present Hyperdimensional One-class classification (HD-OCC). The modeling approach uses the power of HD computing to identify anomalies among sampled data. Also, I discuss how hyperdimensional encoding works for OCC. The encoding process is similar to those used in multi-class classification and can be reused across models.

HD-OCC is tested using three different use case scenarios. The first focuses on predicting future diagnosis of type 2 diabetes among members of the Pima Indian community. This experiment illustrates the impact of linear encoding within HD-OCC and provides a baseline comparison against ML algorithms. The second experiment uses patient data to model sepsis and predict septic shock in patients within the intensive care unit. This real-case scenario adds a different challenge in introducing sequential features to the dataset. Finally, HD-OCC is applied towards image processing by using pulmonary CT scans to detect patients with anomalies, including detecting patients with a COVID-19 infection. The results show that HD-OCC performs well and that it is versatile enough to be applied to different types of input. Also, that HD computing is a promising framework to drive research closer towards true artificial intelligence.


PhD Defense: Security Modeling and Analysis for Emerging Intelligent Transportation Systems

Name: Anthony Lopez

Chair: Dr. Mohammad Al Faruque

Date: December 4, 2020

Time: 3:00PM – 5:00PM

Location: Zoom

Committee: Dr. Mohammad Al Faruque (Chair), Dr. Wenlong Jin and Dr. Zhou Li

Title: “Security Modeling and Analysis for Emerging Intelligent Transportation Systems”


Massive deployment of embedded systems including various sensors, on-board and road-side computing units, wireless communication among vehicles and infrastructure via enabling technology of the Internet of Things (IoT), and intelligent algorithms are changing the transportation sector, leading to novel systems known as Intelligent Transportation Systems (ITS). However, with these newer technologies come unforeseen safety and security concerns. In spite of the recent interest and importance of ITS security, there have been few efforts to consolidate, structure, and unify this large body of research. There has also been an increasing divergence between academic research and industrial practice in the area, each of which has evolved independently with little interaction and in some cases with little understanding of the assumptions, issues,  trade-offs, and scales considered by the other. In addition to a lack of a clear consolidation and summary of related ITS security works, research on modeling/analysis tools for ITS security is also lacking.

For these reasons, this dissertation tackles these challenges by providing 1) a consolidation in ITS security research in terms of both V2X and IoT aspects (with a focus on battery systems) and 2) two methodologies to model and analyze the performance of ITS under attacks. Both methodologies are designed to be standalone open-sourced tools that ITS designers, engineers, and researchers may utilize to promote the growth of ITS security The first methodology focuses on modeling attacks and analyzing their impacts on vulnerable connected Fixed-Time Traffic Signal Control Systems. The second methodology is presented hand-in-hand with an attack taxonomy that focuses on a more advanced ITS system use-case called Vehicular Communication (V2X) Advisory Speed Limit Control (ASL) and involves the study of various attack types on different components of the ITS.

Bio: Anthony Lopez is a Ph.D. student studying Computer Engineering at the University of California Irvine (UCI), USA in the Embedded and Cyber-Physical Systems Lab under Professor Mohammad Al Faruque. He earned a B.S. from UC San Diego, USA and an M.S. from UC Irvine, both in Computer Engineering. His research focuses on the secure design, modeling, analysis and simulation of cyber-physical transportation systems. He is an IEEE student member and an NSF Graduate Research Fellowship Program awardee.


PhD Defense: Programmable Accelerators for Lattice-based Cryptography

Name: Hamid Nejatollahi

Advisor: Nikil Dutt

Date: June 11, 2020

Time: 10:00 AM

Committee: Ian Harris, Rainer Doemer

Thesis: “Programmable Accelerators for Lattice-based Cryptography”


Advances in computing steadily erode computer security at its foundation, calling for fundamental innovations to strengthen the weakening cryptographic primitives and security protocols.  While many alternatives have been proposed for symmetric key cryptography and related protocols (e.g., lightweight ciphers and authenticated encryption), the alternatives for public-key cryptography are limited to post-quantum cryptography primitives and their protocols. In particular, lattice-based cryptography is a promising candidate, both in terms of foundational properties, as well as its application to traditional security problems such as key exchange, digital signature, and encryption/decryption.  At the same time, the emergence of new computing paradigms, such as Cloud Computing and Internet of Everything, demand that innovations in security extend beyond their foundational aspects, to the actual design and deployment of these primitives and protocols while satisfying emerging design constraints such as latency, compactness, energy efficiency, and agility. In this thesis, we propose a methodology to design programmable hardware accelerators for lattice-based algorithms and we use the proposed methodology to implement flexible and energy-efficient post-quantum cache- and DMA-based accelerators for the most promising submissions to the  NIST  standardization contest.   We validate our methodology by integrating our accelerators into an HLS-based SoC infrastructure based on the X86 processor and evaluate overall performance.  In addition, we adopt the systolic architecture to accelerate the polynomial multiplication, which is the heart of a subset of LBC algorithms (i.e., ideal LBC), on the field-programmable gate arrays (FPGAs).  Finally, we propose a high-throughput Processing In-Memory (PIM) accelerator for the number-theoretic transform (NTT-) based polynomial multiplier.


PhD Defense: Approximate and Bit-width Con gurable Arithmetic Logic Unit Design for Deep Learning Accelerator

Name: Xiaoliang Chen

Chair: Prof. Fadi Kurdahi

Date: June 2, 2020

Time: 10:00 AM

Location: Zoom

Committee: Prof. Fadi J. Kurdahi (Chair), Prof. Ahmed M. Eltawil and Prof. Rainer Doemer

Title: “Approximate and Bit-width Con gurable Arithmetic Logic Unit Design for Deep Learning Accelerator”


As key building blocks for digital signal processing, image processing and deep learning etc, adders, multi-operand adders and multiply-accumulator units (MAC) have drawn lots of attention recently. Two popular ways to improve arithmetic logic unit (ALU) performance and energy efficiency are approximate computing and precision scalable design. Approximate computing helps achieve better performance or energy efficiency by trading accuracy. Precision scalable design provides the capability of allocating just-enough hardware resources to meet the application requirements.

In this thesis, we first present a correlation aware predictor (CAP) based approximate adder, which utilizes spatial-temporal correlation information of input streams to predict carry-in signals for sub-block adders. CAP uses less prediction bits to reduce the overall adder delay. For highly correlated input streams, we found that CAP can reduce adder delay by ~23.33% and save ~15.9% area at the same error rate compared to prior works.

Inspired by the success of approximate multipliers using approximate compressors, we proposed a pipelined approximate compressor based speculative multi-operand adder (AC-MOA). All compressors are replaced with approximate ones to reduce the overall delay of the bit-array reduction tree. An efficient error detection and correction block is designed to compensate the errors with one extra cycle. Experimental results showed the proposed 8-bit 8-operand AC-MOA achieved 1.47X ~ 1.66X speedup over conventional baseline design.

Recent research works on deep learning algorithms showed that bit-width can be reduced without losing accuracy. To benefit from the fact that bit-width requirement varies across deep learning applications, bit-width configurable designs can be used to improve hardware efficiency. In this thesis a bit-width con gurable MAC (BC-MAC) is proposed. BC-MAC uses a spatial-temporal approach to support variable precision requirements for both activations and weights. The basic processing element (PE) of BC-MAC is a multi-operand adder. Multiple multi-operand adders can be combined together to support input operands of any precision. Bit-serial summation is used to accumulate partial addition results to perform MAC operations. Booth encoding is employed to further boost the throughput. Synthesis results on TSMC 16nm technology and simulation results show the proposed MAC achieves higher area efficiency and energy efficiency than the state-of-the-art designs, making it a promising ALU for deep learning accelerators.

PhD Defense: Understanding and Guaranteeing Security, Privacy, and Safety of Smart Homes

Name: Rahmadi Trimananda

Chair: Professor Brian Demsky

Date: Thursday, August 20, 2020

Time:11:00 AM – 01:00 PM

Location: Zoom

Committee: Dr. Brian Demsky, Dr. Athina Markopoulou, Dr. Harry Xu

Title: Understanding and Guaranteeing Security, Privacy, and Safety of Smart Homes


Smart homes are becoming increasingly popular. Unfortunately, they come with security, privacy, and safety issues.  In this work, we explore new methods and techniques to better understand and guarantee security, privacy, and safety of smart homes. To tackle the existing problems, we view smart home from 3 different sides: devices, platforms, and apps.

On the devices side, we discovered that smart home devices are vulnerable to passive inference attacks based on network traffic, even in the presence of encryption. We first present this passive inference attack and our techniques that we developed to exploit this vulnerability on smart home devices.  We created PingPong,  a tool that can automatically extract packet-level signatures for device events (e.g., light bulb turning ON/OFF) from network traffic.  We evaluated PingPong on popular smart home devices ranging from smart plugs and thermostats to cameras, voice-activated devices, and smart TVs. We were able to:  (1) automatically extract previously unknown signa-tures that consist of simple sequences of packet lengths and directions;  (2) use those signatures to detect the devices or specific events with an average recall of more than 97%;  (3) show that the signatures are unique among hundreds of millions of packets of real world network traffic; (4) show that our methodology is also applicable to publicly available datasets; and (5) demonstrate its robustness in different settings: events triggered by local and remote smartphones, as well as by home-automation systems. Furthermore, we also present existing techniques (e.g., packet padding) as possible defenses against passive inference attacks and their analyses.

On the platforms side, smart home platforms such as SmartThings enable homeowners to manage devices in sophisticated ways to save energy, improve security, and provide conveniences. Unfortunately, we discovered that smart home platforms contain vulnerabilities, potentially impacting home security and privacy.  Aside from the traditional defense techniques to enhance the security and privacy of smart home devices, we also created Vigilia, a system that shrinks the attack surface of smart home IoT systems by restricting the network access of devices.  As existing smart home systems are closed, we have created an open implementation of a similar programming and configuration model in Vigilia and extended the execution environment to maximally restrict communications by instantiating device-based network permissions. We have implemented and compared Vigilia with forefront IoT-defense systems; our results demonstrate that Vigilia outperforms these systems and incurs negligible overhead.

On the apps side, smart home platforms allow developers to write apps to make smart home devices work together to accomplish tasks, e.g., home security and energy conservation—smart home devices provide the convenience of remotely controlling and automating home appliances.  A smarthome app typically implements narrow functionality and thus to fully implement desired function-ality homeowners may need to install multiple apps. These different apps can conflict with each other and these conflicts can result in undesired actions such as locking the door during a fire. We study conflicts between apps on Samsung SmartThings, the most popular platform for developing and deploying smart home IoT devices. By collecting and studying 198 official and 69 third-party apps, we found significant app conflicts in 3 categories: (1) close to 60% of app pairs that access the same device, (2) more than 90% of app pairs with physical interactions, and (3) around 11% of app pairs that access the same global variable. Our results suggest that the problem of conflicts between smart home apps is serious and can create potential safety risks.  We then developed an automatic conflict detection tool that uses model checking to automatically detect up to 96% of the conflicts.


PhD Defense: Advancing Compiler and Simulator Techniques for Highly Parallel Simulation of Embedded Systems

Name: Zhongqi Cheng

Chair: Prof. Rainer Doemer

Date: May 29th, 2020

Time: 02:00 PM

Location:  Zoom

Committee: Rainer Doemer (Chair), Prof. Mohammad Al Faruque and Prof. Aparna Chandramowlishwaran

Title: “Advancing Compiler and Simulator Techniques for Highly Parallel Simulation of Embedded Systems”


As an Electronic System Level (ESL) design language, the IEEE SystemC standard is widely used for testing, validation and verification of embedded system models. Discrete Event Simulation (DES) has been used for decades as the default SystemC simulation semantic. However, due to the sequential nature of DES, Parallel DES has recently gained an increasing amount of attention for performing high speed simulations on parallel computing platforms. To further exploit the parallel computation power of modern multi- and many-core platforms, Out-of-order Parallel Discrete Event Simulation (OoO PDES) has been proposed. In OoO PDES, threads comply with a partial order such that different simulation threads may run in different time cycles to increase the parallelism of execution. The Recoding Infrastructure for SystemC (RISC) has been introduced as a tool flow to fully support OoO PDES.

To preserve the SystemC semantics under OoO PDES, a compiler based approach statically analyzes the race conditions in the input model. However, there are severe restrictions: the source code for the input design must be available in one file, which does not scale. This disables the use of Intellectual Property (IP) and hierarchical file structures. In this dissertation, we propose a partial-graph based approach to scale the static analysis to support separate files and IP reuse. Specifically, the Partial Segment Graph (PSG) data structure is proposed and is used to abstract the behaviours and communication of modules within a single translation unit. These partial graphs are combined at top level to reconstruct the complete behaviors and communication of the entire model.

We also propose new algorithms to support the static analysis for modern SystemC TLM-2.0 standard. SystemC TLM-2.0 is widely used in industrial ESL designs for better interoperability and higher simulation speed. However, it is identified as an obstacle for parallel SystemC simulation due to the disappearance of channels. To solve the problem, we propose a compile time approach to statically analyze potential conflicts among threads in SystemC TLM-2.0 loosely- and approximately-timed models. A new Socket Call Path (SCP) technique is introduced which provides the compiler with socket binding information for precise static analysis. Based on SCP, an algorithm is proposed to analyze entangled variable pairs for automatic and accurate conflict analysis.

Besides the works on the compiler side, we focus as well on increasing the simulation speed of OoO PDES. We observe that the granularity of the Segment Graph (SG) data structure used in static analysis has a high impact on OoO PDES. This motivates us to propose a set of coding guidelines for the RISC users to properly refine their SystemC model for a higher simulation speed.

Furthermore, in this dissertation, an algorithm is proposed to optimize directly the event delivery strategy in OoO PDES. Event delivery in OoO PDES was very conservative, which often postponed the execution of waiting threads due to unknown future behaviors of the SystemC model, and in turn became a bottleneck of simulation speed. The algorithm we propose takes advantage of the prediction of future thread behaviors, and therefore allows waiting threads to resume execution earlier, resulting in significantly increased simulation speed.

To summarize, the contributions of this dissertation include: 1) a scalable RISC tool flow for statically analyzing and protecting 3rd party IPs in models with multiple files, 2) an advanced static analysis approach for modern SystemC TLM-2.0 models, 3) a set of coding guidelines for RISC users to achieve higher simulation speed, and 4) a more efficient event delivery algorithm in OoO PDES scheduler using prediction information.

Together, these compiler and simulator advances enable OoO PDES for larger and modern model simulation and thus improve the design of embedded systems significantly, leading to better devices at lower cost in the end.


PhD Defense: Runtime Resource Management of Emerging Applications in Heterogeneous Architectures

Name: Kasra Moazzemi

Advisor: Nikil Dutt

Date: May 28, 2020

Time: 12:00 PM

Location: Zoom

Title: “Runtime Resource Management of Emerging Applications in Heterogeneous Architectures”

Abstract: Runtime resource management for heterogeneous computing systems is becoming more and more complex as workloads in these platforms get increasingly more diverse and the conflicts grow between heterogeneous architectural components and their resource demands. The goal of these runtime resource management mechanisms is to achieve the overall system goal for dynamic workloads while coordinating system resources in a robust and adaptive fashion.

To address the complexities in heterogeneous computing systems, state-of-the-art techniques that use heuristics or machine learning have been proposed. On the other hand, conventional control theory can be used for formal guarantees, but may face unmanageable complexity for modeling system dynamics when dealing with heterogeneous computing platforms. In this thesis, we initially analyze variety of runtime resource management methods and introduce a classification for these methods capturing the utilized resources and metrics. We cover heuristic, machine learning and control theory methods used to manage resources such as performance, power, energy, temperature, Quality-of-Service (QoS) and reliability of the system.

In addition, we explore variety of dynamic resource management frameworks that provide significant gains in terms of self-optimization and self-adaptivity. This includes simulation infrastructures, hardware platforms enhanced with multi-layer management mechanisms and corresponding software frameworks that enable management policies for these systems in an effective and adaptive manner. Ultimately, we address the problem of optimizing energy efficiency, power consumption, performance and QoS in heterogeneous systems by proposing adaptive runtime policies. The proposed methods in this thesis, take into account the constraints and requirements defined by user, dynamic workloads and coordination between conflicting objectives. The projects presented in this dissertation show effectiveness in responding to abrupt changes in heterogeneous computing systems by dynamically adapting to changing application and system behavior at runtime, and are thus able to provide significant improvement compared to commonly used static resource management methods.


PhD Defense: Security Monitor for Mobile Devices: Design and Applications

Name: Saeed Mirzamohammadi

Chair: Ardalan Amiri Sani

Date: May 21, 2020

Time: 3pm

Location: Zoom

Committee: Ardalan Amiri Sani, Sharad Mehrotra, Gene Tsudik, Sharad Agarwal (MSR)

Title: Security Monitor for Mobile Devices: Design and Applications


Android’s underlying Linux kernel is rapidly becoming a more attractive target for attackers. In 2014, the number of reported bugs in the kernel is 4 percent of the overall bugs discovered in Android. This number has been drastically increased to 9 and 44 percent in 2015 and 2016, respectively. An attacker uses these kernel bugs to get kernel privilege and gain complete control of the mobile device.

In this talk, we present the Security Monitor, a small, trustworthy, and extensible software that provides different security services, with a small Trusted Computing Base (TCB). Security Monitor is designed and built based on two ARM hardware features: virtualization hardware and ARM TrustZone. The security services within the Security Monitor enforce certain privacy and security guarantees for the system. We demonstrate three end-to-end systems that leverage the Security Monitor to provide different security services. First, we present Viola that provides trustworthy sensor notifications using low-level checks in the Security Monitor. Second, we present Ditio that provides trustworthy auditing of sensor activities by recording the sensor activities in the Security Monitor. Third, we present Tabellion that provides the secure formation of electronic contracts by designing secure primitives in the Security Monitor.


PhD Defense: Brain Inspired Neural Network Models of Visual Motion Perception and Tracking in Dynamic Scenes

Name: Hirak Jyoti Kashyap

Chair: Jeffrey L. Krichmar

Date: May 15, 2020

Time: 12:00 PM Pacific Time

Location: Zoom

Committee: Jeffrey L. Krichmar, Nikil Dutt, Charless C. Fowlkes, Emre Neftci

Title: Brain Inspired Neural Network Models of Visual Motion Perception and Tracking in Dynamic Scenes


For self-driving vehicles, aerial drones and autonomous robots to be successfully deployed in the real-world, they must be able to navigate complex environments and track objects. While Artificial Intelligence and Machine Vision have made significant progress in dynamic scene understanding, they are not yet as robust and computationally efficient as humans or other primates in these tasks. For example, the current state-of-the-art visual tracking methods become inaccurate when applied to random test videos. We suggest that ideas from cortical visual processing can inspire real world solutions for motion perception and tracking that are robust and efficient. In this context, the thesis makes the following contributions. First, a method for estimating 6DoF ego-motion and pixel-wise object motion is introduced, based on a learned overcomplete motion field basis set. The method uses motion field constraints for training and a novel differentiable sparsity regularizer to achieve state-of-the-art ego and object-motion performances on benchmark datasets. Second, a Convolutional Neural Network (CNN) that learns hidden neural representations analogous to the response characteristics of dorsal Medial Superior Temporal area (MSTd) neurons for optic flow and object motion is presented. The findings suggest that goal driven training of CNNs might automatically result in the MSTd-like response properties of model neurons. Third, a recurrent neural network model of predictive smooth pursuit eye movements is presented that generates similar pursuit initiation and predictive pursuit behaviors as observed in humans. The model provides the computational mechanisms of formation and rapid update of an internal model of target velocity, widely attributed to zero lag tracking and smooth pursuit of occluded objects. Finally, a spike based stereo algorithm and its fully neuromorphic implementation is presented that reconstructs dynamic visual scenes at 400 frames-per-second with one watt of power consumption using the IBM TrueNorth processor. Taken together, the presented models and implementations demonstrate how the dorsal visual pathway in the brain performs efficient motion perception and informs ideas for efficient computational vision systems.