Seminars by CECS

PhD Defense: A Multiple Compiler Approach for Improved Performance and Efficiency

Name: Aniket Shivam

Chair: Alexander V. Veidenbaum

Date: May 25, 2021

Time: 3:30 PM

Location: Zoom

Committee: Alexander V. Veidenbaum, Alexandru Nicolau, and Tony Givargis

Title: A Multiple Compiler Approach for Improved Performance and Efficiency


Production compilers have achieved a high level of maturity in terms of generating efficient code. Compilers are embedded with numerous code optimization techniques, with special focus on loop nest optimizations, that have been developed over the last four decades. The code generated by any two production compilers can turn out to be very different based on pros and cons of their respective Intermediate Representation (IR), implemented loop transformations and their ordering, cost models used and even instruction selection (such as vector instructions) and scheduling. The compilers also need to predict the behavior of a multi-core processor which has complex pipelines, multiple functional units, complex memory hierarchy, etc. on the overall performance. Hence, the performance of produced code for a program segment by a given compiler may not necessarily be matched by other compilers. Additionally, there is no way of knowing how close a compiler gets to optimal performance or if there is any headroom for improvement.

The complexity and rigidity of the compilation process makes it very difficult to modify a given compiler to improve the performance of generated code for every case where it couldn’t produce the best possible code. Therefore, this thesis presents a compilation approach that turns the differences between compilation processes and performance optimizations in each compiler from a weakness to a strength. This approach is implemented as a novel compilation framework, the MCompiler. This meta-compilation framework allows different segments of a program to be compiled using an ensemble of compilers/optimizers and combined into a single executable. Utilizing the highest performing code for each segment, identified via Exploratory Search, can lead to a significant overall improvement in performance. The framework is shown to produce performance improvements for serial (including auto-vectorized code), auto-parallelized and hand-optimized (using OpenMP) parallel code

Next, this thesis explores the possibility of learning which compiler will produce the best code for a segment. This is accomplished using Machine Learning. The Machine Learning models learn about inherent characteristics of loop nests and then predict which code optimizer is the most suited for each loop nest in an application. These Machine Learning models are then incorporated into the MCompiler to predict the best code optimizer, during compilation, for each code segment of the application. This feature allows the MCompiler to replace the expensive Exploratory Search with Machine Learning predictions and still keep performance very close to the Exploratory Search.

Finally, this thesis expands the compilation approach to achieve energy efficiency on modern architectures. Prior research has advocated both for and against the hypothesis that optimizing for performance translates into optimizing for energy efficiency. No production compiler optimizes for energy efficiency directly, expecting optimizing for performance to translate into higher energy efficiency. Optimizing for performance is complex for recent generations of processors and, with automatic DVFS management in these processors, optimizing for energy efficiency would add another level of complexity for compilers with no guarantee of success. Using the MCompiler, this thesis shows how the performance-oriented compiler optimizations can be used to achieve energy efficiency.

PhD Defense: Data -Driven Modeling and Analysis for Trustworthy Cyber-Physical Systems

Name: Sina Faezi

Chair: Dr. Mohammad Al Faruque

Date: April 21, 2021

Time: 10:00 AM

Location: Zoom

Committee: Dr. Mohammad Al Faruque (UCI), Dr. Philip Brisk (UCR), Dr. Zhou Li (UCI)

Title: “Data-Driven Modeling and Analysis for Trustworthy Cyber-Physical Systems”


In the age of digitization, a layer of cyber software sits on a hardware circuit and controls the physical systems around us. The tight integration of cyber and physical components is referred to as Cyber-Physical Systems (CPS). The interaction between cyber and physical components brings unique challenges which traditional modeling tools struggle to resolve. Particularly, they often fail to model the unintentional physical manifestation of cyber-domain information flows (side-channel signals) which may result in trust issues in the system.

In this thesis, we take a data-driven approach to model a CPS behavior when it is exposed to various information flows. First, we demonstrate how it is possible to extract valuable cyber-domain information by recording the acoustic noise generated by a DNA synthesizer. Then, we consider an integrated circuit as a CPS by itself and monitor the chip through electromagnetic and power side-channels to detect hardware Trojans (HT) in the chip.

HT is a malicious modification of the hardware implementation of a circuit design which may lead to various security issues over the life-cycle of a chip. One of the major challenges for HT detection is its reliance on a trusted reference chip (a.k.a golden chip). However, in practice, manufacturing a golden chip is costly and often considered infeasible.  This thesis investigates a creative neural network  design and training methodology which eliminates the need for a golden chip. Furthermore, it proposes using hierarchical temporal memory (HTM) as a data driven approach which can be updated over the chip’s life-cycle and uses that for run-time HT detection.

Bio: Sina Faezi is a Ph.D. candidate in computer engineering at the University of California, Irvine (UCI).  He works under Professor M. Al Faruque in the Autonomous and Intelligent Cyber-Physical Systems (AICPS) laboratory on the topic of data-driven modeling and analysis for Cyber-Physical systems. He creates data-driven models and then uses them to tackle practical issues like durability, security, process control, etc., in cyber-physical systems.  Through his Ph.D.,  he has published numerous articles in prestigious conferences and has received Broadcom Foundation Graduate Engineering fellowship. He has completed his B.Sc. in electrical engineering at the Sharif University of Technology in 2015 and has received his M.S. degree in computer engineer from UCI in 2017.


PhD Defense: One-Class Classification with Hyperdimensional Computing

Name: Neftali D. Watkinson Medina

Chair: Alexandru Nicolau

Date: December 7, 2020

Time: 11:00am

Location: Zoom

Committee: Alexandru Nicolau, Alexander Veidenbaum, and Tony Givargis

Title: One-Class Classification with Hyperdimensional Computing


Contemporary research in cognitive and neurological sciences confirms that human brains perform object detection and classification by identifying membership to a single class. When observing a scene with various objects, we can quickly point out and answer queries about the object we recognize, without needing to know what the unknown objects are. Within the field of machine learning (ML), the closest algorithm that emulates this behavior is one-class classification (OCC). With this approach, models are trained using samples of a single class in order to identify membership or anomalies from query instances. However, research about OCC is scarce and most approaches focus on repurposing models that were designed for binary or multi-class classification, resulting in suboptimal performance. A novel, neuro-inspired approach to computing, called Hyperdimensional (HD) computing, promises to be closer than traditional approaches to how humans encode information. With HD computing we have the opportunity to design OCC models without having to manipulate multi-class models. This makes for a more straightforward approach that can be easily tuned to the problem requirements.

In this dissertation I present Hyperdimensional One-class classification (HD-OCC). The modeling approach uses the power of HD computing to identify anomalies among sampled data. Also, I discuss how hyperdimensional encoding works for OCC. The encoding process is similar to those used in multi-class classification and can be reused across models.

HD-OCC is tested using three different use case scenarios. The first focuses on predicting future diagnosis of type 2 diabetes among members of the Pima Indian community. This experiment illustrates the impact of linear encoding within HD-OCC and provides a baseline comparison against ML algorithms. The second experiment uses patient data to model sepsis and predict septic shock in patients within the intensive care unit. This real-case scenario adds a different challenge in introducing sequential features to the dataset. Finally, HD-OCC is applied towards image processing by using pulmonary CT scans to detect patients with anomalies, including detecting patients with a COVID-19 infection. The results show that HD-OCC performs well and that it is versatile enough to be applied to different types of input. Also, that HD computing is a promising framework to drive research closer towards true artificial intelligence.


PhD Defense: Security Modeling and Analysis for Emerging Intelligent Transportation Systems

Name: Anthony Lopez

Chair: Dr. Mohammad Al Faruque

Date: December 4, 2020

Time: 3:00PM – 5:00PM

Location: Zoom

Committee: Dr. Mohammad Al Faruque (Chair), Dr. Wenlong Jin and Dr. Zhou Li

Title: “Security Modeling and Analysis for Emerging Intelligent Transportation Systems”


Massive deployment of embedded systems including various sensors, on-board and road-side computing units, wireless communication among vehicles and infrastructure via enabling technology of the Internet of Things (IoT), and intelligent algorithms are changing the transportation sector, leading to novel systems known as Intelligent Transportation Systems (ITS). However, with these newer technologies come unforeseen safety and security concerns. In spite of the recent interest and importance of ITS security, there have been few efforts to consolidate, structure, and unify this large body of research. There has also been an increasing divergence between academic research and industrial practice in the area, each of which has evolved independently with little interaction and in some cases with little understanding of the assumptions, issues,  trade-offs, and scales considered by the other. In addition to a lack of a clear consolidation and summary of related ITS security works, research on modeling/analysis tools for ITS security is also lacking.

For these reasons, this dissertation tackles these challenges by providing 1) a consolidation in ITS security research in terms of both V2X and IoT aspects (with a focus on battery systems) and 2) two methodologies to model and analyze the performance of ITS under attacks. Both methodologies are designed to be standalone open-sourced tools that ITS designers, engineers, and researchers may utilize to promote the growth of ITS security The first methodology focuses on modeling attacks and analyzing their impacts on vulnerable connected Fixed-Time Traffic Signal Control Systems. The second methodology is presented hand-in-hand with an attack taxonomy that focuses on a more advanced ITS system use-case called Vehicular Communication (V2X) Advisory Speed Limit Control (ASL) and involves the study of various attack types on different components of the ITS.

Bio: Anthony Lopez is a Ph.D. student studying Computer Engineering at the University of California Irvine (UCI), USA in the Embedded and Cyber-Physical Systems Lab under Professor Mohammad Al Faruque. He earned a B.S. from UC San Diego, USA and an M.S. from UC Irvine, both in Computer Engineering. His research focuses on the secure design, modeling, analysis and simulation of cyber-physical transportation systems. He is an IEEE student member and an NSF Graduate Research Fellowship Program awardee.


PhD Defense: Programmable Accelerators for Lattice-based Cryptography

Name: Hamid Nejatollahi

Advisor: Nikil Dutt

Date: June 11, 2020

Time: 10:00 AM

Committee: Ian Harris, Rainer Doemer

Thesis: “Programmable Accelerators for Lattice-based Cryptography”


Advances in computing steadily erode computer security at its foundation, calling for fundamental innovations to strengthen the weakening cryptographic primitives and security protocols.  While many alternatives have been proposed for symmetric key cryptography and related protocols (e.g., lightweight ciphers and authenticated encryption), the alternatives for public-key cryptography are limited to post-quantum cryptography primitives and their protocols. In particular, lattice-based cryptography is a promising candidate, both in terms of foundational properties, as well as its application to traditional security problems such as key exchange, digital signature, and encryption/decryption.  At the same time, the emergence of new computing paradigms, such as Cloud Computing and Internet of Everything, demand that innovations in security extend beyond their foundational aspects, to the actual design and deployment of these primitives and protocols while satisfying emerging design constraints such as latency, compactness, energy efficiency, and agility. In this thesis, we propose a methodology to design programmable hardware accelerators for lattice-based algorithms and we use the proposed methodology to implement flexible and energy-efficient post-quantum cache- and DMA-based accelerators for the most promising submissions to the  NIST  standardization contest.   We validate our methodology by integrating our accelerators into an HLS-based SoC infrastructure based on the X86 processor and evaluate overall performance.  In addition, we adopt the systolic architecture to accelerate the polynomial multiplication, which is the heart of a subset of LBC algorithms (i.e., ideal LBC), on the field-programmable gate arrays (FPGAs).  Finally, we propose a high-throughput Processing In-Memory (PIM) accelerator for the number-theoretic transform (NTT-) based polynomial multiplier.


PhD Defense: Approximate and Bit-width Con gurable Arithmetic Logic Unit Design for Deep Learning Accelerator

Name: Xiaoliang Chen

Chair: Prof. Fadi Kurdahi

Date: June 2, 2020

Time: 10:00 AM

Location: Zoom

Committee: Prof. Fadi J. Kurdahi (Chair), Prof. Ahmed M. Eltawil and Prof. Rainer Doemer

Title: “Approximate and Bit-width Con gurable Arithmetic Logic Unit Design for Deep Learning Accelerator”


As key building blocks for digital signal processing, image processing and deep learning etc, adders, multi-operand adders and multiply-accumulator units (MAC) have drawn lots of attention recently. Two popular ways to improve arithmetic logic unit (ALU) performance and energy efficiency are approximate computing and precision scalable design. Approximate computing helps achieve better performance or energy efficiency by trading accuracy. Precision scalable design provides the capability of allocating just-enough hardware resources to meet the application requirements.

In this thesis, we first present a correlation aware predictor (CAP) based approximate adder, which utilizes spatial-temporal correlation information of input streams to predict carry-in signals for sub-block adders. CAP uses less prediction bits to reduce the overall adder delay. For highly correlated input streams, we found that CAP can reduce adder delay by ~23.33% and save ~15.9% area at the same error rate compared to prior works.

Inspired by the success of approximate multipliers using approximate compressors, we proposed a pipelined approximate compressor based speculative multi-operand adder (AC-MOA). All compressors are replaced with approximate ones to reduce the overall delay of the bit-array reduction tree. An efficient error detection and correction block is designed to compensate the errors with one extra cycle. Experimental results showed the proposed 8-bit 8-operand AC-MOA achieved 1.47X ~ 1.66X speedup over conventional baseline design.

Recent research works on deep learning algorithms showed that bit-width can be reduced without losing accuracy. To benefit from the fact that bit-width requirement varies across deep learning applications, bit-width configurable designs can be used to improve hardware efficiency. In this thesis a bit-width con gurable MAC (BC-MAC) is proposed. BC-MAC uses a spatial-temporal approach to support variable precision requirements for both activations and weights. The basic processing element (PE) of BC-MAC is a multi-operand adder. Multiple multi-operand adders can be combined together to support input operands of any precision. Bit-serial summation is used to accumulate partial addition results to perform MAC operations. Booth encoding is employed to further boost the throughput. Synthesis results on TSMC 16nm technology and simulation results show the proposed MAC achieves higher area efficiency and energy efficiency than the state-of-the-art designs, making it a promising ALU for deep learning accelerators.

PhD Defense: Understanding and Guaranteeing Security, Privacy, and Safety of Smart Homes

Name: Rahmadi Trimananda

Chair: Professor Brian Demsky

Date: Thursday, August 20, 2020

Time:11:00 AM – 01:00 PM

Location: Zoom

Committee: Dr. Brian Demsky, Dr. Athina Markopoulou, Dr. Harry Xu

Title: Understanding and Guaranteeing Security, Privacy, and Safety of Smart Homes


Smart homes are becoming increasingly popular. Unfortunately, they come with security, privacy, and safety issues.  In this work, we explore new methods and techniques to better understand and guarantee security, privacy, and safety of smart homes. To tackle the existing problems, we view smart home from 3 different sides: devices, platforms, and apps.

On the devices side, we discovered that smart home devices are vulnerable to passive inference attacks based on network traffic, even in the presence of encryption. We first present this passive inference attack and our techniques that we developed to exploit this vulnerability on smart home devices.  We created PingPong,  a tool that can automatically extract packet-level signatures for device events (e.g., light bulb turning ON/OFF) from network traffic.  We evaluated PingPong on popular smart home devices ranging from smart plugs and thermostats to cameras, voice-activated devices, and smart TVs. We were able to:  (1) automatically extract previously unknown signa-tures that consist of simple sequences of packet lengths and directions;  (2) use those signatures to detect the devices or specific events with an average recall of more than 97%;  (3) show that the signatures are unique among hundreds of millions of packets of real world network traffic; (4) show that our methodology is also applicable to publicly available datasets; and (5) demonstrate its robustness in different settings: events triggered by local and remote smartphones, as well as by home-automation systems. Furthermore, we also present existing techniques (e.g., packet padding) as possible defenses against passive inference attacks and their analyses.

On the platforms side, smart home platforms such as SmartThings enable homeowners to manage devices in sophisticated ways to save energy, improve security, and provide conveniences. Unfortunately, we discovered that smart home platforms contain vulnerabilities, potentially impacting home security and privacy.  Aside from the traditional defense techniques to enhance the security and privacy of smart home devices, we also created Vigilia, a system that shrinks the attack surface of smart home IoT systems by restricting the network access of devices.  As existing smart home systems are closed, we have created an open implementation of a similar programming and configuration model in Vigilia and extended the execution environment to maximally restrict communications by instantiating device-based network permissions. We have implemented and compared Vigilia with forefront IoT-defense systems; our results demonstrate that Vigilia outperforms these systems and incurs negligible overhead.

On the apps side, smart home platforms allow developers to write apps to make smart home devices work together to accomplish tasks, e.g., home security and energy conservation—smart home devices provide the convenience of remotely controlling and automating home appliances.  A smarthome app typically implements narrow functionality and thus to fully implement desired function-ality homeowners may need to install multiple apps. These different apps can conflict with each other and these conflicts can result in undesired actions such as locking the door during a fire. We study conflicts between apps on Samsung SmartThings, the most popular platform for developing and deploying smart home IoT devices. By collecting and studying 198 official and 69 third-party apps, we found significant app conflicts in 3 categories: (1) close to 60% of app pairs that access the same device, (2) more than 90% of app pairs with physical interactions, and (3) around 11% of app pairs that access the same global variable. Our results suggest that the problem of conflicts between smart home apps is serious and can create potential safety risks.  We then developed an automatic conflict detection tool that uses model checking to automatically detect up to 96% of the conflicts.


PhD Defense: Advancing Compiler and Simulator Techniques for Highly Parallel Simulation of Embedded Systems

Name: Zhongqi Cheng

Chair: Prof. Rainer Doemer

Date: May 29th, 2020

Time: 02:00 PM

Location:  Zoom

Committee: Rainer Doemer (Chair), Prof. Mohammad Al Faruque and Prof. Aparna Chandramowlishwaran

Title: “Advancing Compiler and Simulator Techniques for Highly Parallel Simulation of Embedded Systems”


As an Electronic System Level (ESL) design language, the IEEE SystemC standard is widely used for testing, validation and verification of embedded system models. Discrete Event Simulation (DES) has been used for decades as the default SystemC simulation semantic. However, due to the sequential nature of DES, Parallel DES has recently gained an increasing amount of attention for performing high speed simulations on parallel computing platforms. To further exploit the parallel computation power of modern multi- and many-core platforms, Out-of-order Parallel Discrete Event Simulation (OoO PDES) has been proposed. In OoO PDES, threads comply with a partial order such that different simulation threads may run in different time cycles to increase the parallelism of execution. The Recoding Infrastructure for SystemC (RISC) has been introduced as a tool flow to fully support OoO PDES.

To preserve the SystemC semantics under OoO PDES, a compiler based approach statically analyzes the race conditions in the input model. However, there are severe restrictions: the source code for the input design must be available in one file, which does not scale. This disables the use of Intellectual Property (IP) and hierarchical file structures. In this dissertation, we propose a partial-graph based approach to scale the static analysis to support separate files and IP reuse. Specifically, the Partial Segment Graph (PSG) data structure is proposed and is used to abstract the behaviours and communication of modules within a single translation unit. These partial graphs are combined at top level to reconstruct the complete behaviors and communication of the entire model.

We also propose new algorithms to support the static analysis for modern SystemC TLM-2.0 standard. SystemC TLM-2.0 is widely used in industrial ESL designs for better interoperability and higher simulation speed. However, it is identified as an obstacle for parallel SystemC simulation due to the disappearance of channels. To solve the problem, we propose a compile time approach to statically analyze potential conflicts among threads in SystemC TLM-2.0 loosely- and approximately-timed models. A new Socket Call Path (SCP) technique is introduced which provides the compiler with socket binding information for precise static analysis. Based on SCP, an algorithm is proposed to analyze entangled variable pairs for automatic and accurate conflict analysis.

Besides the works on the compiler side, we focus as well on increasing the simulation speed of OoO PDES. We observe that the granularity of the Segment Graph (SG) data structure used in static analysis has a high impact on OoO PDES. This motivates us to propose a set of coding guidelines for the RISC users to properly refine their SystemC model for a higher simulation speed.

Furthermore, in this dissertation, an algorithm is proposed to optimize directly the event delivery strategy in OoO PDES. Event delivery in OoO PDES was very conservative, which often postponed the execution of waiting threads due to unknown future behaviors of the SystemC model, and in turn became a bottleneck of simulation speed. The algorithm we propose takes advantage of the prediction of future thread behaviors, and therefore allows waiting threads to resume execution earlier, resulting in significantly increased simulation speed.

To summarize, the contributions of this dissertation include: 1) a scalable RISC tool flow for statically analyzing and protecting 3rd party IPs in models with multiple files, 2) an advanced static analysis approach for modern SystemC TLM-2.0 models, 3) a set of coding guidelines for RISC users to achieve higher simulation speed, and 4) a more efficient event delivery algorithm in OoO PDES scheduler using prediction information.

Together, these compiler and simulator advances enable OoO PDES for larger and modern model simulation and thus improve the design of embedded systems significantly, leading to better devices at lower cost in the end.


PhD Defense: Runtime Resource Management of Emerging Applications in Heterogeneous Architectures

Name: Kasra Moazzemi

Advisor: Nikil Dutt

Date: May 28, 2020

Time: 12:00 PM

Location: Zoom

Title: “Runtime Resource Management of Emerging Applications in Heterogeneous Architectures”

Abstract: Runtime resource management for heterogeneous computing systems is becoming more and more complex as workloads in these platforms get increasingly more diverse and the conflicts grow between heterogeneous architectural components and their resource demands. The goal of these runtime resource management mechanisms is to achieve the overall system goal for dynamic workloads while coordinating system resources in a robust and adaptive fashion.

To address the complexities in heterogeneous computing systems, state-of-the-art techniques that use heuristics or machine learning have been proposed. On the other hand, conventional control theory can be used for formal guarantees, but may face unmanageable complexity for modeling system dynamics when dealing with heterogeneous computing platforms. In this thesis, we initially analyze variety of runtime resource management methods and introduce a classification for these methods capturing the utilized resources and metrics. We cover heuristic, machine learning and control theory methods used to manage resources such as performance, power, energy, temperature, Quality-of-Service (QoS) and reliability of the system.

In addition, we explore variety of dynamic resource management frameworks that provide significant gains in terms of self-optimization and self-adaptivity. This includes simulation infrastructures, hardware platforms enhanced with multi-layer management mechanisms and corresponding software frameworks that enable management policies for these systems in an effective and adaptive manner. Ultimately, we address the problem of optimizing energy efficiency, power consumption, performance and QoS in heterogeneous systems by proposing adaptive runtime policies. The proposed methods in this thesis, take into account the constraints and requirements defined by user, dynamic workloads and coordination between conflicting objectives. The projects presented in this dissertation show effectiveness in responding to abrupt changes in heterogeneous computing systems by dynamically adapting to changing application and system behavior at runtime, and are thus able to provide significant improvement compared to commonly used static resource management methods.


PhD Defense: Security Monitor for Mobile Devices: Design and Applications

Name: Saeed Mirzamohammadi

Chair: Ardalan Amiri Sani

Date: May 21, 2020

Time: 3pm

Location: Zoom

Committee: Ardalan Amiri Sani, Sharad Mehrotra, Gene Tsudik, Sharad Agarwal (MSR)

Title: Security Monitor for Mobile Devices: Design and Applications


Android’s underlying Linux kernel is rapidly becoming a more attractive target for attackers. In 2014, the number of reported bugs in the kernel is 4 percent of the overall bugs discovered in Android. This number has been drastically increased to 9 and 44 percent in 2015 and 2016, respectively. An attacker uses these kernel bugs to get kernel privilege and gain complete control of the mobile device.

In this talk, we present the Security Monitor, a small, trustworthy, and extensible software that provides different security services, with a small Trusted Computing Base (TCB). Security Monitor is designed and built based on two ARM hardware features: virtualization hardware and ARM TrustZone. The security services within the Security Monitor enforce certain privacy and security guarantees for the system. We demonstrate three end-to-end systems that leverage the Security Monitor to provide different security services. First, we present Viola that provides trustworthy sensor notifications using low-level checks in the Security Monitor. Second, we present Ditio that provides trustworthy auditing of sensor activities by recording the sensor activities in the Security Monitor. Third, we present Tabellion that provides the secure formation of electronic contracts by designing secure primitives in the Security Monitor.


PhD Defense: Brain Inspired Neural Network Models of Visual Motion Perception and Tracking in Dynamic Scenes

Name: Hirak Jyoti Kashyap

Chair: Jeffrey L. Krichmar

Date: May 15, 2020

Time: 12:00 PM Pacific Time

Location: Zoom

Committee: Jeffrey L. Krichmar, Nikil Dutt, Charless C. Fowlkes, Emre Neftci

Title: Brain Inspired Neural Network Models of Visual Motion Perception and Tracking in Dynamic Scenes


For self-driving vehicles, aerial drones and autonomous robots to be successfully deployed in the real-world, they must be able to navigate complex environments and track objects. While Artificial Intelligence and Machine Vision have made significant progress in dynamic scene understanding, they are not yet as robust and computationally efficient as humans or other primates in these tasks. For example, the current state-of-the-art visual tracking methods become inaccurate when applied to random test videos. We suggest that ideas from cortical visual processing can inspire real world solutions for motion perception and tracking that are robust and efficient. In this context, the thesis makes the following contributions. First, a method for estimating 6DoF ego-motion and pixel-wise object motion is introduced, based on a learned overcomplete motion field basis set. The method uses motion field constraints for training and a novel differentiable sparsity regularizer to achieve state-of-the-art ego and object-motion performances on benchmark datasets. Second, a Convolutional Neural Network (CNN) that learns hidden neural representations analogous to the response characteristics of dorsal Medial Superior Temporal area (MSTd) neurons for optic flow and object motion is presented. The findings suggest that goal driven training of CNNs might automatically result in the MSTd-like response properties of model neurons. Third, a recurrent neural network model of predictive smooth pursuit eye movements is presented that generates similar pursuit initiation and predictive pursuit behaviors as observed in humans. The model provides the computational mechanisms of formation and rapid update of an internal model of target velocity, widely attributed to zero lag tracking and smooth pursuit of occluded objects. Finally, a spike based stereo algorithm and its fully neuromorphic implementation is presented that reconstructs dynamic visual scenes at 400 frames-per-second with one watt of power consumption using the IBM TrueNorth processor. Taken together, the presented models and implementations demonstrate how the dorsal visual pathway in the brain performs efficient motion perception and informs ideas for efficient computational vision systems.

PhD Defense: Design and Implementation of Robust Full-Duplex Wireless Network

Name: Sergey Shaboyan

Chair: Prof. Ahmed Eltawil

Date: March 10th, 2020

Time: 09:30 AM

Location:  EH 3206

Committee: Prof. Ahmed Eltawil (Chair), Prof. Ender Ayanoglu, Prof. Zhiying Wang

Title: “Design and Implementation of Robust Full-Duplex Wireless Network”


Recently Full-Duplex (FD) communication has gained significant interest due to demonstrable increase in throughput and spectral efficiency. Conventional Half-duplex (HD) communication systems use either time-duplexing or frequency-duplexing to avoid self-interference. In contrast, full-duplex systems transmit and receive simultaneously on the same frequency band, thus optimally utilizing available resources. The main challenge in FD systems is managing the self-interference (SI) signal at each node, which is typically orders of magnitude larger than the intended signal of interest (SOI). To achieve sufficient SI suppression, FD systems rely on cancellation across multiple domains such as spatial, analog, and digital. However, number of practical, FD specific challenges arise impacting quality of service, when at least one node in a network operates in full-duplex mode.

In this thesis, we consider practical issues of wireless networks containing a full-duplex node. The ultimate goal of this work is to design and implement real-time, end-to-end networks consisting of at least one FD node that is capable of improving network performance under limited available bandwidth constraint. First, we identify synchronization issues in a network consisting of a full-duplex base station communicating with half-duplex nodes. Novel synchronization techniques specific for full-duplex networks are proposed that allow compensation of synchronization errors in time and frequency. The proposed techniques are implemented and tested experimentally on a real-time full-duplex wireless network. Second, we characterize the dynamic environment impact on the received self-interference in the FD system, which is equipped with a reconfigurable antenna as a passive SI suppression mechanism. The self-interference channel delay profile is measured using the FD system operating on 5MHz, 10MHz, and 20MHz bandwidths. The measured channel profiles collected under suppressing and non-suppressing antenna patterns are compared, and channel changes in static, as well as dynamic environments, are highlighted. We then statistically model the SI channel by performing probability distribution fitting to SI channel data. Third, the thesis proposes a Wi-Fi compliant self-interference active cancellation technique for amplify and forward, as well as decode and forward full-duplex relays. Finally, we design and implement an end-to-end wireless network extended with the aid of a custom-designed amplify and forward full-duplex relay. We then analyze the relay coverage limitation under the stability and transmit power constraints. The network performance is analyzed as a function of relay location for constant gain and constant transmit power modes, consequently suggesting optimal relay location that will maximize signal to noise plus interference ratio (SINR) at the destination node. We evaluate the overall network performance by simulation, as well as experimentally in outdoor/indoor environments.

PhD Defense: Novel Monitoring, Detection, and Optimization Techniques for Neurological Disorders

Name: Seyede Mahya Safavi

Chair: Prof. Pai Chou

Date: February 20th, 2020

Time: 10:00 AM

Location:  EH 3404

Committee: Prof. Pai Chou (Chair), Prof. Beth Lopour, Prof. Phillip Sheu

Title: “Novel Monitoring, Detection, and Optimization Techniques for Neurological Disorders”


The advancement in chronically implanted neural recording devices has led to advent of assisting devices for rehabilitation and restoring lost sensorimotor functions in patients suffering from paralysis. Electrocorticogram (ECoG) signal can record high-Gamma sub-band activity known to be related to hand movements. In the first part of this work, we Propose a finger movement detection technique based on ECoG source localization. In fact, the finger flexion and extension are originating in slightly different areas of motor cortex. The origin of the brain activity is used as the distinctive feature for decoding the finger movement.

The real-time implementation of brain source localization is challenging due to extensive iterations in the existing solutions. In the second part of this work, we have proposed two techniques to reduce the computational complexity of the Multiple Signal Classification (MUSIC) algorithm. First the cortex surface is parsed into several regions. Next, a novel nominating procedure will pick a number of regions to be searched for brain activity. In the second step, an electrode selection technique based on the Cramer-Rao bound of the errors is proposed to select the best set of an arbitrary number of electrodes. The proposed techniques lead to 90% reduction in computational complexity while maintaining a good concordance in terms of localization error compared to regular MUSIC algorithm.

Epilepsy is a neurological disorder with multiple comorbid conditions including cardiovascular and respiratory disorders. The cardiovascular imbalance is of great importance since the mechanisms of Sudden Unexpected Death in Epilepsy (SUDEP) is still unknown. The ictal tachycardia is the most well-known cardiac imbalance during the seizure. In the third part of this dissertation, we used an optical sensor called photoplethysmogram (PPG) to investigate the variations in ictal blood flow in limbs. Six different features related to hemodynamics were derived from PPG pulse morphology. A consistent pattern of ictal change was observed across all the subjects/seizures. These variations suggest an increase in vascular resistance due to an increase in sympathetic tone. The timing analysis of the PPG features revealed some PPG feature variations can precede the ictal tachycardia by 50 seconds. These features were used to train a neural network based on Long Short Term Memory LSTM architecture for automatic seizure detection. We were able to reduce the False Alarm rate by 50% compared to other heart rate variability based detectors.

PhD Defense: Efficient Offline and Online Training of Memristive Neuromorphic Hardware

Name: Mohammed Fouda

Chair: Prof. Ahmed Eltawil

Date: February 6th, 2020

Time: 9:30 AM

Location:  EH 3206

Committee: Ahmed Eltawil (Chair), Prof. Fadi Kurdahi, Prof. Nikil Dutt, Prof. Emre Neftci

Title: “Efficient Offline and Online Training of Memristive Neuromorphic Hardware”


Brain-inspired neuromorphic systems have witnessed rapid development over the last decade from both algorithmic and hardware perspectives. Neuromorphic hardware promises to be more energy- and speed- efficient as compared to traditional Von-Neumann architectures. Thanks to the recent progress in solid-state devices, different nanoscale-nonvolatile memory devices, such as RRAMs (memristors), STT-RAM and PCM, support computations based on mimicking biological synaptic response. The most important advantage of these devices is their ability to be sandwiched between interconnect wires creating crossbar array structures that are inherently able to perform matrix-vector multiplication (MVM) in one step. Despite the great potential of RRAMs, they suffer from numerous nonidealities limiting the performance, including, high variability, asymmetric and nonlinear weight update, endurance, retention and stuck at fault (SAF) defects in addition to the interconnect wire resistance that creates sneak paths. This thesis will focus on the application of RRAMs for neuromorphic computation while accounting for the impact of device nonidealities on neuromorphic hardware.

In this thesis, we propose software-level solutions to mitigate the impact of nonidealities, that highly affect the offline (ex-situ) training, without incorporating expensive SPICE or numerical simulations. We propose two techniques to incorporate the effect of sneak path problem during training, in addition to the device’s variability, with negligible overhead. The first technique is inspired by the impact of the sneak path problem on the stored weights (devices’ conductances) which we referred to as the mask technique. This mask is element-wise multiplied by the weights during the training. This mask can be obtained from measured weights of fabricated hardware. The other solution is a neural network estimator which is trained by our SPICE-like simulator. The test validation results, done through our SPICE-like framework, show significant improvement in performance, close to the baseline BNNs and QNNs, which demonstrates the efficiency of the proposed methods. Both techniques show the high ability to capture the problem for multilayer perceptron networks for MNIST dataset with negligible runtime overhead. In addition, the proposed neural estimator outperforms the mask technique for challenging datasets such as CIFAR10. Furthermore, other nonidealities such as SAF defects and retention are evaluated.

We also develop a model to incorporate the stochastic asymmetric nonlinear weight update in online (in-situ) training. We propose two solutions for this problem; 1) a compensation technique which is tested on a small scale problem to separate two Laplacian mixed sources using online independent component analysis. 2) stochastic rounding and is tested on a spiking neural network with deep local learning dynamics showing only a 1~2\% drop in the baseline accuracy for three different RRAM devices. We also propose Error-triggered learning to overcome the limited endurance problem with only 0.3% and 3% drop in the accuracy for N-MNIST and DVSGesture datasets with around 33X and 100X reduction in the number of writes, respectively.

Finally, the prospects of this neuromorphic hardware are discussed to develop new algorithms with the existing resistive crossbar hardware including its nonidealities.

PhD Defense: Towards Engineering Computer Vision Systems: From Web to FPGAs

Final Defense – Sajjad Taheri

Date: August 26, 2019

Time: 2:00 pm

Location: Donald Bren Hall 4011

Committee: Alex Nicolau(chair), Alex Veidenbaum(co-chair), Nikil Dutt

Title: Towards Engineering Computer Vision Systems: From Web to FPGAs
Computer vision has many applications that impact our daily lives, such as automation, entertainment, healthcare, etc. However, computer vision is very challenging. This is in part due to intrinsically difficult nature of the problem and partly due to the complexity and size of visual data that need to be processed. To be able to deploy computer vision in many practical use cases, sophisticated algorithms and efficient implementations are required. In this dissertation, we consider two platforms that are suitable for computer vision processing, yet they were not easily accessible to algorithm designers and developers: The Web and FPGA-based Accelerators. Through the development of open-source software, we highlight challenges associated with vision development on each platform and demonstrate opportunities to mitigate them.
The Web is the world’s most ubiquitous computing platform which hosts a plethora of visual content. Due to historical reasons such as insufficient JavaScript performance and lack of API support for acquiring and manipulating images, computer vision is not mainstream on the Web. We show that in light of recent web developments such as vastly improved JavaScript performance and addition of APIs such as WebRTC, efficient computer vision processing can be realized on web clients. Through novel engineering techniques, we translate a popular open-source computer vision library (OpenCV) from C++ to JavaScript and optimize its performance for the web environment. We demonstrate that hundreds of computer vision functions run in browsers with performance close to their original C++ version.
Field Programmable Gate Arrays (FPGA)s are a promising solution to mitigate the computational cost of vision algorithms through hardware pipelining and parallelism while providing excellent power efficiency. However, an efficient FPGA implementation of vision algorithm requires hardware design expertise and a considerable amount of engineering person-hours. We show how high-level graph-based specifications, such as OpenVX can significantly improve FPGA design productivity. Since such abstractions exclude implementation details, different implementation configurations that satisfy various design constraints, such as performance and power consumption, can be explored systematically. They also enable a variety of local and global optimizations to apply across the algorithms.