Name: Emad Arasteh
Chair: Prof. Rainer Doemer
Date: July 7, 2022
Time: 10:00 am
Location: EH 3206
Committee: Prof. Fadi Kurdahi, Prof. Ian Harris
Title: Transaction-Level Modeling of Deep Neural Networks for Efficient Parallelism and Memory Accuracy
The emergence of data-intensive applications, such as Deep Neural Networks (DNNs), exacerbates the well-known memory bottleneck in computer systems and demands early attention in the design flow. Electronic System-Level (ESL) design using SystemC Transaction Level Modeling (TLM) enables effective performance estimation, design space exploration, and gradual refinement. In this dissertation, we present our exploratory modeling framework for hardware-software codesign based on SystemC TLM with particular focus on exposing parallelism and memory contention. We demonstrate the effectiveness of our approach for representative complex DNNs such as GoogLeNet and Single Shot MultiBox Detector.
First, we study the impact of communication mechanisms on the available parallelism in TLM models. Specifically, we demonstrate the impact of varying synchronization mechanisms and buffering schemes on the exposed parallelism using different modeling styles of a DNN. We measure the performance of aggressive out-of-order parallel discrete event simulation and analyze the available parallelism in the models. Our study suggests that increased parallel simulation performance indicates better models with higher amounts of parallelism exposed.
Second, we explore the critical aspects of modeling and analysis of timing accuracy and memory contention. A major hurdle in tackling the memory bottleneck is the detection of memory contention late in the design cycle once detailed timed or cycle-accurate models are developed. A memory bottleneck detected at such a late stage can severely limit the available design choices or even require costly redesign. To explore new architectures prior to RTL implementation, we propose a novel TLM-2.0 loosely-timed contention-aware (LT-CA) modeling style that offers high-speed simulation close to traditional loosely-timed (LT) models, yet shows the same accuracy for memory contention as low level approximately-timed (AT) models.
Finally, we refine further the TLM-2.0 AT model by adding a cycle-accurate model of a memory subsystem. This model provides a higher timing accuracy for contention analysis, hence it gives a more accurate estimation of the performance. We revise our LT-CA memory delay modeling to provide further accuracy comparable to the cycle-accurate TLM model of the shared memory subsystem. The high amount of contention on the shared memory suggests new processor architectures with local memories.
Emad Arasteh is a Ph.D. candidate in Computer Engineering at University of California, Irvine and a graduate researcher at the Center for Embedded and Cyber-physical Systems (CECS). He received the M.Sc. degree in Electronic Design from Lund University in Sweden. His current research interests include system-level modeling and design of embedded systems, scalable memory system design, and massively-parallel processor architecture. Previously, he worked as a hardware and software engineer at semiconductor companies in Sweden and USA including Ericsson, Canon (Axis Communications) and Samsung.