Speaker: Professor Sungjoo Yoo, Computing and Memory Architecture Lab, Seoul National University
Title: Reviving Processing-in-Memory for LArge Data Workload on Existing Computer Architecture
Date: Tuesday, July 14, 2015
Time: 11:00 AM
Location: Donald Bren Hall 3011
Host: Nikil Dutt
Abstract: Processing-in-memory (PIM) is rebounding from its unsuccessful attempts in 1990s due to two main reasons, recent advances in 3D stacking technologies and emerging large data workload. In this talk, we present two of our recent works, PIM for large data workload and combining PIM with the existing computer architecture.
Graph data are becoming more and more popular in many areas such as machine learning, social network analysis, etc. Graph computation is to process a query to the graph database, e.g., finding the most popular personality. Graph computation is characterized by computation parallelism (per-vertex parallel computation) and significant random memory accesses (to neighbor vertices). The conventional architecture is not well suited for this type of workload. We present a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefecthers specialized for memory access patterns of graph processing, which operate base on the hints provided by our programming model.
In order to make best use of PIM in more areas, it is required to integrate the PIM architectures with existing systems in a seamless manner. The current PIM proposals lack due to two common characteristics: unconventional programming models for in-memory computation units (as programmable co-processors) and lack of ability to utilize large on-chip caches. We propose a new PIM architecture that (1) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification.