Name: Hirak Jyoti Kashyap
Chair: Jeffrey L. Krichmar
Date: May 15, 2020
Time: 12:00 PM Pacific Time
Committee: Jeffrey L. Krichmar, Nikil Dutt, Charless C. Fowlkes, Emre Neftci
Title: Brain Inspired Neural Network Models of Visual Motion Perception and Tracking in Dynamic Scenes
For self-driving vehicles, aerial drones and autonomous robots to be successfully deployed in the real-world, they must be able to navigate complex environments and track objects. While Artificial Intelligence and Machine Vision have made significant progress in dynamic scene understanding, they are not yet as robust and computationally efficient as humans or other primates in these tasks. For example, the current state-of-the-art visual tracking methods become inaccurate when applied to random test videos. We suggest that ideas from cortical visual processing can inspire real world solutions for motion perception and tracking that are robust and efficient. In this context, the thesis makes the following contributions. First, a method for estimating 6DoF ego-motion and pixel-wise object motion is introduced, based on a learned overcomplete motion field basis set. The method uses motion field constraints for training and a novel differentiable sparsity regularizer to achieve state-of-the-art ego and object-motion performances on benchmark datasets. Second, a Convolutional Neural Network (CNN) that learns hidden neural representations analogous to the response characteristics of dorsal Medial Superior Temporal area (MSTd) neurons for optic flow and object motion is presented. The findings suggest that goal driven training of CNNs might automatically result in the MSTd-like response properties of model neurons. Third, a recurrent neural network model of predictive smooth pursuit eye movements is presented that generates similar pursuit initiation and predictive pursuit behaviors as observed in humans. The model provides the computational mechanisms of formation and rapid update of an internal model of target velocity, widely attributed to zero lag tracking and smooth pursuit of occluded objects. Finally, a spike based stereo algorithm and its fully neuromorphic implementation is presented that reconstructs dynamic visual scenes at 400 frames-per-second with one watt of power consumption using the IBM TrueNorth processor. Taken together, the presented models and implementations demonstrate how the dorsal visual pathway in the brain performs efficient motion perception and informs ideas for efficient computational vision systems.