Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

Cereb Cortex. 2018 Dec 1;28(12):4136-4160. doi: 10.1093/cercor/bhx268.

Abstract

Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships between the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albeit to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was synthesized from natural images with high-throughput to map category representation, contrast, and selectivity. Through the decoding models, fMRI signals were directly decoded to estimate the feature representations in both visual and semantic spaces, for direct visual reconstruction and semantic categorization, respectively. These results corroborate, generalize, and extend previous findings, and highlight the value of using deep learning, as an all-in-one model of the visual cortex, to understand and decode natural vision.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Brain Mapping
  • Deep Learning*
  • Female
  • Humans
  • Image Processing, Computer-Assisted
  • Magnetic Resonance Imaging
  • Models, Neurological*
  • Pattern Recognition, Visual / physiology*
  • Visual Cortex / physiology*
  • Young Adult