Reconstructing visual experiences from brain activity evoked by natural movies - PubMed
- ️Sat Jan 01 2011
Reconstructing visual experiences from brain activity evoked by natural movies
Shinji Nishimoto et al. Curr Biol. 2011.
Abstract
Quantitative modeling of human brain activity can provide crucial insights about cortical representations [1, 2] and can form the basis for brain decoding devices [3-5]. Recent functional magnetic resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns and have reconstructed these patterns from brain activity [6-8]. However, blood oxygen level-dependent (BOLD) signals measured via fMRI are very slow [9], so it has been difficult to model brain activity elicited by dynamic stimuli such as natural movies. Here we present a new motion-energy [10, 11] encoding model that largely overcomes this limitation. The model describes fast visual information and slow hemodynamics by separate components. We recorded BOLD signals in occipitotemporal visual cortex of human subjects who watched natural movies and fit the model separately to individual voxels. Visualization of the fit models reveals how early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder [8] by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
Copyright © 2011 Elsevier Ltd. All rights reserved.
Conflict of interest statement
The authors declare no conflict of interest.
Figures

A, Stimuli first pass through a fixed set of nonlinear spatio-temporal motion-energy filters (shown in detail in panel B), and then through a set of hemodynamic response filters fit separately to each voxel. The summed output of the filter bank provides a prediction of BOLD signals. B, The nonlinear motion-energy filter bank consists of several filtering stages. Stimuli are first transformed into the Commission internationale de l'éclairage (CIE) L*A*B* color space and the color channels are stripped off. Luminance signals then pass through a bank of 6,555 spatio-temporal Gabor filters differing in position, orientation, direction, spatial and temporal frequency (see Supplemental Information for details). Motion energy is calculated by squaring and summing Gabor filters in quadrature. Finally, signals pass through a compressive nonlinearity and are temporally down-sampled to the fMRI sampling rate (1 Hz).

A, (top) The static encoding model includes only Gabor filters that are not sensitive to motion. (bottom) Prediction accuracy of the static model is shown on a flattened map of the cortical surface of one subject (S1). Prediction accuracy is relatively poor. B, The non-directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies, but motion in opponent directions is pooled. Prediction accuracy of this model is better than the static model. C, The directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies and directions. This model provides the most accurate predictions of all models tested. D and E, Voxel-wise comparisons of prediction accuracy between the three models. The directional motion-energy model performs significantly better than the other two models, although the difference between the non-directional and directional motion models is small. See also Figure S1 for subject- and area-wise comparisons. F, The spatial receptive field of one voxel (left), and its spatial and temporal frequency selectivity (right). This receptive field is located near the fovea, and it is high-pass for spatial frequency and low-pass for temporal frequency. This voxel thus prefers static or slow speed motion. G, Receptive field for a second voxel. This receptive field is located lower periphery, and it is band-pass for spatial frequency and high-pass for temporal frequency. This voxel thus prefers higher speed motion than the voxel in F. H, Comparison of retinotopic angle maps estimated using (top) the motion-energy encoding model and (bottom) conventional multi-focal mapping on a flattened cortical map [47]. The angle maps are similar, even though they were estimated using independent data sets and methods. I, Comparison of eccentricity maps estimated as in panel H. The maps are similar except in the far periphery where the multi-focal mapping stimulus was coarse. J, Optimal speed projected on to a flattened map as in panel H. Voxels near the fovea tend to prefer slow speed motion, while those in the periphery tend to prefer high speed motion. See also Figure S1B for subject-wise comparisons.

A, Identification accuracy for one subject (S1). The test data in our experiment consisted of 486 volumes (seconds) of BOLD signals evoked by the test movies. The estimated model yielded 486 volumes of BOLD signals predicted for the same movies. The brightness of the point in the mth column and nth row represents the log-likelihood (see Supplemental Information) of the BOLD signals evoked at the mth second given the BOLD signal predicted at the nth second. The highest log-likelihood in each column is designated by a red circle and thus indicates the choice of the identification algorithm. B, Temporal offset between the correct timing and the timing identified by the algorithm, for the same subject shown in panel A. The algorithm was correct to within ± one volume (second) 95% of the time (464/486); chance performance is less than 1% (3/486; i.e., three volumes centered at the correct timing). C, Scaling of identification accuracy with set size. To understand how identification accuracy scales with size of stimulus set we enlarged the identification stimulus set to include additional stimuli drawn from a natural movie database (but not actually used in the experiment). For all three subjects identification accuracy (within ± one volume) is greater than 75% even when set of potential movies includes one million clips. This is far above chance (gray dashed line).

A, First row: Three frames from a natural movie used in the experiment, taken one second apart. Second through sixth rows: frames from the five clips with the highest posterior probability. The maximum a posteriori (MAP) reconstruction is shown in row two. Seventh row: The averaged high posterior (AHP) reconstruction. The MAP provides good reconstruction of the second and third frames, while AHP provide more robust reconstructions across frames. B and C, Additional examples of reconstructions, format same as in panel A. D, Reconstruction accuracy (correlation in motion-energy; see Supplemental Information) for all three subjects. Error bars indicate ± 1 s.e.m. across one-second clips. Both the MAP and AHP reconstructions are significant, though the AHP reconstructions are significantly better than the MAP reconstructions. Dashed lines show chance performance (P=0.01). See also Figure S2.
Similar articles
-
Bayesian reconstruction of natural images from human brain activity.
Naselaris T, Prenger RJ, Kay KN, Oliver M, Gallant JL. Naselaris T, et al. Neuron. 2009 Sep 24;63(6):902-15. doi: 10.1016/j.neuron.2009.09.006. Neuron. 2009. PMID: 19778517 Free PMC article.
-
Integration of EEG source imaging and fMRI during continuous viewing of natural movies.
Whittingstall K, Bartels A, Singh V, Kwon S, Logothetis NK. Whittingstall K, et al. Magn Reson Imaging. 2010 Oct;28(8):1135-42. doi: 10.1016/j.mri.2010.03.042. Epub 2010 Jun 25. Magn Reson Imaging. 2010. PMID: 20579829
-
Han K, Wen H, Shi J, Lu KH, Zhang Y, Fu D, Liu Z. Han K, et al. Neuroimage. 2019 Sep;198:125-136. doi: 10.1016/j.neuroimage.2019.05.039. Epub 2019 May 16. Neuroimage. 2019. PMID: 31103784 Free PMC article.
-
Human cortical areas underlying the perception of optic flow: brain imaging studies.
Greenlee MW. Greenlee MW. Int Rev Neurobiol. 2000;44:269-92. doi: 10.1016/s0074-7742(08)60746-1. Int Rev Neurobiol. 2000. PMID: 10605650 Review.
-
Modeling correlated noise is necessary to decode uncertainty.
van Bergen RS, Jehee JFM. van Bergen RS, et al. Neuroimage. 2018 Oct 15;180(Pt A):78-87. doi: 10.1016/j.neuroimage.2017.08.015. Epub 2017 Aug 8. Neuroimage. 2018. PMID: 28801251 Review.
Cited by
-
On the encoding of natural music in computational models and human brains.
Kim SG. Kim SG. Front Neurosci. 2022 Sep 20;16:928841. doi: 10.3389/fnins.2022.928841. eCollection 2022. Front Neurosci. 2022. PMID: 36203808 Free PMC article. Review.
-
Nakai T, Nishimoto S. Nakai T, et al. Commun Biol. 2022 Nov 14;5(1):1245. doi: 10.1038/s42003-022-04221-y. Commun Biol. 2022. PMID: 36376490 Free PMC article.
-
Bordier C, Puja F, Macaluso E. Bordier C, et al. Neuroimage. 2013 Feb 15;67:213-26. doi: 10.1016/j.neuroimage.2012.11.031. Epub 2012 Nov 29. Neuroimage. 2013. PMID: 23202431 Free PMC article.
-
Visual dictionaries as intermediate features in the human brain.
Ramakrishnan K, Scholte HS, Groen II, Smeulders AW, Ghebreab S. Ramakrishnan K, et al. Front Comput Neurosci. 2015 Jan 15;8:168. doi: 10.3389/fncom.2014.00168. eCollection 2014. Front Comput Neurosci. 2015. PMID: 25642183 Free PMC article.
-
Machine learning in neuroimaging: from research to clinical practice.
Nenning KH, Langs G. Nenning KH, et al. Radiologie (Heidelb). 2022 Dec;62(Suppl 1):1-10. doi: 10.1007/s00117-022-01051-1. Epub 2022 Aug 31. Radiologie (Heidelb). 2022. PMID: 36044070 Free PMC article. Review.
References
-
- Wu MC, David SV, Gallant JL. Complete functional characterization of sensory neurons by system identification. Annu Rev Neurosci. 2006;29:477–505. - PubMed
-
- Haynes JD, Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci. 2006;7:523–534. - PubMed
-
- Kay KN, Gallant JL. I can see what you see. Nat Neurosci. 2009;12:245. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources