pubmed.ncbi.nlm.nih.gov

Reconstructing visual experiences from brain activity evoked by natural movies - PubMed

️Sat Jan 01 2011

Reconstructing visual experiences from brain activity evoked by natural movies

Shinji Nishimoto et al. Curr Biol. 2011.

Abstract

Quantitative modeling of human brain activity can provide crucial insights about cortical representations [1, 2] and can form the basis for brain decoding devices [3-5]. Recent functional magnetic resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns and have reconstructed these patterns from brain activity [6-8]. However, blood oxygen level-dependent (BOLD) signals measured via fMRI are very slow [9], so it has been difficult to model brain activity elicited by dynamic stimuli such as natural movies. Here we present a new motion-energy [10, 11] encoding model that largely overcomes this limitation. The model describes fast visual information and slow hemodynamics by separate components. We recorded BOLD signals in occipitotemporal visual cortex of human subjects who watched natural movies and fit the model separately to individual voxels. Visualization of the fit models reveals how early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder [8] by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1. Schematic diagram of the motion-energy encoding model**
A, Stimuli first pass through a fixed set of nonlinear spatio-temporal motion-energy filters (shown in detail in panel B), and then through a set of hemodynamic response filters fit separately to each voxel. The summed output of the filter bank provides a prediction of BOLD signals. B, The nonlinear motion-energy filter bank consists of several filtering stages. Stimuli are first transformed into the Commission internationale de l'éclairage (CIE) L*A*B* color space and the color channels are stripped off. Luminance signals then pass through a bank of 6,555 spatio-temporal Gabor filters differing in position, orientation, direction, spatial and temporal frequency (see Supplemental Information for details). Motion energy is calculated by squaring and summing Gabor filters in quadrature. Finally, signals pass through a compressive nonlinearity and are temporally down-sampled to the fMRI sampling rate (1 Hz).

**Figure 2. The directional motion-energy model capture motion information**
A, (top) The static encoding model includes only Gabor filters that are not sensitive to motion. (bottom) Prediction accuracy of the static model is shown on a flattened map of the cortical surface of one subject (S1). Prediction accuracy is relatively poor. B, The non-directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies, but motion in opponent directions is pooled. Prediction accuracy of this model is better than the static model. C, The directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies and directions. This model provides the most accurate predictions of all models tested. **D and E,** Voxel-wise comparisons of prediction accuracy between the three models. The directional motion-energy model performs significantly better than the other two models, although the difference between the non-directional and directional motion models is small. See also Figure S1 for subject- and area-wise comparisons. F, The spatial receptive field of one voxel (left), and its spatial and temporal frequency selectivity (right). This receptive field is located near the fovea, and it is high-pass for spatial frequency and low-pass for temporal frequency. This voxel thus prefers static or slow speed motion. G, Receptive field for a second voxel. This receptive field is located lower periphery, and it is band-pass for spatial frequency and high-pass for temporal frequency. This voxel thus prefers higher speed motion than the voxel in F. H, Comparison of retinotopic angle maps estimated using (top) the motion-energy encoding model and (bottom) conventional multi-focal mapping on a flattened cortical map [47]. The angle maps are similar, even though they were estimated using independent data sets and methods. I, Comparison of eccentricity maps estimated as in panel H. The maps are similar except in the far periphery where the multi-focal mapping stimulus was coarse. J, Optimal speed projected on to a flattened map as in panel H. Voxels near the fovea tend to prefer slow speed motion, while those in the periphery tend to prefer high speed motion. See also Figure S1B for subject-wise comparisons.

**Figure 3. Identification analysis**
A, Identification accuracy for one subject (S1). The test data in our experiment consisted of 486 volumes (seconds) of BOLD signals evoked by the test movies. The estimated model yielded 486 volumes of BOLD signals predicted for the same movies. The brightness of the point in the m^th column and n^th row represents the log-likelihood (see Supplemental Information) of the BOLD signals evoked at the m^th second given the BOLD signal predicted at the n^th second. The highest log-likelihood in each column is designated by a red circle and thus indicates the choice of the identification algorithm. B, Temporal offset between the correct timing and the timing identified by the algorithm, for the same subject shown in panel A. The algorithm was correct to within ± one volume (second) 95% of the time (464/486); chance performance is less than 1% (3/486; i.e., three volumes centered at the correct timing). C, Scaling of identification accuracy with set size. To understand how identification accuracy scales with size of stimulus set we enlarged the identification stimulus set to include additional stimuli drawn from a natural movie database (but not actually used in the experiment). For all three subjects identification accuracy (within ± one volume) is greater than 75% even when set of potential movies includes one million clips. This is far above chance (gray dashed line).

**Figure 4. Reconstructions of natural movies from BOLD signals**
A, First row: Three frames from a natural movie used in the experiment, taken one second apart. Second through sixth rows: frames from the five clips with the highest posterior probability. The maximum a posteriori (MAP) reconstruction is shown in row two. Seventh row: The averaged high posterior (AHP) reconstruction. The MAP provides good reconstruction of the second and third frames, while AHP provide more robust reconstructions across frames. **B and C,** Additional examples of reconstructions, format same as in panel A. D, Reconstruction accuracy (correlation in motion-energy; see Supplemental Information) for all three subjects. Error bars indicate ± 1 s.e.m. across one-second clips. Both the MAP and AHP reconstructions are significant, though the AHP reconstructions are significantly better than the MAP reconstructions. Dashed lines show chance performance (P=0.01). See also Figure S2.

Cited by

On the encoding of natural music in computational models and human brains.
Kim SG. Kim SG. Front Neurosci. 2022 Sep 20;16:928841. doi: 10.3389/fnins.2022.928841. eCollection 2022. Front Neurosci. 2022. PMID: 36203808 Free PMC article. Review.
Representations and decodability of diverse cognitive functions are preserved across the human cortex, cerebellum, and subcortex.
Nakai T, Nishimoto S. Nakai T, et al. Commun Biol. 2022 Nov 14;5(1):1245. doi: 10.1038/s42003-022-04221-y. Commun Biol. 2022. PMID: 36376490 Free PMC article.
Sensory processing during viewing of cinematographic material: computational modeling and functional neuroimaging.
Bordier C, Puja F, Macaluso E. Bordier C, et al. Neuroimage. 2013 Feb 15;67:213-26. doi: 10.1016/j.neuroimage.2012.11.031. Epub 2012 Nov 29. Neuroimage. 2013. PMID: 23202431 Free PMC article.
Visual dictionaries as intermediate features in the human brain.
Ramakrishnan K, Scholte HS, Groen II, Smeulders AW, Ghebreab S. Ramakrishnan K, et al. Front Comput Neurosci. 2015 Jan 15;8:168. doi: 10.3389/fncom.2014.00168. eCollection 2014. Front Comput Neurosci. 2015. PMID: 25642183 Free PMC article.
Machine learning in neuroimaging: from research to clinical practice.
Nenning KH, Langs G. Nenning KH, et al. Radiologie (Heidelb). 2022 Dec;62(Suppl 1):1-10. doi: 10.1007/s00117-022-01051-1. Epub 2022 Aug 31. Radiologie (Heidelb). 2022. PMID: 36044070 Free PMC article. Review.

References

1. Wu MC, David SV, Gallant JL. Complete functional characterization of sensory neurons by system identification. Annu Rev Neurosci. 2006;29:477–505. - PubMed
1. Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. Neuroimage. 2011;56:400–410. - PMC - PubMed
1. Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nat Neurosci. 2005;8:679–685. - PMC - PubMed
1. Haynes JD, Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci. 2006;7:523–534. - PubMed
1. Kay KN, Gallant JL. I can see what you see. Nat Neurosci. 2009;12:245. - PubMed

Reconstructing visual experiences from brain activity evoked by natural movies - PubMed