pubmed.ncbi.nlm.nih.gov

Getting the Cocktail Party Started: Masking Effects in Speech Perception - PubMed

Getting the Cocktail Party Started: Masking Effects in Speech Perception

Samuel Evans et al. J Cogn Neurosci. 2016 Mar.

Abstract

Spoken conversations typically take place in noisy environments, and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous fMRI, while they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioral task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment; activity was found within right lateralized frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise.

PubMed Disclaimer

Figures

Figure 1
Figure 1

(A) Oscillograms and spectrograms of masking stimuli. SP=speech; ROT=rotated speech; SMN=Speech modulated noise. (B) The organisation of a set of trials (left – rounded boxes) and the statistical models (right). Epoch Model (Red), the first column in the design matrix models the presence of the voice of speaker 1, and thus models all auditory trials for their full duration, excluding “silent” implicit baseline trials. Additional columns model the presence of competing masking sounds derived from speaker 2, with each column representing a different masking sound. These events partially overlap with events specified in the first column. This design identifies unique variance associated with hearing speaker 1 in clear, and the additional effect of masking with competing sounds. Onset Model (Red + Orange), the design matrix models events in the same way as the Epoch Model (Red) with additional events modelling the onset of clear speech and masking (Orange). This allows the identification of unique variance associated with the onset of masking. (C) Behavioural post-scanning accuracy for each condition. Bar graphs of the mean beta value for each condition with within subject error bars representing one standard error (Loftus and Masson 1994).

Figure 2
Figure 2

Masking and intelligibility networks. White – regions responding to clear speech as compared to the resting baseline. Red – regions responding more to clear than to masked speech. Blue – regions responding more to masked than to clear speech. Orange – regions in which activity correlated at a whole brain level with accuracy in comprehension of speech in post-scanning masking tasks (averaged across condition). Bar graphs of the mean beta value for each condition with within subject error bars representing one standard error (Loftus and Masson 1994). Scatter plot of the relationship between neural activity and comprehension of masked speech in post scanning masking tasks.

Figure 3
Figure 3

Regions showing increasing activation in response to masking sounds with increasing informational content. Bar graphs of the mean beta value for each condition with within subject error bars representing one standard error (Loftus and Masson 1994).

Figure 4
Figure 4

(A) Activation overlap map for the different contrasts, including the conjunction of [SP > ROT] ∩ [SP > SMN] in red box with plot of the response. (B) Region of interest analyses comparing the neural response to the intelligibility of the masking stimulus [SP > ROT] for bilateral anterior and posterior STS. Plots show mean beta values for each condition with within subject error bars representing one standard error (Loftus and Masson 1994). (C) Lateralisation curve for [SP > ROT] within temporal cortex.

Figure 5
Figure 5

(A) Average effect of masking onsets – activation associated with the onset of masking sounds in the presence of on-going scanner noise and target speech. Red rendering shows the effect of masking onsets and the time course of the response in selected regions in plots (1) and (2). The blue rendering shows regions responding more to masked epochs, as compared to clear speech, and the time course of the responses in selected regions in plots (3) and (4). (B) Modulation of onset effects by informational content. (C) Clear speech onsets – activation associated with the onset of clear speech in the presence of on-going scanner. (D) Conjunction of clear speech and masking onsets. Bar graphs show the mean beta value for each condition with within subject error bars representing one standard error (Loftus & Masson, 1994), (E) Lateralisation curves for the frontal cortex for (i) Masking onsets (ii) clear speech onsets.

Similar articles

Cited by

References

    1. Adank P. The neural bases of difficult speech comprehension and speech production: Two Activation Likelihood Estimation (ALE) meta-analyses. Brain and Language. 2012;122(1):42–54. Retrieved from http://www.sciencedirect.com/science/article/pii/S0093934X12000867. - PubMed
    1. Awad M, Warren JE, Scott SK, Turkheimer FE, Wise RJS. A common system for the comprehension and production of narrative speech. Journal of Neuroscience. 2007;27(43):11455–11464. Retrieved from WOS:000250577500003. - PMC - PubMed
    1. Azadpour M, Balaban E. Phonological representations are unconsciously used when processing complex, non-speech signals. PloS one. 2008;3(4):e1966. doi: 10.1371/journal.pone.0001966. - DOI - PMC - PubMed
    1. Badcock Na, Bishop DVM, Hardiman MJ, Barry JG, Watkins KE. Co-localisation of abnormal brain structure and function in specific language impairment. Brain and language. 2012;120(3):310–20. doi: 10.1016/j.bandl.2011.10.006. - DOI - PMC - PubMed
    1. Ben-David BM, Tse VYY, Schneider Ba. Does it take older adults longer than younger adults to perceptually segregate a speech target from a background masker? Hearing research. 2012;290(1–2):55–63. doi: 10.1016/j.heares.2012.04.022. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources