Review
Between sound and perception: reviewing the search for a neural code

https://doi.org/10.1016/S0378-5955(01)00259-3Get rights and content

Abstract

This review investigates the roles of representation, transformation and coding as part of a hierarchical process between sound and perception. This is followed by a survey of how speech sounds and elements thereof are represented in the activity patterns along the auditory pathway. Then the evidence for a place representation of texture features of sound, comprising frequency, periodicity pitch, harmonicity in vowels, and direction and speed of frequency modulation, and for a temporal and synchrony representation of sound contours, comprising onsets, offsets, voice onset time, and low rate amplitude modulation, in auditory cortex is reviewed. Contours mark changes and transitions in sound and auditory cortex appears particularly sensitive to these dynamic aspects of sound. Texture determines which neurons, both cortical and subcortical, are activated by the sound whereas the contours modulate the activity of those neurons. Because contours are temporally represented in the majority of neurons activated by the texture aspects of sound, each of these neurons is part of an ensemble formed by the combination of contour and texture sensitivity. A multiplexed coding of complex sound is proposed whereby the contours set up widespread synchrony across those neurons in all auditory cortical areas that are activated by the texture of sound.

Introduction

Approximately 35 years after the publication of Kiang’s monograph on the discharge patterns of auditory nerve (AN) fibers in the cat (Kiang et al., 1965) the physiology of the AN is well known. An important aspect thereof is the identification of the targets of the myelinated type I and unmyelinated type II fibers in the three subdivisions of the cochlear nucleus (CN) (Ryugo, 1992, Liberman, 1991, Liberman, 1993). For the type I fibers, the frequency tuning curves (FTCs), period histograms and post-stimulus time histograms (PSTH) for simple stimuli, e.g. clicks and tone bursts, are well documented. The responses to more complex stimuli, such as elements of speech, can generally be predicted from those to more simple ones such as tones, two-tone combinations and clicks (Sachs, 1984). Within 10 years after Young’s (1998) detailed review on the CN, we perhaps will be at the same level of understanding about the CN, but the multiplicity of cell types and circuitry (Rhode, 1991, Oertel, 1999) causes this to be a more difficult endeavor. The ventral CN (VCN) extracts and enhances the frequency and timing information that is multiplexed in the firing patterns of the AN fibers, and distributes the results via two main pathways: the sound localization path and the sound identification path. The anterior part of the VCN (AVCN) mainly serves the sound localization aspects and its two types of bushy cells provide input to the superior olivary complex (SOC), where interaural time differences (ITDs) and level differences (ILDs) are mapped for each frequency separately (Carr, 1993). The posterior part of the VCN (PVCN) extracts across frequency timing aspects through its broadly tuned octopus cells, whereas its stellate cells, as well as those from the AVCN, compute estimates of the spectral representation of sound. This temporal and spectral information is carried, via the monaural nuclei of the lateral lemniscus (LL), to the central nucleus of the inferior colliculus (ICC). This sound identification path carries a level tolerant representation of complex spectra (e.g. of vowels) created by the chopper (stellate) neurons in VCN (May et al., 1998). The temporal and the spectral aspects of sound are both topographically, but mutually orthogonal, mapped in the ICC (Langner, 1992). The output from the SOC also arrives at the ICC, following some additional elaboration by the neurons in the dorsal nucleus of the LL (Wu and Kelly, 1995). In the ICC, ITDs and ILDs are combined into frequency-specific maps of interaural differences (Yin and Chan, 1988). Combining the frequency-specific ITD and ILD maps from the ICC results in a map of sound location in the external nucleus of the inferior colliculus (ICX). This auditory space map is subsequently represented in the deep layers of the superior colliculus (SC) (Middlebrooks, 1988) and aligned with the retinotopic map of visual space and the motor map of gaze (Hyde and Knudsen, 2000, Knudsen et al., 1987).

The inferior colliculi (ICs) and SCs form an important endpoint of the time-specialized part of the auditory nervous system (Trussell, 1997). In the IC, topographic maps are found for frequency, periodicity, and location of a sound. This spatial map is sufficient and necessary for adequate orientation to a sound source (Cohen and Knudsen, 1999). The ICC is the first level where physiological correlates of critical bandwidth properties, such as its level independence, are present (Ehret and Merzenich, 1988a, Ehret and Merzenich, 1988b, Schreiner and Langner, 1997).

In the last two decades one has slowly started to accept the notion that the auditory system evolved to allow the perception of sounds that are of survival value, and that the auditory system therefore has to be studied using such sounds. This neuroethological emphasis (Ohlemiller et al., 1994) has brought us a major understanding of the brains of auditory specialists such as echo locating bats (Suga, 1988, Suga, 1996) and the barn owl (Konishi et al., 1988). Speech is not fundamentally different, in the acoustic sense, from animal vocalizations, albeit that it is not as stereotyped (Suga, 1992). Human speech cannot have carried any weight in the evolutionary pressure that led to vertebrate hearing, and thus one cannot expect any particular selectivity and sensitivity in the vertebrate auditory systems, including humans, to speech. However, one may assume that human speech developed according to the constraints posed by the auditory and vocalization systems. This is reflected in the fact that human speech and animal sounds, not only those of other primates, share the same three basic elements: steady-state harmonically related frequencies, frequency modulations (FMs) and noise bursts (Mullennix and Pisoni, 1989, Suga, 1988, Suga, 1992, Fitch et al., 1997).

Most of our knowledge about the auditory thalamus and auditory cortex is obtained by stimulating with those sound elements that speech and animal vocalizations have in common: noise bursts, FM, and harmonic complexes interspersed with silent gaps. Frequency, noise burst bandwidth and preference for FM appear to be topographically mapped in cortical areas of auditory generalists (Schreiner, 1995) and specialists (Suga, 1994) alike. In contrast, sensitivity to low-frequency amplitude modulation (AM), as well as to gaps and voice onset times (VOTs), appears to be distributed across most neurons in at least three cortical areas and reflected as modulations in firing rate, that are synchronous across areas (Eggermont, 1994a, Eggermont, 1998c, Eggermont, 1999, Eggermont, 2000a). Many auditory cortical areas are tonotopically organized (Phillips, 1995, Schreiner, 1995) and they are presumably specialized to represent a limited, and likely different, set of particular important sound features (Ehret, 1997), albeit that none of these specializations, except in the mustache bat’s cortex, has definitively been identified. One would expect that separate auditory cortical areas needed to be able to integrate biologically important sound features with other perceptual and cognitive tasks. It is therefore likely that individual cortical areas fulfill a role similar to that of the various cell types and subdivisions in the CN and brainstem. It seems likely that no more than a few independent channels or types of processing can coexist within an area (Kaas, 1987). The information, extracted by each cortical area, could be used to create clustered representations of sound location (e.g. in the frontal eye fields, Cohen and Knudsen, 1999), and sound meaning (e.g. in the mammalian equivalents of premotor areas such as hyperstriatum ventrale pars caudale (HVc) in birds and Broca’s area in humans, Doupe and Kuhl, 1999).

The neural code employed by a sensory system is likely determined by the innate structure of the system, i.e. it is the connection of the pathways and the properties of their neurons that produce the coded representation (Ehret, 1997). These anatomical connections and their neuronal specialization determine what kind of neurophysiological representation of the stimulus will occur. At higher levels of the central nervous system (CNS) these representations will be modulated by neural activity reflecting internal states such as drowsiness or arousal and also by the recent perceptual history of the animal (Merzenich and deCharms, 1996).

Perkel and Bullock (1969) asked more than three decades ago: ‘Is the code of the brain about to be broken?’ They subsequently listed a large number of potential neural codes that ‘made sense’ to the neuroscientists of that time. As we will see ‘making sense’ or ‘having meaning’ is crucial to the notion of a code; it indicates that coding occurs in context. The endless list of potential codes in that review also suggested that the concept of ‘code’ was very broadly defined. The specific mentioning of the ‘code of the brain’ suggests a belief that there is only one neural code for all perceptual phenomena.

Maybe we know more about the neurophysiological substrates of the vowel /ϵ/ than of any other speech sound. We know nearly everything about the neural activity evoked by the vowel /ϵ/ in AN fibers (Delgutte, 1984, Sachs, 1984) and in various cell types of the CN (May et al., 1998). On the basis of those neural responses the investigator is able to identify that vowel from a selection of other vowels with near certainty. The auditory CNS can do so too, but lacks the a priori knowledge of the experimenter about which vowels are presented. How does the CNS do this identification? The neural responses to /ϵ/ will likely change dramatically between CN and auditory cortex. Is there, in the end, a unique code for /ϵ/? If so, how is that code formed out of the neural activities evoked by the sound? The vowel /ϵ/ can be characterized by a unique spectrum that appears to be represented in a population of T-multipolar cells (choppers) in VCN in a level tolerant way (Young, 1998). It most likely can also be uniquely represented in the population autocorrelogram of the phase-locked firings in a population of bushy cells or octopus cells in VCN (Oertel, 1999). However, we do not have any account of neural activity caused by /ϵ/ in, for instance, the IC or the auditory cortex. We do know the multi-unit activity (MUA) that is produced in some parts of auditory cortex by other phonemes such as /da/, /ta/, /ba/, and /pa/ (Steinschneider et al., 1994, Eggermont, 1995a, Eggermont, 1995b). We even know what areas in the human brain are metabolically activated differentially by presentation of voices and other sounds through visualization by positron emission tomography scans or functional magnetic resonance imaging (Belin et al., 2000). However, vowel representation will strongly depend on context, i.e. what consonants precede or follow it. It is therefore not clear if a representation of a word can be generated from the representation of its phonemes in isolation.

There is evidence that different acoustic representations exist for identical phonemes. Identical cues for particular phonemes can also give rise to different percepts as a function of context. As a result of these contextual effects, it has been difficult to isolate acoustic attributes or features that satisfy the perceptual invariance that is observed in practice. Thus, there is no simple one-to-one relationship between the acoustic segments of the speech waveform and the way they are perceived (Mullennix and Pisoni, 1989). The implications thereof for the existence of a neural code have not yet been explored.

The relative importance of spectral and temporal information present in the sounds used to convey speech understanding has been elucidated by research in cochlear implant users where limited place-specific information is delivered to the remaining AN fibers. What minimal information has to be presented to the AN fibers in case of complete sensory hearing loss so that the receiver of a cochlear implant can fully understand speech? It seems that, at least under optimal listening conditions, spectral information is far less important than temporal information for recognition of phonemes and words in simple sentences (Shannon et al., 1995). Event-related potential (ERP) research in normal hearing and implanted human subjects has elucidated a potential role for non-primary auditory pathways in signaling temporal differences in sounds that can or cannot be sensed by the auditory cortex (Ponton et al., 2000).

This review will present an extensive, albeit not exhaustive, selection of what is known about neural responses in the auditory system of laboratory animals, such as monkeys, cats and bats, related to the identification of complex sound. I will discuss the transformations in the neural activity that take place along the way from AN to cortex. The review will also speculate further, extending a previous review (Eggermont, 1998b), on that elusive interface between stimulus and perception: the neural code.

Section snippets

Biologically important features of sound have shaped the auditory nervous system

Worden and Galambos (1972) aptly noted that ‘the full capacity of the sensory processor may not be revealed except through study of its response to stimuli that pose analytical tasks of the kind that shaped its evolutionary development’. Thus, regularities in the acoustic biotope, consisting of individual vocalizations and background sounds that are part of the natural habitat of the animal (Aertsen et al., 1979, Nelken et al., 1999, Smolders et al., 1979), are likely manifested in the response

Representations and codes: defining the neurophysiological emphasis

I propose that in the field of neurophysiology, the concept ‘neural code’ be reserved for a unique specification, a set of rules, that relates behavior to neural activity. The neural code is then to be equated with the result of the last non-identity transformation of neural activity between sensory receptor and cortex. In this sense it may be analogous to the genetic code present in DNA (words of three nucleic acid bases, out of four available, code for specific amino acids) that is translated

Stimulus representation in terms of firing rate or firing times

Usually one reads about rate coding, or about temporal or synchrony coding, in the activity patterns of the AN. In our restricted nomenclature this is translated into the representation of a specific sound in the overall or synchronized firing rates of AN fibers or in the interspike intervals of AN fiber activity. In the following I will transcribe part of Cariani’s (1995) classification for neural code types into those for neural representations.

Neural representations reflecting the

Neural representations as maps

A topographic map is defined as an ensemble of neurons combining two or more neural representations with parameters systematically ordered along a spatial dimension in a given nucleus or area. Usually this takes the form of a spatially coded parameter (e.g. CF) and some other parameter (e.g. average firing rate, first-spike latency) (Schreiner, 1995, Ehret, 1997). As an example, the rate–place scheme’s central representation is formed by differences in spatially organized firing rates, i.e. as

Information processing: probing the efficiency in neural representations

In general, calculating whether the information needed to represent a particular stimulus property is present in the firings of a given neural population is much easier than to determine whether the CNS actually utilizes all or part of this information to modify its behavior (Johnson, 2000). A clear example is found in the relationship between AN activity and the threshold of hearing. In barn owls, the threshold for phase locking, i.e. being able to tell the frequency of the sound from the

Parallel distributed processing and a specialization for representing time characterize the auditory system

Sound is special among sensory stimuli. Sound source localization, in contrast to the visual and somatosensory system where stimulus location is directly mapped onto the receptor surface, has to be computed from interaural spectral and timing differences. The position of a sound source produces only minute time of arrival differences at the two ears: in the human ear at most 800 μs, for a sound located along the axis through the ears. On that basis, we are able to distinguish differences in

Representation of selected speech sound features across the auditory pathway

I will describe the neural processing of speech sounds by initially focussing on some characteristic elements of speech such as vowels, fricatives and consonants. Then I will describe the representation of formant transitions and FM, VOTs and AM. This will be followed by a survey of the neural representation of vocalizations. For all of this one has to keep in mind that standard speech and other species-specific vocalizations are usually produced, and received, at a SPL at which most

Transformation of representations along the auditory pathway

Single-unit response properties change considerably from AN to auditory cortex. AN fibers have response functions that are monotonic, have relatively narrow frequency tuning curves, and can follow carrier frequencies up to 5 kHz and AM frequencies up to 3 kHz. Depending on the cortical area, response functions can be dominantly non-monotonic or monotonic, frequency tuning can be narrow or broad, single- or multi-peaked, phase locking to pure tones is absent and that to AM ceases below 64–128

Is texture mapped, do contours synchronize?

Some of the auditory features, for which we reviewed the neural representation at various stations along the auditory pathway, include common onset and offset of sound, common rates of AM and FM, harmonicity and common spatial origin. It is useful for the subsequent synthesis to reemphasize that these auditory features can be categorized into contour and texture components of sound. Contour components are those temporal aspects of sound that covary across frequency, overlap with IBPs, and are

Proposal for a neural code: synchronized topographic maps

We have reviewed evidence for a place representation of texture features of sound, such as frequency, periodicity pitch, harmonicity in vowels, direction and speed of FM, and for a temporal and synchrony representation of sound contours, such as onsets, VOTs, and low rate (<32 Hz) AM. For most of these proposed representations, the evidence is not overwhelming or only provided by one laboratory. For instance, for the topographic maps for periodicity pitch and vowel primary frequency differences

Acknowledgements

This investigation was supported by grants from the Alberta Heritage Foundation for Medical Research and the Natural Sciences and Engineering Research Council of Canada, and by the Campbell McLaurin Chair for Hearing Deficiencies.

References (330)

  • J.J. Eggermont

    Is there a neural code?

    Neurosci. Biobehav. Rev.

    (1998)
  • J.J. Eggermont et al.

    Spectro-temporal characterization of auditory neurons: redundant or necessary

    Hear. Res.

    (1981)
  • J.J. Eggermont et al.

    Moderate noise trauma in juvenile cats results in profound cortical topographic map changes in adulthood

    Hear. Res.

    (2000)
  • G. Ehret et al.

    Complex sound analysis (frequency resolution, filtering and spectral integration) by single units of the inferior colliculus of the cat

    Brain Res. Rev.

    (1988)
  • G. Ehret et al.

    Regional variations of noise-induced changes in operating range in cat AI

    Hear. Res.

    (2000)
  • W.J.M. Epping et al.

    Single-unit characteristics in the auditory midbrain of the immobilized grassfrog

    Hear. Res.

    (1985)
  • W.J.M. Epping et al.

    Sensitivity of neurons in the auditory midbrain of the grassfrog to temporal characteristics of sound. II. Stimulation with amplitude modulated sound

    Hear. Res.

    (1986)
  • R.D. Frisina et al.

    Encoding of amplitude modulation in the gerbil cochlear nucleus: I. A hierarchy of enhancement

    Hear. Res.

    (1990)
  • D.D. Gehr et al.

    Neuronal responses in cat primary auditory cortex to natural and altered species-specific calls

    Hear. Res.

    (2000)
  • C.D. Geisler

    Representation of speech sounds in the auditory nerve

    J. Phon.

    (1988)
  • Abeles, M., 1982. Local Cortical Circuits: an Electrophysiological Study. Springer Verlag,...
  • Abeles, M., 1988. Neural codes for higher brain functions. In: Markowitsch, H.J. (Ed.), Information Processing by the...
  • A.M.H.J. Aertsen et al.

    Dynamics of neuronal firing correlation: modulation of ‘effective connectivity’

    J. Neurophysiol.

    (1989)
  • A.M.H.J. Aertsen et al.

    Spectro-temporal receptive fields of auditory neurons in the grassfrog. I. Characterization of tonal and natural stimuli

    Biol. Cybern.

    (1980)
  • A.M.H.J. Aertsen et al.

    The spectro-temporal receptive field. A functional characterization of auditory neurons

    Biol. Cybern.

    (1981)
  • A.M.H.J. Aertsen et al.

    Spectro-temporal receptive fields of auditory neurons in the grassfrog. II. Analysis of the stimulus–event relation for tonal stimuli

    Biol. Cybern.

    (1980)
  • A.M.H.J. Aertsen et al.

    Spectro-temporal receptive fields of auditory neurons in the grassfrog. III. Analysis of the stimulus–event relation for natural stimuli

    Biol. Cybern.

    (1981)
  • A.M.H.J. Aertsen et al.

    Neural representation of the acoustic biotope: on the existence of stimulus–event relations for sensory neurons

    Biol. Cybern.

    (1979)
  • Aitkin, L., 1986. The Auditory Midbrain. Humana Press, Clifton,...
  • Barlow, H.B., 1961. Possible principle underlying the transformation of sensory messages. In: Rosenblith, W. (Ed.),...
  • P. Belin et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • A. Bieser

    Processing of twitter-call fundamental frequencies in insula and auditory cortex of squirrel monkeys

    Exp. Brain Res.

    (1998)
  • A. Bieser et al.

    Auditory responsive cortex in the squirrel monkey: neural responses to amplitude-modulated sounds

    Exp. Brain Res.

    (1996)
  • C.C. Blackburn et al.

    The representations of the steady-state vowel sound /ϵ/ in the discharge patterns of cat anteroventral cochlear nucleus neurons

    J. Neurophysiol.

    (1990)
  • Black, I.B., 1991. Information in the Brain: a Molecular Perspective. The MIT Press, Cambridge,...
  • E. de Boer

    Correlation studies applied to the frequency resolution of the cochlea

    J. Audit. Res.

    (1967)
  • E. de Boer

    Reverse correlation. II. Initiation of nerve impulses in the inner ear

    Proc. R. Neth. Acad. Sci.

    (1969)
  • E. de Boer et al.

    On cochlear encoding: potentialities and limitations of the reverse correlation technique

    J. Acoust. Soc. Am.

    (1978)
  • A. Borst et al.

    Information theory and neural coding

    Nat. Neurosci.

    (1999)
  • M. Brosch et al.

    Correlations between neural discharges are related to receptive field properties in cat primary auditory cortex

    Eur. J. Neurosci.

    (1999)
  • Brugge, J.F., 1992. An overview of central auditory processing. In: Popper, A.N., Fay, R.R. (Eds.), The Mammalian...
  • E. Buchfellner et al.

    Gap detection in the starling (Sturnus vulgaris). II. Coding of gaps by forebrain neurons

    J. Comp. Physiol. A

    (1989)
  • D.V. Buonomano et al.

    Temporal information transformed into a spatial code by a neural network with realistic properties

    Science

    (1995)
  • P. Cariani

    As if time really mattered: temporal strategies for neural coding of sensory information

    Commun. Cogn. Artif. Intell.

    (1995)
  • P. Cariani

    Emergence of new signal-primitives in neural systems

    Intellectica

    (1997)
  • P. Cariani

    Temporal coding of periodicity pitch in the auditory system: an overview

    Neural Plast.

    (1999)
  • Cariani, P., 1999b. Neural timing nets for auditory computation. In: Greenberg, S., Slaney, M. (Eds.), Computational...
  • P. Cariani et al.

    Neural correlates of the pitch of complex tones. I. Pitch and pitch salience

    J. Neurophysiol.

    (1996)
  • C.E. Carr

    Processing of temporal information in the brain

    Annu. Rev. Neurosci.

    (1993)
  • L.H. Carney et al.

    A temporal analysis of auditory-nerve fiber responses to spoken stop consonant-vowel syllables

    J. Acoust. Soc. Am.

    (1986)
  • Cited by (216)

    • Simultaneous EEG and MEG recordings reveal vocal pitch elicited cortical gamma oscillations in young and older adults

      2020, NeuroImage
      Citation Excerpt :

      Transient gamma responses in the 40-Hz range indicate the temporal segmentation of speech (Giraud and Poeppel, 2012), while the f0 response encodes the voice pitch. The complex patterns of speech sound are processed along the auditory pathway (Eggermont, 2001) before language processing takes place in the auditory cortex (Scott and Johnsrude, 2003). This hierarchical processing schema involving the brainstem and the cortex allows of reliable speech perception even when the acoustical properties are varying.

    View all citing articles on Scopus
    View full text