Neural Wetwares for Phonological Software

How are the primes of Phonological Computations Implemented in the Brain?

Spoken communication relies on transmission of a continuous and analog signal that carries the message from the speaker to the listener. Yet, the consensus is that speech processing, i.e. the perception and articulation of speech, requires analyses in terms of algebraic units referred to as distinctive features (Chomsky & Halle, 1968; Hale & Reiss, 2008). Individual sounds in words, such as the k-sound in ‘cat’, are decomposed into bundles of distinctive features. An obvious associated issue concerns whether these features encode articulatory/motor information or whether they represent perceptual/acoustic information? Emerging neurobiological evidence cautions against a purely acoustic (Guenther et al., 2006), an articulatory (Holt & Lotto, 2008), or even a simple mirror-neuron driven hypothesis (Hickok et al., 2011). Rather, Hickok et al. (2011) explicitly argue that speech processing requires an integration of acoustic/perceptual and articulatory/motor information in the brain, primarily carried out by the Spt, a brain area conveniently situated in the Sylvian fissure at the parietal-temporal boundary. Crucially, Spt activates both during passive perception of speech sounds and during sub-vocal/covert articulation. Sub-vocal activation implies that Spt is not being driven by overt auditory feedback, suggesting rather that it is involved in sensorimotor integration (Hickok et al. 2011). Further, Hickok and Poeppel (2007) report different sub-regional patterns of activity associated with the sensory and motor phases, suggesting distinct neuronal subpopulations for each phase . More broadly, while Spt is not speech-specific, activating reliably for perception of tonal melodies and tasks involving humming, speech induced activity in Spt is highly correlated with pars opercularis in Broca’s region. Activity in Spt is also reported to be motor-effector selective (Pa & Hickok, 2008), with noticeably more robust activity when motor tasks involve the vocal apparatus as opposed to manual effectors. It is conveniently located in between networks of auditory (superior temporal sulcus) and motor (pars opercularis, premotor cortex) regions, and diffusion tensor imaging studies suggest that Spt and posterior sector of Broca’s region are densely connected at the anatomic level. It is, thus, both anatomically and anatomically well position to support sensory-motor integration of the type required for linguistic processing of speech signals.

Mesgarani et al. 2014 report that subjects implanted with multi-electrode arrays exhibit systematic selectivity of electrodes to phonetic properties specific to classes of speech sounds (e.g., stops vs. fricatives) during perception. Such classes of sounds are usually defined in terms of phonological features, and Mesgarini and colleagues identify populations of neurons that are specifically tuned to spectrotemporal cues associated with individual classes. While this is positive evidence for the representation of acoustic-phonetic information in the brain, our design seeks to fill several gaps in this design. First, note that while classes of sounds discussed by Mesgarani et al. (2014) (e.g., fricatives, stops) are defined in terms of phonological features, any given class is defined by a set of valued features (e.g., fricatives might be defined as +Continuant, - Distributed …). Thus, a population of neurons responding to fricative sounds is, in fact, being activated by multiple different features. While this is not a concern from a general neuroscience perspective, where such localization has multiple utilities, such broad class labels only serve a taxonomic purpose in biolinguistic analyses. Rather, such analyses crucially hinges on the neurocognitive reality of individual phonological features. For instance, vowel harmony is an often discussed cross-linguistic phenomena. An often noted fact is the recurrence of the feature ±ATR in harmony processes of numerous languages. This feature is also usually associated with the ‘vocalic’ class of sounds. From a biolinguistic perspective, however, evidence for the neural correlates of +/-ATR is far more intriguing than correlates for ‘vocalic’ sounds. The former localizes the computational primitives of the theory in the brain, while the latter merely helps identify higher-order constructs of broader granularity. In other words, the ability to identify vowels from consonants does not go very far in decoding linguistic information from speech signals, and this can be extended to other such classes. To this end, we adopt stimuli that systematically contrast speech sounds in terms of single distinctive features (e.g. sip vs. zip differ only in terms of +/-Voice). Such minimal pairs of words contrast in just one sound in a specific position, here the ‘s’ vs. the ‘z’ sound in the initial position. Further, these two sounds themselves, when broken down to their atomic constituents, contrast in terms of just one feature, +/-Voice. Thus, the relative activation of populations of neuron to this specific contrastive pair stands to shed light on the phonological representation of ±Voice at the neurobiological level. Repeated over the set of acknowledged features in the literature our design can be fruitfully applied to develop a detailed map of phonological features in the brain.

Second, our design is characterized by its focus on the neurobiological correlates of phonological features, and not a subset of what they represent (e.g. Mesgarani et al., 2014) focus solely on acoustic representations). In actual phonological processing features are used in a bidirectional manner — they help, both, encode phonological information for production and decode them during perception. Volenec and Reiss (2020), thus, rightly point out that a neurobiological account of distinctive features must take into account both production and perception. Similar arguments are advanced by Hickok et. al. (2011) who, as noted above, point out that empirical evidence suggests that the Spt is a prime candidate for information integration. To this end our design collects neural responses across the entire range of speech-processing modes – production, perception, silent articulation (or speech planning without over articulation). Our study will specifically target Area Spt, alongside a network of regions including Broca’s area, STG, premotor cortex and the cerebellum. A secondary focus will be on the anterior part of the insular cortex. Hickok and colleagues report that activation in the Spt is both (a) motor-effector selective with a bias for articulators, and (b) exhibit sub-regional activity for motor and sensory phases. Our design specifically aims to compare feature-driven activations across the modalities of speech processing. Since the same feature set underlies both production and perception of speech, it is a logical step if we want to formulate a coherent account of how a set of postulated cognitive primitives (features) are instantiated in the brain and how they produce the desired behavioral effects.