MERGE | Syntax

A Large-Scale Behavioral Phenotyping Study on the stratification of syntactic competence.

One Operation, Multiple Realities?

A central claim of modern syntactic theory is that human language is driven by a single, structure-generating operation known as Merge. This operation is what allows us to combine simple elements into complex, hierarchical expressions. While the computation itself may be unitary and discrete, the way humans actually use it—in real-time comprehension and production—appears to be highly stratified.

We observe that children acquire different structures at different stages, and clinical cases often show selective loss of specific syntactic abilities. This raises a fundamental question: Is syntax a uniform continuum of processing, or is it supported by distinct algorithmic subsystems that handle different classes of structural objects?

How Indian Languages Provide the Answer

To test whether the behavioral signature of syntax is universal, we must look beyond European languages. Indian languages (from both the Indo-Aryan and Dravidian families) offer a unique laboratory. With their rich case marking, extensive scrambling (flexible word order), and agglutinative morphology, they decouple the “linear order” of a sentence from its “hierarchical structure.”

If we find the same discrete “tiers” of comprehension in a Tamil speaker as we do in a Hindi or Bengali speaker, we have powerful evidence for a universal neurocognitive architecture for syntax.

The Study: Large-Scale Behavioral Phenotyping

This project is a massive data-driven effort to map the latent structure of language comprehension. Rather than imposing theoretical categories from the top down, we use Unsupervised Hierarchical Cluster Analysis to see how abilities naturally group across three domains:

  1. Command-level: Simple action mapping and prohibitions.
  2. Modifier-level: Phrase-internal combinations (e.g., “the big red ball”).
  3. Syntactic-level: Relational binding, tense contrasts, and role reversal under scrambling.

By identifying these “comprehension tiers,” we aim to provide a bridge between abstract grammatical theory and the actual biological implementation of language in the brain. Whether the result shows a smooth continuum or sharp, all-or-nothing thresholds, the findings will fundamentally constrain our models of language evolution and development.