PNAS · 202313 min read

Distributed feedforward & feedback, or: how the brain talks to itself while it talks.

Speech production, in textbooks, is a little assembly line: an idea, a plan, a motor command, a mouth. The cortex appears not to have read the textbook. We built a causal analysis, and we watched feedforward and feedback fire in the same windows, in the same regions, every word.
A · feedforward (motor → auditory)IFGSTGpreCGB · feedback (auditory → motor) · simultaneousIFGSTGpreCGboth graphs live in the same 300ms window, in the same cortex, every syllable.
Fig. 1 — Simultaneous feedforward and feedback during speech production. Neither is the "real" direction — both are.

The textbook picture of speech production looks like a pipeline. Brocastarts the plan, the motor cortex sends the command, the mouth makes the sound, and the auditory cortex — somewhat passively — listens in to check that nothing went wrong. It is a clean picture. It is also, mostly, fiction. This paper is about how fictional it is, and what you have to do with it to fit the data.

The short version: speech production is not a pipeline. It is two circles drawn on top of each other.

What we measured, and why ECoG

We had intracranial electrocorticography (ECoG) from a set of patients undergoing clinical monitoring at NYU Langone. ECoG is the only technique I know that lets you measure millisecond-resolution activity from dozens of cortical sites at once, in a person who is speaking naturally, in a sentence you can actually parse. fMRI has seconds; scalp EEG has centimeters. ECoG has both axes a neuroscientist wants, at the cost of only being available in people who were going to have a surgery anyway.

We recorded during a simple paradigm: listen to a word, and then repeat it. The paradigm matters, because it lets us separate the listening signal from the speaking signal and — crucially — look at what happens during the transition.

The causal method, in plain language

You can't just correlate activity in region A with activity in region B and call it a connection. Correlation is symmetric; information flow is not. The technique we used — Granger-family causal analysis, specifically a non-parametric spectral variant — asks a directional question: does past activity in A help predict future activity in B above and beyond B's own past? If yes, there is directed information from A to B in that frequency band.

And then the trick, because this is what the paper really hinges on: you do the analysis twice. Once from frontal regions to temporal. Once the other way around. For each time window, for each frequency band, for each pair of electrodes.

Interactive · slide the time cursor, watch both flows
-300-1500150300speech onsetFEEDFORWARD · IFG → STGFEEDBACK · STG → IFGFF: 0.00FB: 0.00
Slide the cursor. Both traces are nonzero almost everywhere. That is the whole point of the paper.
Simulated traces · matches Fig. 3 of the PNAS paper in shape, not in magnitude

The result, stated as carefully as I can

We expected to find feedforward dominance around speech onset, and feedback dominance a few hundred milliseconds later when the auditory cortex "heard" the output and corrected. That is the cartoon.

What we found instead: feedforward and feedback flows coexist through the entire speech-production window. At 100ms before onset, both are nonzero. At 0ms, both are nonzero and both are larger. At 200ms after onset, both are still nonzero. The timing of peaks differs, the frequency bands differ, but neither ever goes to baseline while the other is active.

Speech production is not a hand-off. It is a duet. Frontal regions are singing to temporal regions while temporal regions are singing back. If you cut either flow, the whole thing falls apart.

Why this matters for BCI

If speech production were pipelined, a brain-computer interface could sample at the "output" end — motor cortex, say — and reconstruct speech from commands. That model is common, and it works, sort of. But it underuses the signal.

If production is a duet, the posterior regions carry usable speech information during production, not just during perception. That is part of why our decoders in the speech decoding paper work as well as they do with temporal electrodes and no motor coverage. This paper is, in a sense, the mechanistic explanation for why the decoding paper's numbers are even possible.

GRANGER-F · MEAN ACROSS PATIENTS (n=12)IFG → preCG0.45preCG → STG0.55STG → preCG0.42STG → IFG0.38feedforwardfeedback
Fig. 2 — All four edges are nonzero. The feedback edges are a little smaller on average, but they are never zero.

The harder story

A reviewer once pushed back with the reasonable objection that distinguishing "speech-production feedback" from "speech-perception of your own voice" is, in principle, a confound. How do you know the feedback flow isn't just the hearing of your own output, echoing back?

We answered it with a series of controls I am proud of: latency-matched listening conditions, anticipation trials where the subject was about to speak but didn't, and a frequency-band decomposition where feedback in production lives in a band the auditory-perception signal doesn't occupy. The reviewer was satisfied. I was satisfied. I will be the first to say the controls are not perfect. This kind of paper never is.

Personal note

This was the first paper where I felt like the field gave something back. We showed the figures to a group of speech neuroscientists early, and instead of the usual polite nodding, half the room had an anecdote. Someone had seen hints of this in a case study in 2011. Someone else had seen an inconsistent follow-up that they now, in hindsight, understood. A duet model unifies a lot of little loose ends. That was the moment I understood what people mean when they say a paper "lands."

← all fun papers Next: 07 Subject-Agnostic Transformer →