The textbook picture of speech production looks like a pipeline. Brocastarts the plan, the motor cortex sends the command, the mouth makes the sound, and the auditory cortex — somewhat passively — listens in to check that nothing went wrong. It is a clean picture. It is also, mostly, fiction. This paper is about how fictional it is, and what you have to do with it to fit the data.
The short version: speech production is not a pipeline. It is two circles drawn on top of each other.
What we measured, and why ECoG
We had intracranial electrocorticography (ECoG) from a set of patients undergoing clinical monitoring at NYU Langone. ECoG is the only technique I know that lets you measure millisecond-resolution activity from dozens of cortical sites at once, in a person who is speaking naturally, in a sentence you can actually parse. fMRI has seconds; scalp EEG has centimeters. ECoG has both axes a neuroscientist wants, at the cost of only being available in people who were going to have a surgery anyway.
We recorded during a simple paradigm: listen to a word, and then repeat it. The paradigm matters, because it lets us separate the listening signal from the speaking signal and — crucially — look at what happens during the transition.
The causal method, in plain language
You can't just correlate activity in region A with activity in region B and call it a connection. Correlation is symmetric; information flow is not. The technique we used — Granger-family causal analysis, specifically a non-parametric spectral variant — asks a directional question: does past activity in A help predict future activity in B above and beyond B's own past? If yes, there is directed information from A to B in that frequency band.
And then the trick, because this is what the paper really hinges on: you do the analysis twice. Once from frontal regions to temporal. Once the other way around. For each time window, for each frequency band, for each pair of electrodes.
The result, stated as carefully as I can
We expected to find feedforward dominance around speech onset, and feedback dominance a few hundred milliseconds later when the auditory cortex "heard" the output and corrected. That is the cartoon.
What we found instead: feedforward and feedback flows coexist through the entire speech-production window. At 100ms before onset, both are nonzero. At 0ms, both are nonzero and both are larger. At 200ms after onset, both are still nonzero. The timing of peaks differs, the frequency bands differ, but neither ever goes to baseline while the other is active.
Speech production is not a hand-off. It is a duet. Frontal regions are singing to temporal regions while temporal regions are singing back. If you cut either flow, the whole thing falls apart.
Why this matters for BCI
If speech production were pipelined, a brain-computer interface could sample at the "output" end — motor cortex, say — and reconstruct speech from commands. That model is common, and it works, sort of. But it underuses the signal.
If production is a duet, the posterior regions carry usable speech information during production, not just during perception. That is part of why our decoders in the speech decoding paper work as well as they do with temporal electrodes and no motor coverage. This paper is, in a sense, the mechanistic explanation for why the decoding paper's numbers are even possible.
The harder story
A reviewer once pushed back with the reasonable objection that distinguishing "speech-production feedback" from "speech-perception of your own voice" is, in principle, a confound. How do you know the feedback flow isn't just the hearing of your own output, echoing back?
We answered it with a series of controls I am proud of: latency-matched listening conditions, anticipation trials where the subject was about to speak but didn't, and a frequency-band decomposition where feedback in production lives in a band the auditory-perception signal doesn't occupy. The reviewer was satisfied. I was satisfied. I will be the first to say the controls are not perfect. This kind of paper never is.
Personal note
This was the first paper where I felt like the field gave something back. We showed the figures to a group of speech neuroscientists early, and instead of the usual polite nodding, half the room had an anecdote. Someone had seen hints of this in a case study in 2011. Someone else had seen an inconsistent follow-up that they now, in hindsight, understood. A duet model unifies a lot of little loose ends. That was the moment I understood what people mean when they say a paper "lands."