Fun Papers · Xupeng's Archive

These are not reading lists; they are stories. I wrote them on trains, in small cafés, and one of them on the floor of a hospital waiting room. If you're curious enough to click one, you already deserve the punchline. Each card below opens into a proper write-up — with figures, numbers, interactive toys and the occasional confession — not just an abstract.

The pieces are grouped into four shelves. Brains & speech is where I spend most of my time; seven entries on decoding the cortex. AI & the world around it is the shelf where I'm a tourist, armed with a pen; three essays and two LLM papers on what happens when thinking becomes cheap. Vision & the edges is medical images, drones, pixels — four entries. Applied ML is a new shelf with three time-series / finance papers I contributed to.

SHELF_IBRAINS & SPEECH

7 entries · the main shelf

NAT.MACH.INTELL

FIRST AUTHORNature Machine Intelligence · 202448 participants

A neural speech decoding framework, or: how to listen to a brain talk.

An ECoG-to-speech pipeline with a differentiable synthesizer — it turns cortical signals into a voice that still sounds like the person who owns the cortex. Works in real time, works on the right hemisphere, works with low-density clinical grids. Peak PCC = 0.806.

BCIECoGDeep LearningSpeech Synthesis

▸ open_entry.dat →

Co-authorPNAS · 2023causal analysis

Distributed feedforward & feedback cortical processing, how the brain talks to itself while it talks.

Speech production, in textbooks, is a little assembly line. The cortex appears not to have read the textbook. We built a causal analysis and watched feedforward and feedback fire in the same windows, every word.

NeuroscienceECoGCausalityPNAS

▸ open_entry.dat →

FIRST AUTHORNature Communications · 2024cross-subject

Subject-agnostic transformer-based neural speech decoding, from surface and depth electrodes.

Most decoders memorise the subject they were trained on. We built a transformer that generalises across people — ECoG grids, sEEG depth probes, different implants, different regions — without retraining. One of the projects I am quietly the proudest of.

BCITransformerGeneralizationsEEG

▸ open_entry.dat →

Co-authorarXiv · 2509.09015 · 2025fMRI decoding

VoxelFormer, parameter-efficient visual decoding from many brains at once.

Reconstructing what someone is seeing from their fMRI usually means one model per subject and an enormous adapter. VoxelFormer asks: what if the voxels themselves are the tokens and the model is one, shared, tiny transformer?

fMRITransformerVisual Decoding

▸ open_entry.dat →

Co-authorarXiv · 2509.08703 · 2025ROC-AUC 0.87

Predicting speech arrest during cortical stimulation, before anyone touches the scalpel.

Before epilepsy surgery, neurologists stimulate individual electrodes to find the ones that freeze the patient mid-word. It's slow, painful, sometimes you run out of time. We built an ML model that predicts which electrodes cause arrest from resting-state data alone.

ClinicalECoGSurgery Planning

▸ open_entry.dat →

Co-authorPNAS · 10.1073/pnas.2404121121 · 2024ECoG

A corollary discharge circuit in human speech, the 120ms signal the brain sends itself.

Every animal that moves needs a copy of the motor command sent to sensory cortices — so they know "I did that" instead of being startled. We mapped the human version. It starts in ventral preCG, travels 120ms to STG, and carries a spectral prediction of the about-to-be-produced sound.

NeuroscienceECoGCausalityPNAS

▸ open_entry.dat →

FIRST AUTHORIEEE ISBI · doc 9098589 · 2020first PhD paper

Stimulus speech decoding with GAN transfer learning, the paper where I learned to cheat honestly.

ECoG-paired audio data is scarce. Training a speech generator from scratch on it is impossible. We pretrained a GAN on free natural speech, froze it, then trained a tiny ECoG encoder into its latent space. The generator is the prior; the encoder is the only learned piece.

BCIGANTransfer Learning

▸ open_entry.dat →

SHELF_IIAI & THE WORLD AROUND IT

5 entries · essays + LLM research

CO-FIRST AUTHORarXiv · 2604.03272 · 2026with Shuchen Meng

AI & systemic risk, when every trader quietly has the same brain.

Three channels — performative prediction, algorithmic herding, cognitive dependency — combine into a convex coupling. The systemic-risk multiplier grows superlinearly in AI adoption; we prove it, then watch the saddle-node bifurcation into monoculture.

FinanceSystemic RiskGame Theory

▸ open_entry.dat →

FIRST AUTHORarXiv · 2603.09209 · 2026essay

Abundant intelligence, deficient demand.

What if the AI bust isn't a productivity crash but a distribution mismatch? A displacement spiral, a ghost-GDP wedge, and contracts anchored to a cognitive scarcity that no longer exists. The paper is a macro stress test wearing a philosopher's hat.

MacroAI EconomicsStress Test

▸ open_entry.dat →

CO-FIRST AUTHORarXiv · 2603.05565 · 2026with Shuchen Meng

When AI levels the playing field, and then tilts it the other way.

If AI collapses skill differences, inequality should fall. It doesn't. Skill homogenization compresses labor returns, but asset concentration does the opposite — so we get two regimes, and the regime you land in depends on who owns the compute.

InequalityAI LaborCapital

▸ open_entry.dat →

Co-authorarXiv · 2505.11423 · 202515 models tested

When thinking fails, why chain-of-thought hurts instruction-following.

Reasoning models crush math. Everyone knows that. We found that "think step by step" consistently hurts plain instruction-following on 15 models tested. The mechanism: constraint attention collapses during CoT. The fix: a tiny routing classifier that picks when to reason.

LLMReasoningAlignment

▸ open_entry.dat →

Co-authorAAAI · vol. 40 no. 37 · 2026RewardBench

ENCORE, entropy-guided reward composition for safety models.

Multi-head reward models judge responses against rules. Some rules are noisy; their scores are all over the place. We show that downweighting high-entropy rules — the simplest possible fix — beats SOTA on RewardBench-safety by +4.5 points. No retraining. Theoretically grounded via Bradley–Terry.

▸ open_entry.dat →

SHELF_IIIVISION & THE EDGES

5 entries · medical imaging, drones, pixels

FIRST AUTHORarXiv · 2410.20327 · 2024Med-VQA

R-LLaVA, or teaching a VLM where to look.

Med-VQA models tend to read medical images the way a tourist reads a map — all at once. We hand them a bounding box instead. Tiny annotation, big accuracy gain; the model finally notices what a doctor would have circled.

Vision-LanguageMedicalVQA

▸ open_entry.dat →

Co-authorarXiv · 2603.04277 · 2026UAVs

VANGUARD, recovering scale from parked cars.

A drone in a GPS-denied alley knows what things are but not how big its image is. We used the one object whose size is culturally standardised — the sedan — as a scale bar. Cheap, lightweight, survives vision-language spatial hallucinations.

UAVGeometryPerception

▸ open_entry.dat →

Co-authorarXiv · 2603.22371 · 2026clinical

Cerebral palsy severity from video, skeletons plus the things clinicians already measure.

Clinical gait descriptors are expensive; skeleton-tracking is cheap. We fused them. The cheap stream does the heavy lifting; the expensive stream lends interpretability, which is the part clinicians actually want. 70.86% accuracy and a story radiologists can follow.

ClinicalMultimodalSkeleton Tracking

▸ open_entry.dat →

Co-authorECCV · Ch. 7 · Springer · 2020Harvard / NYU

Active learning for connectomics, two streams are better than one.

Electron-microscopy brain volumes are a billion voxels each. Experts pay by the hour to annotate. A two-stream query-suggestion scheme — one supervised signal and one unsupervised — picks what to label next. −40% labels to reach the same segmentation accuracy.

ConnectomicsActive LearningECCV

▸ open_entry.dat →

Co-authorarXiv · 2403.11155 · 2024NYU Video Lab

Interactive 360° video streaming, FoV-adaptive coding with temporal prediction.

VR video is expensive because 90% of it is behind the viewer. We proposed a two-zone scheme: a premium core uses temporal+spatial prediction; a rotation margin uses intra-only so head turns don't freeze. −40% bandwidth vs all-intra.

VRVideo CodingStreaming

▸ open_entry.dat →

SHELF_IVAPPLIED ML · TIME-SERIES & FINANCE

3 entries · contributions to industry-facing work

Co-authorIEEE · doc 10704454 · 2024QLoRA

QLoRA earnings predictions, a 7B model that read SEC filings.

Quarterly earnings reports are dense and full of non-obvious signal. We instruction-tuned a 7B LLM with QLoRA on (report → 5-day return) pairs. The management tone shift — a thing buy-side analysts already watch — drove the edge. Beat zero-shot GPT-4 on a single A100.

FinanceLLMQLoRA

▸ open_entry.dat →

Co-authorIEEE · doc 10800545 · 2024RMB/USD

Explainable exchange-rate forecasting, when TSMixer quietly beats the transformers.

LSTM vs CNN vs transformer vs TSMixer on daily RMB/USD. TSMixer won, not because it's bigger, but because exchange-rate dependencies are long and shallow — a perfect match for its architectural prior. SHAP produced per-feature attributions that a trader can actually read.

FinanceTSMixerSHAP

▸ open_entry.dat →

Co-authorIEEE · doc 10695966 · 2024wearables

Heart-rate prediction, from ARIMA to transformers — the newer isn't always better.

We ran ARIMA, LSTM, CNN, transformer, and TSMixer on the same wearable heart-rate data. ARIMA wins short horizons. TSMixer wins medium ones. Transformers win nowhere. Model-mismatch story, not a compute-budget story.

HealthTime-SeriesBenchmark

▸ open_entry.dat →