IEEE · 20246 min read

Earnings reports into stock predictions, a tiny LLM that read SEC filings for a living.

Quarterly earnings reports are dense, mandatory, textual and full of non-obvious signal. We instruction-tuned a small LLM with QLoRA on earnings-call transcripts and 10-Ks, paired with subsequent 5-day returns. The finetune beat zero-shot GPT-4 and ran on a single GPU.
10-Q / EARNINGS CALL TRANSCRIPTLLM (7B) · 4-bit quantized+ LoRA (trainable · 0.4%)PREDICTED 5-DAY RETURNAAPL+1.8%NVDA+4.2%META−1.4%GOOG+0.6%
Fig. 1 — Input: quarterly report. Training signal: 5-day returns after the filing. Adapter: QLoRA (0.4% of base weights).

Earnings reports are a strange dataset. They arrive at predictable times, they are mandatory, they are written in a register that has drifted very little since the 90s, and they carry signal that is not in the price history because it is text and no one has priced it yet in a rigorous way. Classical NLP choked on them — the vocabulary is domain-specific and the relevant inferences require cross-section context. Modern LLMs have no such issue, but naively prompting GPT-4 with a 10-K and asking "will the stock go up?" gives you a polite nothing.

The fix, which is cheap

We used QLoRA to instruction-tune a 7B open-weights model on a supervised dataset of (earnings transcript → 5-day forward return) pairs, framed as a "predict the return bucket" classification. 4-bit quantisation plus low-rank adapters means the whole finetune fits on a single A100. No proprietary data, no enormous compute budget, no custom tokeniser.

Interactive · how much does text add beyond numbers?
SHARPE (5-DAY) · LONG-SHORT PORTFOLIOprice only (baseline)0.8numbers + text (GPT-4 0-shot)1.1QLoRA finetune (ours)2.3slide to change evaluation horizon
The textual signal decays with horizon. Shorter windows = bigger edge. At 1 day the finetuned model is crushing; at 20 days everything is ≈0.
Illustrative · shape matches Fig. 3 of the paper qualitatively

What surprised us

Management tone was more predictive than numbers-in-the-filing. Specifically, the delta between this quarter's management commentary and the previous quarter's, scored on a handful of lexical dimensions. A company that goes from "cautiously optimistic" to "we see headwinds" in the guidance section predicts a drawdown whose magnitude is larger than you would get from the numbers alone. The LLM picks this up natively; a numbers-only model can't.

The model learned what every buy-side analyst already knows: the tone shift is the signal. But it learned it from text alone, without a person explaining what to look for.
+1.5
Sharpe vs baseline
1×A100
Compute footprint
4-bit
Quantisation

Caveats, honestly

This is not a trading system. It is a paper. Backtests are backtests. Live markets have transaction costs, capacity constraints, and regime shifts a backtest doesn't see. If you deploy a 7B open-weights LLM as your signal source and do not think carefully about cost-of-capital and crowding, you will lose money. The paper's contribution is the methodological one — that QLoRA on earnings text is a viable formulation — and not an investment recommendation.

← all fun papers Next: 17 Exchange Rate DL →