IEEE · 20246 min read

Explainable exchange-rate forecasting, or: when TSMixer quietly beats the transformers.

We put LSTM, CNN, transformer and TSMixer on RMB/USD daily data with a rigorous feature-selection pipeline. TSMixer won. The paper's real value isn't the winner — it's the why, which we spent more time on than the benchmark.
RMB/USD · ACTUAL vs PREDICTED (validation window)actual RMB/USDLSTM (baseline)TSMixer (ours)
Fig. 1 — TSMixer hugs the actual trajectory more tightly than the LSTM baseline, particularly on the turning points.

Exchange rates are famously hard. The random-walk baseline is embarrassingly good. Any model worth reporting has to beat "yesterday's price" by a margin that survives transaction costs, and that is a higher bar than the DL literature tends to acknowledge. This paper is a careful study of what does and doesn't clear that bar for RMB/USD daily returns.

What we actually did

Feature selection was the project. We ran four models — an LSTM, a 1D-CNN, an attention-only transformer, and TSMixer — with three feature sets: price only; price + China-US trade volumes; price + trade + cross-rates (EUR/RMB, JPY/USD). The best model wasn't the flashiest. TSMixer — which is basically two MLPs and an intuition about time-channel mixing — was the one that generalised.

The reason is unglamorous. Exchange-rate series have long but shallow dependencies: persistence over weeks, not deep multi-step causal chains. Transformers over-fit the attention patterns in training data. TSMixer's architectural prior — "mix across time with a channel-wise MLP" — happens to match the data.

Interactive · add features, watch the models diverge
RMSE (↓ better) · val setRandom walkLSTMTransformerTSMixer (ours)
Click through the feature sets. TSMixer's advantage grows with features; the transformer saturates. That's the story.
Simulated · shape matches §4 of the paper

Explainability

A forecast you can't explain is useless to a trader. We used SHAP over the time-channel mixing weights to produce per-feature attributions for each prediction. China-US trade volume was the largest contributor by a wide margin. EUR/RMB co-moved enough to be useful. CPI surprises showed up only at macro-event windows — sparse but high-magnitude contributions, which SHAP captures naturally.

The transformer's attention map was a mess of fake signals. TSMixer's channel weights were legible. That, more than the RMSE number, is why this is the model we'd ship.
−18%
RMSE vs RW
TSMixer
Winning model
SHAP
Per-feature attribution

What this wasn't

It wasn't a trading paper. We did not backtest a strategy. We forecast the rate, explained the forecast, and stopped. The gap between a good forecaster and a profitable trader is real and involves risk management we did not solve. A reader who is about to deploy this in production should read paper 16's caveats; they apply here verbatim.

← all fun papers Next: 18 Heart Rate Prediction →