ML Architecture6 min readDecember 8, 2024

Transfer Learning for Financial Time Series: Pre-Training the Exit LSTM

Pre-trained LSTM on raw price sequences before fine-tuning for exit prediction, achieving 40% faster convergence through transfer learning.

LSTMTransfer LearningPre-training

The Data Problem for Exit Models

Exit prediction is harder than entry prediction because exit labels are sparser. Every bar can potentially be an entry, but exit labels only exist for bars where a trade is actually open. With 4,505 trades averaging 12 bars held each, there are about 54,000 exit-relevant training examples. That sounds like a lot, but for an LSTM with 128 hidden units, it is borderline.

S22 solves this by pre-training the LSTM on a much larger dataset: raw price sequence prediction. The model learns to predict the next bar's close from the previous 50 bars. This task has millions of training examples (one per bar across all instruments and timeframes) and teaches the LSTM general patterns about price sequences.

Transfer Learning Results

After pre-training on raw sequences, the LSTM's weights already encode useful representations of price dynamics. Fine-tuning for exit prediction starts from these informed weights rather than random initialization. The result is 40% faster convergence (100 epochs instead of 170) and 3% better validation loss at convergence.

The faster convergence is the bigger practical benefit. Less training time means faster iteration during development and less risk of overfitting through excessive epoch count. The 3% loss improvement is modest but compounds across all exit decisions.

When Pre-Training Helps and When It Does Not

Pre-training helped most for instruments with fewer historical trades (CRYPTO cluster, which has the shortest data history). For data-rich instruments like EURUSD, the benefit was smaller because the fine-tuning dataset was already large enough to learn good representations from scratch. This pattern is consistent with transfer learning literature: the benefit is largest when the target task has limited data. For V7, the practical implication is that pre-training enables the LSTM exit model to perform well on newer or less-traded instruments where direct training data is limited. As the system expands to more instruments, this pre-training advantage will become more important.

← PBO 0.112: The One Number That Keeps Me Honest The Bug That Took Two Months: Why Feature Preprocessing Is Not Optional →