← Back to Research
Machine Learning11 min readSeptember 12, 2024

LSTM Exit Timing: Formulating Exits as Sequence Prediction

How LSTM networks can predict optimal exit timing by learning from trade trajectory sequences.

LSTMNeural NetworksExit Strategy

Why Exits Are Harder Than Entries

There is a saying in trading: "Entries are easy, exits are hard." After four years of system building I can confirm this is absolutely true.

Entry signals are static decisions. Given current market conditions, should I open a position? You have a snapshot of features and predict a binary outcome. Standard classification problem.

Exits are sequential decisions. The optimal exit depends on:

How long you have been in the trade
The trajectory of profit/loss since entry
How far the trade went in your favor (MFE)
How far it went against you (MAE)
Current market regime
Whether the original signal conditions still hold

This is a sequence prediction problem, and LSTMs (Long Short-Term Memory networks) are specifically designed for sequence data.

The Trade Trajectory Representation

Each open trade generates a sequence of observations at every bar:

Trade-specific features (10):

bars_held: integer count since entry
current_r: current R-multiple (profit/loss divided by initial risk)
mfe_r: maximum favorable excursion in R
mae_r: maximum adverse excursion in R
r_velocity: rate of change in R over last 5 bars
mfe_ratio: current_r / mfe_r (how much of the peak have we given back?)
entry_confidence: original L1/L2 signal strength
trade_direction: LONG (+1) or SHORT (-1)
time_of_day: normalized session time
day_of_week: encoded as cyclic features

Market state features (20):

Standard price action, momentum, volatility features
Regime indicators (Hurst, ADX regime, cluster state)
Current vs entry conditions (has the regime changed since entry?)

Total: 30 features per bar, forming a variable-length sequence.

LSTM Architecture

The L3 exit LSTM uses a relatively simple architecture:

Input: 30 features per timestep
LSTM layers: 2 stacked LSTM layers, 64 hidden units each
Dropout: 0.3 between layers (regularization)
Output: Single sigmoid neuron (exit probability)
Loss function: Binary cross-entropy
Optimizer: Adam with learning rate 1e-3, cosine annealing schedule

Why 64 hidden units? Tested 32, 64, 128, 256. 64 gave the best bias-variance tradeoff. Larger models overfit to training trade trajectories.

Why 2 layers? Single layer could not capture complex trajectory patterns (like "reached 2R, pulled back to 1R, now recovering to 1.5R"). Three layers showed no improvement over two and trained 40% slower.

Training Data Construction

The trickiest part of LSTM exit models is constructing training labels. What constitutes an "optimal" exit?

I use hindsight-optimal labeling:

1.For each historical trade, calculate the maximum R that could have been achieved
2.The optimal exit point is where R was maximized
3.Label each bar as EXIT=1 if it is within 2 bars of the optimal exit, EXIT=0 otherwise

This creates a class imbalance (most bars are HOLD, few are EXIT) which I handle with:

Focal loss instead of standard BCE (downweights easy HOLD predictions)
Temporal oversampling of exit windows
Class weights proportional to inverse frequency

Pre-Training with S22

The S22 LSTM Pre-training module addresses the cold-start problem. Instead of training from random initialization:

1.Pre-train the LSTM on a general sequence prediction task (next-bar return prediction) using all available market data
2.Fine-tune on the exit timing task with actual trade trajectories
3.Freeze the first LSTM layer, fine-tune only the second layer and output head

Pre-training improved exit timing accuracy by approximately 8% on out-of-sample data. The first layer learns general market dynamics; the second layer specializes in exit prediction.

Integration with Regime Rules

The LSTM output (exit probability 0-1) does not directly trigger exits. It is combined with regime-specific giveback rules:

Exit triggered when ANY of:

1.LSTM exit probability > 0.70
2.Current R < -(giveback * MFE) where giveback depends on Hurst regime
3.Trade hits maximum bars limit (varies by cluster)
4.Stop loss hit (initial risk * 1.0)

The LSTM handles "smart" exits, recognizing trajectory patterns that suggest the optimal exit point. The rule-based exits handle risk management, cutting losses and preventing excessive giveback.

Results

Comparing L3 LSTM exits vs fixed trailing stop exits:

MetricLSTM ExitFixed Trailing
Avg R per trade0.118R0.092R
Win Rate59.2%56.8%
Profit Factor1.601.43
Avg Hold Time8.3 bars11.7 bars

The LSTM exits faster (8.3 vs 11.7 bars average) while capturing more profit per trade. It is particularly good at identifying trades that have peaked and are about to reverse. The mfe_ratio feature is the strongest predictor of exit timing.

Practical Considerations

Sequence length limits: I cap sequences at 50 bars. Trades lasting longer than 50 bars are rare and the LSTM does not have enough training data for very long sequences.

Inference speed: LSTM inference needs to happen every bar for every open trade. With batch processing across positions, this takes approximately 2ms per bar. Fast enough for M15 but potentially an issue for tick-level strategies.

Model staleness: The LSTM is retrained quarterly on the most recent 2 years of trade data. Market dynamics evolve, and the exit patterns that were optimal in 2023 may not be optimal in 2025.

Complementary RL agent (S13): The PPO agent provides a second opinion on exit timing. When LSTM and RL agree, exit confidence is high. When they disagree, the system defaults to the more conservative option (usually exit).

Does the Complexity Earn Its Place?

The LSTM improved average R per trade from 0.092 to 0.118. That is a real improvement. But it is important to ask: how much of that is the LSTM being smart vs the rule-based exits being bad? When I tested a well-tuned trailing stop (not the naive fixed one), the gap narrowed significantly. The LSTM still won, but by maybe 0.01R per trade, not 0.026R. Being honest about effect sizes matters. The LSTM adds value. But most of the exit performance comes from the regime-conditional giveback rules, which are simple and interpretable. Complexity should earn its place, and knowing exactly how much value each component provides is the only way to know if it has.