System Design15 min readNovember 20, 2024

Building a 3-Layer ML Trading Pipeline

Architectural decisions behind separating signal generation (L1), entry timing (L2), and exit management (L3) in the V7 Engine.

ArchitectureML PipelineSystem Design

Why Three Layers?

The V7 Engine did not start as a 3-layer system. It started as a single model that tried to do everything. Predict direction, time the entry, and determine when to exit. It was terrible.

The fundamental insight was that these are three different prediction problems with different feature requirements, different time horizons, and different loss functions. Combining them into one model creates a muddled optimization target.

Separating them allows each layer to specialize and be validated independently.

L1: Signal Generation

Problem: Given current market conditions, should I be looking at this instrument right now?

Method: XGBoost classification models trained per asset cluster. Each cluster (FOREX, METALS, INDEX, CRYPTO, COMMODITY, EQUITY) has its own model with cluster-specific thresholds.

Features: 38 features covering price action, momentum, volatility, microstructure, and regime indicators. Calculated on M15 bars.

Output: Probability score 0-1. Above cluster-specific threshold, a signal is generated.

Why XGBoost: Gradient boosting handles tabular data extremely well. It is interpretable (feature importance), fast to train, and resistant to irrelevant features. Neural networks showed no improvement over XGBoost for this layer.

Why per-cluster: FOREX and CRYPTO have completely different statistical properties. A universal model either overfits to the dominant cluster or underperforms on all of them. Cluster-specific models with cluster-specific thresholds outperform by approximately 3-5% in win rate.

L2: Entry Gate

Problem: Given an L1 signal, should I enter NOW or wait?

Method: XGBoost calibrated model with timing-specific features. Acts as a filter that can SKIP or ENTER based on immediate market microstructure.

Features: Spread-ATR ratio, session timing, recent volatility acceleration, order flow imbalance, correlation regime.

Output: ENTER or SKIP decision.

The Over-Skip Problem: L2 has a tendency to be too conservative, skipping 60%+ of valid signals. This was a known issue that required a rule-based fallback for cases where L2 confidence was near the boundary. In practice, the rule-based fallback catches about 15% of trades that L2 would have incorrectly skipped.

Why a separate model: Entry timing requires microstructure features (spread, volume) that change on a bar-by-bar basis. L1 features are more stable (regime, trend indicators). Mixing these timescales in one model degraded both predictions.

L3: Exit Management

Problem: Given an open trade, when should I close it?

Method: LSTM neural network trained on trade trajectory sequences.

Features: 30 features including bars held, current R-multiple, max favorable excursion (MFE), max adverse excursion (MAE), regime state, and trailing performance.

Output: EXIT probability. Combined with regime-specific giveback rules (35% for trending, 25% for mean-reverting).

Why LSTM: Exits are inherently sequential. The decision depends on the trajectory of the trade, not just a snapshot. LSTMs are designed for sequence prediction. The model learns patterns like "when a trade reaches 2R MFE and starts retracing, exit probability increases."

The RL Component (S13): A PPO-based RL agent provides a secondary exit signal. It is formulated as an MDP where the state is the trade trajectory, actions are HOLD/EXIT, and reward is the final R-multiple. The RL agent is more aggressive about cutting losses and letting winners run compared to the LSTM.

How the Layers Interact

The pipeline is strictly sequential: L1 then L2 then L3.

Signal flow:

1.L1 scans all instruments every M15 bar, generates probability scores

2.Instruments above threshold pass to L2 for ENTER/SKIP decision

3.If ENTER, position opened and passed to L3 for management

4.L3 monitors each open trade independently, EXIT when conditions met

No information flows backward. L3 never influences L1. This prevents feedback loops and makes debugging straightforward. You can validate each layer independently.

Results

The 3-layer architecture achieves:

•59.2% win rate across all clusters

•+533.9R over 7.5 years (4,505 trades)

•1.49% max drawdown with DD-triggered risk scaling

•Profit factor: 1.60

By comparison, the single-model approach achieved approximately 52% win rate with higher drawdown. The separation of concerns provided roughly a 7% improvement in win rate alone.

Architecture Lessons

1.Separate what can be separated. If two prediction problems have different time horizons, different features, or different loss functions, they should be separate models.

2.Specialize by asset class. Universal models underperform cluster-specific ones. The cost is more models to maintain, but the performance benefit is worth it.

3.Strict sequential flow. No feedback loops between layers. It is tempting to let L3 inform L1 about regime changes, but this creates complexity that is nearly impossible to debug.

4.Validate independently AND together. Each layer should pass validation alone. Then validate the integrated pipeline. A system where L1 is great but L2 kills it is worse than one where all layers are decent.

5.Rule-based fallbacks are okay. Pure ML everything sounds great in theory, but practical systems need guardrails. L2's rule-based fallback handles edge cases that the ML model struggles with.

How This Actually Got Built

The 3-layer architecture was not designed top-down. It evolved because each single-model attempt failed in a specific, identifiable way. The honest response to "my entry model is also bad at exits" is not to add more features to the same model. It is to recognize that you are solving two different problems and separate them. Architectural decisions should follow evidence, not elegance. The system that works is more valuable than the system that looks clean on a whiteboard.

← Why PBO Matters: Detecting Backtest Overfitting Hurst Exponent for Market Regime Classification →