Infrastructure5 min readDecember 6, 2024

The Bug That Took Two Months: Why Feature Preprocessing Is Not Optional

Consistent feature scaling between training and inference using saved Z-score parameters, eliminating train/test preprocessing mismatch.

Feature EngineeringPreprocessingData Pipeline

The Two-Month Bug

For two months, live shadow trading results consistently underperformed backtest expectations by 8%. The signals were worse, the exits were later, and the overall system felt sluggish. The cause turned out to be a preprocessing mismatch: training used Z-score standardization with parameters (mean and std) computed on the training set. Inference used Z-score standardization with parameters computed on the live data buffer.

This meant that a feature value of 25 for ATR would map to different standardized values during training and inference because the reference distribution was different. The models were seeing features on a different scale than they were trained on.

The Fix: Saved Scaler Parameters

S23 saves the mean and standard deviation for each of the 38 features computed during training. During inference, these saved parameters are loaded and applied. Live feature values are standardized using training-time statistics, ensuring the model sees features on the exact same scale it was trained on.

The fix took one day to implement. Finding the bug took two months. The lesson is that preprocessing code is as much part of the model as the weights themselves.

Why This Matters More Than You Think

Feature preprocessing mismatches are the most common source of train/test divergence in production ML systems. They are also the hardest to detect because the system still produces reasonable-looking outputs. It does not crash. It does not throw errors. It just performs slightly worse. S23's value is not in the standardization itself, which is basic statistics. The value is in formalizing the pipeline so that preprocessing parameters are versioned, hashed, and validated alongside model weights. In V7's frozen model manifest, the scaler parameters have their own hash entries. If they change, the entire validation chain breaks and forces re-verification.

← Transfer Learning for Financial Time Series: Pre-Training the Exit LSTM Twelve Folds of Evidence: Walk-Forward Validation That Earns Trust →