Letting K-Means Define Market States Nobody Named
Unsupervised K-Means clustering on market features achieving 0.68 silhouette score, enabling regime-conditional alpha generation.
Beyond Named Regimes
Traditional regime detection uses labels humans invented: trending, ranging, volatile, quiet. But markets do not read our labels. K-Means clustering lets the data define its own states based on the actual statistical properties of each period, finding structure that human labels miss.
S09 runs K-Means with k=4 on a feature set including ATR percentile, ADX, Hurst exponent, autocorrelation at lag-1 and lag-5, and realized volatility. The resulting clusters do not map neatly to traditional labels. Cluster 2, for example, captures a "high-vol-but-persistent" state that would be classified as both "volatile" and "trending" under traditional labels.
Silhouette Score and Cluster Quality
The 0.68 silhouette score means the clusters have genuine separation in feature space. Each market state is meaningfully different from the others. Scores above 0.5 indicate good structure; above 0.7 is excellent. At 0.68, we are capturing real regime differences, not just fitting noise.
Cluster assignments feed into the L1 pipeline as a categorical feature. Each L1 model is trained per asset cluster, and within that, the K-Means regime label provides context. The models learn that certain technical patterns are predictive in regime 1 but not regime 3, creating regime-conditional alpha that a single global model would average away.
What Unsupervised Learning Adds
The honest contribution of S09 is modest in raw R terms. Adding K-Means regime as a feature improved total backtest R by approximately 2.3%. But the real value is in consistency. Monthly variance of returns dropped by 15% when regime-conditional models replaced regime-agnostic ones. For an FTMO-constrained system where consistent monthly returns matter more than maximum total return, that variance reduction is worth more than any percentage of R. The system with K-Means regimes produces smoother equity curves, which translates to lower breach probability and more predictable performance.