Quantitative Trading System

My Bayesian approach to algorithmic trading. This is a market-long strategy that leverages probabilistic models and regime detection to optimize trading decisions. Below, you can find a live dashboard showcasing performance metrics and insights. Additionally, you can find detailed model validation results.

Live Dashboard

Paper trading via Alpaca Markets. Signals, positions, and equity curve update daily.

Model Validation

All performance figures are subjected to rigorous statistical testing before drawing conclusions. The results below use two complementary permutation frameworks: a block bootstrap test on strategy returns (preserving autocorrelation structure) and a walk-forward label-shuffling test on the ML regime detector. Both confirm genuine predictive signal beyond what random chance would produce.

Strategy Validation Overview

The strategy uses a 70-day rolling lookback window specifically designed to capture multi-month equity momentum, a well-documented market anomaly. The block bootstrap null preserves temporal autocorrelation within 20-day blocks but destroys the multi-month momentum signal that the strategy exploits. The actual strategy significantly outperforms this null (p=0.026, n=2,500), confirming the 70-day window is capturing genuine predictive signal rather than overfitting noise. Critically, raw momentum alone is insufficient to explain the system's performance. Without regime-based position scaling, the market-long strategy is fully exposed during bear markets. The regime detector I designed (below) addresses this directly.

Block bootstrap equity curve fan

Validation Metrics

Sharpe ratio, Calmar ratio, and total return of the actual strategy shown against the full null distribution of 2,500 block-shuffled runs. The actual strategy sits in the right tail across all three metrics. Sharpe and Calmar rank in the top 2–3% of the null — the risk-adjusted metrics are where the strategy most clearly separates itself, which is expected given that the regime detector's primary function is drawdown reduction rather than return maximization. Total return ranks at the 92nd percentile, reflecting that some null runs achieve higher raw returns by chance through favorable block orderings, but without the risk controls that produce the superior risk-adjusted profile.

Block bootstrap multi-metric panel

Regime Detector Validation

The ML regime detector is a Gradient Boosting classifier that predicts one of three market states — bull, bear, or neutral — and scales position sizes accordingly. It is trained and validated using a strict walk-forward protocol across 7 out-of-sample years (2020–2026), never seeing future data during training. The permutation test below confirms the regime features carry genuine predictive signal entirely beyond what random chance would produce.

Regime detector permutation test histogram

The detector uses five robust features: the MA50/MA200 crossover ratio, price distance from MA200, 3-month momentum, short vs long-term volatility ratio, and market breadth (fraction of universe stocks > MA50). These features were selected because they are well-established in the academic literature as predictive of market regimes and are robust to overfitting given the limited training history available (~7 years).

Validation uses a strict walk-forward protocol: the model is trained on a rolling 3-year window and evaluated on the following out-of-sample year, repeated across 2020–2026 to produce 7 independent test folds. The model has been validated against lookahead bias. The permutation test shuffles the regime labels within each calendar year across 200 runs, building a null distribution of walk-forward accuracy under the hypothesis that the features contain no real predictive information. The observed accuracy of 86.5% sits 7.74 standard deviations above the null mean, entirely off the right tail of the distribution (p<0.0001).

Two design choices specifically address the asymmetric cost structure of a long-only strategy. First, bear samples are upweighted by a factor of 3× during training via asymmetric sample weights, reflecting the fact that failing to reduce exposure during a bear market is much more costly than missing out on upside during a bull market, or being overly cautious during a neutral one. Second, a confidence threshold prevents low-conviction bull predictions from triggering full exposure, instead defaulting to the neutral scale factor. Bear predictions are not subject to this threshold since the cost of a false negative is much higher than a false positive. The practical result is strong bear detection — the asymmetrically important case — while avoiding the chronic over-prediction of bull regimes in flat markets that plagued earlier model iterations.

A methodological note: the regime labels are derived from the same MA and momentum inputs used as features, introducing some circularity. The model is partly learning to reproduce its own labeling function rather than predicting a fully independent ground truth, which inflates the absolute accuracy figure somewhat. The more meaningful validation is therefore economic: does applying the regime scalar actually improve the strategy's Sharpe ratio and reduce maximum drawdown in out-of-sample backtests? The answer is yes, and the equity curve above reflects that improvement. The permutation test result should be interpreted as confirmation that the regime features are internally consistent and non-degenerate, rather than a claim of 86.5% predictive power in isolation from the broader strategy.