Fixing Gold Market Overfitting: A Predictive Machine Studying Method with ONNX and Gradient Boosting
Case Research: The “Golden Gauss” Structure
Writer: Daglox Kankwanda
ORCID: 0009-0000-8306-0938Technical Paper: Zenodo Repository (DOI: 10.5281/zenodo.18646499)
Contents
Introduction The Core Issues in Algorithmic Buying and selling Methodology System Structure Characteristic Engineering Validation and Outcomes Commerce Administration Trustworthy Limitations Conclusion Implementation & Availability References
1. Introduction
The algorithmic buying and selling area, notably in retail markets, faces a basic credibility drawback. The sample is predictable and pervasive: programs show spectacular backtest efficiency, adopted by fast degradation in ahead testing, culminating in account destruction throughout stay deployment. This failure mode stems from a single root trigger—optimization for in-sample efficiency with out rigorous out-of-sample validation.
The mathematical actuality is easy: given ample levels of freedom, any mannequin can “memorize” historic value patterns. Such memorization produces spectacular backtest metrics whereas offering zero predictive energy for future market habits. The mannequin has realized the noise, not the sign.
Past overfitting, conventional indicator-based approaches endure from a basic timing deficiency. Technical indicators, by building, are reactive—they course of historic knowledge to generate alerts after value actions have already begun.
Core Thesis: A really helpful buying and selling system should determine the situations previous vital value exercise, not the exercise itself. The aim is prediction, not affirmation.
This text presents a technique that synthesizes machine studying analysis insights right into a sensible, deployable buying and selling system for XAUUSD (Gold) markets, demonstrated by the “Golden Gauss” structure.
2. The Core Issues in Algorithmic Buying and selling
2.1 The Overfitting Disaster
The proliferation of “AI-powered” buying and selling programs in retail markets has created a credibility disaster, with most programs exhibiting catastrophic failure when deployed on unseen knowledge as a result of extreme overfitting.
Determine 1: Conceptual illustration of the everyday Knowledgeable Advisor lifecycle. Fashions optimized for historic efficiency ceaselessly fail catastrophically when deployed on unseen market situations.
2.2 The Latency Downside in Technical Evaluation
Technical indicators are inherently reactive:
By the point RSI crosses the overbought threshold, the value has already moved considerably By the point a MACD crossover confirms, the optimum entry window has handed By the point a breakout is “confirmed,” stop-loss necessities have expanded considerably

Determine 2: Comparability of timing between reactive technical indicators and predictive machine studying approaches. Conventional indicators verify strikes after optimum entry has handed, whereas predictive programs determine setup situations earlier than execution.
2.3 Literature Context
The applying of machine studying to monetary time-series prediction has advanced considerably. A number of constant findings are related:
Discovering Implication Gradient Boosting Dominance on Tabular Knowledge Regardless of advertising attraction of “deep studying,” ensemble strategies persistently outperform neural networks on structured monetary knowledge Characteristic Engineering Criticality High quality of engineered options sometimes determines mannequin success greater than architectural selections Temporal Validation Necessities Normal cross-validation that shuffles knowledge is inappropriate for monetary time-series as a result of lookahead bias Cross-Asset Info Monetary devices don’t commerce in isolation; correlated devices present priceless context
3. Methodology
3.1 The Predictive Labeling Methodology
Normal approaches to coaching buying and selling fashions label knowledge on the level the place value motion happens. This creates a basic drawback: if the mannequin learns options calculated from the identical bars which might be labeled, it successfully learns to acknowledge strikes which might be already occurring slightly than strikes which might be about to occur.
The Golden Gauss structure employs a technique that maintains temporal separation between function calculation and label placement:
The labeling course of identifies worthwhile zones the place value moved considerably in a selected route All options are calculated from market knowledge that occurred earlier than the labeled zone begins

Determine 3: Guide labeling interface displaying XAUUSD value motion with recognized directional zones. The labeled BUY and SELL areas characterize worthwhile strikes used as coaching targets; the mannequin learns to foretell these strikes utilizing options calculated from previous market knowledge.
Implications: This temporal separation ensures the mannequin learns to acknowledge preconditions—the market microstructure patterns that precede vital strikes—slightly than traits of the strikes themselves.
3.2 High quality-Filtered Coaching Labels
Not all value actions are significant or tradeable. Many are:
Too small to beat transaction prices (unfold + fee) Too erratic to execute cleanly A part of bigger consolidation patterns with out directional follow-through
The labeling course of applies strict filtering standards, figuring out solely zones the place value moved with ample magnitude and directional consistency. This ensures the mannequin learns solely from setups that exceeded minimal profitability thresholds.
3.3 Twin-Mannequin Directional Structure
Market dynamics exhibit basic asymmetry between bullish and bearish habits:
Accumulation patterns differ structurally from distribution patterns Worry-driven promoting sometimes executes quicker than greed-driven shopping for Help habits differs from resistance habits Quantity traits differ between advances and declines
To respect this asymmetry, the structure employs two impartial binary fashions:
Mannequin Output Coaching Knowledge BUY Mannequin P(Bullish Transfer Imminent) Skilled solely on bullish labels SELL Mannequin P(Bearish Transfer Imminent) Skilled solely on bearish labels
Every mannequin is a binary classifier detecting solely its respective directional setup. This prevents the confusion that happens when a single mannequin makes an attempt to be taught contradictory patterns concurrently.
3.4 Stroll-Ahead Validation Protocol
Normal machine studying cross-validation, which shuffles knowledge randomly, is inappropriate for monetary time-series as a result of temporal dependencies and lookahead bias dangers.
The system makes use of strict walk-forward validation with full chronological separation:
Coaching knowledge extends by December 31, 2024 All architectural choices, hyperparameters, and have engineering selections had been finalized utilizing solely this knowledge The mannequin was then frozen and validated on a 13-month out-of-sample interval (January 2025 by January 2026)

Determine 4: Temporal knowledge separation for walk-forward validation. Coaching knowledge extends by finish of 2024; all 2025-2026 analysis represents strictly out-of-sample efficiency on knowledge not used for coaching.
Vital Guidelines:
No shuffling of time-series knowledge Analysis interval evaluation solely in any case mannequin choices finalized No iterative “peeking” at analysis outcomes to regulate parameters
4. System Structure
The system contains two distinct however built-in elements:
Coaching Pipeline — applied in Python for mannequin growth and validation Execution Engine — applied in MQL5 for real-time deployment inside MetaTrader 5

Determine 5: Excessive-level structure of the system. The coaching pipeline (high) processes historic knowledge by function engineering and mannequin coaching, exporting through ONNX. The execution engine (backside) calculates options instantaneously, obtains chance scores, and applies commerce administration logic for place execution.
4.1 Mannequin Structure Choice
The selection of mannequin structure was pushed by empirical analysis in opposition to standards particular to monetary time-series prediction:
Criterion Precedence Efficiency on structured/tabular knowledge Vital Robustness to noise and outliers Vital Dealing with of regime adjustments Excessive Coaching knowledge effectivity Excessive Inference velocity for stay deployment Excessive Interpretability (function significance) Medium
Based mostly on intensive testing, Gradient Boosting Choice Timber (GBDT) had been chosen. This alternative aligns with constant findings within the machine studying literature that GBDT architectures outperform deep studying approaches on structured monetary knowledge.
Why Not Neural Networks?
Whereas “Neural Community” generates advertising attraction, the technical actuality for tabular monetary knowledge:
GBDTs deal with function interactions naturally with out express specification GBDTs are extra sturdy to noise and outliers in monetary knowledge GBDTs require considerably much less coaching knowledge GBDTs present interpretable function significance rankings GBDTs prepare quicker, enabling extra intensive hyperparameter search
4.2 ONNX Deployment
The mannequin is exported through ONNX (Open Neural Community Trade) for platform-agnostic deployment, enabling Python-trained fashions to execute at C++ speeds inside MT5.
A crucial requirement is training-serving parity: function calculations in MQL5 should be mathematically equivalent to these carried out throughout Python coaching. Any discrepancy creates “training-serving skew” that degrades mannequin efficiency.
4.3 The MQL5-ONNX Interface
The bridge between Python coaching and MQL5 execution depends on the native ONNX API launched in MetaTrader 5 Construct 3600. The first engineering problem is making certain the enter tensor form matches the Python export precisely, and accurately deciphering the classifier’s dual-output construction.
Under is the structural logic used to initialize and run inference with the Gradient Boosting mannequin inside the Knowledgeable Advisor:
Mannequin Initialization
#useful resource “RecordsdataBULLISH_Model.onnx” as uchar ExtModelBuy[]
lengthy g_onnx_buy;
const int SNIPER_FEATURES = 239;
bool InitializeONNXModels()
{
    Print(“Loading ONNX fashions…”);
   Â
   Â
    g_onnx_buy = OnnxCreateFromBuffer(ExtModelBuy, ONNX_DEFAULT);
    if(g_onnx_buy == INVALID_HANDLE)
    {
        Print(“[FAIL] Didn’t load BUY mannequin”);
        return false;
    }
   Â
   Â
    ulong input_shape_buy[] = {1, SNIPER_FEATURES};
    if(!OnnxSetInputShape(g_onnx_buy, 0, input_shape_buy))
    {
        Print(“[FAIL] Didn’t set BUY mannequin enter form”);
        return false;
    }
   Â
    Print(”  [OK] BUY mannequin loaded efficiently”);
    return true;
}
Chance Inference
The classifier outputs two tensors: predicted labels and sophistication possibilities. For probability-based execution, we extract the chance of the goal class:
bool GetBuyPrediction(const float &options[], double &chance)
{
    chance = 0.0;
   Â
    if(g_onnx_buy == INVALID_HANDLE)
    {
        Print(“[FAIL] BUY mannequin not loaded”);
        return false;
    }
   Â
   Â
    float input_data[];
    ArrayResize(input_data, SNIPER_FEATURES);
    ArrayCopy(input_data, options);
   Â
   Â
   Â
   Â
   Â
    lengthy output_labels[];     Â
    float output_probs[];     Â
   Â
    ArrayResize(output_labels, 1);
    ArrayResize(output_probs, 2);
    ArrayInitialize(output_labels, 0);
    ArrayInitialize(output_probs, 0.0f);
   Â
   Â
    if(!OnnxRun(g_onnx_buy, ONNX_NO_CONVERSION, input_data, output_labels, output_probs))
    {
        int error = GetLastError();
        Print(“[FAIL] BUY ONNX inference failed: “, error);
        return false;
    }
   Â
   Â
   Â
    chance = (double)output_probs[0];
   Â
    return true;
}
Key Implementation Particulars:
Twin-Output Construction: Gradient Boosting classifiers exported through ONNX produce two outputs—the anticipated label and the chance distribution throughout lessons. The chance output is used for threshold-based execution. Class Mapping: Class 0 represents the goal situation (BULLISH for the BUY mannequin). The chance output_probs[0] immediately signifies mannequin confidence in an imminent bullish transfer. Form Validation: Strict form checking at initialization catches training-serving mismatches instantly slightly than producing silent prediction errors throughout stay buying and selling.
4.4 Execution Configuration
Parameter Worth Image XAUUSD solely Timeframe M1 (function calculation) Energetic Hours 14:00–18:00 (dealer time, configurable) Chance Threshold 88% Cease Loss Mounted preliminary; dynamically managed Take Revenue Goal-based with ratchet safety Prohibited Methods No grid, no martingale
5. Characteristic Engineering
The system processes 239 engineered options throughout a number of research-backed domains. These options had been developed by tutorial literature overview, area experience in market microstructure, and iterative empirical testing with strict validation protocols.
5.1 Characteristic Classes Overview
Class Conceptual Focus Volatility Regime Market state classification, tradeable vs. non-tradeable situations Momentum Multi-scale fee of change, development persistence Quantity Dynamics Participation ranges, uncommon exercise detection Value Construction Help/resistance proximity, vary place Cross-Asset Correlated instrument alerts, correlation regime shifts Microstructure Directional strain and short-horizon stress proxies Temporal Session timing, cyclical patterns Sequential Sample recognition, run-length evaluation
5.2 Key Driving Options
The next options persistently ranked among the many most influential in response to international SHAP significance evaluation:
ADX Pattern Power (14-period): Measuring development energy, impartial of route VWAP Volatility Deviation: Distance of value from intraday VWAP, normalized by current volatility Volatility Regime Classifier: ATR relative to its transferring common, indicating low-, normal-, or high-volatility states MACD Histogram Momentum: Capturing short-term momentum and potential reversals 60-minute Gold/DXY Rolling Correlation: Rolling correlation between XAUUSD and DXY returns 60-minute Gold/USDJPY Rolling Correlation: Rolling correlation between XAUUSD and USDJPY returns Directional Volatility Regime: Signed volatility function combining EMA-based development energy with present ATR regime Order-Stream Persistence: Proxy for the way lengthy directional strikes persist throughout current candles EMA Unfold Dynamics: Distances and slopes between quick and gradual EMAs
The presence of well-known indicators (ADX, MACD) alongside proprietary regime and correlation options demonstrates that the mannequin enhances, slightly than replaces, established market relationships with higher-resolution timing alerts.
5.3 Cross-Asset Intelligence
Gold (XAUUSD) doesn’t commerce in isolation. Its value motion is influenced by:
US Greenback Dynamics: Sometimes inverse correlation; greenback energy typically pressures gold costs Secure-Haven Flows: Correlation with different safe-haven belongings throughout risk-off intervals Yield Expectations: Relationship with actual rate of interest proxies
The function set incorporates lagged returns from correlated devices, rolling correlations at a number of time scales, divergence detection, and regime change alerts.
6. Validation and Outcomes
The validation strategy follows a single precept: show generalization, not memorization. Any mannequin can obtain spectacular outcomes on knowledge it has seen. The one significant analysis is efficiency on strictly unseen knowledge.
6.1 Out-of-Pattern Efficiency
All 2025 efficiency represents true out-of-sample (OOS) outcomes. The mannequin structure, hyperparameters, and have set had been frozen earlier than any 2025 knowledge was evaluated.

Determine 6: Backtest fairness and steadiness curves from Jan 2021 to Jan 2026. The interval Jan 2021–Dec 2024 represents knowledge included in mannequin coaching; the interval Jan 2025–Jan 2026 constitutes strictly out-of-sample analysis.
Metric Full Interval (Jan 2021– Jan 2026) OOS Solely (Jan 2025–Jan 2026) Win Fee 88.71% 83.67% Complete Trades 1,030 319 Revenue Issue 1.77 1.50 Sharpe Ratio 9.90 13.9 Max Drawdown (0.01 lot) ~$500 ~$313 Restoration Issue 11.57 3.66 Avg Holding Time 30 min 30 sec 30 min 30 sec
Interpretation: The out-of-sample interval demonstrates continued profitability with metrics that degrade gracefully from the coaching interval:
Win fee decreases from 88.71% to 83.67%—a managed 5% discount indicating the mannequin generalizes slightly than memorizes Revenue issue stays above 1.50, confirming optimistic expectancy on unseen knowledge The upper OOS Sharpe ratio (13.9 vs 9.90) supplies sturdy proof in opposition to overfitting
This efficiency hole is anticipated and wholesome. The managed degradation confirms real sample generalization.
6.2 Chance Threshold Evaluation
The mannequin outputs steady chance scores. Evaluation reveals the connection between chance ranges and commerce outcomes:
Chance Vary Trades Win Fee 0.880 – 0.897 231 88.3% 0.897 – 0.923 167 90.4% 0.923 – 0.950 190 93.2% 0.950 – 0.976 107 87.9% 0.976 – 0.993 27 96.3%
Why 88% Minimal Threshold? The 88% threshold was decided by systematic analysis because the optimum entry level balancing commerce frequency in opposition to high quality. Under this threshold, false-positive charges improve considerably.
6.3 Exit Composition Evaluation
Exit Sort Proportion Interpretation Ratchet Revenue (SL_WIN) 87.1% Dynamic revenue seize Take Revenue (TP) 3.2% Full goal reached Cease Loss (SL_LOSS) 9.7% Managed losses
The overwhelming majority of profitable trades exit through the ratchet system, capturing income dynamically slightly than ready for full TP.
6.4 Temporal Consistency
Yr Trades Win Fee Standing 2021 172 93.6% Coaching 2022 125 93.6% Coaching 2023 64 87.5% Coaching 2024 124 93.5% Coaching 2025 237 85.2% Out-of-Pattern 2026 Â — Â — —
All years worthwhile with constant efficiency patterns throughout coaching and out-of-sample intervals.
7. Commerce Administration
The system implements a complete commerce administration layer that extends past easy entry execution.
7.1 Chance-Based mostly Choice Making
Not like programs that generate discrete “purchase” or “promote” alerts, the structure calculates chance scores instantaneously on every new bar:
Entry Choice: Chance should exceed 88% threshold earlier than place opening Route Choice: Increased chance between BUY and SELL fashions determines route Exit Timing: Chance adjustments inform place closure choices Maintain/Shut Logic: Steady chance monitoring throughout open positions
7.2 Entry Validation and Filtering
Twin-Mannequin Affirmation: Each BUY and SELL mannequin possibilities are assessed to verify directional bias and filter ambiguous situations Regime Filtering: Extra filters detect unfavorable market regimes (excessive volatility occasions, low liquidity intervals) Conditional Execution: Commerce execution proceeds solely after chance thresholds are glad and regime filters verify favorable situations
7.3 Ratchet Revenue Safety
Downside Addressed: Value might transfer 80% towards the take-profit degree, then reverse—with out energetic administration, this unrealized revenue can be misplaced.
Ratchet Answer: As value strikes favorably, the system progressively locks in revenue by tightening exit situations, making certain that vital favorable strikes are captured even when the total take-profit is just not reached.
7.4 Ratchet Loss Minimization
Downside Addressed: Even high-confidence predictions often fail; ready for the mounted stop-loss ends in most loss on each dropping commerce.
Ratchet Answer: When value strikes adversely, the system actively manages the exit to reduce loss slightly than passively ready for stop-loss execution, decreasing common loss per unsuccessful commerce.
8. Trustworthy Limitations
8.1 What This System Is NOT
Not infallible: Roughly 15–18% of alerts lead to suboptimal entries relying on market situations Not common: Skilled solely for XAUUSD with its particular market microstructure and session dynamics Not static: Periodic retraining (3–6 months) is required as markets evolve Not assured: Out-of-sample validation demonstrates methodology soundness however doesn’t assure future efficiency
8.2 Recognized Threat Components
Threat Description Mitigation Regime Change Market construction evolves by coverage shifts and geopolitical occasions Periodic retraining protocol Execution Threat Slippage throughout volatility can degrade realized outcomes Session-aware execution, energetic hours restriction Edge Decay Predictive edges face decay as markets evolve Retraining with methodology preservation Focus Unique XAUUSD focus supplies no diversification Person duty for portfolio allocation
8.3 Execution Assumptions
All reported outcomes are primarily based on historic simulations. No extra slippage mannequin has been utilized, and real-world execution might result in materially totally different efficiency. These statistics needs to be interpreted as estimates beneath supreme execution situations.
9. Conclusion
This text introduced a technique for fixing two basic failures that characterize retail algorithmic buying and selling—overfitting to historic noise and reactive sign technology—by rigorous machine studying practices.
The core improvements demonstrated within the Golden Gauss structure embrace:
Predictive labeling that permits real anticipation of value strikes Twin-model directional specialization that respects market asymmetry Chance-driven execution that quantifies confidence earlier than commerce entry Clever commerce administration that minimizes losses when predictions show suboptimal
On strictly out-of-sample 2025 knowledge—collected in any case mannequin choices had been finalized—the system demonstrates roughly 83.67% directional accuracy on the 88% chance threshold. The managed efficiency differential from coaching metrics signifies real sample studying slightly than memorization.
Key Takeaways for Practitioners
By no means shuffle time-series knowledge throughout validation—this creates lookahead bias and knowledge leakage Out-of-sample efficiency is the one significant metric for evaluating stay buying and selling potential Chance thresholds allow accuracy/frequency tradeoffs—greater thresholds yield fewer however higher-quality alerts Twin binary fashions respect the asymmetry between bullish and bearish market dynamics Commerce administration amplifies edge—ratchet mechanisms maximize wins and reduce losses All programs have limitations—sincere acknowledgment permits applicable deployment and threat administration
The retail algorithmic buying and selling trade suffers from systematic misalignment between vendor incentives and consumer outcomes. The methodology introduced right here—strict temporal separation, documented efficiency degradation, bounded confidence claims—affords a template for sincere system analysis that prioritizes sustainable operation over advertising attraction.
Knowledgeable critique of the validation methodology and underlying assumptions is welcomed. Progress in algorithmic buying and selling requires programs designed to outlive scrutiny slightly than keep away from it.
10. Implementation & Availability
The structure described on this paper—particularly the predictive labeling engine and the ONNX chance inference—has been absolutely applied within the Golden Gauss AI system.
To assist additional analysis and validation, the entire system is accessible for testing within the MQL5 Market. The package deal consists of the “Visualizer” mode, which renders the chance cones and “Kill Zones” immediately on the chart, permitting merchants to watch the mannequin’s decision-making course of in real-time.
Threat Disclaimer: Buying and selling foreign exchange and CFDs includes substantial threat of loss and isn’t appropriate for all traders. Previous efficiency, whether or not in backtesting or stay buying and selling, doesn’t assure future outcomes. The validation outcomes introduced characterize historic evaluation beneath particular market situations that will not persist. Merchants ought to solely use capital they will afford to lose and will take into account their monetary scenario earlier than buying and selling.
References
Cao, L. J. and Tay, F. E. H. (2001). Monetary forecasting utilizing assist vector machines. Neural Computing & Functions, 10(2), 184-192. Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the twenty second ACM SIGKDD Worldwide Convention on Data Discovery and Knowledge Mining, 785-794. López de Prado, M. (2018). Advances in Monetary Machine Studying. Wiley. Bailey, D. H. and López de Prado, M. (2014). The chance of backtest overfitting. Journal of Computational Finance, 17(4), 39-69. Pardo, R. (2008). The Analysis and Optimization of Buying and selling Methods (2nd ed.). Wiley. Krauss, C., Do, X. A., and Huck, N. (2017). Deep neural networks, gradient-boosted bushes, random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Analysis, 259(2), 689-702. Baur, D. G. and McDermott, T. Ok. (2010). Is gold a protected haven? Worldwide proof. Journal of Banking & Finance, 34(8), 1886-1898. ONNX Runtime Builders (2021). ONNX Runtime: Excessive efficiency inference and coaching accelerator. Out there: https://onnxruntime.ai/







