๐Ÿค– PPO Trading Bot

RESEARCH
Architecture
CNN + LSTM
Proximal Policy Optimisation
Symbol
XAUUSDT
Gold / USD โ€” 1h bars
Best Return
+7.18%
In-sample ยท MLP baseline ยท 1M steps
Status
Training
v3 โ€” reward reshaping run

Latest Equity Curve

Equity Curve

Training History

RunStepsReturnSharpeMax DDNotes
MLP baseline1M +7.18%0.443-6.46% Close+Open only
CNN+LSTM v11M +80.15%0.242-60.15% Overlevered
CNN+LSTM v23M +3.03%0.181-2.88% Too conservative
CNN+LSTM v33M โ€”โ€”โ€” In progress

Model Architecture

ComponentDetail
FeaturesClose-norm, Ret(1/5/20), Vol-norm, RSI-14 โ€” 6 channels ร— 40 bars
CNN branchConv1d(2ร—64, k=3) โ†’ GlobalAvgPool โ†’ 64 dims
LSTM branch2-layer LSTM hidden=64 โ†’ last hidden โ†’ 64 dims
Account stateBalance / Equity / Margin โ†’ 32 dims
OrdersEntry price, volume, profit โ†’ 32 dims
Policy headMLP [256 โ†’ 128] โ†’ 3 actions
Total params~311k
Training deviceCPU ยท 8 parallel workers