🤖 PPO Trading Bot

RESEARCH

Architecture

CNN + LSTM

Proximal Policy Optimisation

Symbol

XAUUSDT

Gold / USD — 1h bars

Best Return

+7.18%

In-sample · MLP baseline · 1M steps

Status

Training

v3 — reward reshaping run

Latest Equity Curve

Equity Curve

Training History

Run	Steps	Return	Sharpe	Max DD	Notes
MLP baseline	1M	+7.18%	0.443	-6.46%	Close+Open only
CNN+LSTM v1	1M	+80.15%	0.242	-60.15%	Overlevered
CNN+LSTM v2	3M	+3.03%	0.181	-2.88%	Too conservative
CNN+LSTM v3	3M	—	—	—	In progress

Model Architecture

Component	Detail
Features	Close-norm, Ret(1/5/20), Vol-norm, RSI-14 — 6 channels × 40 bars
CNN branch	Conv1d(2×64, k=3) → GlobalAvgPool → 64 dims
LSTM branch	2-layer LSTM hidden=64 → last hidden → 64 dims
Account state	Balance / Equity / Margin → 32 dims
Orders	Entry price, volume, profit → 32 dims
Policy head	MLP [256 → 128] → 3 actions
Total params	~311k
Training device	CPU · 8 parallel workers