feat: enhance backtesting panel with flash messages and pairing checks
CI / lint-test-build (push) Failing after 12s

This commit is contained in:
2026-06-07 21:51:09 +02:00
parent 5e7732b85f
commit f221464daa
4 changed files with 269 additions and 79 deletions
+130
View File
@@ -0,0 +1,130 @@
# Backtesting Architecture
> Detailed design and implementation of the backtesting subsystem.
> See [`README.md`](README.md#63-backtesting-flow) for the high-level user flow.
## Data Flow
```txt
market_snapshots (DB) ─┐
├──→ load_replay_events_from_db() ──→ list[ReplayBookEvent]
JSONL file ─────────────┘
BacktestReplayEngine.run()
BacktestReport
BacktestJobRepository.store_report()
```
Two event sources:
- **DB mode** (default) — loads snapshots from `market_snapshots` table. Supports symbol/time filtering.
- **File mode** — reads JSONL files from disk (legacy, used by `backtest_replay.py` script).
## Core Types
### `ReplayClock`
Timekeeper for simulation. Ensures events advance monotonically. Supports `advance_ms()` to model execution latency.
### `ReplayBookEvent`
One atomic book state at a point in time. Fields: `occurred_at`, `symbol`, `bids: tuple[BookLevel]`, `asks: tuple[BookLevel]`.
### `BacktestConfig`
| Field | Default | Description |
| ------------------------ | -------- | ----------------------------------------------------- |
| `fee_rate` | `0.0` | 0.0 → API-sourced fee from `kraken_account_snapshots` |
| `min_profit_threshold` | `0.0005` | Minimum net profit to attempt trade |
| `trade_capital` | `100.0` | Capital allocated per trade |
| `quote_asset` | `"USD"` | Base currency for P&L |
| `slippage_bps` | `4.0` | Simulated slippage in basis points |
| `execution_latency_ms` | `20.0` | Simulated latency per leg |
| `max_depth_levels` | `10` | Order book depth for detection |
| `max_concurrent_trades` | `1` | Max simultaneous trades |
| `min_order_size_by_pair` | `None` | Per-pair min order size overrides |
### `BacktestReport`
| Field | Type | Description |
| -------------------------------- | -------------- | ---------------------------------- |
| `started_at` / `finished_at` | datetime | Simulation window |
| `processed_events` | int | Events consumed |
| `opportunities_seen` | int | Detected opportunities |
| `trades_executed` | int | Successful trades |
| `win_rate` | float or None | Fraction of profitable trades |
| `fill_rate` | float or None | Average fill ratio |
| `realized_pnl_usd` | float | Net P&L after slippage |
| `max_drawdown_usd` | float | Peak-to-trough equity drop |
| `miss_reasons` | dict[str, int] | Counters for skipped opportunities |
| `execution_latency_p50/95/99_ms` | float or None | Latency percentiles |
## Simulation Client
`_SimulatedRestClient` replaces the real Kraken REST client during backtesting.
- **Slippage model:** `fill_ratio = max(0.85, 1.0 - (slippage_bps / 10000.0) * 8.0)`
- **Latency model:** Clock advances by `execution_latency_ms` before each simulated fill
- Orders always fill (status = `"closed"`) at the modeled ratio
## Job Worker
`backtest_worker` is an `asyncio.Task` started in `create_app()` lifespan:
```python
backtest_task = asyncio.create_task(
backtest_worker(backtest_queue, db),
name="backtest_worker",
)
```
Workflow per job:
1. Dequeue `(job_id, config_dict)` from `asyncio.Queue`
2. Update status → `"running"` in `backtest_jobs` table
3. Load events (DB or file)
4. Build currency graph → triangular cycles
5. Instantiate `BacktestReplayEngine``engine.run()`
6. Store report → update status → `"completed"` (or `"failed"` on exception)
## Sweep Pipeline
`run_parameter_search` performs grid search over backtest parameters:
1. **Split** events into train/test windows by time ratio
2. **Build grid** — cartesian product of `theta_values × trade_capital_values × pair_universes × staleness_threshold_values`
3. **For each parameter set:**
- Filter events to pair universe + apply staleness gate
- Build cycles restricted to pair universe
- Run engine on train window → `train_report`
- Run engine on test window → `test_report`
- Score = `realized_pnl + win_rate_bonus + fill_rate_bonus - max_drawdown`
- Compute generalization gap = `|train_score - test_score| / max(train_score, test_score)`
4. **Evaluate promotion:**
- `PromotionCriteria` checks: min test P&L, min win rate ≥ 0.5, min fill rate ≥ 0.9, max drawdown ≤ $25, generalization gap ≤ 0.5
- Results passing all criteria are flagged `promotion_ready`
## UI
> See `backtesting.html` → `partials/backtesting_panel.html`.
- **Shell page** loads the panel via `hx-get="/dashboard/fragment/backtesting"`
- **Run form** — starting balances, time range, profit threshold (required); fee profile, slippage, latency (advanced/collapsible)
- **Status card** — current job status + message
- **Recent jobs table** — lists last 20 jobs with status, events, trades, P&L; each row has a detail button
- **Job detail** — `GET /dashboard/backtesting/job/{id}` returns report HTML
Pairings are managed on the `/dashboard/config/pairings` page. Backtest uses DB-enabled pairings by default when no symbols are specified.
## Source Files
| File | Role |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `backtesting/replay.py` | `ReplayClock`, `ReplayBookEvent`, `BacktestConfig`, `BacktestReport`, `_SimulatedRestClient`, `BacktestReplayEngine`, `load_replay_events`, `load_replay_events_from_db` |
| `backtesting/runner.py` | `run_backtest_job`, `backtest_worker`, `_build_cycles_from_events`, `_parse_balances` |
| `backtesting/sweep.py` | `SweepParameters`, `SweepResult`, `SweepArtifacts`, `PromotionCriteria`, `split_events_time_windows`, `build_parameter_grid`, `run_parameter_search`, `persist_sweep_results` |