feat: add backtesting functionality with UI and API endpoints
CI / lint-test-build (push) Successful in 2m31s

- Introduced backtesting page and fragment in the dashboard for running backtests and viewing recent reports.
- Implemented backtest run logic with configuration options including event path, starting balances, trade capital, and fee profiles.
- Added recent backtest reports storage and retrieval.
- Created a new strategy module for statistical arbitrage experiments with validation on configuration parameters.
- Updated settings to include parameters for the statistical arbitrage strategy.
- Enhanced dashboard controls to support the new strategy mode.
- Added unit tests for backtesting functionality and strategy validation.
- Updated templates for backtesting UI integration.
This commit is contained in:
2026-06-02 09:28:22 +02:00
parent f612c8533a
commit 38e1d64437
17 changed files with 1089 additions and 165 deletions
+5 -119
View File
@@ -349,125 +349,11 @@ Example pushed image tag shape:
git.allucanget.biz/allucanget/arbitrade:latest
```
## Project Layout
## Architecture Docs
```text
arbitrade/
├── .gitea/workflows/ci.yml
├── .github/instructions/TODO.md
├── PLAN.md
├── pyproject.toml
├── src/arbitrade/
│ ├── api/
│ ├── config/
│ ├── storage/
│ ├── logging_setup.py
│ └── main.py
├── tests/
└── web/templates/
```
Implementation detail moved into arc42 docs:
## Next Work
- [arc42 overview](docs/architecture/arc42.md) - system context, building blocks, runtime, deployment, quality goals, risks.
- [current implementation snapshot](docs/architecture/current-implementation.md) - codebase state, active routes, backtesting, strategy flags, deployment flow.
Next planned implementation slice:
- Kraken REST client skeleton
- native Kraken WebSocket client
- in-memory order book cache
- latency instrumentation
## Troubleshooting
PowerShell blocks activation script:
```powershell
Set-ExecutionPolicy -Scope Process -ExecutionPolicy RemoteSigned
```
Then activate again:
```powershell
.\.venv\Scripts\Activate.ps1
```
If app import fails, confirm editable install ran:
```powershell
uv pip install -e .[dev]
```
If DuckDB file missing, start app once or create `data/` directory manually.
## Security Hardening
Threat model notes:
- Primary risk surfaces: environment secrets, dashboard auth credentials, exchange API key scope, and dependency supply chain.
- Assumed attacker model: leaked repository content, leaked CI logs/artifacts, or unauthorized dashboard access.
- High-impact outcomes to prevent: credential exfiltration, unauthorized withdrawals, and unsafe live-trading control changes.
Hardening checklist:
- Use least-privilege Kraken API keys: query + trade only; never enable withdrawal.
- Rotate API keys immediately if secret scan flags a potential exposure.
- Keep dashboard auth enabled in non-local environments and avoid default/shared credentials.
- Run `pip-audit -r requirements/latest-runtime.in` in CI; treat vulnerability findings as release blockers.
- Run `python scripts/security_scan.py` before release and after major merges.
- Store secrets in environment/secret manager; never commit `.env` or key material.
## Performance Hardening
Profile scenarios:
- `book_update_burst`
- `execution_spike`
- `reconnect_storm`
## Backtesting
Run a deterministic replay backtest from a JSONL event stream:
```powershell
python scripts/backtest_replay.py --events path\to\replay.jsonl --starting-balances USD=1000.0
```
Run parameter sweep with train/test split and promotion scoring:
```powershell
python scripts/backtest_sweep.py --events path\to\replay.jsonl --starting-balances USD=1000.0 --output ops/backtesting/parameter_sweep_results.json
```
Replay event format:
```json
{
"timestamp": "2026-06-01T12:00:00Z",
"symbol": "BTC/USD",
"bids": [[100.0, 1.0]],
"asks": [[101.0, 1.0]]
}
```
Notes:
- Events are replayed in timestamp order.
- The replay engine reuses the production detector, pre-trade validation, trade limits, and execution sequencer.
- The simulated execution path applies configurable slippage and execution latency so reports include deterministic trade/miss statistics.
- Parameter sweep splits replay data into in-sample and out-of-sample windows, ranks configurations by out-of-sample score, and flags overfit via train/test generalization-gap checks.
- Sweep output persists ranked combinations and promotion-ready candidates for paper-trading canary promotion decisions.
- Latency baseline and threshold artifacts:
- `ops/performance/latency_baseline.json`
- `ops/performance/latency_thresholds.json`
CI guardrail:
- `.gitea/workflows/ci.yml` runs `scripts/check_latency_regression.py` and fails on regression.
Measured optimization impact (2026-06-01):
- `MetricsCalculator.compute()` switched from Python row scans to DuckDB SQL aggregates/quantiles.
- Benchmark (`scripts/benchmark_metrics_compute.py`):
- Python scan avg: `12.623 ms`
- SQL aggregate avg: `11.039 ms`
- Speedup: `1.14x`
For navigation from README, use the docs above instead of this file for deep architecture detail.