471 lines
10 KiB
Markdown
471 lines
10 KiB
Markdown
# Arbitrade
|
|
|
|
Low-latency cryptocurrency arbitrage bot scaffold for Kraken.
|
|
|
|
Current stack:
|
|
|
|
- Python 3.12+
|
|
- FastAPI + HTMX/Jinja2
|
|
- DuckDB for dev/test/prod
|
|
- Native Kraken WebSocket planned for market-data hot path
|
|
- Gitea Actions + Gitea container registry
|
|
|
|
Project plan lives in [PLAN.md](PLAN.md).
|
|
Task checklist lives in [.github/instructions/TODO.md](.github/instructions/TODO.md).
|
|
|
|
## Current Status
|
|
|
|
Bootstrap complete for foundation layer:
|
|
|
|
- repo initialized
|
|
- typed settings and env loading
|
|
- structured logging
|
|
- encrypted secret helpers
|
|
- DuckDB connection + base schema
|
|
- FastAPI app with health endpoint
|
|
- Gitea Actions CI scaffold
|
|
- Docker / docker-compose scaffold
|
|
|
|
Not implemented yet:
|
|
|
|
- Kraken REST client
|
|
- Kraken native WebSocket client
|
|
- arbitrage detection engine
|
|
- trade execution
|
|
- dashboard beyond health/bootstrap page
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.12+
|
|
- `uv` for env/package management
|
|
- Git
|
|
- Docker Desktop or Docker Engine
|
|
- Gitea account on `git.allucanget.biz` for push/CI/registry access
|
|
|
|
Optional:
|
|
|
|
- PowerShell 7 on Windows
|
|
|
|
## Repository Setup
|
|
|
|
Clone repo:
|
|
|
|
```powershell
|
|
git clone https://git.allucanget.biz/allucanget/arbitrade.git
|
|
Set-Location arbitrade
|
|
```
|
|
|
|
If repo already exists locally, confirm remote:
|
|
|
|
```powershell
|
|
git remote -v
|
|
```
|
|
|
|
Expected origin:
|
|
|
|
```text
|
|
https://git.allucanget.biz/allucanget/arbitrade.git
|
|
```
|
|
|
|
## Local Development Setup
|
|
|
|
Create virtualenv with `uv`:
|
|
|
|
```powershell
|
|
uv venv
|
|
```
|
|
|
|
Activate env on Windows:
|
|
|
|
```powershell
|
|
.\.venv\Scripts\Activate.ps1
|
|
```
|
|
|
|
Install app + dev dependencies:
|
|
|
|
```powershell
|
|
uv pip install -e .[dev]
|
|
```
|
|
|
|
Dependency source of truth:
|
|
|
|
- Runtime dependencies live in `requirements/latest-runtime.in`.
|
|
- Dev dependencies live in `requirements/latest-dev.in`.
|
|
- `pyproject.toml` reads both files dynamically during package install.
|
|
|
|
Create local env file:
|
|
|
|
```powershell
|
|
Copy-Item .env.example .env
|
|
```
|
|
|
|
Minimum `.env` values:
|
|
|
|
```env
|
|
APP_ENV=dev
|
|
APP_HOST=0.0.0.0
|
|
APP_PORT=8000
|
|
LOG_LEVEL=INFO
|
|
LOG_JSON=true
|
|
DUCKDB_PATH=./data/arbitrade.duckdb
|
|
FERNET_KEY=
|
|
KRAKEN_API_KEY=
|
|
KRAKEN_API_SECRET=
|
|
KRAKEN_API_KEY_PERMISSIONS=query,trade
|
|
```
|
|
|
|
Notes:
|
|
|
|
- Leave Kraken creds empty until Kraken integration lands.
|
|
- If Kraken creds are set, both key and secret are required.
|
|
- `KRAKEN_API_KEY_PERMISSIONS` must include `query,trade` and must not include withdrawal scope.
|
|
- `FERNET_KEY` optional. If empty, keyring-backed key generation used by secret helper.
|
|
- On Windows, app falls back to default `asyncio` loop. On non-Windows, `uvloop` installs automatically.
|
|
|
|
## Run App
|
|
|
|
Start app:
|
|
|
|
```powershell
|
|
python -m arbitrade.main
|
|
```
|
|
|
|
Health endpoints:
|
|
|
|
- HTML: `http://localhost:8000/`
|
|
- JSON: `http://localhost:8000/health`
|
|
|
|
## Database
|
|
|
|
DuckDB used everywhere: local dev, tests, production.
|
|
|
|
Default database file:
|
|
|
|
```text
|
|
./data/arbitrade.duckdb
|
|
```
|
|
|
|
Schema bootstrap runs automatically on app startup.
|
|
|
|
Current tables:
|
|
|
|
- `schema_migrations`
|
|
- `opportunities`
|
|
- `trades`
|
|
- `portfolio_snapshots`
|
|
|
|
Audit trail table:
|
|
|
|
- `audit_events` (append-only operational decision log)
|
|
|
|
Audit retention and compaction guidance:
|
|
|
|
- Keep at least 30 days of `audit_events` in active DB for incident triage.
|
|
- Archive older rows to a timestamped export file before deletion.
|
|
- Example monthly archive workflow:
|
|
|
|
```sql
|
|
COPY (
|
|
SELECT *
|
|
FROM audit_events
|
|
WHERE occurred_at < NOW() - INTERVAL 30 DAY
|
|
) TO 'data/audit_events_archive_YYYYMM.parquet' (FORMAT PARQUET);
|
|
|
|
DELETE FROM audit_events
|
|
WHERE occurred_at < NOW() - INTERVAL 30 DAY;
|
|
```
|
|
|
|
- Back up archive files and the main DuckDB file together.
|
|
- For production, run archive + backup as scheduled maintenance (cron/task scheduler).
|
|
|
|
## Quality Checks
|
|
|
|
Run tests:
|
|
|
|
```powershell
|
|
pytest -q
|
|
```
|
|
|
|
Run Ruff:
|
|
|
|
```powershell
|
|
ruff check .
|
|
```
|
|
|
|
Run Black check:
|
|
|
|
```powershell
|
|
black --check .
|
|
```
|
|
|
|
Run mypy:
|
|
|
|
```powershell
|
|
mypy src
|
|
```
|
|
|
|
Run dependency vulnerability audit:
|
|
|
|
```powershell
|
|
pip-audit -r requirements/latest-runtime.in
|
|
```
|
|
|
|
Run secret scan (worktree + git history):
|
|
|
|
```powershell
|
|
python scripts/security_scan.py
|
|
```
|
|
|
|
Generate latency profile baseline:
|
|
|
|
```powershell
|
|
python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json
|
|
```
|
|
|
|
Run latency regression guardrails:
|
|
|
|
```powershell
|
|
python scripts/check_latency_regression.py --baseline ops/performance/latency_baseline.json --thresholds ops/performance/latency_thresholds.json --iterations 600
|
|
```
|
|
|
|
Install pre-commit hooks:
|
|
|
|
```powershell
|
|
pre-commit install
|
|
```
|
|
|
|
Run hooks manually:
|
|
|
|
```powershell
|
|
pre-commit run --all-files
|
|
```
|
|
|
|
## Docker
|
|
|
|
Build locally:
|
|
|
|
```powershell
|
|
docker build -t arbitrade:local .
|
|
```
|
|
|
|
Container dependency install flow:
|
|
|
|
- Docker installs runtime dependencies from `requirements/latest-runtime.in`.
|
|
- Docker then installs the package with `--no-deps` so dependency resolution is driven by requirements files.
|
|
|
|
Run with compose:
|
|
|
|
```powershell
|
|
docker compose up --build
|
|
```
|
|
|
|
Compose mounts local `data/` folder into container at `/app/data`.
|
|
|
|
Important:
|
|
|
|
- [docker-compose.yml](docker-compose.yml) uses `git.allucanget.biz/allucanget/arbitrade:latest` as the default image reference.
|
|
|
|
## Coolify Deployment (Nixpacks)
|
|
|
|
Use this when deploying directly from Git in Coolify without the Dockerfile path.
|
|
|
|
### 1) Create application in Coolify
|
|
|
|
- In Coolify, create a new `Application` from your Git repository.
|
|
- Branch: `main` (or your release branch).
|
|
- Build Pack: `Nixpacks`.
|
|
- Root Directory: `.`
|
|
|
|
### 2) Configure build and start behavior
|
|
|
|
Set these in Coolify application settings:
|
|
|
|
- Build Command: leave empty (let Nixpacks auto-detect Python).
|
|
- Install Command: leave empty (Nixpacks will install from `pyproject.toml`, which reads `requirements/latest-runtime.in`).
|
|
- Start Command: `python -m arbitrade.main`
|
|
- Port: `8000`
|
|
|
|
### 3) Configure health check and networking
|
|
|
|
- Health Check Path: `/health`
|
|
- Exposed Port: `8000`
|
|
- Use Coolify-generated domain or attach your own domain.
|
|
|
|
### 4) Configure persistent storage
|
|
|
|
Add a persistent volume in Coolify:
|
|
|
|
- Mount Path: `/app/data`
|
|
|
|
This preserves DuckDB and other runtime artifacts across restarts/redeploys.
|
|
|
|
### 5) Configure environment variables
|
|
|
|
Add runtime environment variables in Coolify (UI: Environment Variables):
|
|
|
|
- `APP_ENV=prod`
|
|
- `APP_HOST=0.0.0.0`
|
|
- `APP_PORT=8000`
|
|
- `DUCKDB_PATH=/app/data/arbitrade.duckdb`
|
|
- `LOG_LEVEL=INFO`
|
|
- `LOG_JSON=true`
|
|
- `KRAKEN_API_KEY=...`
|
|
- `KRAKEN_API_SECRET=...`
|
|
- `KRAKEN_API_KEY_PERMISSIONS=query,trade`
|
|
|
|
Recommended:
|
|
|
|
- Configure `FERNET_KEY` in Coolify secrets (do not commit it).
|
|
- Keep all exchange keys/secrets in Coolify secret variables only.
|
|
|
|
### 6) Deploy and verify
|
|
|
|
- Trigger deploy in Coolify.
|
|
- Verify app boot logs show startup completed.
|
|
- Verify `GET /health` returns success on deployed URL.
|
|
|
|
## Gitea CI / Registry Setup
|
|
|
|
CI file:
|
|
|
|
- [.gitea/workflows/ci.yml](.gitea/workflows/ci.yml)
|
|
|
|
Required Gitea Actions secrets:
|
|
|
|
- `REGISTRY_USERNAME`
|
|
- `REGISTRY_TOKEN`
|
|
- `REGISTRY_NAMESPACE`
|
|
|
|
Expected namespace now likely:
|
|
|
|
```text
|
|
allucanget
|
|
```
|
|
|
|
Example registry login:
|
|
|
|
```powershell
|
|
docker login git.allucanget.biz
|
|
```
|
|
|
|
Example pushed image tag shape:
|
|
|
|
```text
|
|
git.allucanget.biz/allucanget/arbitrade:<tag>
|
|
```
|
|
|
|
## Project Layout
|
|
|
|
```text
|
|
arbitrade/
|
|
├── .gitea/workflows/ci.yml
|
|
├── .github/instructions/TODO.md
|
|
├── PLAN.md
|
|
├── pyproject.toml
|
|
├── src/arbitrade/
|
|
│ ├── api/
|
|
│ ├── config/
|
|
│ ├── storage/
|
|
│ ├── logging_setup.py
|
|
│ └── main.py
|
|
├── tests/
|
|
└── web/templates/
|
|
```
|
|
|
|
## Next Work
|
|
|
|
Next planned implementation slice:
|
|
|
|
- Kraken REST client skeleton
|
|
- native Kraken WebSocket client
|
|
- in-memory order book cache
|
|
- latency instrumentation
|
|
|
|
## Troubleshooting
|
|
|
|
PowerShell blocks activation script:
|
|
|
|
```powershell
|
|
Set-ExecutionPolicy -Scope Process -ExecutionPolicy RemoteSigned
|
|
```
|
|
|
|
Then activate again:
|
|
|
|
```powershell
|
|
.\.venv\Scripts\Activate.ps1
|
|
```
|
|
|
|
If app import fails, confirm editable install ran:
|
|
|
|
```powershell
|
|
uv pip install -e .[dev]
|
|
```
|
|
|
|
If DuckDB file missing, start app once or create `data/` directory manually.
|
|
|
|
## Security Hardening
|
|
|
|
Threat model notes:
|
|
|
|
- Primary risk surfaces: environment secrets, dashboard auth credentials, exchange API key scope, and dependency supply chain.
|
|
- Assumed attacker model: leaked repository content, leaked CI logs/artifacts, or unauthorized dashboard access.
|
|
- High-impact outcomes to prevent: credential exfiltration, unauthorized withdrawals, and unsafe live-trading control changes.
|
|
|
|
Hardening checklist:
|
|
|
|
- Use least-privilege Kraken API keys: query + trade only; never enable withdrawal.
|
|
- Rotate API keys immediately if secret scan flags a potential exposure.
|
|
- Keep dashboard auth enabled in non-local environments and avoid default/shared credentials.
|
|
- Run `pip-audit --skip-editable` in CI; treat vulnerability findings as release blockers.
|
|
- Run `python scripts/security_scan.py` before release and after major merges.
|
|
- Store secrets in environment/secret manager; never commit `.env` or key material.
|
|
|
|
## Performance Hardening
|
|
|
|
Profile scenarios:
|
|
|
|
- `book_update_burst`
|
|
- `execution_spike`
|
|
- `reconnect_storm`
|
|
|
|
## Backtesting
|
|
|
|
Run a deterministic replay backtest from a JSONL event stream:
|
|
|
|
```powershell
|
|
python scripts/backtest_replay.py --events path\to\replay.jsonl --starting-balances USD=1000.0
|
|
```
|
|
|
|
Replay event format:
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2026-06-01T12:00:00Z",
|
|
"symbol": "BTC/USD",
|
|
"bids": [[100.0, 1.0]],
|
|
"asks": [[101.0, 1.0]]
|
|
}
|
|
```
|
|
|
|
Notes:
|
|
|
|
- Events are replayed in timestamp order.
|
|
- The replay engine reuses the production detector, pre-trade validation, trade limits, and execution sequencer.
|
|
- The simulated execution path applies configurable slippage and execution latency so reports include deterministic trade/miss statistics.
|
|
Latency baseline and threshold artifacts:
|
|
|
|
- `ops/performance/latency_baseline.json`
|
|
- `ops/performance/latency_thresholds.json`
|
|
|
|
CI guardrail:
|
|
|
|
- `.gitea/workflows/ci.yml` runs `scripts/check_latency_regression.py` and fails on regression.
|
|
|
|
Measured optimization impact (2026-06-01):
|
|
|
|
- `MetricsCalculator.compute()` switched from Python row scans to DuckDB SQL aggregates/quantiles.
|
|
- Benchmark (`scripts/benchmark_metrics_compute.py`):
|
|
- Python scan avg: `12.623 ms`
|
|
- SQL aggregate avg: `11.039 ms`
|
|
- Speedup: `1.14x`
|