Files
arbitrade/README.md
T
2026-06-01 17:48:13 +02:00

10 KiB

Arbitrade

Low-latency cryptocurrency arbitrage bot scaffold for Kraken.

Current stack:

  • Python 3.12+
  • FastAPI + HTMX/Jinja2
  • DuckDB for dev/test/prod
  • Native Kraken WebSocket planned for market-data hot path
  • Gitea Actions + Gitea container registry

Project plan lives in PLAN.md. Task checklist lives in .github/instructions/TODO.md.

Current Status

Bootstrap complete for foundation layer:

  • repo initialized
  • typed settings and env loading
  • structured logging
  • encrypted secret helpers
  • DuckDB connection + base schema
  • FastAPI app with health endpoint
  • Gitea Actions CI scaffold
  • Docker / docker-compose scaffold

Not implemented yet:

  • Kraken REST client
  • Kraken native WebSocket client
  • arbitrage detection engine
  • trade execution
  • dashboard beyond health/bootstrap page

Prerequisites

  • Python 3.12+
  • uv for env/package management
  • Git
  • Docker Desktop or Docker Engine
  • Gitea account on git.allucanget.biz for push/CI/registry access

Optional:

  • PowerShell 7 on Windows

Repository Setup

Clone repo:

git clone https://git.allucanget.biz/allucanget/arbitrade.git
Set-Location arbitrade

If repo already exists locally, confirm remote:

git remote -v

Expected origin:

https://git.allucanget.biz/allucanget/arbitrade.git

Local Development Setup

Create virtualenv with uv:

uv venv

Activate env on Windows:

.\.venv\Scripts\Activate.ps1

Install app + dev dependencies:

uv pip install -e .[dev]

Dependency source of truth:

  • Runtime dependencies live in requirements/latest-runtime.in.
  • Dev dependencies live in requirements/latest-dev.in.
  • pyproject.toml reads both files dynamically during package install.

Create local env file:

Copy-Item .env.example .env

Minimum .env values:

APP_ENV=dev
APP_HOST=0.0.0.0
APP_PORT=8000
LOG_LEVEL=INFO
LOG_JSON=true
DUCKDB_PATH=./data/arbitrade.duckdb
FERNET_KEY=
KRAKEN_API_KEY=
KRAKEN_API_SECRET=
KRAKEN_API_KEY_PERMISSIONS=query,trade

Notes:

  • Leave Kraken creds empty until Kraken integration lands.
  • If Kraken creds are set, both key and secret are required.
  • KRAKEN_API_KEY_PERMISSIONS must include query,trade and must not include withdrawal scope.
  • FERNET_KEY optional. If empty, keyring-backed key generation used by secret helper.
  • On Windows, app falls back to default asyncio loop. On non-Windows, uvloop installs automatically.

Run App

Start app:

python -m arbitrade.main

Health endpoints:

  • HTML: http://localhost:8000/
  • JSON: http://localhost:8000/health

Database

DuckDB used everywhere: local dev, tests, production.

Default database file:

./data/arbitrade.duckdb

Schema bootstrap runs automatically on app startup.

Current tables:

  • schema_migrations
  • opportunities
  • trades
  • portfolio_snapshots

Audit trail table:

  • audit_events (append-only operational decision log)

Audit retention and compaction guidance:

  • Keep at least 30 days of audit_events in active DB for incident triage.
  • Archive older rows to a timestamped export file before deletion.
  • Example monthly archive workflow:
COPY (
  SELECT *
  FROM audit_events
  WHERE occurred_at < NOW() - INTERVAL 30 DAY
) TO 'data/audit_events_archive_YYYYMM.parquet' (FORMAT PARQUET);

DELETE FROM audit_events
WHERE occurred_at < NOW() - INTERVAL 30 DAY;
  • Back up archive files and the main DuckDB file together.
  • For production, run archive + backup as scheduled maintenance (cron/task scheduler).

Quality Checks

Run tests:

pytest -q

Run Ruff:

ruff check .

Run Black check:

black --check .

Run mypy:

mypy src

Run dependency vulnerability audit:

pip-audit -r requirements/latest-runtime.in

Run secret scan (worktree + git history):

python scripts/security_scan.py

Generate latency profile baseline:

python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json

Run latency regression guardrails:

python scripts/check_latency_regression.py --baseline ops/performance/latency_baseline.json --thresholds ops/performance/latency_thresholds.json --iterations 600

Install pre-commit hooks:

pre-commit install

Run hooks manually:

pre-commit run --all-files

Docker

Build locally:

docker build -t arbitrade:local .

Container dependency install flow:

  • Docker installs runtime dependencies from requirements/latest-runtime.in.
  • Docker then installs the package with --no-deps so dependency resolution is driven by requirements files.

Run with compose:

docker compose up --build

Compose mounts local data/ folder into container at /app/data.

Important:

  • docker-compose.yml uses git.allucanget.biz/allucanget/arbitrade:latest as the default image reference.

Coolify Deployment (Prebuilt Image)

Use this when deploying from the image published by CI instead of building from Git inside Coolify.

1) Create application in Coolify

  • In Coolify, create a new Application using Docker Image / Public Image / Private Registry Image.
  • Image: git.allucanget.biz/allucanget/arbitrade:latest
  • Registry: git.allucanget.biz
  • If registry auth is required, configure the same registry credentials in Coolify.

2) Configure build and start behavior

Set these in Coolify application settings:

  • Build Command: leave empty.
  • Install Command: leave empty.
  • Start Command: leave empty unless you explicitly want to override the image default.
  • Port: 8000

3) Configure health check and networking

  • Health Check Path: /health
  • Exposed Port: 8000
  • Use Coolify-generated domain or attach your own domain.

4) Configure persistent storage

Add a persistent volume in Coolify:

  • Mount Path: /app/data

This preserves DuckDB and other runtime artifacts across restarts/redeploys.

5) Configure environment variables

Add runtime environment variables in Coolify (UI: Environment Variables):

  • APP_ENV=prod
  • APP_HOST=0.0.0.0
  • APP_PORT=8000
  • DUCKDB_PATH=/app/data/arbitrade.duckdb
  • LOG_LEVEL=INFO
  • LOG_JSON=true
  • KRAKEN_API_KEY=...
  • KRAKEN_API_SECRET=...
  • KRAKEN_API_KEY_PERMISSIONS=query,trade

Recommended:

  • Configure FERNET_KEY in Coolify secrets (do not commit it).
  • Keep all exchange keys/secrets in Coolify secret variables only.

Coolify should own runtime configuration through environment variables. CI only publishes the image.

6) Deploy and verify

  • Trigger deploy in Coolify after CI publishes git.allucanget.biz/allucanget/arbitrade:latest.
  • Verify app boot logs show startup completed.
  • Verify GET /health returns success on deployed URL.

Gitea CI / Registry Setup

CI file:

Required Gitea Actions secrets:

  • REGISTRY_USERNAME
  • REGISTRY_TOKEN

Example registry login:

docker login git.allucanget.biz

Example pushed image tag shape:

git.allucanget.biz/allucanget/arbitrade:latest

Project Layout

arbitrade/
├── .gitea/workflows/ci.yml
├── .github/instructions/TODO.md
├── PLAN.md
├── pyproject.toml
├── src/arbitrade/
│   ├── api/
│   ├── config/
│   ├── storage/
│   ├── logging_setup.py
│   └── main.py
├── tests/
└── web/templates/

Next Work

Next planned implementation slice:

  • Kraken REST client skeleton
  • native Kraken WebSocket client
  • in-memory order book cache
  • latency instrumentation

Troubleshooting

PowerShell blocks activation script:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy RemoteSigned

Then activate again:

.\.venv\Scripts\Activate.ps1

If app import fails, confirm editable install ran:

uv pip install -e .[dev]

If DuckDB file missing, start app once or create data/ directory manually.

Security Hardening

Threat model notes:

  • Primary risk surfaces: environment secrets, dashboard auth credentials, exchange API key scope, and dependency supply chain.
  • Assumed attacker model: leaked repository content, leaked CI logs/artifacts, or unauthorized dashboard access.
  • High-impact outcomes to prevent: credential exfiltration, unauthorized withdrawals, and unsafe live-trading control changes.

Hardening checklist:

  • Use least-privilege Kraken API keys: query + trade only; never enable withdrawal.
  • Rotate API keys immediately if secret scan flags a potential exposure.
  • Keep dashboard auth enabled in non-local environments and avoid default/shared credentials.
  • Run pip-audit -r requirements/latest-runtime.in in CI; treat vulnerability findings as release blockers.
  • Run python scripts/security_scan.py before release and after major merges.
  • Store secrets in environment/secret manager; never commit .env or key material.

Performance Hardening

Profile scenarios:

  • book_update_burst
  • execution_spike
  • reconnect_storm

Backtesting

Run a deterministic replay backtest from a JSONL event stream:

python scripts/backtest_replay.py --events path\to\replay.jsonl --starting-balances USD=1000.0

Replay event format:

{
  "timestamp": "2026-06-01T12:00:00Z",
  "symbol": "BTC/USD",
  "bids": [[100.0, 1.0]],
  "asks": [[101.0, 1.0]]
}

Notes:

  • Events are replayed in timestamp order.

  • The replay engine reuses the production detector, pre-trade validation, trade limits, and execution sequencer.

  • The simulated execution path applies configurable slippage and execution latency so reports include deterministic trade/miss statistics. Latency baseline and threshold artifacts:

  • ops/performance/latency_baseline.json

  • ops/performance/latency_thresholds.json

CI guardrail:

  • .gitea/workflows/ci.yml runs scripts/check_latency_regression.py and fails on regression.

Measured optimization impact (2026-06-01):

  • MetricsCalculator.compute() switched from Python row scans to DuckDB SQL aggregates/quantiles.
  • Benchmark (scripts/benchmark_metrics_compute.py):
    • Python scan avg: 12.623 ms
    • SQL aggregate avg: 11.039 ms
    • Speedup: 1.14x