Files
arbitrade/ops/performance
zwitschi cc11082ea7
CI / lint-test-build (push) Failing after 19s
feat: Implement latency profiling and guardrails for performance monitoring
- Added synthetic latency profiler scenarios and CLI scripts for baseline generation and regression checks.
- Introduced latency baseline and threshold artifacts for CI enforcement.
- Enhanced CI workflow with latency guardrail checks.
- Updated documentation to include latency profiling commands and performance metrics.
- Added unit tests for latency guardrail evaluation.
2026-06-01 14:47:52 +02:00
..

Performance Hardening

This folder contains latency profiling baselines and guardrail thresholds used in CI.

Scenarios

The profiler covers representative load patterns:

  • book_update_burst: rapid market-data deltas with moderate detection load.
  • execution_spike: heavier detection/execution pressure.
  • reconnect_storm: frequent reconnect/reset behavior.

Profiling Commands

Generate a fresh profile:

python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json

Check current performance against the baseline and thresholds:

python scripts/check_latency_regression.py \
  --baseline ops/performance/latency_baseline.json \
  --thresholds ops/performance/latency_thresholds.json \
  --iterations 600

CI executes the same guardrail check.

Baseline Snapshot (2026-06-01)

Key end-to-end latency baselines from latency_baseline.json:

  • book_update_burst: p95 = 0.0132 ms, p99 = 0.0198 ms
  • execution_spike: p95 = 0.0139 ms, p99 = 0.0177 ms
  • reconnect_storm: p95 = 0.0114 ms, p99 = 0.0134 ms

Optimization Note

MetricsCalculator.compute() was optimized to use DuckDB SQL aggregations and quantiles, reducing Python-side row scans.

Measured benchmark (scripts/benchmark_metrics_compute.py):

  • Python scan baseline: 12.623 ms
  • SQL aggregate implementation: 11.039 ms
  • Speedup: 1.14x