allucanget/arbitrade

Fork 0

Files

T

History

zwitschi cc11082ea7

CI / lint-test-build (push) Failing after 19s

Details

feat: Implement latency profiling and guardrails for performance monitoring

- Added synthetic latency profiler scenarios and CLI scripts for baseline generation and regression checks.
- Introduced latency baseline and threshold artifacts for CI enforcement.
- Enhanced CI workflow with latency guardrail checks.
- Updated documentation to include latency profiling commands and performance metrics.
- Added unit tests for latency guardrail evaluation.

2026-06-01 14:47:52 +02:00

latency_baseline.json

feat: Implement latency profiling and guardrails for performance monitoring

2026-06-01 14:47:52 +02:00

latency_thresholds.json

feat: Implement latency profiling and guardrails for performance monitoring

2026-06-01 14:47:52 +02:00

README.md

feat: Implement latency profiling and guardrails for performance monitoring

2026-06-01 14:47:52 +02:00

README.md

Performance Hardening

This folder contains latency profiling baselines and guardrail thresholds used in CI.

Scenarios

The profiler covers representative load patterns:

book_update_burst: rapid market-data deltas with moderate detection load.
execution_spike: heavier detection/execution pressure.
reconnect_storm: frequent reconnect/reset behavior.

Profiling Commands

Generate a fresh profile:

python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json

Check current performance against the baseline and thresholds:

python scripts/check_latency_regression.py \
  --baseline ops/performance/latency_baseline.json \
  --thresholds ops/performance/latency_thresholds.json \
  --iterations 600

CI executes the same guardrail check.

Baseline Snapshot (2026-06-01)

Key end-to-end latency baselines from latency_baseline.json:

book_update_burst: p95 = 0.0132 ms, p99 = 0.0198 ms
execution_spike: p95 = 0.0139 ms, p99 = 0.0177 ms
reconnect_storm: p95 = 0.0114 ms, p99 = 0.0134 ms

Optimization Note

MetricsCalculator.compute() was optimized to use DuckDB SQL aggregations and quantiles, reducing Python-side row scans.

Measured benchmark (scripts/benchmark_metrics_compute.py):

Python scan baseline: 12.623 ms
SQL aggregate implementation: 11.039 ms
Speedup: 1.14x