feat: Implement latency profiling and guardrails for performance monitoring
CI / lint-test-build (push) Failing after 19s
CI / lint-test-build (push) Failing after 19s
- Added synthetic latency profiler scenarios and CLI scripts for baseline generation and regression checks. - Introduced latency baseline and threshold artifacts for CI enforcement. - Enhanced CI workflow with latency guardrail checks. - Updated documentation to include latency profiling commands and performance metrics. - Added unit tests for latency guardrail evaluation.
This commit is contained in:
@@ -0,0 +1,48 @@
|
||||
# Performance Hardening
|
||||
|
||||
This folder contains latency profiling baselines and guardrail thresholds used in CI.
|
||||
|
||||
## Scenarios
|
||||
|
||||
The profiler covers representative load patterns:
|
||||
|
||||
- `book_update_burst`: rapid market-data deltas with moderate detection load.
|
||||
- `execution_spike`: heavier detection/execution pressure.
|
||||
- `reconnect_storm`: frequent reconnect/reset behavior.
|
||||
|
||||
## Profiling Commands
|
||||
|
||||
Generate a fresh profile:
|
||||
|
||||
```powershell
|
||||
python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json
|
||||
```
|
||||
|
||||
Check current performance against the baseline and thresholds:
|
||||
|
||||
```powershell
|
||||
python scripts/check_latency_regression.py \
|
||||
--baseline ops/performance/latency_baseline.json \
|
||||
--thresholds ops/performance/latency_thresholds.json \
|
||||
--iterations 600
|
||||
```
|
||||
|
||||
CI executes the same guardrail check.
|
||||
|
||||
## Baseline Snapshot (2026-06-01)
|
||||
|
||||
Key end-to-end latency baselines from `latency_baseline.json`:
|
||||
|
||||
- `book_update_burst`: p95 = 0.0132 ms, p99 = 0.0198 ms
|
||||
- `execution_spike`: p95 = 0.0139 ms, p99 = 0.0177 ms
|
||||
- `reconnect_storm`: p95 = 0.0114 ms, p99 = 0.0134 ms
|
||||
|
||||
## Optimization Note
|
||||
|
||||
`MetricsCalculator.compute()` was optimized to use DuckDB SQL aggregations and quantiles, reducing Python-side row scans.
|
||||
|
||||
Measured benchmark (`scripts/benchmark_metrics_compute.py`):
|
||||
|
||||
- Python scan baseline: 12.623 ms
|
||||
- SQL aggregate implementation: 11.039 ms
|
||||
- Speedup: 1.14x
|
||||
@@ -0,0 +1,96 @@
|
||||
{
|
||||
"iterations": 600,
|
||||
"scenarios": {
|
||||
"book_update_burst": {
|
||||
"iterations": 600,
|
||||
"stages": {
|
||||
"ingest": {
|
||||
"p50_ms": 0.0028,
|
||||
"p95_ms": 0.0056,
|
||||
"p99_ms": 0.0083
|
||||
},
|
||||
"detect": {
|
||||
"p50_ms": 0.0034,
|
||||
"p95_ms": 0.005899999999999999,
|
||||
"p99_ms": 0.0081
|
||||
},
|
||||
"risk": {
|
||||
"p50_ms": 0.0002,
|
||||
"p95_ms": 0.0003,
|
||||
"p99_ms": 0.0006
|
||||
},
|
||||
"execution": {
|
||||
"p50_ms": 0.0006,
|
||||
"p95_ms": 0.0012,
|
||||
"p99_ms": 0.0020009999999999993
|
||||
},
|
||||
"end_to_end": {
|
||||
"p50_ms": 0.007,
|
||||
"p95_ms": 0.013204999999999996,
|
||||
"p99_ms": 0.019801
|
||||
}
|
||||
}
|
||||
},
|
||||
"execution_spike": {
|
||||
"iterations": 600,
|
||||
"stages": {
|
||||
"ingest": {
|
||||
"p50_ms": 0.0029,
|
||||
"p95_ms": 0.003,
|
||||
"p99_ms": 0.00431099999999999
|
||||
},
|
||||
"detect": {
|
||||
"p50_ms": 0.0097,
|
||||
"p95_ms": 0.0101,
|
||||
"p99_ms": 0.012404999999999996
|
||||
},
|
||||
"risk": {
|
||||
"p50_ms": 0.0002,
|
||||
"p95_ms": 0.00019999999999999998,
|
||||
"p99_ms": 0.0003
|
||||
},
|
||||
"execution": {
|
||||
"p50_ms": 0.0006,
|
||||
"p95_ms": 0.0007,
|
||||
"p99_ms": 0.001000999999999999
|
||||
},
|
||||
"end_to_end": {
|
||||
"p50_ms": 0.0135,
|
||||
"p95_ms": 0.0139,
|
||||
"p99_ms": 0.017701999999999996
|
||||
}
|
||||
}
|
||||
},
|
||||
"reconnect_storm": {
|
||||
"iterations": 600,
|
||||
"stages": {
|
||||
"ingest": {
|
||||
"p50_ms": 0.0029,
|
||||
"p95_ms": 0.0039,
|
||||
"p99_ms": 0.0047
|
||||
},
|
||||
"detect": {
|
||||
"p50_ms": 0.0051,
|
||||
"p95_ms": 0.006,
|
||||
"p99_ms": 0.007101999999999998
|
||||
},
|
||||
"risk": {
|
||||
"p50_ms": 0.0002,
|
||||
"p95_ms": 0.00019999999999999998,
|
||||
"p99_ms": 0.0003009999999999991
|
||||
},
|
||||
"execution": {
|
||||
"p50_ms": 0.0006,
|
||||
"p95_ms": 0.0007999999999999999,
|
||||
"p99_ms": 0.0011009999999999991
|
||||
},
|
||||
"end_to_end": {
|
||||
"p50_ms": 0.0088,
|
||||
"p95_ms": 0.0114,
|
||||
"p99_ms": 0.013403999999999998
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"generated_at": "2026-06-01T12:35:48.836000+00:00"
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"default": {
|
||||
"p95_ms": 3.0,
|
||||
"p99_ms": 3.5
|
||||
},
|
||||
"scenarios": {
|
||||
"execution_spike": {
|
||||
"p95_ms": 3.2,
|
||||
"p99_ms": 3.8
|
||||
},
|
||||
"reconnect_storm": {
|
||||
"p95_ms": 3.4,
|
||||
"p99_ms": 4.0
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user