feat: Implement latency profiling and guardrails for performance monitoring
CI / lint-test-build (push) Failing after 19s

- Added synthetic latency profiler scenarios and CLI scripts for baseline generation and regression checks.
- Introduced latency baseline and threshold artifacts for CI enforcement.
- Enhanced CI workflow with latency guardrail checks.
- Updated documentation to include latency profiling commands and performance metrics.
- Added unit tests for latency guardrail evaluation.
This commit is contained in:
2026-06-01 14:47:52 +02:00
parent c17f41aaf8
commit cc11082ea7
16 changed files with 900 additions and 56 deletions
+48
View File
@@ -0,0 +1,48 @@
# Performance Hardening
This folder contains latency profiling baselines and guardrail thresholds used in CI.
## Scenarios
The profiler covers representative load patterns:
- `book_update_burst`: rapid market-data deltas with moderate detection load.
- `execution_spike`: heavier detection/execution pressure.
- `reconnect_storm`: frequent reconnect/reset behavior.
## Profiling Commands
Generate a fresh profile:
```powershell
python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json
```
Check current performance against the baseline and thresholds:
```powershell
python scripts/check_latency_regression.py \
--baseline ops/performance/latency_baseline.json \
--thresholds ops/performance/latency_thresholds.json \
--iterations 600
```
CI executes the same guardrail check.
## Baseline Snapshot (2026-06-01)
Key end-to-end latency baselines from `latency_baseline.json`:
- `book_update_burst`: p95 = 0.0132 ms, p99 = 0.0198 ms
- `execution_spike`: p95 = 0.0139 ms, p99 = 0.0177 ms
- `reconnect_storm`: p95 = 0.0114 ms, p99 = 0.0134 ms
## Optimization Note
`MetricsCalculator.compute()` was optimized to use DuckDB SQL aggregations and quantiles, reducing Python-side row scans.
Measured benchmark (`scripts/benchmark_metrics_compute.py`):
- Python scan baseline: 12.623 ms
- SQL aggregate implementation: 11.039 ms
- Speedup: 1.14x
+96
View File
@@ -0,0 +1,96 @@
{
"iterations": 600,
"scenarios": {
"book_update_burst": {
"iterations": 600,
"stages": {
"ingest": {
"p50_ms": 0.0028,
"p95_ms": 0.0056,
"p99_ms": 0.0083
},
"detect": {
"p50_ms": 0.0034,
"p95_ms": 0.005899999999999999,
"p99_ms": 0.0081
},
"risk": {
"p50_ms": 0.0002,
"p95_ms": 0.0003,
"p99_ms": 0.0006
},
"execution": {
"p50_ms": 0.0006,
"p95_ms": 0.0012,
"p99_ms": 0.0020009999999999993
},
"end_to_end": {
"p50_ms": 0.007,
"p95_ms": 0.013204999999999996,
"p99_ms": 0.019801
}
}
},
"execution_spike": {
"iterations": 600,
"stages": {
"ingest": {
"p50_ms": 0.0029,
"p95_ms": 0.003,
"p99_ms": 0.00431099999999999
},
"detect": {
"p50_ms": 0.0097,
"p95_ms": 0.0101,
"p99_ms": 0.012404999999999996
},
"risk": {
"p50_ms": 0.0002,
"p95_ms": 0.00019999999999999998,
"p99_ms": 0.0003
},
"execution": {
"p50_ms": 0.0006,
"p95_ms": 0.0007,
"p99_ms": 0.001000999999999999
},
"end_to_end": {
"p50_ms": 0.0135,
"p95_ms": 0.0139,
"p99_ms": 0.017701999999999996
}
}
},
"reconnect_storm": {
"iterations": 600,
"stages": {
"ingest": {
"p50_ms": 0.0029,
"p95_ms": 0.0039,
"p99_ms": 0.0047
},
"detect": {
"p50_ms": 0.0051,
"p95_ms": 0.006,
"p99_ms": 0.007101999999999998
},
"risk": {
"p50_ms": 0.0002,
"p95_ms": 0.00019999999999999998,
"p99_ms": 0.0003009999999999991
},
"execution": {
"p50_ms": 0.0006,
"p95_ms": 0.0007999999999999999,
"p99_ms": 0.0011009999999999991
},
"end_to_end": {
"p50_ms": 0.0088,
"p95_ms": 0.0114,
"p99_ms": 0.013403999999999998
}
}
}
},
"generated_at": "2026-06-01T12:35:48.836000+00:00"
}
+16
View File
@@ -0,0 +1,16 @@
{
"default": {
"p95_ms": 3.0,
"p99_ms": 3.5
},
"scenarios": {
"execution_spike": {
"p95_ms": 3.2,
"p99_ms": 3.8
},
"reconnect_storm": {
"p95_ms": 3.4,
"p99_ms": 4.0
}
}
}