feat: Implement latency profiling and guardrails for performance monitoring

- Added synthetic latency profiler scenarios and CLI scripts for baseline generation and regression checks. - Introduced latency baseline and threshold artifacts for CI enforcement. - Enhanced CI workflow with latency guardrail checks. - Updated documentation to include latency profiling commands and performance metrics. - Added unit tests for latency guardrail evaluation.
2026-06-01 14:47:52 +02:00
parent c17f41aaf8
commit cc11082ea7
16 changed files with 900 additions and 56 deletions
@@ -0,0 +1,48 @@
+# Performance Hardening
+
+This folder contains latency profiling baselines and guardrail thresholds used in CI.
+
+## Scenarios
+
+The profiler covers representative load patterns:
+
+- `book_update_burst`: rapid market-data deltas with moderate detection load.
+- `execution_spike`: heavier detection/execution pressure.
+- `reconnect_storm`: frequent reconnect/reset behavior.
+
+## Profiling Commands
+
+Generate a fresh profile:
+
+```powershell
+python scripts/profile_latency.py --iterations 600 --output ops/performance/latency_baseline.json
+```
+
+Check current performance against the baseline and thresholds:
+
+```powershell
+python scripts/check_latency_regression.py \
+  --baseline ops/performance/latency_baseline.json \
+  --thresholds ops/performance/latency_thresholds.json \
+  --iterations 600
+```
+
+CI executes the same guardrail check.
+
+## Baseline Snapshot (2026-06-01)
+
+Key end-to-end latency baselines from `latency_baseline.json`:
+
+- `book_update_burst`: p95 = 0.0132 ms, p99 = 0.0198 ms
+- `execution_spike`: p95 = 0.0139 ms, p99 = 0.0177 ms
+- `reconnect_storm`: p95 = 0.0114 ms, p99 = 0.0134 ms
+
+## Optimization Note
+
+`MetricsCalculator.compute()` was optimized to use DuckDB SQL aggregations and quantiles, reducing Python-side row scans.
+
+Measured benchmark (`scripts/benchmark_metrics_compute.py`):
+
+- Python scan baseline: 12.623 ms
+- SQL aggregate implementation: 11.039 ms
+- Speedup: 1.14x
@@ -0,0 +1,96 @@
+{
+  "iterations": 600,
+  "scenarios": {
+    "book_update_burst": {
+      "iterations": 600,
+      "stages": {
+        "ingest": {
+          "p50_ms": 0.0028,
+          "p95_ms": 0.0056,
+          "p99_ms": 0.0083
+        },
+        "detect": {
+          "p50_ms": 0.0034,
+          "p95_ms": 0.005899999999999999,
+          "p99_ms": 0.0081
+        },
+        "risk": {
+          "p50_ms": 0.0002,
+          "p95_ms": 0.0003,
+          "p99_ms": 0.0006
+        },
+        "execution": {
+          "p50_ms": 0.0006,
+          "p95_ms": 0.0012,
+          "p99_ms": 0.0020009999999999993
+        },
+        "end_to_end": {
+          "p50_ms": 0.007,
+          "p95_ms": 0.013204999999999996,
+          "p99_ms": 0.019801
+        }
+      }
+    },
+    "execution_spike": {
+      "iterations": 600,
+      "stages": {
+        "ingest": {
+          "p50_ms": 0.0029,
+          "p95_ms": 0.003,
+          "p99_ms": 0.00431099999999999
+        },
+        "detect": {
+          "p50_ms": 0.0097,
+          "p95_ms": 0.0101,
+          "p99_ms": 0.012404999999999996
+        },
+        "risk": {
+          "p50_ms": 0.0002,
+          "p95_ms": 0.00019999999999999998,
+          "p99_ms": 0.0003
+        },
+        "execution": {
+          "p50_ms": 0.0006,
+          "p95_ms": 0.0007,
+          "p99_ms": 0.001000999999999999
+        },
+        "end_to_end": {
+          "p50_ms": 0.0135,
+          "p95_ms": 0.0139,
+          "p99_ms": 0.017701999999999996
+        }
+      }
+    },
+    "reconnect_storm": {
+      "iterations": 600,
+      "stages": {
+        "ingest": {
+          "p50_ms": 0.0029,
+          "p95_ms": 0.0039,
+          "p99_ms": 0.0047
+        },
+        "detect": {
+          "p50_ms": 0.0051,
+          "p95_ms": 0.006,
+          "p99_ms": 0.007101999999999998
+        },
+        "risk": {
+          "p50_ms": 0.0002,
+          "p95_ms": 0.00019999999999999998,
+          "p99_ms": 0.0003009999999999991
+        },
+        "execution": {
+          "p50_ms": 0.0006,
+          "p95_ms": 0.0007999999999999999,
+          "p99_ms": 0.0011009999999999991
+        },
+        "end_to_end": {
+          "p50_ms": 0.0088,
+          "p95_ms": 0.0114,
+          "p99_ms": 0.013403999999999998
+        }
+      }
+    }
+  },
+  "generated_at": "2026-06-01T12:35:48.836000+00:00"
+}
@@ -0,0 +1,16 @@
+{
+  "default": {
+    "p95_ms": 3.0,
+    "p99_ms": 3.5
+  },
+  "scenarios": {
+    "execution_spike": {
+      "p95_ms": 3.2,
+      "p99_ms": 3.8
+    },
+    "reconnect_storm": {
+      "p95_ms": 3.4,
+      "p99_ms": 4.0
+    }
+  }
+}