Files
calminer-docs/specifications/monte_carlo_simulation.md
zwitschi 29f16139a3 feat: documentation update
- Completed export workflow implementation (query builders, CSV/XLSX serializers, streaming API endpoints, UI modals, automated tests).
- Added export modal UI and client script to trigger downloads directly from dashboard.
- Documented import/export field mapping and usage guidelines in FR-008.
- Updated installation guide with export environment variables, dependencies, and CLI/CI usage instructions.
2025-11-11 18:34:02 +01:00

142 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Monte Carlo Simulation Specification
## 1. Purpose
Define the configuration, inputs, and outputs for CalMiner's Monte Carlo
simulation engine used to evaluate project scenarios with stochastic cash-flow
assumptions. The engine augments deterministic profitability metrics by
sampling cash-flow distributions and aggregating resulting Net Present Value
(NPV), Internal Rate of Return (IRR), and Payback Period statistics.
## 2. Scope
- Applies to scenario-level profitability analysis executed via
`services/simulation.py`.
- Covers configuration dataclasses (`SimulationConfig`, `CashFlowSpec`,
`DistributionSpec`) and supported distribution families.
- Outlines expectations for downstream reporting and visualization modules that
consume simulation results.
## 3. Inputs
### 3.1 Cash Flow Specifications
Each Monte Carlo run receives an ordered collection of `CashFlowSpec` entries.
Each spec pairs a deterministic `CashFlow` (amount, period index/date) with an
optional `DistributionSpec`. When no distribution is provided the deterministic
value is used for every iteration.
### 3.2 Simulation Configuration
`SimulationConfig` controls execution:
| Field | Description |
| ------------------------------------- | ----------------------------------------------------------------- |
| `iterations` | Number of Monte Carlo iterations (must be > 0). |
| `discount_rate` | Annual discount rate (decimal) passed to NPV helper. |
| `seed` | Optional RNG seed to ensure reproducible sampling. |
| `metrics` | Tuple of requested metrics (`npv`, `irr`, `payback`). |
| `percentiles` | Percentile cutoffs (0100) computed for each metric. |
| `compounds_per_year` | Compounding frequency reused by financial helpers. |
| `return_samples` | When `True`, raw metric samples are returned alongside summaries. |
| `residual_value` / `residual_periods` | Optional residual cash flow inputs reused by NPV. |
### 3.3 Context Metadata
Optional dictionaries provide dynamic parameters when sourcing distribution
means or other values:
- `scenario_context`: scenario-specific values (e.g., salvage mean, cost
overrides).
- `metadata`: shared configuration (e.g., global commodity price expectations).
## 4. Distributions
`DistributionSpec` defines stochastic behaviour:
| Property | Description |
| ------------ | ------------------------------------------------------------------------------- |
| `type` | `normal`, `lognormal`, `triangular`, or `discrete`. |
| `parameters` | Mapping of required parameters per distribution family. |
| `source` | How base parameters are sourced: `static`, `scenario_field`, or `metadata_key`. |
| `source_key` | Identifier used for non-static sources. |
### 4.1 Parameter Validation
- `normal`: requires non-negative `std_dev`; defaults `mean` to baseline cash
flow amount when omitted.
- `lognormal`: requires `mean` (mu in log space) and non-negative `sigma`.
- `triangular`: requires `min`, `mode`, `max` with constraint `min <= mode <= max`.
- `discrete`: requires paired `values`/`probabilities` sequences; probabilities
must be non-negative and sum to 1.0.
Invalid definitions raise `DistributionConfigError` before sampling.
## 5. Algorithm Overview
1. Seed a NumPy `Generator` (`default_rng(seed)`) unless a generator instance is
supplied.
2. For each iteration:
- Realise cash flows by sampling distributions or using deterministic
values.
- Compute requested metrics using shared helpers from
`services/financial.py`:
- NPV via `net_present_value` (respecting `residual_value` inputs).
- IRR via `internal_rate_of_return`; non-converging or invalid trajectories
return `NaN` and increment `failed_runs`.
- Payback via `payback_period`; scenarios failing to hit non-negative
cumulative cash flow record `NaN`.
3. Aggregate results into per-metric arrays; calculate summary statistics:
mean, sample standard deviation, min/max, and configured percentiles using
`numpy.percentile`.
4. Assemble `SimulationResult` containing summary descriptors and optional raw
samples when `return_samples` is enabled.
## 6. Outputs
`SimulationResult` includes:
- `iterations`: total iteration count executed.
- `summaries`: mapping of `SimulationMetric` to `MetricSummary` objects with:
- `mean`, `std_dev`, `minimum`, `maximum`.
- `percentiles`: mapping of configured percentile cutoffs to values.
- `sample_size`: number of successful (non-NaN) samples.
- `failed_runs`: count of iterations producing `NaN` for the metric.
- `samples`: optional mapping of metric to raw `numpy.ndarray` of samples when
detailed analysis is required downstream.
## 7. Error Handling
- Invalid configuration or missing context raises `DistributionConfigError`.
- Zero iterations or invalid percentile ranges raise `ValueError`.
- Financial helper exceptions (`ConvergenceError`, `PaybackNotReachedError`)
are captured per iteration and converted to `NaN` samples to preserve
aggregate results while flagging failure counts.
## 8. Usage Guidance
- Scenario services should construct `CashFlowSpec` instances from persisted
financial inputs and optional uncertainty definitions stored alongside the
scenario.
- Reporting routes can request raw samples when producing histogram or violin
plots; otherwise rely on `MetricSummary` statistics for tabular output.
- Visualizations implementing FR-005 should leverage percentile outputs to
render fan charts or confidence intervals.
- When integrating with scheduling workflows, persist the deterministic seed to
ensure repeated runs remain comparable.
## 9. Testing
`tests/test_simulation.py` covers deterministic parity with financial helpers,
seed reproducibility, context parameter sourcing, failure accounting for metrics
that cannot be computed, error handling for misconfigured distributions, and
sample-return functionality. Additional regression cases should accompany new
metrics or distribution families.
## 10. References
- Implementation: `calminer/services/simulation.py`
- Financial helpers: `calminer/services/financial.py`
- Tests: `calminer/tests/test_simulation.py`
- Related specification: `calminer-docs/specifications/financial_metrics.md`