# Monte Carlo Simulation Specification ## 1. Purpose Define the configuration, inputs, and outputs for CalMiner's Monte Carlo simulation engine used to evaluate project scenarios with stochastic cash-flow assumptions. The engine augments deterministic profitability metrics by sampling cash-flow distributions and aggregating resulting Net Present Value (NPV), Internal Rate of Return (IRR), and Payback Period statistics. ## 2. Scope - Applies to scenario-level profitability analysis executed via `services/simulation.py`. - Covers configuration dataclasses (`SimulationConfig`, `CashFlowSpec`, `DistributionSpec`) and supported distribution families. - Outlines expectations for downstream reporting and visualization modules that consume simulation results. ## 3. Inputs ### 3.1 Cash Flow Specifications Each Monte Carlo run receives an ordered collection of `CashFlowSpec` entries. Each spec pairs a deterministic `CashFlow` (amount, period index/date) with an optional `DistributionSpec`. When no distribution is provided the deterministic value is used for every iteration. ### 3.2 Simulation Configuration `SimulationConfig` controls execution: | Field | Description | | ------------------------------------- | ----------------------------------------------------------------- | | `iterations` | Number of Monte Carlo iterations (must be > 0). | | `discount_rate` | Annual discount rate (decimal) passed to NPV helper. | | `seed` | Optional RNG seed to ensure reproducible sampling. | | `metrics` | Tuple of requested metrics (`npv`, `irr`, `payback`). | | `percentiles` | Percentile cutoffs (0–100) computed for each metric. | | `compounds_per_year` | Compounding frequency reused by financial helpers. | | `return_samples` | When `True`, raw metric samples are returned alongside summaries. | | `residual_value` / `residual_periods` | Optional residual cash flow inputs reused by NPV. | ### 3.3 Context Metadata Optional dictionaries provide dynamic parameters when sourcing distribution means or other values: - `scenario_context`: scenario-specific values (e.g., salvage mean, cost overrides). - `metadata`: shared configuration (e.g., global commodity price expectations). ## 4. Distributions `DistributionSpec` defines stochastic behaviour: | Property | Description | | ------------ | ------------------------------------------------------------------------------- | | `type` | `normal`, `lognormal`, `triangular`, or `discrete`. | | `parameters` | Mapping of required parameters per distribution family. | | `source` | How base parameters are sourced: `static`, `scenario_field`, or `metadata_key`. | | `source_key` | Identifier used for non-static sources. | ### 4.1 Parameter Validation - `normal`: requires non-negative `std_dev`; defaults `mean` to baseline cash flow amount when omitted. - `lognormal`: requires `mean` (mu in log space) and non-negative `sigma`. - `triangular`: requires `min`, `mode`, `max` with constraint `min <= mode <= max`. - `discrete`: requires paired `values`/`probabilities` sequences; probabilities must be non-negative and sum to 1.0. Invalid definitions raise `DistributionConfigError` before sampling. ## 5. Algorithm Overview 1. Seed a NumPy `Generator` (`default_rng(seed)`) unless a generator instance is supplied. 2. For each iteration: - Realise cash flows by sampling distributions or using deterministic values. - Compute requested metrics using shared helpers from `services/financial.py`: - NPV via `net_present_value` (respecting `residual_value` inputs). - IRR via `internal_rate_of_return`; non-converging or invalid trajectories return `NaN` and increment `failed_runs`. - Payback via `payback_period`; scenarios failing to hit non-negative cumulative cash flow record `NaN`. 3. Aggregate results into per-metric arrays; calculate summary statistics: mean, sample standard deviation, min/max, and configured percentiles using `numpy.percentile`. 4. Assemble `SimulationResult` containing summary descriptors and optional raw samples when `return_samples` is enabled. ## 6. Outputs `SimulationResult` includes: - `iterations`: total iteration count executed. - `summaries`: mapping of `SimulationMetric` to `MetricSummary` objects with: - `mean`, `std_dev`, `minimum`, `maximum`. - `percentiles`: mapping of configured percentile cutoffs to values. - `sample_size`: number of successful (non-NaN) samples. - `failed_runs`: count of iterations producing `NaN` for the metric. - `samples`: optional mapping of metric to raw `numpy.ndarray` of samples when detailed analysis is required downstream. ## 7. Error Handling - Invalid configuration or missing context raises `DistributionConfigError`. - Zero iterations or invalid percentile ranges raise `ValueError`. - Financial helper exceptions (`ConvergenceError`, `PaybackNotReachedError`) are captured per iteration and converted to `NaN` samples to preserve aggregate results while flagging failure counts. ## 8. Usage Guidance - Scenario services should construct `CashFlowSpec` instances from persisted financial inputs and optional uncertainty definitions stored alongside the scenario. - Reporting routes can request raw samples when producing histogram or violin plots; otherwise rely on `MetricSummary` statistics for tabular output. - Visualizations implementing FR-005 should leverage percentile outputs to render fan charts or confidence intervals. - When integrating with scheduling workflows, persist the deterministic seed to ensure repeated runs remain comparable. ## 9. Testing `tests/test_simulation.py` covers deterministic parity with financial helpers, seed reproducibility, context parameter sourcing, failure accounting for metrics that cannot be computed, error handling for misconfigured distributions, and sample-return functionality. Additional regression cases should accompany new metrics or distribution families. ## 10. References - Implementation: `calminer/services/simulation.py` - Financial helpers: `calminer/services/financial.py` - Tests: `calminer/tests/test_simulation.py` - Related specification: `calminer-docs/specifications/financial_metrics.md`