feat: documentation update

- Completed export workflow implementation (query builders, CSV/XLSX serializers, streaming API endpoints, UI modals, automated tests). - Added export modal UI and client script to trigger downloads directly from dashboard. - Documented import/export field mapping and usage guidelines in FR-008. - Updated installation guide with export environment variables, dependencies, and CLI/CI usage instructions.
2025-11-11 18:34:02 +01:00
parent 02906bc960
commit 29f16139a3
11 changed files with 751 additions and 15 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,8 @@
 # Change Log

-<!-- TODO: Add description for CHANGELOG.md -->
+## 2025-11-10
+
+- Completed export workflow implementation (query builders, CSV/XLSX serializers, streaming API endpoints, UI modals, automated tests).
+- Added export modal UI and client script to trigger downloads directly from dashboard.
+- Documented import/export field mapping and usage guidelines in FR-008.
+- Updated installation guide with export environment variables, dependencies, and CLI/CI usage instructions.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -2,14 +2,42 @@

 We welcome contributions to Calminer! If you would like to contribute, please follow these guidelines:

-1. **Fork the Repository**: Create a personal copy of the repository on your GitHub account.
+## Fork the Repository

-2. **Create a Branch**: Before making changes, create a new branch for your feature or bug fix.
+Create a personal copy of the repository on your GitHub account.

-3. **Make Your Changes**: Implement your changes in the new branch.
+## Create a Branch

-4. **Write Tests**: Ensure that your changes are covered by tests.
+Before making changes, create a new branch for your feature or bug fix.

-5. **Submit a Pull Request**: Once you are satisfied with your changes, submit a pull request to the main repository.
+## Make Your Changes
+
+Implement your changes in the new branch.
+
+## Write Tests
+
+Ensure that your changes are covered by tests.
+
+## Run Test Suite With Coverage
+
+Execute the default pytest run to enforce the 80% project-wide coverage threshold and review missing lines in the terminal report.
+
+```bash
+pytest
+```
+
+## Run Export Test Suite
+
+Before opening a pull request, run the export-focused pytest module to verify CSV/XLSX streaming endpoints.
+
+```bash
+pytest tests/test_export_routes.py
+```
+
+This ensures the API headers, download content, and modal routes remain functional.
+
+## Submit a Pull Request
+
+Once you are satisfied with your changes, submit a pull request to the main repository.

 Thank you for your interest in contributing to Calminer!
--- a/admin/installation.md
+++ b/admin/installation.md
@@ -39,12 +39,21 @@ Before you begin, ensure that you have the following prerequisites installed on

 3. **Access the Application**

-   Once the containers are up and running, you can access the Calminer application by navigating to `http://localhost:3000` in your web browser.
+   Once the containers are up and running, you can access the Calminer application by navigating to `http://localhost:8003` in your web browser.
   If you are running the application on a remote server, replace `localhost` with the server's IP address or domain name.

 4. **Database Initialization**

-   The first time you run the application, the database will be initialized automatically. Ensure that the database container is running and accessible.
+   The application container executes `/app/scripts/docker-entrypoint.sh` before launching the API. This entrypoint runs `python -m scripts.run_migrations`, which applies all Alembic migrations and keeps the schema current on every startup. No additional action is required when using Docker Compose, but you can review the logs to confirm the migrations completed successfully.
+
+   For local development without Docker, run the same command after setting your environment variables:
+
+   ```bash
+   # activate your virtualenv first
+   python -m scripts.run_migrations
+   ```
+
+   The script is idempotent; it will only apply pending migrations.

 5. **Seed Default Accounts and Roles**

@@ -65,15 +74,24 @@ Before you begin, ensure that you have the following prerequisites installed on

   You can rerun the script safely; it updates existing roles and user details without creating duplicates.

-6. **Stopping the Application**
+### Export Dependencies

-   To stop the application, run the following command in the terminal:
+Export and monitoring workflows require the following Python packages in addition to the core dependencies:

-   ```bash
-   docker compose down
-   ```
+- `pandas`
+- `openpyxl`
+- `prometheus-client`

-   This command will stop and remove the containers, networks, and volumes created by Docker Compose.
+These libraries are already listed in `requirements.txt`. Ensure they are installed in your virtual environment if you are not using Docker.
+
+### Environment Variables for Export Features
+
+While exports reuse the existing database configuration, you may optionally set the following variables to adjust behavior:
+
+- `CALMINER_EXPORT_MAX_ROWS` — override default pagination when generating exports (optional).
+- `CALMINER_EXPORT_METADATA` — enable (`true`) or disable (`false`) the metadata sheet in Excel exports by default (UI form still allows per-request overrides).
+
+Set these variables in your `.env` file or compose environment section before launching the stack.

 ## Docker Configuration

@@ -83,6 +101,45 @@ The `docker-compose.yml` file contains the configuration for the Calminer applic

 The application uses environment variables to configure various settings. You can set these variables in a `.env` file in the root directory of the project. Refer to the `docker-compose.yml` file for a list of available environment variables and their default values.

+Key variables relevant to import/export workflows:
+
+| Variable                      | Default   | Description                                                                     |
+| ----------------------------- | --------- | ------------------------------------------------------------------------------- |
+| `CALMINER_EXPORT_MAX_ROWS`    | _(unset)_ | Optional safety guard to limit the number of rows exported in a single request. |
+| `CALMINER_EXPORT_METADATA`    | `true`    | Controls whether metadata sheets are generated by default during Excel exports. |
+| `CALMINER_IMPORT_STAGING_TTL` | `300`     | Controls how long staged import tokens remain valid before expiration.          |
+| `CALMINER_IMPORT_MAX_ROWS`    | _(unset)_ | Optional guard to prevent excessively large import files.                       |
+
+### Running Export Workflows Locally
+
+1. Activate your virtual environment and ensure dependencies are installed:
+
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+2. Start the FastAPI application (or use `docker compose up`).
+
+3. Use the `/exports/projects` or `/exports/scenarios` endpoints to request CSV/XLSX downloads:
+
+   ```bash
+   curl -X POST http://localhost:8000/exports/projects \
+     -H "Content-Type: application/json" \
+     -d '{"format": "csv"}' --output projects.csv
+
+   curl -X POST http://localhost:8000/exports/projects \
+     -H "Content-Type: application/json" \
+     -d '{"format": "xlsx"}' --output projects.xlsx
+   ```
+
+4. The Prometheus metrics endpoint is available at `/metrics` once the app is running. Ensure your monitoring stack scrapes it (e.g., Prometheus target `localhost:8000`).
+
+5. For automated verification in CI pipelines, invoke the dedicated pytest module:
+
+   ```bash
+   pytest tests/test_export_routes.py
+   ```
+
 ### Volumes

 The application uses Docker volumes to persist data. The following volumes are defined in the `docker-compose.yml` file:
@@ -92,6 +149,16 @@ The application uses Docker volumes to persist data. The following volumes are d

 Ensure that these volumes are properly configured to avoid data loss during container restarts or removals.

+## Stopping the Application
+
+To stop the application, run the following command in the terminal:
+
+```bash
+docker compose down
+```
+
+This command will stop and remove the containers, networks, and volumes created by Docker Compose.
+
 ## Troubleshooting

 If you encounter any issues during the installation or deployment process, refer to the following troubleshooting tips:
--- a/admin/runbooks/export_operations.md
+++ b/admin/runbooks/export_operations.md
@@ -0,0 +1,81 @@
+# Export Operations Runbook
+
+## Purpose
+
+This runbook provides step-by-step guidance for operators to execute project and scenario exports, monitor their status, and troubleshoot common issues.
+
+## Prerequisites
+
+- Access to the CalMiner web UI with role `analyst`, `project_manager`, or `admin`.
+- Direct API access (curl or HTTP client) if performing scripted exports.
+- Environment variables configured per [Installation Guide](installation.md), especially:
+  - `CALMINER_EXPORT_MAX_ROWS`
+  - `CALMINER_EXPORT_METADATA`
+
+## Success Path
+
+### Export via Web UI
+
+1. Sign in to CalMiner.
+2. Navigate to the dashboard and click **Export** next to either _Recent Projects_ or _Scenario Alerts_.
+3. In the modal dialog:
+   - Choose **CSV** or **Excel (.xlsx)**.
+   - Toggle **Include metadata sheet** (Excel only) as needed.
+   - Click **Download**.
+4. Confirm that the browser downloads a file named `projects-YYYYMMDD-HHMMSS.csv` (or `.xlsx`).
+
+### Export via API (curl)
+
+```bash
+# CSV export of projects
+curl -X POST https://<host>/exports/projects \
+  -H "Content-Type: application/json" \
+  -d '{"format": "csv"}' \
+  --output projects.csv
+
+# Excel export of scenarios
+curl -X POST https://<host>/exports/scenarios \
+  -H "Content-Type: application/json" \
+  -d '{"format": "xlsx"}' \
+  --output scenarios.xlsx
+```
+
+Expected response headers:
+
+- `Content-Type: text/csv` or `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
+- `Content-Disposition: attachment; filename=...`
+
+## Troubleshooting
+
+| Symptom                                  | Likely Cause                                   | Resolution                                                                  |
+| ---------------------------------------- | ---------------------------------------------- | --------------------------------------------------------------------------- |
+| `403 Forbidden`                          | User lacks analyst/project_manager/admin role. | Assign appropriate role or escalate to administrator.                       |
+| `400 Bad Request` with validation errors | Unsupported format or malformed filters.       | Verify payload matches schema (`format` = `csv` or `xlsx`); review filters. |
+| Empty dataset                            | No matching records for filters.               | Validate data exists; adjust filters or check project/scenario status.      |
+| Large exports time out                   | Dataset exceeds `CALMINER_EXPORT_MAX_ROWS`.    | Increase limit (with caution) or export narrower dataset.                   |
+
+## Monitoring & Logging
+
+- Success and error events are logged via structured events (`import.preview`, `import.commit`, `export`). Ensure your log sink (e.g., ELK) captures JSON payloads and index fields such as `dataset`, `status`, `row_count`, and `token` for filtering.
+- Prometheus endpoint: `GET /metrics`
+
+  - Sample scrape config:
+
+    ```yaml
+    scrape_configs:
+      - job_name: calminer
+        static_configs:
+          - targets: ["calminer.local:8000"]
+    ```
+
+  - Key metrics:
+    - `calminer_import_total` / `calminer_export_total` — counters labelled by `dataset`, `action`, and `status` (imports) or `dataset`, `status`, and `format` (exports).
+    - `calminer_import_duration_seconds` / `calminer_export_duration_seconds` — histograms for measuring operation duration.
+  - Alerting suggestions:
+    - Trigger when `calminer_export_total{status="failure"}` increases for the last 5 minutes.
+    - Trigger when 95th percentile of `calminer_export_duration_seconds` exceeds your SLA threshold.
+
+- Dashboard recommendations:
+  - Plot export/import throughput split by dataset and format.
+  - Surface recent failures with `detail` and `error` metadata pulled from the `import_export_logs` table.
+  - Combine logs, metrics, and DB audit records to trace user actions end-to-end.
--- a/architecture/08_concepts.md
+++ b/architecture/08_concepts.md
@@ -75,6 +75,8 @@ The system employs a relational data model to manage and store information effic

 A detailed [Data Model](08_concepts/02_data_model.md) documentation is available.

+Discounted cash-flow metrics (NPV, IRR, Payback) referenced by the economic portion of the data model are described in the [Financial Metrics Specification](../specifications/financial_metrics.md), while stochastic extensions leverage the Monte Carlo engine documented in the [Monte Carlo Simulation Specification](../specifications/monte_carlo_simulation.md).
+
 All data interactions are handled through the [Data Access Layer](05_building_block_view.md#data-access-layer), ensuring consistency and integrity across operations.

 ## Security
--- a/architecture/08_concepts/02_data_model.md
+++ b/architecture/08_concepts/02_data_model.md
@@ -32,6 +32,7 @@ The data model for the system is designed to capture the essential entities and
      - [Product Sales](#product-sales)
    - [Investment Model](#investment-model)
    - [Economic Model](#economic-model)
+      - [Discounted Cash Flow Metrics](#discounted-cash-flow-metrics)
    - [Risk Model](#risk-model)
  - [Parameter](#parameter)
  - [Scenario](#scenario)
@@ -74,6 +75,9 @@ erDiagram
    Unit ||--o{ QualityMetric : used_in
    Unit ||--o{ Parameter : used_in

+    MiningTechnology ||--o{ Parameter : has_many
+    MiningTechnology ||--o{ QualityMetric : has_many
+    MiningTechnology ||--o{ MonteCarloSimulation : has_many
 ```

 ## User
@@ -483,6 +487,14 @@ erDiagram
    FinancialModel ||--o{ EconomicModel : includes
 ```

+#### Discounted Cash Flow Metrics
+
+CalMiner standardises the computation of NPV, IRR, and Payback Period through the shared helpers in `services/financial.py`. The detailed algorithms, assumptions (compounding frequency, period anchoring, residual handling), and regression coverage are documented in [Financial Metrics Specification](../../specifications/financial_metrics.md). Scenario evaluation services and downstream analytics should rely on these helpers to ensure consistency across API, UI, and reporting features.
+
+#### Monte Carlo Simulation
+
+Stochastic profitability analysis builds on the deterministic helpers by sampling cash-flow distributions defined per scenario. The configuration contracts, supported distributions, and result schema are described in [Monte Carlo Simulation Specification](../../specifications/monte_carlo_simulation.md). Scenario evaluation flows should call `services/simulation.py` to generate iteration summaries and percentile data for reporting and visualisation features.
+
 ### Risk Model

 The Risk Model identifies and evaluates potential risks associated with mining projects. It includes risk factors, their probabilities, and potential impacts on project outcomes.
--- a/requirements/FR-008.md
+++ b/requirements/FR-008.md
@@ -33,6 +33,8 @@ Exporting analysis results in multiple formats is essential for users who need t
 - The export functionality should be accessible from relevant areas of the application (e.g., project dashboards, analysis results pages).
 - The system should log export activities for auditing and monitoring purposes.
 - Import and export flows must share a consistent schema contract so that data exported from the platform can be re-imported without loss.
+- Export endpoints must respect role-based access rules (analyst, project_manager, admin) and return streaming responses with proper content-disposition headers.
+- UI tooling must support triggered downloads via modal forms with format and metadata controls.

 ## Import/Export Field Mapping

--- a/specifications/financial_metrics.md
+++ b/specifications/financial_metrics.md
@@ -0,0 +1,184 @@
+# Financial Metrics Specification
+
+## 1. Purpose
+
+Define the standard methodology CalMiner uses to compute discounted-cash-flow
+profitability metrics, including Net Present Value (NPV), Internal Rate of Return
+(IRR), and Payback Period. These calculations underpin scenario evaluation,
+reporting, and investment decision support within the platform.
+
+## 2. Scope
+
+- Applies to scenario-level profitability analysis using cash flows stored in
+  the application database.
+- Covers deterministic cash-flow evaluation; stochastic extensions (e.g., Monte
+  Carlo) may overlay these metrics but should reference this specification.
+- Documents the assumptions implemented in `services/financial.py` and the
+  related pytest coverage in `tests/test_financial.py`.
+
+## 3. Inputs and Definitions
+
+| Symbol   | Name                    | Description                                 | Units / Domain    | Notes                                                     |
+| -------- | ----------------------- | ------------------------------------------- | ----------------- | --------------------------------------------------------- |
+| $CF_t$   | Cash flow at period $t$ | Currency amount (positive or negative)      | Scenario currency | Negative values typically represent investments/outflows. |
+| $t$      | Period index            | Fractional period (0 = anchor)              | Real number       | Derived from explicit index or calendar date.             |
+| $r$      | Discount rate           | Decimal representation of the annual rate   | $r > -1$          | Scenario configuration provides default rate.             |
+| $m$      | Compounds per year      | Compounding frequency                       | Positive integer  | Defaults to 1 (annual).                                   |
+| $RV$     | Residual value          | Terminal value realised after final period  | Scenario currency | Optional.                                                 |
+| $t_{RV}$ | Residual periods        | Timing of residual value relative to anchor | Real number       | Defaults to last period + 1.                              |
+
+### Period Anchoring and Timing
+
+Cash flows can be supplied with either:
+
+1. `period_index` — explicit integers/floats overriding all other timing.
+2. `date` — calendar dates. The earliest dated flow anchors the timeline and
+   subsequent dates convert the day difference into fractional periods using a
+   365-day year divided by `compounds_per_year`.
+3. Neither — flows default to sequential periods based on input order.
+
+This aligns with `normalize_cash_flows` in `services/financial.py`, ensuring all
+calculations receive `(amount, periods)` tuples.
+
+## 4. Net Present Value (NPV)
+
+### Formula
+
+For a set of cash flows $CF_t$ with discount rate $r$ and compounding frequency
+$m$:
+
+$$
+\text{NPV} = \sum_{t=0}^{n} \frac{CF_t}{\left(1 + \frac{r}{m}\right)^{t}} +
+\begin{cases}
+\dfrac{RV}{\left(1 + \frac{r}{m}\right)^{t_{RV}}} & \text{if residual value present} \\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+### Implementation Notes
+
+- `discount_factor` computes $(1 + r/m)^{-t}`; NPV iterates over the normalised
+flows and sums `amount \* factor`.
+- Residual values default to one period after the final cash flow when
+  `residual_periods` is omitted.
+- Empty cash-flow sequences return 0 unless a residual value is supplied.
+
+## 5. Internal Rate of Return (IRR)
+
+### Definition
+
+IRR is the discount rate $r$ for which NPV equals zero:
+
+$$
+0 = \sum_{t=0}^{n} \frac{CF_t}{\left(1 + \frac{r}{m}\right)^{t}}
+$$
+
+### Solver Behaviour
+
+- Newton–Raphson iteration starts from `guess` (default 10%).
+- Derivative instability or non-finite values trigger a fallback to a bracketed
+  bisection search between:
+  - Lower bound: $-0.99 \times m$
+  - Upper bound: $10.0$ (doubles until the root is bracketed or attempts exceed 12)
+- Raises `ConvergenceError` when no sign change is found or the bisection fails
+  within the iteration budget (`max_iterations` default 100; bisection uses
+  double this limit).
+- Validates that the cash-flow series includes at least one negative and one
+  positive value; otherwise IRR is undefined and a `ValueError` is raised.
+
+### Caveats
+
+- Multiple sign changes may yield multiple IRRs. The solver returns the root it
+  finds within the configured bounds; scenarios must interpret the result in
+  context.
+- Rates less than `-1 * m` imply nonphysical periodic rates and are excluded.
+
+## 6. Payback Period
+
+### Definition
+
+The payback period is the earliest period $t$ where cumulative cash flows become
+non-negative. With fractional interpolation (default behaviour), the period is
+calculated as:
+
+$$
+\text{Payback} = t_{prev} + \left(\frac{-\text{Cumulative}_{prev}}{CF_t}\right)
+\times (t - t_{prev})
+$$
+
+where $t_{prev}$ is the previous period with negative cumulative cash flow.
+
+### Implementation Notes
+
+- Cash flows are sorted by period to ensure chronological accumulation.
+- When `allow_fractional` is `False`, the function returns the first period with
+  non-negative cumulative total without interpolation.
+- `PaybackNotReachedError` is raised if the cumulative total never becomes
+  non-negative.
+
+## 7. Examples
+
+### Example 1: Baseline Project
+
+- Initial investment: $-1,000,000$ at period 0.
+- Annual inflows: 300k, 320k, 340k, 360k, 450k (periods 1-5).
+- Discount rate: 8% annual, `compounds_per_year = 1`.
+
+| Period | Cash Flow (currency) |
+| ------ | -------------------- |
+| 0      | -1,000,000           |
+| 1      | 300,000              |
+| 2      | 320,000              |
+| 3      | 340,000              |
+| 4      | 360,000              |
+| 5      | 450,000              |
+
+- `net_present_value` ≈ 205,759
+- `internal_rate_of_return` ≈ 0.158
+- `payback_period` ≈ 4.13 periods
+
+### Example 2: Residual Value with Irregular Timing
+
+- Investment: -500,000 on 2024-01-01
+- Cash inflows on irregular dates (2024-07-01: 180k, 2025-01-01: 200k,
+  2025-11-01: 260k)
+- Residual value 150k realised two years after final inflow
+- Discount rate: 10%, `compounds_per_year = 4`
+
+NPV discounts each cash flow by converting day deltas to quarterly periods. The
+residual is discounted at `t_{RV} = last_period + 2` (because the override is
+supplied).
+
+## 8. Testing Strategy
+
+`tests/test_financial.py` exercises:
+
+- `normalize_cash_flows` with date-based, index-based, and sequential cash-flow
+  inputs.
+- NPV calculations with and without residual values, including discount-rate
+  sensitivity checks.
+- IRR convergence success cases, invalid inputs, and non-converging scenarios.
+- Payback period exact, fractional, and never-payback cases.
+
+Developers extending the financial metrics should add regression tests covering
+new assumptions or solver behaviour.
+
+## 9. Integration Notes
+
+- Scenario evaluation services should pass cash flows as `CashFlow` instances to
+  reuse the shared normalisation logic.
+- UI and reporting layers should display rates as percentages but supply them as
+  decimals to the service layer.
+- Future Monte Carlo or sensitivity analyses can reuse the same helpers to
+  evaluate each simulated cash-flow path.
+
+## 10. References
+
+- Internal implementation: `calminer/services/financial.py`
+- Tests: `calminer/tests/test_financial.py`
+- Related specification: `calminer-docs/specifications/price_calculation.md`
+- Architecture context: `calminer-docs/architecture/08_concepts/02_data_model.md`
+
+```}
+
+```
--- a/specifications/monte_carlo_simulation.md
+++ b/specifications/monte_carlo_simulation.md
@@ -0,0 +1,141 @@
+# Monte Carlo Simulation Specification
+
+## 1. Purpose
+
+Define the configuration, inputs, and outputs for CalMiner's Monte Carlo
+simulation engine used to evaluate project scenarios with stochastic cash-flow
+assumptions. The engine augments deterministic profitability metrics by
+sampling cash-flow distributions and aggregating resulting Net Present Value
+(NPV), Internal Rate of Return (IRR), and Payback Period statistics.
+
+## 2. Scope
+
+- Applies to scenario-level profitability analysis executed via
+  `services/simulation.py`.
+- Covers configuration dataclasses (`SimulationConfig`, `CashFlowSpec`,
+  `DistributionSpec`) and supported distribution families.
+- Outlines expectations for downstream reporting and visualization modules that
+  consume simulation results.
+
+## 3. Inputs
+
+### 3.1 Cash Flow Specifications
+
+Each Monte Carlo run receives an ordered collection of `CashFlowSpec` entries.
+Each spec pairs a deterministic `CashFlow` (amount, period index/date) with an
+optional `DistributionSpec`. When no distribution is provided the deterministic
+value is used for every iteration.
+
+### 3.2 Simulation Configuration
+
+`SimulationConfig` controls execution:
+
+| Field                                 | Description                                                       |
+| ------------------------------------- | ----------------------------------------------------------------- |
+| `iterations`                          | Number of Monte Carlo iterations (must be > 0).                   |
+| `discount_rate`                       | Annual discount rate (decimal) passed to NPV helper.              |
+| `seed`                                | Optional RNG seed to ensure reproducible sampling.                |
+| `metrics`                             | Tuple of requested metrics (`npv`, `irr`, `payback`).             |
+| `percentiles`                         | Percentile cutoffs (0–100) computed for each metric.              |
+| `compounds_per_year`                  | Compounding frequency reused by financial helpers.                |
+| `return_samples`                      | When `True`, raw metric samples are returned alongside summaries. |
+| `residual_value` / `residual_periods` | Optional residual cash flow inputs reused by NPV.                 |
+
+### 3.3 Context Metadata
+
+Optional dictionaries provide dynamic parameters when sourcing distribution
+means or other values:
+
+- `scenario_context`: scenario-specific values (e.g., salvage mean, cost
+  overrides).
+- `metadata`: shared configuration (e.g., global commodity price expectations).
+
+## 4. Distributions
+
+`DistributionSpec` defines stochastic behaviour:
+
+| Property     | Description                                                                     |
+| ------------ | ------------------------------------------------------------------------------- |
+| `type`       | `normal`, `lognormal`, `triangular`, or `discrete`.                             |
+| `parameters` | Mapping of required parameters per distribution family.                         |
+| `source`     | How base parameters are sourced: `static`, `scenario_field`, or `metadata_key`. |
+| `source_key` | Identifier used for non-static sources.                                         |
+
+### 4.1 Parameter Validation
+
+- `normal`: requires non-negative `std_dev`; defaults `mean` to baseline cash
+  flow amount when omitted.
+- `lognormal`: requires `mean` (mu in log space) and non-negative `sigma`.
+- `triangular`: requires `min`, `mode`, `max` with constraint `min <= mode <= max`.
+- `discrete`: requires paired `values`/`probabilities` sequences; probabilities
+  must be non-negative and sum to 1.0.
+
+Invalid definitions raise `DistributionConfigError` before sampling.
+
+## 5. Algorithm Overview
+
+1. Seed a NumPy `Generator` (`default_rng(seed)`) unless a generator instance is
+   supplied.
+2. For each iteration:
+   - Realise cash flows by sampling distributions or using deterministic
+     values.
+   - Compute requested metrics using shared helpers from
+     `services/financial.py`:
+     - NPV via `net_present_value` (respecting `residual_value` inputs).
+     - IRR via `internal_rate_of_return`; non-converging or invalid trajectories
+       return `NaN` and increment `failed_runs`.
+     - Payback via `payback_period`; scenarios failing to hit non-negative
+       cumulative cash flow record `NaN`.
+3. Aggregate results into per-metric arrays; calculate summary statistics:
+   mean, sample standard deviation, min/max, and configured percentiles using
+   `numpy.percentile`.
+4. Assemble `SimulationResult` containing summary descriptors and optional raw
+   samples when `return_samples` is enabled.
+
+## 6. Outputs
+
+`SimulationResult` includes:
+
+- `iterations`: total iteration count executed.
+- `summaries`: mapping of `SimulationMetric` to `MetricSummary` objects with:
+  - `mean`, `std_dev`, `minimum`, `maximum`.
+  - `percentiles`: mapping of configured percentile cutoffs to values.
+  - `sample_size`: number of successful (non-NaN) samples.
+  - `failed_runs`: count of iterations producing `NaN` for the metric.
+- `samples`: optional mapping of metric to raw `numpy.ndarray` of samples when
+  detailed analysis is required downstream.
+
+## 7. Error Handling
+
+- Invalid configuration or missing context raises `DistributionConfigError`.
+- Zero iterations or invalid percentile ranges raise `ValueError`.
+- Financial helper exceptions (`ConvergenceError`, `PaybackNotReachedError`)
+  are captured per iteration and converted to `NaN` samples to preserve
+  aggregate results while flagging failure counts.
+
+## 8. Usage Guidance
+
+- Scenario services should construct `CashFlowSpec` instances from persisted
+  financial inputs and optional uncertainty definitions stored alongside the
+  scenario.
+- Reporting routes can request raw samples when producing histogram or violin
+  plots; otherwise rely on `MetricSummary` statistics for tabular output.
+- Visualizations implementing FR-005 should leverage percentile outputs to
+  render fan charts or confidence intervals.
+- When integrating with scheduling workflows, persist the deterministic seed to
+  ensure repeated runs remain comparable.
+
+## 9. Testing
+
+`tests/test_simulation.py` covers deterministic parity with financial helpers,
+seed reproducibility, context parameter sourcing, failure accounting for metrics
+that cannot be computed, error handling for misconfigured distributions, and
+sample-return functionality. Additional regression cases should accompany new
+metrics or distribution families.
+
+## 10. References
+
+- Implementation: `calminer/services/simulation.py`
+- Financial helpers: `calminer/services/financial.py`
+- Tests: `calminer/tests/test_simulation.py`
+- Related specification: `calminer-docs/specifications/financial_metrics.md`
--- a/specifications/price_calculation.md
+++ b/specifications/price_calculation.md
@@ -1,4 +1,141 @@
-# Variables for Price Calculation
+# Product Price Calculation Specification
+
+## 1. Purpose
+
+Provide a detailed reference for calculating product sale prices across supported metals, including adjustments for ore grade, recovery rate, moisture, and impurities. This document extends the initial variable list with explicit formula definitions, validation rules, and example workflows.
+
+## 2. Scope
+
+- Applies to primary commodity outputs (e.g., copper, gold, lithium) cited in scenario models.
+- Supports integration into scenario profitability pipelines and exportable reporting.
+- Covers penalties/credits based on quality metrics (water content, impurity assays) and market adjustments.
+
+## 3. Inputs & Parameters
+
+| Symbol        | Name                    | Description                                          | Units / Domain               | Validation              |
+| ------------- | ----------------------- | ---------------------------------------------------- | ---------------------------- | ----------------------- |
+| $M$           | Metal                   | Commodity identifier (e.g., `copper`, `gold`)        | Enum `MetalType`             | Required                |
+| $Q_{ore}$     | Ore tonnage processed   | Total ore mass entering processing                   | metric tonnes                | $Q_{ore} > 0$           |
+| $G$           | Head grade              | Percentage of target metal in ore (mass fraction)    | % (0–100)                    | $0 < G \leq 100$        |
+| $R$           | Recovery rate           | Plant recovery efficiency                            | % (0–100)                    | $0 < R \leq 100$        |
+| $T$           | Treatment charge        | Base processing fee negotiated with smelter/refiner  | currency / tonne concentrate | $T \geq 0$              |
+| $S$           | Smelting charge         | Additional fee tied to concentrate handling          | currency / tonne concentrate | $S \geq 0$              |
+| $M_{moist}$   | Moisture content        | Percentage water in concentrate                      | % (0–100)                    | $0 \leq M_{moist} < 40$ |
+| $M_{imp}^{i}$ | Impurity content        | For impurity _i_ (e.g., As, Pb, Zn) measure mass ppm | ppm                          | $0 \leq M_{imp}^{i}$    |
+| $F_{moist}$   | Moisture penalty factor | Currency impact per excess moisture percentage       | currency / %                 | Optional                |
+| $F_{imp}^{i}$ | Impurity penalty factor | Currency impact per ppm over threshold               | currency / ppm               | Optional                |
+| $Adj_{prem}$  | Premiums/credits        | Adders (e.g., gold credits in copper concentrate)    | currency                     | Optional                |
+| $FX$          | FX rate                 | Convert pricing currency to scenario currency        | currency conversion rate     | $FX > 0$                |
+
+## 4. Derived Values
+
+1. **Metal content** (payable basis):
+   $$ Q*{metal} = Q*{ore} \times \frac{G}{100} \times \frac{R}{100} $$
+
+2. **Payable mass after deductions** (if contractual payable is <100%): Introduce payable percentage $K_M$ (default 100%).
+   $$ Q*{pay} = Q*{metal} \times \frac{K_M}{100} $$
+
+3. **Gross revenue** in reference currency:
+   $$ Rev*{gross}^{ref} = Q*{pay} \times P\_{ref} $$
+
+4. **Treatment and smelting charges**:
+   $$ Charges = T + S $$
+
+5. **Moisture penalty** (if moisture exceeds threshold $M_{moist}^{thr}$):
+   $$ Pen*{moist} = \max(0, M*{moist} - M*{moist}^{thr}) \times F*{moist} $$
+
+6. **Impurity penalty** (sum over impurities with thresholds $M_{imp}^{i,thr}$):
+   $$ Pen*{imp} = \sum_i \max(0, M*{imp}^{i} - M*{imp}^{i,thr}) \times F*{imp}^{i} $$
+
+7. **Net revenue before premiums**:
+   $$ Rev*{net}^{ref} = Rev*{gross}^{ref} - Charges - Pen*{moist} - Pen*{imp} $$
+
+8. **Apply premiums/credits**:
+   $$ Rev*{adj}^{ref} = Rev*{net}^{ref} + Adj\_{prem} $$
+
+9. **Convert to scenario currency**:
+   $$ Rev*{adj} = Rev*{adj}^{ref} \times FX $$
+
+## 5. Workflow Examples
+
+### 5.1 Copper Concentrate
+
+- Ore feed $Q_{ore} = 100,000$ t, head grade $G = 1.2\%$, recovery $R = 90\%$.
+- Reference price $P_{ref} = \$8,500$/t, payable percentage $K_M = 96\%$.
+- Treatment and smelting charges $T+S = \$100/t$ concentrate equivalent.
+- Moisture threshold 8%, actual 10%, penalty factor $F_{moist} = \$3,000$ per %.
+- Arsenic impurity: threshold 0 ppm (premium for As-free), actual 100 ppm, penalty factor $F_{imp}^{As} = \$2$ per ppm.
+
+Calculations:
+
+1. $Q_{metal} = 100,000 \times 0.012 \times 0.9 = 1,080$ t.
+2. $Q_{pay} = 1,080 \times 0.96 = 1,036.8$ t.
+3. $Rev_{gross}^{ref} = 1,036.8 \times 8,500 = \$8,812,800$.
+4. $Pen_{moist} = (10 - 8) \times 3,000 = \$6,000$.
+5. $Pen_{imp} = (100 - 0) \times 2 = \$200$.
+6. $Rev_{net}^{ref} = 8,812,800 - 100,000 - 6,000 - 200 = \$8,706,600$.
+7. Assume premiums $Adj_{prem} = \$50,000$, FX = 1.0 → final $Rev_{adj} = \$8,756,600$.
+
+### 5.2 Gold Doré
+
+- Ore feed 50,000 t, head grade 2.5 g/t (convert to % mass: 0.00025%), recovery 92%.
+- Reference price $P_{ref} = \$1,900$/oz, convert to per tonne of metal ($1\ \text{oz} = 0.0311035 \text{kg}$).
+- Treat $Q_{metal}$ in kilograms or troy ounces as implementation convenience.
+- No moisture/impurity penalties; add refining charge 1.5% of revenue.
+
+Implementation note: Provide conversion helpers for precious metals (grams per tonne to ounces, etc.).
+
+## 6. Validation & Error Handling
+
+- Ensure required inputs are provided; raise descriptive validation errors per scenario when data missing.
+- Bound checks for percentage inputs; clamp or reject values outside (0,100].
+- Penalty factors default to zero if not configured; log warnings if impurities exceed supported list.
+- FX rate defaults to 1.0 when scenario currency matches reference currency; enforce positive values.
+
+## 7. Data Sources & Configuration
+
+- Reference prices sourced from market data integration (e.g., LME, LBMA). Provide placeholder configuration until connectors exist.
+- Penalty thresholds and factors configurable per smelter contract; persisted in the `pricing_settings` tables (`pricing_settings`, `pricing_metal_settings`, `pricing_impurity_settings`).
+- The default configuration lives in the `pricing_settings` row with slug `default`. `services.bootstrap.bootstrap_pricing_settings` runs during FastAPI startup (and via `scripts/initial_data.py`) to ensure this row exists and mirrors the configured baseline metadata.
+- Runtime consumers obtain metadata through the FastAPI dependency `dependencies.get_pricing_metadata`, which loads the persisted defaults with impurity overrides. If the requested slug is missing, the dependency reseeds the table from `Settings.pricing_metadata()` before returning a fresh `PricingMetadata` instance.
+- Deployment environments can still influence the initial bootstrap by setting the following environment variables **before the database is seeded**. They are only read when creating or reseeding the `default` record:
+
+| Environment Variable                        | Default | Bootstrap Usage                                                                 |
+| ------------------------------------------- | ------- | ------------------------------------------------------------------------------- |
+| `CALMINER_PRICING_DEFAULT_PAYABLE_PCT`      | `100.0` | Initial payable percentage stored in the default pricing settings row.          |
+| `CALMINER_PRICING_DEFAULT_CURRENCY`         | `USD`   | Initial currency recorded in the persisted metadata (can be set to `null`).     |
+| `CALMINER_PRICING_MOISTURE_THRESHOLD_PCT`   | `8.0`   | Initial moisture threshold applied when seeding the database.                   |
+| `CALMINER_PRICING_MOISTURE_PENALTY_PER_PCT` | `0.0`   | Initial moisture penalty factor captured in the persisted configuration record. |
+
+Operators should update the database record (or project-specific overrides) after bootstrap to align with active smelter contracts. Subsequent application restarts reuse the stored values without re-reading environment variables.
+
+## 8. Output Schema
+
+Define structured result for integration:
+
+```json
+{
+  "metal": "copper",
+  "ore_tonnage": 100000,
+  "head_grade_pct": 1.2,
+  "recovery_pct": 90,
+  "payable_metal_tonnes": 1036.8,
+  "reference_price": 8500,
+  "gross_revenue": 8812800,
+  "moisture_penalty": 6000,
+  "impurity_penalty": 200,
+  "treatment_smelt_charges": 100000,
+  "premiums": 50000,
+  "net_revenue": 8756600,
+  "currency": "USD"
+}
+```
+
+## 9. Dependencies & Next Steps
+
+- Align with FR-006 (performance monitoring) to collect calculation metrics.
+- Coordinate with reporting requirements to expose inputs/outputs in exports.
+- Next implementation steps: build pricing service module, integrate with scenario evaluation, and author unit tests using example data above.# Variables for Price Calculation

 ## Variables

--- a/specifications/pricing_settings_data_model.md
+++ b/specifications/pricing_settings_data_model.md
@@ -0,0 +1,77 @@
+# Pricing Settings Data Model
+
+## Objective
+
+Persist pricing configuration values that currently live in environment variables so projects can own default payable percentages, currency, and penalty factors without redeploying the application.
+
+## Core Entity: `pricing_settings`
+
+Holds the defaults that are injected into `PricingMetadata` when evaluating scenarios.
+
+| Column                     | Type                    | Nullable | Default  | Notes                                                                      |
+| -------------------------- | ----------------------- | -------- | -------- | -------------------------------------------------------------------------- |
+| `id`                       | Integer                 | No       | Identity | Primary key                                                                |
+| `name`                     | String(128)             | No       | —        | Human readable label displayed in the UI                                   |
+| `slug`                     | String(64)              | No       | —        | Unique code used for programmatic lookup (e.g. `default`, `contract-a`)    |
+| `description`              | Text                    | Yes      | `NULL`   | Optional descriptive text                                                  |
+| `default_currency`         | String(3)               | Yes      | `NULL`   | Normalised ISO-4217 code; fallback when scenario currency is absent        |
+| `default_payable_pct`      | Numeric(5,2)            | No       | `100.00` | Default payable percentage applied when not supplied with an input         |
+| `moisture_threshold_pct`   | Numeric(5,2)            | No       | `8.00`   | Percentage moisture threshold before penalties apply                       |
+| `moisture_penalty_per_pct` | Numeric(14,4)           | No       | `0.0000` | Currency amount deducted per percentage point above the moisture threshold |
+| `metadata`                 | JSON                    | Yes      | `NULL`   | Future extension bucket (e.g. FX assumptions)                              |
+| `created_at`               | DateTime(timezone=True) | No       | `now()`  | Creation timestamp                                                         |
+| `updated_at`               | DateTime(timezone=True) | No       | `now()`  | Auto updated timestamp                                                     |
+
+## Child Entity: `pricing_metal_settings`
+
+Stores overrides that apply to specific commodities (payable percentage or alternate thresholds).
+
+| Column                     | Type                    | Nullable | Default  | Notes                                                                          |
+| -------------------------- | ----------------------- | -------- | -------- | ------------------------------------------------------------------------------ |
+| `id`                       | Integer                 | No       | Identity | Primary key                                                                    |
+| `pricing_settings_id`      | Integer (FK)            | No       | —        | References `pricing_settings.id` with cascade delete                           |
+| `metal_code`               | String(32)              | No       | —        | Normalised commodity identifier (e.g. `copper`, `gold`)                        |
+| `payable_pct`              | Numeric(5,2)            | Yes      | `NULL`   | Contractual payable percentage for this metal; overrides parent value when set |
+| `moisture_threshold_pct`   | Numeric(5,2)            | Yes      | `NULL`   | Optional metal specific moisture threshold                                     |
+| `moisture_penalty_per_pct` | Numeric(14,4)           | Yes      | `NULL`   | Optional metal specific penalty factor                                         |
+| `data`                     | JSON                    | Yes      | `NULL`   | Additional metal settings (credits, payable deductions)                        |
+| `created_at`               | DateTime(timezone=True) | No       | `now()`  | Creation timestamp                                                             |
+| `updated_at`               | DateTime(timezone=True) | No       | `now()`  | Auto updated timestamp                                                         |
+
+`metal_code` should have a unique constraint together with `pricing_settings_id` to prevent duplication.
+
+## Child Entity: `pricing_impurity_settings`
+
+Represents impurity penalty factors and thresholds that are injected into `PricingMetadata.impurity_thresholds` and `PricingMetadata.impurity_penalty_per_ppm`.
+
+| Column                | Type                    | Nullable | Default  | Notes                                                |
+| --------------------- | ----------------------- | -------- | -------- | ---------------------------------------------------- |
+| `id`                  | Integer                 | No       | Identity | Primary key                                          |
+| `pricing_settings_id` | Integer (FK)            | No       | —        | References `pricing_settings.id` with cascade delete |
+| `impurity_code`       | String(32)              | No       | —        | Identifier such as `As`, `Pb`, `Zn`                  |
+| `threshold_ppm`       | Numeric(14,4)           | No       | `0.0000` | Contractual impurity allowance                       |
+| `penalty_per_ppm`     | Numeric(14,4)           | No       | `0.0000` | Currency penalty applied per ppm above the threshold |
+| `notes`               | Text                    | Yes      | `NULL`   | Optional narrative about the contract rule           |
+| `created_at`          | DateTime(timezone=True) | No       | `now()`  | Creation timestamp                                   |
+| `updated_at`          | DateTime(timezone=True) | No       | `now()`  | Auto updated timestamp                               |
+
+Add a unique constraint on `(pricing_settings_id, impurity_code)`.
+
+## Mapping to `PricingMetadata`
+
+- `default_currency`, `default_payable_pct`, `moisture_threshold_pct`, and `moisture_penalty_per_pct` map directly to the dataclass fields.
+- `pricing_metal_settings` rows provide per-metal overrides. During load, prefer metal-specific values when present, falling back to the parent record. These values hydrate a composed `PricingMetadata` or supplementary structure passed to the evaluator.
+- `pricing_impurity_settings` rows populate `PricingMetadata.impurity_thresholds` and `PricingMetadata.impurity_penalty_per_ppm` dictionaries.
+
+## Usage Notes
+
+- Global defaults can be represented by a `pricing_settings` row referenced by new projects. Future migrations will add `projects.pricing_settings_id` to point at the desired configuration.
+- The `metadata` JSON column gives room to store additional contract attributes (e.g. minimum lot penalties, premium formulas) without immediate schema churn.
+- Numeric precision follows existing financial models (two decimal places for percentages, four for monetary/ppm penalties) to align with current tests.
+
+## Bootstrap & Runtime Loading
+
+- `services.bootstrap.bootstrap_pricing_settings` ensures a baseline record (slug `default`) exists during FastAPI startup and when running `scripts/initial_data.py`. Initial values come from `config.settings.Settings.pricing_metadata()`, allowing operators to shape the first record via environment variables.
+- When projects lack an explicit configuration, the bootstrap associates them with the default record through `UnitOfWork.set_project_pricing_settings`, guaranteeing every project has pricing metadata.
+- At request time, `dependencies.get_pricing_metadata` loads the persisted defaults using `UnitOfWork.get_pricing_metadata(include_children=True)`. If the slug is missing, it reseeds the default record from the bootstrap metadata before returning a `PricingMetadata` instance.
+- Callers therefore observe a consistent fallback chain: project-specific settings → default database record → freshly seeded defaults derived from environment values.