calminer/docs/architecture.md

# Architecture Documentation

## Overview

CalMiner is a FastAPI application that collects mining project inputs, persists scenario-specific records, and surfaces aggregated insights. The platform targets Monte Carlo driven planning, with deterministic CRUD features in place and simulation logic staged for future work.

Frontend components are server-rendered Jinja2 templates, with Chart.js powering the dashboard visualization.

The backend leverages SQLAlchemy for ORM mapping to a PostgreSQL database.

## System Components

- **FastAPI backend** (`main.py`, `routes/`): hosts REST endpoints for scenarios, parameters, costs, consumption, production, equipment, maintenance, simulations, and reporting. Each router encapsulates request/response schemas and DB access patterns.
- **Service layer** (`services/`): houses business logic. `services/reporting.py` produces statistical summaries, while `services/simulation.py` provides the Monte Carlo integration point.
- **Persistence** (`models/`, `config/database.py`): SQLAlchemy models map to PostgreSQL tables in schema `bricsium_platform`. Relationships connect scenarios to derived domain entities.
- **Presentation** (`templates/`, `components/`): server-rendered views support data entry (scenario and parameter forms) and the dashboard visualization powered by Chart.js.
- **Middleware** (`middleware/validation.py`): applies JSON validation before requests reach routers.
- **Testing** (`tests/unit/`): pytest suite covering route and service behavior.

## Runtime Flow

1. Users navigate to form templates or API clients to manage scenarios, parameters, and operational data.
2. FastAPI routers validate payloads with Pydantic models, then delegate to SQLAlchemy sessions for persistence.
3. Simulation runs (placeholder `services/simulation.py`) will consume stored parameters to emit iteration results via `/api/simulations/run`.
4. Reporting requests POST simulation outputs to `/api/reporting/summary`; the reporting service calculates aggregates (count, min/max, mean, median, percentiles, standard deviation, variance, and tail-risk metrics at the 95% confidence level).
5. `templates/Dashboard.html` fetches summaries, renders metric cards, and plots distribution charts with Chart.js for stakeholder review.

### Dashboard Flow Review — 2025-10-20

- The dashboard template depends on a future-facing HTML endpoint (e.g., `/dashboard`) that the current `routes/ui.py` router does not expose; wiring an explicit route is required before the page is reachable from the FastAPI app.
- Client-side logic calls `/api/reporting/summary` with raw simulation outputs and expects `result` fields, so any upstream changes to the reporting contract must maintain this schema.
- Initialization always loads the bundled sample data first, which is useful for demos but masks server errors—consider adding a failure banner when `/api/reporting/summary` is unavailable.
- No persistent storage backs the dashboard yet; users must paste or load JSON manually, aligning with the current MVP scope but highlighting an integration gap with the simulation results table.

### Reporting Pipeline and UI Integration

1. **Data Sources**

   - Scenario-linked calculations (costs, consumption, production) produce raw figures stored in dedicated tables (`capex`, `opex`, `consumption`, `production_output`).
   - Monte Carlo simulations (currently transient) generate arrays of `{ "result": float }` tuples that the dashboard or downstream tooling passes directly to reporting endpoints.

2. **API Contract**

   - `POST /api/reporting/summary` accepts a JSON array of result objects and validates shape through `_validate_payload` in `routes/reporting.py`.
   - On success it returns a structured payload (`ReportSummary`) containing count, mean, median, min/max, standard deviation, and percentile values, all as floats.

3. **Service Layer**

   - `services/reporting.generate_report` converts the sanitized payload into descriptive statistics using Python’s standard library (`statistics` module) to avoid external dependencies.
   - The service remains stateless; no database read/write occurs, which keeps summary calculations deterministic and idempotent.
   - Extended KPIs (surfaced in the API and dashboard):
     - `variance`: population variance computed as the square of the population standard deviation.
     - `percentile_5` and `percentile_95`: lower and upper tail interpolated percentiles for sensitivity bounds.
     - `value_at_risk_95`: 5th percentile threshold representing the minimum outcome within a 95% confidence band.
     - `expected_shortfall_95`: mean of all outcomes at or below the `value_at_risk_95`, highlighting tail exposure.

4. **UI Consumption**

   - `templates/Dashboard.html` posts the user-provided dataset to the summary endpoint, renders metric cards for each field, and charts the distribution using Chart.js.
   - `SUMMARY_FIELDS` now includes variance, 5th/10th/90th/95th percentiles, and tail-risk metrics (VaR/Expected Shortfall at 95%); tooltip annotations surface the tail metrics alongside the percentile line chart.
   - Error handling surfaces HTTP failures inline so users can address malformed JSON or backend availability issues without leaving the page.

5. **Future Integration Points**
   - Once `/api/simulations/run` persists to `simulation_result`, the dashboard can fetch precalculated runs per scenario, removing the manual JSON step.
   - Additional reporting endpoints (e.g., scenario comparisons) can reuse the same service layer, ensuring consistency across UI and API consumers.

## Data Model Highlights

- `scenario`: central entity describing a mining scenario; owns relationships to cost, consumption, production, equipment, and maintenance tables.
- `capex`, `opex`: monetary tracking linked to scenarios.
- `consumption`: resource usage entries parameterized by scenario and description.
- `parameter`: scenario inputs with base `value` and optional distribution linkage via `distribution_id`, `distribution_type`, and JSON `distribution_parameters` to support simulation sampling.
- `production_output`: production metrics per scenario.
- `equipment` and `maintenance`: equipment inventory and maintenance events with dates/costs.
- `simulation_result`: staging table for future Monte Carlo outputs (not yet populated by `run_simulation`).

Foreign keys secure referential integrity between domain tables and their scenarios, enabling per-scenario analytics.

## Integrations and Future Work

- **Monte Carlo engine**: `services/simulation.py` will incorporate stochastic sampling (e.g., NumPy, SciPy) to populate `simulation_result` and feed reporting.
- **Persistence of results**: `/api/simulations/run` currently returns in-memory results; next iteration should persist to `simulation_result` and reference scenarios.
- **Authentication**: not yet implemented; all endpoints are open.
- **Deployment**: documentation focuses on local development; containerization and CI/CD pipelines remain to be defined.

For extended diagrams and setup instructions reference:

- [docs/development_setup.md](development_setup.md) — environment provisioning and tooling.
- [docs/testing.md](testing.md) — pytest workflow and coverage expectations.
- [docs/mvp.md](mvp.md) — roadmap and milestone scope.
- [docs/implementation_plan.md](implementation_plan.md) — feature breakdown aligned with the TODO tracker.
- [docs/architecture_overview.md](architecture_overview.md) — supplementary module map and request flow diagram.