feat: add continuous integration and deployment documentation, including CI stages, local testing, and Kubernetes deployment guidelines

2025-11-12 12:04:25 +01:00
parent 29f16139a3
commit c9ac7195c7
3 changed files with 316 additions and 42 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -40,4 +40,42 @@ This ensures the API headers, download content, and modal routes remain function
 Once you are satisfied with your changes, submit a pull request to the main repository.
 ## Continuous Integration
 Calminer uses Gitea Actions for automated testing, linting, and deployment. The CI pipeline is defined in `.gitea/workflows/cicache.yml` and runs on pushes and pull requests to the `main` and `develop` branches.
 ### Pipeline Stages
 1. **Lint**: Checks code style with Ruff and Black.
 2. **Test**: Runs pytest with coverage enforcement (80% threshold), using a PostgreSQL service. Uploads coverage.xml and pytest-report.xml artifacts.
 3. **Build**: Builds Docker image and pushes to registry only on `main` branch pushes (not PRs) if registry secrets are configured.
 ### Workflow Behavior
 - Triggers on push/PR to `main` or `develop`.
 - Linting must pass before tests run.
 - Tests must pass before build runs.
 - Coverage below 80% fails the test stage.
 - Artifacts are available for PR inspection.
 - Docker push occurs only for main branch commits with valid registry credentials.
 ### Local Testing
 To replicate CI locally:
 ```bash
 # Install test deps
 pip install -r requirements-test.txt
 # Run linting
 ruff check .
 black --check .
 # Run tests with coverage
 pytest --cov=. --cov-report=term-missing --cov-fail-under=80
 # Build image
 docker build -t calminer .
 ```
 Thank you for your interest in contributing to Calminer!
--- a/admin/installation.md
+++ b/admin/installation.md
@@ -27,22 +27,45 @@ Before you begin, ensure that you have the following prerequisites installed on
   cd calminer
   ```
-2. **Build and Start the Docker Containers**
+2. **Environment Configuration**
   Copy the appropriate environment file for your deployment:
   ```bash
   # For development
   cp .env.development .env
   # For staging
   cp .env.staging .env
   # For production
   cp .env.production .env
   # Then edit .env with your actual database credentials
   ```
 3. **Build and Start the Docker Containers**
   Run the following command to build and start the Docker containers:
   ```bash
   # For development (includes live reload and source mounting)
   docker compose up --build
   # For staging
   docker compose -f docker-compose.yml -f docker-compose.staging.yml up --build
   # For production
   docker compose -f docker-compose.yml -f docker-compose.prod.yml up --build
   ```
   This command will build the Docker images and start the containers as defined in the `docker-compose.yml` file.
-3. **Access the Application**
+4. **Access the Application**
   Once the containers are up and running, you can access the Calminer application by navigating to `http://localhost:8003` in your web browser.
   If you are running the application on a remote server, replace `localhost` with the server's IP address or domain name.
-4. **Database Initialization**
+5. **Database Initialization**
   The application container executes `/app/scripts/docker-entrypoint.sh` before launching the API. This entrypoint runs `python -m scripts.run_migrations`, which applies all Alembic migrations and keeps the schema current on every startup. No additional action is required when using Docker Compose, but you can review the logs to confirm the migrations completed successfully.
@@ -55,7 +78,7 @@ Before you begin, ensure that you have the following prerequisites installed on
   The script is idempotent; it will only apply pending migrations.
-5. **Seed Default Accounts and Roles**
+6. **Seed Default Accounts and Roles**
   After the schema is in place, run the initial data seeding utility so the default roles and administrator account exist:
@@ -66,7 +89,7 @@ Before you begin, ensure that you have the following prerequisites installed on
   The script reads the standard database environment variables (see below) and supports the following overrides:
-   - `CALMINER_SEED_ADMIN_EMAIL` (default `admin@calminer.local`)
+   - `CALMINER_SEED_ADMIN_EMAIL` (default `admin@calminer.local` for dev, `admin@calminer.com` for prod)
   - `CALMINER_SEED_ADMIN_USERNAME` (default `admin`)
   - `CALMINER_SEED_ADMIN_PASSWORD` (default `ChangeMe123!` — change in production)
   - `CALMINER_SEED_ADMIN_ROLES` (comma list, always includes `admin`)
@@ -99,46 +122,26 @@ The `docker-compose.yml` file contains the configuration for the Calminer applic
 ### Environment Variables
-The application uses environment variables to configure various settings. You can set these variables in a `.env` file in the root directory of the project. Refer to the `docker-compose.yml` file for a list of available environment variables and their default values.
+The application uses environment variables to configure various settings. You can set these variables in a `.env` file in the root directory of the project. Refer to the provided `.env.*` files for examples and default values.
 Key variables relevant to import/export workflows:
-| Variable                      | Default   | Description                                                                     |
+| Variable                      | Development | Staging | Production | Description                                                                     |
-| ----------------------------- | --------- | ------------------------------------------------------------------------------- |
+| ----------------------------- | ----------- | ------- | ---------- | ------------------------------------------------------------------------------- |
-| `CALMINER_EXPORT_MAX_ROWS`    | _(unset)_ | Optional safety guard to limit the number of rows exported in a single request. |
+| `CALMINER_EXPORT_MAX_ROWS`    | 1000        | 50000   | 100000     | Optional safety guard to limit the number of rows exported in a single request. |
-| `CALMINER_EXPORT_METADATA`    | `true`    | Controls whether metadata sheets are generated by default during Excel exports. |
+| `CALMINER_EXPORT_METADATA`    | `true`      | `true`  | `true`     | Controls whether metadata sheets are generated by default during Excel exports. |
-| `CALMINER_IMPORT_STAGING_TTL` | `300`     | Controls how long staged import tokens remain valid before expiration.          |
+| `CALMINER_IMPORT_STAGING_TTL` | `300`       | `600`   | `3600`     | Controls how long staged import tokens remain valid before expiration.          |
-| `CALMINER_IMPORT_MAX_ROWS`    | _(unset)_ | Optional guard to prevent excessively large import files.                       |
+| `CALMINER_IMPORT_MAX_ROWS`    | `10000`     | `50000` | `100000`   | Optional guard to prevent excessively large import files.                       |
-### Running Export Workflows Locally
+### Docker Environment Parity
-1. Activate your virtual environment and ensure dependencies are installed:
+The Docker Compose configurations ensure environment parity across development, staging, and production:
-   ```bash
+- **Development**: Uses `docker-compose.override.yml` with live code reloading, debug logging, and relaxed resource limits.
-   pip install -r requirements.txt
+- **Staging**: Uses `docker-compose.staging.yml` with health checks, moderate resource limits, and staging-specific configurations.
-   ```
+- **Production**: Uses `docker-compose.prod.yml` with strict resource limits, production logging, and required external database configuration.
-2. Start the FastAPI application (or use `docker compose up`).
+All environments use the same base `docker-compose.yml` and share common environment variables for consistency.
 3. Use the `/exports/projects` or `/exports/scenarios` endpoints to request CSV/XLSX downloads:
   ```bash
   curl -X POST http://localhost:8000/exports/projects \
     -H "Content-Type: application/json" \
     -d '{"format": "csv"}' --output projects.csv
   curl -X POST http://localhost:8000/exports/projects \
     -H "Content-Type: application/json" \
     -d '{"format": "xlsx"}' --output projects.xlsx
   ```
 4. The Prometheus metrics endpoint is available at `/metrics` once the app is running. Ensure your monitoring stack scrapes it (e.g., Prometheus target `localhost:8000`).
 5. For automated verification in CI pipelines, invoke the dedicated pytest module:
   ```bash
   pytest tests/test_export_routes.py
   ```
 ### Volumes
@@ -149,15 +152,105 @@ The application uses Docker volumes to persist data. The following volumes are d
 Ensure that these volumes are properly configured to avoid data loss during container restarts or removals.
-## Stopping the Application
+## Kubernetes Deployment
-To stop the application, run the following command in the terminal:
+For production deployments, Calminer can be deployed on a Kubernetes cluster using the provided manifests in the `k8s/` directory.
 ### K8s Prerequisites
 - Kubernetes cluster (e.g., minikube for local testing, or cloud provider like GKE, EKS)
 - kubectl configured to access the cluster
 - Helm (optional, for advanced deployments)
 ### K8s Deployment Steps
 1. **Clone the Repository and Build Image**
   ```bash
   git clone https://git.allucanget.biz/allucanget/calminer.git
   cd calminer
   docker build -t registry.example.com/calminer:latest .
   docker push registry.example.com/calminer:latest
   ```
 2. **Update Manifests**
   Edit the manifests in `k8s/` to match your environment:
   - Update image registry in `deployment.yaml`
   - Update host in `ingress.yaml`
   - Update secrets in `secret.yaml` with base64 encoded values
 3. **Deploy to Kubernetes**
   ```bash
   kubectl apply -f k8s/
   ```
 4. **Verify Deployment**
   ```bash
   kubectl get pods
   kubectl get services
   kubectl get ingress
   ```
 5. **Access the Application**
   The application will be available at the ingress host (e.g., `https://calminer.example.com`).
 ### Environment Parity
 The Kubernetes deployment uses the same environment variables as Docker Compose, ensuring consistency across environments. Secrets are managed via Kubernetes Secrets, and configurations via ConfigMaps.
 ### Scaling
 The deployment is configured with 3 replicas for high availability. You can scale as needed:
 ```bash
-docker compose down
+kubectl scale deployment calminer-app --replicas=5
 ```
-This command will stop and remove the containers, networks, and volumes created by Docker Compose.
+### Monitoring
 Ensure your monitoring stack (e.g., Prometheus) scrapes the `/metrics` endpoint from the service.
 ## CI/CD Pipeline
 Calminer uses Gitea Actions for continuous integration and deployment. The CI/CD pipeline is defined in `.gitea/workflows/cicache.yml` and includes the following stages:
 ### CI Stages
 1. **Lint**: Runs Ruff for Python linting, Black for code formatting, and Bandit for security scanning.
 2. **Test**: Executes the full pytest suite with coverage reporting (80% minimum), using a PostgreSQL service container.
 3. **Build**: Builds Docker images using Buildx and pushes to the Gitea registry on the main branch.
 ### CD Stages
 1. **Deploy**: Deploys to staging or production Kubernetes clusters based on commit messages containing `[deploy staging]` or `[deploy production]`.
 ### Required Secrets
 The following secrets must be configured in your Gitea repository:
 - `REGISTRY_URL`: Gitea registry URL
 - `REGISTRY_USERNAME`: Registry username
 - `REGISTRY_PASSWORD`: Registry password
 - `STAGING_KUBE_CONFIG`: Base64-encoded kubeconfig for staging cluster
 - `PROD_KUBE_CONFIG`: Base64-encoded kubeconfig for production cluster
 ### Deployment Triggers
 - **Automatic**: Images are built and pushed on every push to `main` or `develop` branches.
 - **Staging Deployment**: Include `[deploy staging]` in your commit message to trigger staging deployment.
 - **Production Deployment**: Include `[deploy production]` in your commit message to trigger production deployment.
 ### Monitoring CI/CD
 - View pipeline status in the Gitea Actions tab.
 - Test artifacts (coverage, pytest reports) are uploaded for each run.
 - Docker build logs are available for troubleshooting build failures.
 - Deployment runs publish Kubernetes rollout diagnostics under the `deployment-logs` artifact (`/logs/deployment/`), which includes pod listings, deployment manifests, and recent container logs.
 ## Troubleshooting
--- a/specifications/kpis.md
+++ b/specifications/kpis.md
@@ -0,0 +1,143 @@
 # Key Performance Indicators (KPIs) for Calminer
 ## Overview
 This document defines the key performance indicators (KPIs) that Calminer tracks to ensure system scalability, reliability, and optimal user experience as specified in FR-006.
 ## KPI Categories
 ### Application Performance Metrics
 #### Response Time
 - **Metric**: HTTP request duration (95th percentile)
 - **Target**: < 500ms for API endpoints, < 2s for UI pages
 - **Collection**: Automatic via MetricsMiddleware
 - **Alert Threshold**: > 1s (API), > 5s (UI)
 #### Error Rate
 - **Metric**: HTTP error responses (4xx/5xx) as percentage of total requests
 - **Target**: < 1% overall, < 0.1% for 5xx errors
 - **Collection**: Automatic via MetricsMiddleware
 - **Alert Threshold**: > 5% (4xx), > 0.5% (5xx)
 #### Throughput
 - **Metric**: Requests per second (RPS)
 - **Target**: > 100 RPS sustained
 - **Collection**: Automatic via MetricsMiddleware
 - **Alert Threshold**: < 10 RPS sustained
 ### Data Processing Metrics
 #### Import/Export Duration
 - **Metric**: Time to complete import/export operations
 - **Target**: < 30s for small datasets (< 10k rows), < 5min for large datasets
 - **Collection**: Via monitoring.metrics.observe_import/observe_export
 - **Alert Threshold**: > 10min for any operation
 #### Data Volume
 - **Metric**: Rows processed per operation
 - **Target**: Support up to 100k rows per import/export
 - **Collection**: Via import/export service instrumentation
 - **Alert Threshold**: Operations failing on > 10k rows
 ### System Resource Metrics
 #### Database Connections
 - **Metric**: Active database connections
 - **Target**: < 80% of max connections
 - **Collection**: Prometheus gauge (DB_CONNECTIONS)
 - **Alert Threshold**: > 90% of max connections
 #### Memory Usage
 - **Metric**: Application memory consumption
 - **Target**: < 512MB per worker
 - **Collection**: Container metrics (Kubernetes/Docker)
 - **Alert Threshold**: > 1GB per worker
 #### CPU Usage
 - **Metric**: Application CPU utilization
 - **Target**: < 70% sustained
 - **Collection**: Container metrics (Kubernetes/Docker)
 - **Alert Threshold**: > 85% sustained
 ### User Experience Metrics
 #### Concurrent Users
 - **Metric**: Active user sessions
 - **Target**: Support 100+ concurrent users
 - **Collection**: Session tracking via AuthSessionMiddleware
 - **Alert Threshold**: > 200 concurrent users (capacity planning)
 #### Session Duration
 - **Metric**: Average user session length
 - **Target**: 10-30 minutes typical
 - **Collection**: Session tracking
 - **Alert Threshold**: < 1 minute average (usability issue)
 ### Business Metrics
 #### Project/Scenario Operations
 - **Metric**: Projects/scenarios created per hour
 - **Target**: 50+ operations per hour
 - **Collection**: Repository operation logging
 - **Alert Threshold**: < 5 operations per hour (adoption issue)
 #### Simulation Performance
 - **Metric**: Monte Carlo simulation completion time
 - **Target**: < 10s for typical scenarios
 - **Collection**: Simulation service instrumentation
 - **Alert Threshold**: > 60s for any simulation
 ## Monitoring Implementation
 ### Data Collection
 - **HTTP Metrics**: Automatic collection via MetricsMiddleware
 - **Business Metrics**: Service-level instrumentation
 - **System Metrics**: Container orchestration (Kubernetes)
 - **Storage**: performance_metrics table + Prometheus
 ### Alerting
 - **Response Time**: P95 > 1s for 5 minutes
 - **Error Rate**: > 5% for 10 minutes
 - **Resource Usage**: > 90% for 15 minutes
 - **Data Processing**: Failures > 3 in 1 hour
 ### Dashboards
 - **Real-time**: Current performance via /metrics endpoint
 - **Historical**: Aggregated metrics via /performance endpoint
 - **Health**: Detailed health checks via /health endpoint
 ## Scaling Guidelines
 ### Horizontal Scaling Triggers
 - CPU > 70% sustained
 - Memory > 80% sustained
 - RPS > 80% of target
 ### Vertical Scaling Triggers
 - Memory > 90% sustained
 - Database connections > 80%
 ### Auto-scaling Configuration
 - Min replicas: 2
 - Max replicas: 10
 - Scale up: CPU > 70% for 5 minutes
 - Scale down: CPU < 30% for 10 minutes