Files
calminer/docs/architecture/07_deployment/07_01_testing_ci.md

12 KiB
Raw Blame History

Testing, CI and Quality Assurance

This chapter centralizes the project's testing strategy, CI configuration, and quality targets.

Overview

CalMiner uses a combination of unit, integration, and end-to-end tests to ensure quality.

Frameworks

  • Backend: pytest for unit and integration tests.
  • Frontend: pytest with Playwright for E2E tests.
  • Database: pytest fixtures with psycopg2 for DB tests.

Test Types

  • Unit Tests: Test individual functions/modules.
  • Integration Tests: Test API endpoints and DB interactions.
  • E2E Tests: Playwright for full user flows.

CI/CD

  • Use Gitea Actions for CI/CD; workflows live under .gitea/workflows/.
  • test.yml runs on every push, provisions a temporary Postgres 16 service, waits for readiness, executes the setup script in dry-run and live modes, then fans out into parallel matrix jobs for unit (pytest tests/unit) and end-to-end (pytest tests/e2e) suites. Playwright browsers install only for the E2E job.
  • build-and-push.yml runs only after the Run Tests workflow finishes successfully (triggered via workflow_run on main). Once tests pass, it builds the Docker image with docker/build-push-action@v2, reuses cache-backed layers, and pushes to the Gitea registry.
  • deploy.yml runs only after the build workflow reports success on main. It connects to the target host (via appleboy/ssh-action), pulls the Docker image tagged with the build commit SHA, and restarts the container with that exact image reference.
  • Mandatory secrets: REGISTRY_USERNAME, REGISTRY_PASSWORD, REGISTRY_URL, SSH_HOST, SSH_USERNAME, SSH_PRIVATE_KEY.
  • Run tests on pull requests to shared branches; enforce coverage target ≥80% (pytest-cov).

Running Tests

  • Unit: pytest tests/unit/
  • E2E: pytest tests/e2e/
  • All: pytest

Test Directory Structure

Organize tests under the tests/ directory mirroring the application structure:

tests/
  unit/
    test_<module>.py
  e2e/
    test_<flow>.py
  fixtures/
    conftest.py

Fixtures and Test Data

  • Define reusable fixtures in tests/fixtures/conftest.py.
  • Use temporary in-memory databases or isolated schemas for DB tests.
  • Load sample data via fixtures for consistent test environments.
  • Leverage the seeded_ui_data fixture in tests/unit/conftest.py to populate scenarios with related cost, maintenance, and simulation records for deterministic UI route checks.

E2E (Playwright) Tests

The E2E test suite, located in tests/e2e/, uses Playwright to simulate user interactions in a live browser environment. These tests are designed to catch issues in the UI, frontend-backend integration, and overall application flow.

Fixtures

  • live_server: A session-scoped fixture that launches the FastAPI application in a separate process, making it accessible to the browser.
  • playwright_instance, browser, page: Standard pytest-playwright fixtures for managing the Playwright instance, browser, and individual pages.

Smoke Tests

  • UI Page Loading: test_smoke.py contains a parameterized test that systematically navigates to all UI routes to ensure they load without errors, have the correct title, and display a primary heading.
  • Form Submissions: Each major form in the application has a corresponding test file (e.g., test_scenarios.py, test_costs.py) that verifies: page loads, create item by filling the form, success message, and UI updates.

Running E2E Tests

To run the Playwright tests:

pytest tests/e2e/

To run headed mode:

pytest tests/e2e/ --headed

Mocking and Dependency Injection

  • Use unittest.mock to mock external dependencies.
  • Inject dependencies via function parameters or FastAPI's dependency overrides in tests.

Code Coverage

  • Install pytest-cov to generate coverage reports.
  • Run with coverage: pytest --cov --cov-report=term (use --cov-report=html when visualizing hotspots).
  • Target 95%+ overall coverage. Focus on historically low modules: services/simulation.py, services/reporting.py, middleware/validation.py, and routes/ui.py.
  • Latest snapshot (2025-10-21): pytest --cov=. --cov-report=term-missing returns 91% overall coverage.

CI Integration

test.yml encapsulates the steps below:

  • Check out the repository and set up Python 3.10.
  • Configure the runner's apt proxy (if available), install project dependencies (requirements + test extras), and download Playwright browsers.
  • Run pytest (extend with --cov flags when enforcing coverage).

The pip cache step is temporarily disabled in test.yml until the self-hosted cache service is exposed (see docs/ci-cache-troubleshooting.md).

build-and-push.yml adds:

  • Registry login using repository secrets.
  • Docker image build/push with GHA cache storage (cache-from/cache-to set to type=gha).

deploy.yml handles:

  • SSH into the deployment host.
  • Pull the tagged image from the registry.
  • Stop, remove, and relaunch the calminer container exposing port 8000.

When adding new workflows, mirror this structure to ensure secrets, caching, and deployment steps remain aligned with the production environment.

Workflow Optimization Opportunities

test.yml

  • Run the apt-proxy setup once via a composite action or preconfigured runner image if additional matrix jobs are added.
  • Collapse dependency installation into a single pip install -r requirements-test.txt call (includes base requirements) once caching is restored.
  • Investigate caching or pre-baking Playwright browser binaries to eliminate >650 MB cold downloads per run.

build-and-push.yml

  • Skip QEMU setup or explicitly constrain Buildx to linux/amd64 to reduce startup time.
  • Enable cache-from / cache-to settings (registry or type=gha) to reuse Docker build layers between runs.

deploy.yml

  • Extract deployment script into a reusable shell script or compose file to minimize inline secrets and ease multi-environment scaling.
  • Add a post-deploy health check (e.g., curl readiness probe) before declaring success.

Priority Overview

  1. Restore shared caching for Python wheels and Playwright browsers once infrastructure exposes the cache service (highest impact on runtime and bandwidth; requires coordination with CI owners).
  2. Enable Docker layer caching in build-and-push.yml to shorten build cycles (medium effort, immediate benefit to release workflows).
  3. Add post-deploy health verification to deploy.yml (low effort, improves confidence in automation).
  4. Streamline redundant setup steps in test.yml (medium effort once cache strategy is in place; consider composite actions or base image updates).

Setup Consolidation Opportunities

  • Run Tests matrix jobs each execute the apt proxy configuration, pip installs, database wait, and setup scripts. A composite action or shell script wrapper could centralize these routines and parameterize target-specific behavior (unit vs e2e) to avoid copy/paste maintenance as additional jobs (lint, type check) are introduced.
  • Both the test and build workflows perform a checkout step; while unavoidable per workflow, shared git submodules or sparse checkout rules could be encapsulated in a composite action to keep options consistent.
  • The database setup script currently runs twice (dry-run and live) for every matrix leg. Evaluate whether the dry-run remains necessary once migrations stabilize; if retained, consider adding an environment variable toggle to skip redundant seed operations for read-only suites (e.g., lint).

Proposed Shared Setup Action

  • Location: .gitea/actions/setup-python-env/action.yml (composite action).
  • Inputs:
    • python-version (default 3.10): forwarded to actions/setup-python.
    • install-playwright (default false): when true, run python -m playwright install --with-deps.
    • install-requirements (default requirements.txt requirements-test.txt): space-delimited list pip installs iterate over.
    • run-db-setup (default true): toggles database wait + setup scripts.
    • db-dry-run (default true): controls whether the dry-run invocation executes.
  • Steps encapsulated:
    1. Set up Python via actions/setup-python@v5 using provided version.
    2. Configure apt proxy via shared shell snippet (with graceful fallback when proxy offline).
    3. Iterate over requirement files and execute pip install -r <file>.
    4. If install-playwright == true, install browsers.
    5. If run-db-setup == true, run the wait-for-Postgres python snippet and call scripts/setup_database.py, honoring db-dry-run toggle.
  • Usage sketch (in test.yml):
      - name: Prepare Python environment
        uses: ./.gitea/actions/setup-python-env
        with:
          install-playwright: ${{ matrix.target == 'e2e' }}
          db-dry-run: true
  • Benefits: centralizes proxy logic and dependency installs, reduces duplication across matrix jobs, and keeps future lint/type-check jobs lightweight by disabling database setup.
  • Implementation status: action available at .gitea/actions/setup-python-env and consumed by test.yml; extend to additional workflows as they adopt the shared routine.
  • Obsolete steps removed: individual apt proxy, dependency install, Playwright, and database setup commands pruned from test.yml once the composite action was integrated.

CI Owner Coordination Notes

Key Findings

  • Self-hosted runner: ASUS System Product Name chassis with AMD Ryzen 7 7700X (8 physical cores / 16 threads) and 63.2 GB usable RAM; act_runner configuration not overridden, so only one workflow job runs concurrently today.
  • Unit test matrix job: completes 117 pytest cases in roughly 4.1 seconds after Postgres spins up; Docker services consume ~150 MB for postgres:16-alpine, with minimal sustained CPU load once tests begin.
  • End-to-end matrix job: pytest tests/e2e averages 2122 seconds of execution, but a cold run downloads ~179 MB of apt packages plus ~470 MB of Playwright browser bundles (Chromium, Firefox, WebKit, FFmpeg), exceeding 650 MB network transfer and adding several gigabytes of disk writes if caches are absent.
  • Both jobs reuse existing Python package caches when available; absent a shared cache service, repeated Playwright installs remain the dominant cost driver for cold executions.

Open Questions

  • Can we raise the runner concurrency above the default single job, or provision an additional runner, so the test matrix can execute without serializing queued workflows?
  • Is there a central cache or artifact service available for Python wheels and Playwright browser bundles to avoid ~650 MB downloads on cold starts?
  • Are we permitted to bake Playwright browsers into the base runner image, or should we pursue a shared cache/proxy solution instead?

Outreach Draft

Subject: CalMiner CI parallelization support

Hi <CI Owner>,

We recently updated the CalMiner test workflow to fan out unit and Playwright E2E suites in parallel. While validating the change, we gathered the following:

- Runner host: ASUS System Product Name with AMD Ryzen 7 7700X (8 cores / 16 threads), ~63 GB RAM, default `act_runner` concurrency (1 job at a time).
- Unit job finishes in ~4.1 s once Postgres is ready; light CPU and network usage.
- E2E job finishes in ~22 s, but a cold run pulls ~179 MB of apt packages plus ~470 MB of Playwright browser payloads (>650 MB download, several GB disk writes) because we do not have a shared cache yet.

To move forward, could you help with the following?

1. Confirm whether we can raise the runner concurrency limit or provision an additional runner so parallel jobs do not queue behind one another.
2. Let us know if a central cache (Artifactory, Nexus, etc.) is available for Python wheels and Playwright browser bundles, or if we should consider baking the browsers into the runner image instead.
3. Share any guidance on preferred caching or proxy solutions for large binary installs on self-hosted runners.

Once we have clarity, we can finalize the parallel rollout and update the documentation accordingly.

Thanks,
<Your Name>