Enhance documentation for AI generation: update image and video generation scenarios, integrate OpenRouter API details, and clarify submit-and-poll patterns for video generation.

Co-authored-by: Copilot <copilot@github.com>
2026-04-27 19:33:43 +02:00
parent 58c2cb4490
commit 1c96ae17fc
4 changed files with 119 additions and 24 deletions
@@ -75,14 +75,19 @@ Operational endpoints for application management.
 Model listing and multi-modal generation via openrouter.ai.
-| Method | Path                         | Auth required | Description                                            |
+| Method | Path                         | Auth required | Description                                                                                                         |
-| ------ | ---------------------------- | ------------- | ------------------------------------------------------ |
+| ------ | ---------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------- |
-| GET    | `/ai/models`                 | ✓             | List available OpenRouter models                       |
+| GET    | `/ai/models`                 | ✓             | List available OpenRouter models                                                                                    |
-| POST   | `/ai/chat`                   | ✓             | Multi-turn chat completion                             |
+| POST   | `/ai/chat`                   | ✓             | Multi-turn chat completion                                                                                          |
-| POST   | `/generate/text`             | ✓             | Single-prompt text generation (optional system prompt) |
+| POST   | `/generate/text`             | ✓             | Single-prompt text generation (optional system prompt)                                                              |
-| POST   | `/generate/image`            | ✓             | Text-to-image generation                               |
+| POST   | `/generate/image`            | ✓             | Text-to-image (DALL-E via `/images/generations` or FLUX/GPT-5 Image Mini via `/chat/completions` with `modalities`) |
-| POST   | `/generate/video`            | ✓             | Text-to-video generation                               |
+| POST   | `/generate/video`            | ✓             | Text-to-video (Sora 2 Pro, Veo 3.1 Fast) — returns `polling_url`                                                    |
-| POST   | `/generate/video/from-image` | ✓             | Image-to-video generation                              |
+| POST   | `/generate/video/from-image` | ✓             | Image-to-video — returns `polling_url`                                                                              |
 | GET    | `/generate/video/status`     | ✓             | Poll video generation status via `polling_url`                                                                      |
 **Video generation flow:** The `/generate/video` and `/generate/video/from-image` endpoints submit a job to OpenRouter's `/api/v1/videos` endpoint and return immediately with `status: "queued"` and a `polling_url`. Clients poll `/generate/video/status?polling_url=...` every 5 seconds until `status` is `"completed"` (returns `unsigned_urls`) or `"failed"`.
 **Image generation routing:** The router auto-detects the model type — models containing `"flux"` or `"gpt-5-image-mini"` are routed to `/chat/completions` with `modalities: ["image"]`, while others (e.g. DALL-E 3) use the legacy `/images/generations` endpoint.
 ### White Box DB Service (`db.py`)
@@ -28,30 +28,42 @@ Describes concrete behavior and interactions of the system's building blocks in
 ## Scenario 3: Image Generation
-1. User submits image generation form
+1. User submits image generation form with prompt, model, size, aspect ratio, and resolution
-2. Flask POSTs to `POST /generate/image`
+2. Flask POSTs to `POST /generate/image` with JWT header
-3. AI Service calls openrouter.ai image model
+3. Router auto-detects model type:
-4. Image URL returned to Flask
+   - **FLUX / GPT-5 Image Mini**: calls `/chat/completions` with `modalities: ["image"]` and `image_config`
-5. Flask renders page with generated image
+   - **DALL-E 3**: calls `/images/generations` with `size` and `n`
 4. Image URL (base64 data URL or hosted URL) returned to Flask
 5. Flask renders page with generated image(s)
 ## Scenario 3a: Image Generation with Aspect Ratio & Resolution
 1. User selects aspect ratio (e.g. `16:9`) and resolution (`2K`) on the image generation form
 2. Flask POSTs `aspect_ratio` and `image_size` to `POST /generate/image`
 3. Backend passes these as `image_config` to the chat completions endpoint (for FLUX/GPT-5 Image Mini)
 4. Generated image respects the requested aspect ratio and resolution
 ## Scenario 4: Video Generation (Text-to-Video)
-1. User submits video generation form with prompt and model selection
+1. User submits video generation form with prompt, model, aspect ratio, resolution, and duration
 2. Flask POSTs to `POST /generate/video` with JWT header
 3. Auth Service validates JWT
-4. AI Service calls OpenRouter `/video/generations`
+4. Backend calls OpenRouter `POST /api/v1/videos` with model, prompt, aspect_ratio, resolution, duration_seconds
-5. OpenRouter returns a job response (`status: "queued"` or `"completed"`)
+5. OpenRouter returns `{"id": "...", "polling_url": "..."}` with `status: "queued"`
-6. FastAPI returns `VideoResponse` to Flask
+6. FastAPI returns `VideoResponse` with `polling_url` to Flask
-7. Flask renders result page; if status is `queued`, the UI may poll or notify asynchronously
+7. Flask renders result page with polling UI
 8. Frontend JavaScript polls `GET /generate/video/status?polling_url=...` every 5 seconds
 9. When `status` becomes `"completed"`, the response includes `unsigned_urls` — the video is displayed in a `<video>` element
 10. If `status` becomes `"failed"`, an error message is shown
-## Scenario 5: Image-to-Video Generation
+## Scenario 4a: Video Generation (Image-to-Video)
-1. User uploads or provides an image URL and a text prompt
+1. User provides an image URL, motion prompt, model, aspect ratio, resolution, and duration
 2. Flask POSTs to `POST /generate/video/from-image` with JWT header
-3. AI Service calls OpenRouter `/video/generations/from-image`
+3. Backend calls OpenRouter `POST /api/v1/videos` with `image_url`, prompt, and parameters
-4. Returns `VideoResponse` with `video_url` when completed
+4. Same polling flow as Scenario 4
-## Scenario 6: Token Refresh
+## Scenario 5: Token Refresh
 1. Access token expires (TTL 15 min)
 2. Client POSTs current refresh token to `POST /auth/refresh`
@@ -59,9 +71,17 @@ Describes concrete behavior and interactions of the system's building blocks in
 4. Old JTI is revoked; new JTI inserted into `refresh_tokens`
 5. New access token + new refresh token returned to client
-## Scenario 7: Admin User Management
+## Scenario 6: Admin User Management
 1. Admin logs in and receives access token with `role: admin`
 2. Admin GETs `/admin/stats` to view user and token counts
 3. Admin DELETEs `/users/{id}` to remove a user — refresh tokens for that user are cascade-deleted
 4. Admin PUTs `/users/{id}/role` to promote a user to admin or demote to user
 ## Scenario 7: User Profile Update
 1. Authenticated user navigates to `/users/profile`
 2. User submits updated email and/or new password
 3. Flask POSTs to `PUT /users/me` with JWT header
 4. Auth Service validates credentials and updates user record in DuckDB
 5. Session `user_email` is updated; user sees success message
@@ -72,3 +72,25 @@ Refresh tokens store a JTI (JWT ID) UUID in the `refresh_tokens` table. On each
 ### Future: AI Generation History
 AI generation metadata (model, prompt, cost, result URLs) can be stored as JSON columns in a future `generation_history` table in DuckDB, enabling per-user analytics and usage dashboards at zero extra infrastructure cost.
 ## OpenRouter API Integration
 ### Image Generation
 Image generation uses two different OpenRouter endpoints depending on the model:
 - **Legacy endpoint** (`/images/generations`): Used by DALL-E 3 and similar models. Returns `data[].url` and `data[].b64_json`.
 - **Chat completions** (`/chat/completions` with `modalities: ["image"]`): Used by FLUX.2 Klein 4B and GPT-5 Image Mini. Returns `choices[0].message.images[].image_url.url` as base64 data URLs.
 The router auto-detects the model type and routes accordingly. Image configuration (`aspect_ratio`, `image_size`) is passed via `image_config` for chat-based models.
 ### Video Generation
 Video generation uses OpenRouter's `/api/v1/videos` endpoint with a **submit-and-poll** pattern:
 1. `POST /api/v1/videos` with `model`, `prompt`, `aspect_ratio`, `resolution`, `duration_seconds`
 2. Response: `{"id": "job_id", "polling_url": "https://..."}` with `status: "queued"`
 3. Poll `GET polling_url` every 5 seconds until `status` is `"completed"` or `"failed"`
 4. Completed response includes `unsigned_urls: [str]` array with video download URLs
 Supported models: `openai/sora-2-pro`, `google/veo-3.1-fast`. Both text-to-video and image-to-video use the same `/api/v1/videos` endpoint (image-to-video includes `image_url` in the request body).
@@ -63,3 +63,51 @@ Refer to section 4 (Solution Strategy) where the most important decisions are al
 **Decision:** Route all AI generation requests through the [OpenRouter](https://openrouter.ai) API, which exposes an OpenAI-compatible REST interface for hundreds of models.
 **Consequences:** Single API key and base URL for all model providers. Model switching requires only a change to the `model` field in the request payload. If OpenRouter is unavailable, all generation endpoints return `502 Bad Gateway`. Pricing and rate limits are governed by OpenRouter's policies per model.
 ---
 ## ADR-006: Use submit-and-poll pattern for video generation
 **Status:** accepted
 **Context:** OpenRouter's video generation models (Sora 2 Pro, Veo 3.1 Fast) do not return video URLs immediately. Video generation is a long-running operation (typically 30-120 seconds) that requires polling.
 **Decision:** Use the `/api/v1/videos` endpoint with a two-step pattern: (1) `POST` to submit the job and receive a `polling_url`, (2) `GET` the `polling_url` every 5 seconds until `status` is `"completed"` or `"failed"`. The Flask frontend proxies polling requests via `GET /generate/video/status?polling_url=...` and the frontend JavaScript polls this endpoint automatically.
 **Consequences:** The video generation endpoint returns immediately with `status: "queued"` and a `polling_url`. The frontend displays a "Processing..." message and polls for updates. When complete, the video is displayed in a `<video>` element. This adds complexity to the frontend but is necessary for long-running operations. If OpenRouter's polling endpoint is unavailable, the frontend shows an error after a timeout.
 ---
 ## ADR-007: Auto-detect image generation model type
 **Status:** accepted
 **Context:** OpenRouter supports image generation through two different endpoints: the legacy `/images/generations` endpoint (DALL-E 3) and the chat completions endpoint with `modalities: ["image"]` (FLUX.2 Klein 4B, GPT-5 Image Mini). These endpoints have different request/response formats.
 **Decision:** The `/generate/image` router auto-detects the model type by checking if the model slug contains `"flux"` or `"gpt-5-image-mini"`. If so, it routes to `/chat/completions` with `modalities: ["image"]` and `image_config` (aspect_ratio, image_size). Otherwise, it uses `/images/generations` with `size` and `n`.
 **Consequences:** Users can specify any image generation model in the form without needing to know which endpoint it uses. The router handles the routing transparently. Adding new image models requires only updating the detection logic if they use a different endpoint.
 ---
 ## ADR-008: Flask session-based auth with role caching
 **Status:** accepted
 **Context:** The Flask frontend needs to know the user's authentication state and role for route protection (`@login_required`, `@admin_required`) without making an extra API call on every request.
 **Decision:** Store the JWT access token, refresh token, user email, and user role in the Flask server-side session cookie after login. The `@login_required` decorator checks for `access_token` in the session. The `@admin_required` decorator checks `session["user_role"] == "admin"`. This avoids an extra API call to `/users/me` on every request.
 **Consequences:** The user role is cached in the session and may become stale if an admin changes a user's role while the user is logged in. The user must log out and log back in to see the updated role. This is acceptable for the expected usage pattern. The session cookie is signed (Flask's default) to prevent tampering.
 ---
 ## ADR-009: Separate generation pages in frontend
 **Status:** accepted
 **Context:** The original `/generate` page handled text, image, and video generation in a single form, which became unwieldy as more generation types were added.
 **Decision:** Create separate Flask routes and Jinja2 templates for each generation type: `/generate/text`, `/generate/image`, `/generate/video`. The `/generate` route redirects to `/generate/text`. The navigation bar includes a "Generate" dropdown with links to each sub-page. The video page uses tabs for text-to-video and image-to-video.
 **Consequences:** Each generation type has its own URL, making it bookmarkable and shareable. The navigation is clearer with a dropdown menu. Adding new generation types (e.g., audio) follows the same pattern. The `/generate` redirect provides a sensible default entry point.