updating docs

updating documentation
2026-05-25 01:04:49 +02:00 · 2026-05-16 01:59:43 +02:00
25 changed files with 2680 additions and 226 deletions
--- a/README.md
+++ b/README.md
@ -1,170 +1,107 @@
-# lila
+# Lila
 **Learn words. Beat friends.**
-lila is a vocabulary trainer built around a Duolingo-style quiz loop: a word appears in one language, you pick the correct translation from four choices. It supports singleplayer and real-time multiplayer, and is designed to work across multiple language pairs without schema changes.
+Lila is a vocabulary trainer that turns the media you love into language practice. Learn vocabulary from a Shakira song, the first chapter of _Harry Potter_, or an episode of _Breaking Bad_ — then challenge your friends in real-time multiplayer quizzes.
 Live at [lilastudy.com](https://lilastudy.com).
 ---
 ## Quickstart
 ```bash
 # 1. Clone and install
 git clone git@git.lilastudy.com:forgejo-lila/lila.git
 cd lila
 pnpm install
 # 2. Environment
 cp .env.example .env
 # 3. Start local services (PostgreSQL, Valkey)
 docker compose up -d
 # 4. Build shared packages
 pnpm --filter @lila/shared build
 pnpm --filter @lila/db build
 # 5. Run migrations and seed data
 pnpm --filter @lila/db migrate
 pnpm --filter @lila/db seed
 # 6. Start dev servers
 pnpm dev
 ```
 API: `http://localhost:3000` · Web: `http://localhost:5173`
 See [DEPLOYMENT.md](DEPLOYMENT.md) for production infrastructure details.
 ---
 ## Documentation Index
 | Document                                     | What you'll find there                                                  |
 | -------------------------------------------- | ----------------------------------------------------------------------- |
 | [STATUS.md](STATUS.md)                       | Current state — what's working, what's blocked, what we're building now |
 | [BACKLOG.md](BACKLOG.md)                     | Prioritized task list: now / next / later / changelog                   |
 | [ARCHITECTURE.md](ARCHITECTURE.md)           | Monorepo structure, layered architecture, data flow                     |
 | [DECISIONS.md](DECISIONS.md)                 | Why we chose X over Y — tool choices, schema design, trade-offs         |
 | [DATA_PIPELINE.md](DATA_PIPELINE.md)         | Kaikki → CEFR enrichment → production PostgreSQL                        |
 | [MODEL_STRATEGY.md](MODEL_STRATEGY.md)       | LLM voter architecture for sense-disambiguated CEFR assignment          |
 | [LLM_SETUP.md](LLM_SETUP.md)                 | Local and cloud LLM provider configuration                              |
 | [DEPLOYMENT.md](DEPLOYMENT.md)               | Hetzner VPS, Caddy, Docker Compose, CI/CD, backups                      |
 | [design/GAME_MODES.md](design/GAME_MODES.md) | Planned multiplayer and singleplayer game modes                         |
 ---
 ## Stack
-| Layer        | Technology                         |
+| Layer      | Technology                                                    |
-| ------------ | ---------------------------------- |
+| ---------- | ------------------------------------------------------------- |
-| Monorepo     | pnpm workspaces                    |
+| Monorepo   | pnpm workspaces                                               |
-| Frontend     | React 18, Vite, TypeScript         |
+| Frontend   | React 18, Vite, TanStack Router, TanStack Query, Tailwind CSS |
-| Routing      | TanStack Router                    |
+| Backend    | Node.js, Express, TypeScript, WebSockets (`ws`)               |
-| Server state | TanStack Query                     |
+| Database   | PostgreSQL + Drizzle ORM                                      |
-| Styling      | Tailwind CSS                       |
+| Auth       | Better Auth (Google + GitHub)                                 |
-| Backend      | Node.js, Express, TypeScript       |
+| Validation | Zod (shared between frontend and backend)                     |
-| Database     | PostgreSQL + Drizzle ORM           |
+| Testing    | Vitest, supertest                                             |
-| Validation   | Zod (shared schemas)               |
+| Deployment | Docker Compose, Caddy, Hetzner VPS                            |
-| Auth         | Better Auth (Google + GitHub)      |
+| CI/CD      | Forgejo Actions                                               |
-| Realtime     | WebSockets (`ws` library)          |
+
-| Testing      | Vitest, supertest                  |
+---
-| Deployment   | Docker Compose, Caddy, Hetzner VPS |
+
-| CI/CD        | Forgejo Actions                    |
+## Current Status
 - ✅ Singleplayer quiz (5 language pairs: en↔it/de/es/fr)
 - ✅ Multiplayer lobby + real-time game (2–4 players, simultaneous answers, 15s timer)
 - ✅ Auth (Google + GitHub)
 - ✅ Live deployment with CI/CD
 - 🔄 Migrating vocabulary data from OpenWordNet to **Kaikki** (sense-disambiguated translations)
 - 🔄 Phase 7 hardening (rate limiting, error boundaries, monitoring)
 See [STATUS.md](STATUS.md) for the full picture.
 ---
 ## Repository Structure
-```tree
+```
 lila/
 ├── apps/
-│   ├── api/        — Express backend
+│   ├── api/           — Express backend
-│   └── web/        — React frontend
+│   └── web/           — React frontend
 ├── packages/
-│   ├── shared/     — Zod schemas and types shared between frontend and backend
+│   ├── shared/        — Zod schemas + constants (API/web contract)
-│   └── db/         — Drizzle schema, migrations, models, seeding scripts
+│   └── db/            — Drizzle schema, migrations, models, seeding
-├── scripts/        — Python scripts for vocabulary data extraction
+├── data-pipeline/     — Kaikki extraction → enrichment → PostgreSQL sync
-└── documentation/  — Project docs
+├── documentation/   — Project docs (this directory)
-```
+└── Caddyfile, docker-compose.yml, etc.
 `packages/shared` is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas and never duplicated.
 ---
 ## Architecture
 Requests flow through a strict layered architecture:
 ```text
 HTTP Request → Router → Controller → Service → Model → Database
 ```
 Each layer only talks to the layer directly below it. Controllers handle HTTP only. Services contain business logic only. Models contain database queries only. All database code lives in `packages/db` — the API never imports Drizzle directly for queries.
 ---
 ## Data Model
 Words are modelled as language-neutral concepts (`terms`) with per-language `translations`. Adding a new language requires no schema changes — only new rows. CEFR levels (A1–C2) are stored per translation for difficulty filtering.
 Core tables: `terms`, `translations`, `term_glosses`, `decks`, `deck_terms`
 Auth tables (managed by Better Auth): `user`, `session`, `account`, `verification`
 Vocabulary data is sourced from WordNet and the Open Multilingual Wordnet (OMW).
 ---
 ## API
 ```text
 POST /api/v1/game/start     — start a quiz session (auth required)
 POST /api/v1/game/answer    — submit an answer (auth required)
 GET  /api/v1/health         — health check (public)
 ALL  /api/auth/*            — Better Auth handlers (public)
 ```
 The correct answer is never sent to the frontend — all evaluation happens server-side.
 ---
 ## Multiplayer
 Rooms are created via REST, then managed over WebSockets. Messages are typed via a Zod discriminated union. The host starts the game; all players answer simultaneously with a 15-second server-enforced timer. Room state is held in-memory (Valkey deferred).
 ---
 ## Infrastructure
 ```tree
 Internet → Caddy (HTTPS)
            ├── lilastudy.com      → web (nginx, static files)
            ├── api.lilastudy.com  → api (Express)
            └── git.lilastudy.com  → Forgejo (git + registry)
 ```
 Deployed on a Hetzner VPS (Debian 13, ARM64). Images are built cross-compiled for ARM64 and pushed to the Forgejo container registry. CI/CD runs via Forgejo Actions on push to `main`. Daily database backups are synced to the dev laptop via rsync.
 See `documentation/deployment.md` for the full infrastructure setup.
 ---
 ## Local Development
 ### Prerequisites
 - Node.js 20+
 - pnpm 9+
 - Docker + Docker Compose
 ### Setup
 ```bash
 # Install dependencies
 pnpm install
 # Create your local env file (used by docker compose + the API)
 cp .env.example .env
 # Start local services (PostgreSQL, Valkey)
 docker compose up -d
 # Build shared packages
 pnpm --filter @lila/shared build
 pnpm --filter @lila/db build
 # Run migrations and seed data
 pnpm --filter @lila/db migrate
 pnpm --filter @lila/db seed
 # Start dev servers
 pnpm dev
 ```
 The API runs on `http://localhost:3000` and the frontend on `http://localhost:5173`.
 ---
 ## Testing
 ```bash
 # All tests
 pnpm test
 # API only
 pnpm --filter api test
 # Frontend only
 pnpm --filter web test
 ```
 ---
-## Roadmap
+## License
-| Phase | Description                                                            | Status |
+TBD
 | ----- | ---------------------------------------------------------------------- | ------ |
 | 0     | Foundation — monorepo, tooling, dev environment                        | ✅     |
 | 1     | Vocabulary data pipeline + REST API                                    | ✅     |
 | 2     | Singleplayer quiz UI                                                   | ✅     |
 | 3     | Auth (Google + GitHub)                                                 | ✅     |
 | 4     | Multiplayer lobby (WebSockets)                                         | ✅     |
 | 5     | Multiplayer game (real-time, server timer)                             | ✅     |
 | 6     | Production deployment + CI/CD                                          | ✅     |
 | 7     | Hardening (rate limiting, error boundaries, monitoring, accessibility) | 🔄     |
 See `documentation/roadmap.md` for task-level detail.
--- a/apps/web/README.md
+++ b/apps/web/README.md
@ -1,73 +0,0 @@
 # React + TypeScript + Vite
 This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
 Currently, two official plugins are available:
 - [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Oxc](https://oxc.rs)
 - [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/)
 ## React Compiler
 The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
 ## Expanding the ESLint configuration
 If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
 ```js
 export default defineConfig([
  globalIgnores(["dist"]),
  {
    files: ["**/*.{ts,tsx}"],
    extends: [
      // Other configs...
      // Remove tseslint.configs.recommended and replace with this
      tseslint.configs.recommendedTypeChecked,
      // Alternatively, use this for stricter rules
      tseslint.configs.strictTypeChecked,
      // Optionally, add this for stylistic rules
      tseslint.configs.stylisticTypeChecked,
      // Other configs...
    ],
    languageOptions: {
      parserOptions: {
        project: ["./tsconfig.node.json", "./tsconfig.app.json"],
        tsconfigRootDir: import.meta.dirname,
      },
      // other options...
    },
  },
 ]);
 ```
 You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
 ```js
 // eslint.config.js
 import reactX from "eslint-plugin-react-x";
 import reactDom from "eslint-plugin-react-dom";
 export default defineConfig([
  globalIgnores(["dist"]),
  {
    files: ["**/*.{ts,tsx}"],
    extends: [
      // Other configs...
      // Enable lint rules for React
      reactX.configs["recommended-typescript"],
      // Enable lint rules for React DOM
      reactDom.configs.recommended,
    ],
    languageOptions: {
      parserOptions: {
        project: ["./tsconfig.node.json", "./tsconfig.app.json"],
        tsconfigRootDir: import.meta.dirname,
      },
      // other options...
    },
  },
 ]);
 ```
--- a/documentation/ARCHITECTURE.md
+++ b/documentation/ARCHITECTURE.md
@ -0,0 +1,229 @@
 # Architecture
 > How Lila is structured, how data flows, and why the boundaries are where they are.
 ---
 ## Monorepo Layout
 ```
 lila/
 ├── apps/
 │   ├── api/              — Express backend (HTTP + WebSocket)
 │   └── web/              — React frontend (Vite, TanStack Router)
 ├── packages/
 │   ├── shared/           — Zod schemas + constants (API/web contract)
 │   └── db/               — Drizzle schema, migrations, models, seeding
 ├── data-pipeline/        — Kaikki extraction → enrichment → PostgreSQL sync
 ├── documentation/        — Project docs
 ├── Caddyfile             — Reverse proxy routing
 ├── docker-compose.yml    — Local dev stack
 └── pnpm-workspace.yaml   — Workspace definition
 ```
 **Package boundaries:**
 | Package           | Owns                                                              | Consumed by                           |
 | ----------------- | ----------------------------------------------------------------- | ------------------------------------- |
 | `packages/shared` | Zod schemas, constants, derived TypeScript types                  | `apps/api`, `apps/web`, `packages/db` |
 | `packages/db`     | Drizzle schema, DB connection, all model/query functions          | `apps/api`                            |
 | `apps/api`        | Router, controllers, services, error handling, WebSocket handlers | —                                     |
 | `apps/web`        | React components, routes, client-side state                       | —                                     |
 **Rule:** `apps/api` never imports `drizzle-orm` for queries. It only calls functions exported from `packages/db`.
 ---
 ## Layered Architecture (HTTP)
 ```
 HTTP Request
     ↓
  Router        — maps URL + HTTP method to a controller
     ↓
 Controller     — handles HTTP only: validates input (Zod safeParse),
                  calls service, sends response or next(error)
     ↓
  Service       — business logic only: no HTTP, no direct DB access
     ↓
  Model         — database queries only: no business logic
     ↓
  Database      — PostgreSQL via Drizzle ORM
 ```
 **The rule:** each layer only talks to the layer directly below it.
 - **Controller** never touches the database.
 - **Service** never reads `req.body`.
 - **Model** never knows what a quiz is.
 ### Error Flow
 ```
 Controller throws ValidationError (400) or calls next(error)
     ↓
 Central errorHandler middleware in app.ts
     ↓
 Maps AppError subclasses to HTTP status codes
     ↓
 Unknown errors → 500
 ```
 ---
 ## WebSocket Architecture
 The WebSocket server is attached to the same Express HTTP server. It upgrades connections on the `/ws` path.
 ```
 WS Connection Upgrade
     ↓
 Auth middleware — validates Better Auth session from cookie
     ↓
 Message Router — dispatches by `type` field (Zod discriminated union)
     ↓
 Handler (lobby or game) — business logic, broadcasts state
     ↓
 In-memory stores (lobby game state, game session state)
 ```
 **Message protocol:** All WebSocket messages are validated against Zod schemas defined in `packages/shared/src/schemas/lobby.ts` and `packages/shared/src/schemas/game.ts`. The `type` field is a discriminated union — the router switches on it and validates the payload against the corresponding schema.
 **State storage:**
 - **Lobby membership** — stored in PostgreSQL (`lobbies`, `lobby_players` tables) for durability
 - **Game/room state** — stored in-memory (`InMemoryLobbyGameStore`, `InMemoryGameSessionStore`). Valkey migration is planned.
 ---
 ## Database Schema (Core)
 **Concept:** Words are language-neutral concepts (`terms`) with per-language `translations`. Adding a new language requires no schema changes — only new rows.
 ### Core Tables
 | Table          | Purpose                                                                          |
 | -------------- | -------------------------------------------------------------------------------- |
 | `terms`        | Language-neutral concept: `id`, `pos` (noun/verb/adj/adv), `source`, `source_id` |
 | `translations` | Per-language word: `term_id` (FK), `language_code`, `text`, `cefr_level` (A1–C2) |
 | `term_glosses` | Per-language definition: `term_id` (FK), `language_code`, `text`                 |
 | `decks`        | Curated wordlists: `source_language`, `validated_languages`, frequency tier      |
 | `deck_terms`   | Junction: which terms belong to which deck                                       |
 ### Auth Tables (managed by Better Auth)
 | Table          | Purpose                                                                           |
 | -------------- | --------------------------------------------------------------------------------- |
 | `user`         | Account: `id`, `name`, `email`, `image`                                           |
 | `session`      | Active sessions: `id`, `user_id`, `token`, `expires_at`                           |
 | `account`      | Social provider links: `user_id`, `provider` (google/github), `providerAccountId` |
 | `verification` | Email verification tokens (unused for social-only auth)                           |
 **Key constraints:**
 - `language_code` is CHECK-constrained against `SUPPORTED_LANGUAGE_CODES` (`en`, `it`, `de`, `es`, `fr`)
 - `pos` is CHECK-constrained against `SUPPORTED_POS` (`noun`, `verb`, `adjective`, `adverb`)
 - `cefr_level` is nullable `varchar(2)` with CHECK `A1`–`C2`
 - `translations` has UNIQUE `(term_id, language_code, text)` — allows synonyms, prevents exact duplicates
 ---
 ## Data Flow: Quiz Session
 ### Singleplayer
 ```
 User clicks "Start Quiz"
     ↓
 POST /api/v1/game/start  (GameRequestSchema: source_lang, target_lang, pos, difficulty, rounds)
     ↓
 gameController.validate → gameService.createGameSession
     ↓
 termModel.getGameTerms(filters) + termModel.getDistractors(filters)
     ↓
 Service shuffles options, stores session in GameSessionStore
     ↓
 Returns GameSession { sessionId, questions[] } — correct answer NEVER sent to frontend
     ↓
 User answers → POST /api/v1/game/answer (AnswerSubmissionSchema)
     ↓
 Service evaluates server-side, returns AnswerResult { isCorrect, correctOptionId, selectedOptionId }
 ```
 ### Multiplayer
 ```
 Host creates lobby → POST /api/v1/lobbies → returns room code
     ↓
 Players join via code → POST /api/v1/lobbies/:code/join
     ↓
 All players connect WebSocket → send lobby:join with room code
     ↓
 Server broadcasts lobby:state (player list) to all connections in room
     ↓
 Host clicks "Start" → WS lobby:start
     ↓
 Server generates questions via MultiplayerGameService, broadcasts game:question
     ↓
 Players submit answers via WS game:answer within 15s server timer
     ↓
 On all-answered or timeout → evaluate, broadcast game:answer_result
     ↓
 After N rounds → broadcast game:finished with final scores
 ```
 ---
 ## The `packages/shared` Contract
 `packages/shared` is the **single source of truth** for all data shapes crossing the API boundary.
 **What lives here:**
 - `constants.ts` — `SUPPORTED_LANGUAGE_CODES`, `SUPPORTED_POS`, `DIFFICULTY_LEVELS`, `CEFR_LEVELS`, `GAME_ROUNDS`
 - `schemas/game.ts` — `GameRequestSchema`, `GameSessionSchema`, `GameQuestionSchema`, `AnswerOptionSchema`, `AnswerSubmissionSchema`, `AnswerResultSchema`
 - `schemas/lobby.ts` — `LobbyCreateSchema`, `LobbyJoinSchema`, `LobbyStateSchema`, `WebSocketMessageSchema` (discriminated union)
 - `schemas/auth.ts` — Auth-related shared types
 **Why this matters:** If the shape changes, TypeScript compilation fails in both `apps/api` and `apps/web` simultaneously. Silent drift is impossible.
 ---
 ## GameSessionStore Abstraction
 The service layer stores session state through an interface, not a concrete implementation:
 ```typescript
 interface GameSessionStore {
  createSession(session: GameSession): Promise<void>;
  getSession(sessionId: string): Promise<GameSession | null>;
  // ...
 }
 ```
 **Current:** `InMemoryGameSessionStore` — Map-based, lives in `apps/api` process memory. Lost on restart.
 **Planned:** `ValkeyGameSessionStore` — Redis-compatible, persists across restarts, enables horizontal scaling.
 The same pattern applies to `LobbyGameStore` (lobby state).
 ---
 ## Key Design Decisions (Quick Reference)
 | Decision                          | Where it's explained          |
 | --------------------------------- | ----------------------------- |
 | Why Drizzle over Prisma           | `DECISIONS.md` → ORM          |
 | Why `ws` over Socket.io           | `DECISIONS.md` → WebSocket    |
 | Why server-side answer evaluation | `DECISIONS.md` → Architecture |
 | Why Better Auth over Keycloak     | `DECISIONS.md` → Auth         |
 | Why terms/translations schema     | `DECISIONS.md` → Data Model   |
 | Why Caddy over Nginx/Traefik      | `DECISIONS.md` → Deployment   |
 ---
 ## Further Reading
 - [DATA_PIPELINE.md](DATA_PIPELINE.md) — How vocabulary data gets from Kaikki into PostgreSQL
 - [DEPLOYMENT.md](DEPLOYMENT.md) — Production infrastructure and ops
 - [MODEL_STRATEGY.md](MODEL_STRATEGY.md) — LLM voter architecture for CEFR assignment
 - [design/GAME_MODES.md](design/GAME_MODES.md) — Planned multiplayer modes
--- a/documentation/BACKLOG.md
+++ b/documentation/BACKLOG.md
@ -14,6 +14,9 @@ Things that are actively in progress or should be picked up immediately. Mostly
 Clearly planned work, not yet started. No hard ordering — sequence based on what unblocks real users first.
 - **404 handling for unknown subdomains and routes** `[ux]`
  Unknown subdomains (e.g., `foo.lilastudy.com`) and client-side routes return raw errors or blank pages. Add catch-all 404 handling: Caddy-level redirect for unknown subdomains, frontend catch-all route for unknown paths.
 - how to update forgejo regularly?
 - stop sql backup script on dev laptop until database moved from openwordnet to kaikki
--- a/documentation/DATA_PIPELINE.md
+++ b/documentation/DATA_PIPELINE.md
--- a/documentation/DECISIONS.md
+++ b/documentation/DECISIONS.md
--- a/documentation/DEPLOYMENT.md
+++ b/documentation/DEPLOYMENT.md
--- a/documentation/LLM_SETUP.md
+++ b/documentation/LLM_SETUP.md
--- a/documentation/MODEL_STRATEGY.md
+++ b/documentation/MODEL_STRATEGY.md
@ -100,6 +100,7 @@ These sub-stages require a model that understands sense context from examples. S
 **Context enrichment via Wiktionary API:** Before calling any model for the gloss or example sub-stage, the pipeline queries the Wiktionary API for the headword. The API returns the full Wiktionary entry including all senses, usage notes, and examples. This structured data is added to the prompt as additional context, giving the model a much clearer picture of which specific sense it is working with.
 This directly fixes the two hardest failure cases:
 - **Category header glosses** ("Terms relating to people.") — the Wiktionary entry contains the real definition which the model can use to generate a proper gloss
 - **Short ambiguous glosses** — the additional sense context prevents the model from guessing the wrong meaning
@ -135,39 +136,46 @@ Three voters means a correct majority requires at least two models to agree. Eve
 ## Open questions
 ### Wiktionary API context extraction
 The Wiktionary API returns the full entry for a word including all senses. For a word like "free" with 8+ senses, dumping the entire entry into the prompt wastes tokens and may confuse the model. The open question is how to extract only the relevant sense — options include matching by sense_index, fuzzy-matching the Kaikki gloss against Wiktionary glosses, or letting the model see all senses and identify the correct one itself.
 ### Batching prompt design
 Batching 5-10 entries per API call multiplies effective daily capacity significantly. The prompt and validation logic for batched requests is more complex — the model must return a structured JSON object keyed by entry ID, and partial failures (one entry in a batch fails validation) need careful handling. Not yet designed or tested.
 ### Groq and Gemini API integration
 Neither Groq nor Gemini is integrated into the pipeline yet. Both use OpenAI-compatible APIs so integration is straightforward — add provider configs to `stage-3-enrich/config.ts` and set API keys in `.env`. The batching prompt design needs to be finalised first.
 ### OpenRouter free model rotation
 OpenRouter's `openrouter/free` router selects a model at random from available free models. This means output style and quality vary between requests, which complicates round 2 voting where models review each other's candidates. May need to pin specific free models rather than using the router.
 ### Qwen3.5-9B performance on hard cases
 The 9B model has not yet been tested. It is expected to handle rare and specialized senses better than the 4B model but this has not been verified. Needs a test run against the same 50 entries used to evaluate the 4B model.
 ### Llama.cpp Gemma 4 bug
 The llama.cpp chat template bug preventing reliable JSON output from Gemma 4 E4B may be fixed in a future release. The model fits in 4GB VRAM and would be a useful additional local voter if the bug is resolved. Worth checking periodically.
 ### Full dataset scale
 The current pipeline runs on a 500-entry sample per language. The full Kaikki English file contains approximately 1.3 million entries, of which a fraction will pass the POS and translation filters. The exact count and the time required to run all sub-stages across all models at full scale is not yet known.
 ### Category header glosses
 Kaikki occasionally uses category headers ("Terms relating to people.", "Terms relating to things.") as glosses. These are not real definitions and no model produces useful output for them. Options include pre-filtering them before the gloss sub-stage and generating a gloss purely from examples, or flagging them as a special case for human review.
 wget -O models/llama-3.1-8b-instruct-q4_k_m.gguf \
-  "https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
+ "https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
 # Q4_K_M (5.68GB — hybrid mode, better quality)
 wget -O models/qwen3.5-9b-q4_k_m.gguf \
-  "https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf"
+ "https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf"
 # Q3_K_S (4.32GB — might fit fully in VRAM)
 wget -O models/qwen3.5-9b-q3_k_s.gguf \
-  "https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q3_K_S.gguf"
+ "https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q3_K_S.gguf"
--- a/documentation/STARTUP_ROADMAP.md
+++ b/documentation/STARTUP_ROADMAP.md
@ -0,0 +1,356 @@
 # Lila — Feature & Startup Strategy Roadmap
 > **Context for any LLM reading this:** Lila is a language learning/vocabulary app with two core differentiators: (1) **media-based practice** — users learn vocabulary extracted from real media they love (e.g., a Shakira song, the first chapter of _Harry Potter_, or an episode of _Breaking Bad_), and (2) **multiplayer modes** — users practice vocabulary together or competitively in real-time sessions. The app is currently at an early MVP stage. The existing MVP was built around OpenWordNet, which is being replaced because it produces unreliable translations (sense-disambiguation issues). The team is migrating the data pipeline to **Kaikki**, which structures entries per word sense and links translations to specific senses rather than vague general concepts. This migration is the current technical priority. The project is a TypeScript monorepo (pnpm workspaces) with an Express/WebSocket API (`apps/api`), a React frontend using TanStack Router (`apps/web`), a data ingestion pipeline (`data-pipeline`) backed by SQLite/Drizzle, shared packages (`packages/db`, `packages/shared`), and Docker-based deployment orchestrated with Caddy. Documentation restructuring (human-readable vs. AI-optimized docs) is being handled in a separate parallel workstream.
 ---
 ## Current State (Ground Truth — 2026-05-15)
 ### What Works Today ✅
 - **Singleplayer quiz** — Duolingo-style, 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
 - **Multiplayer** — Create/join lobby by room code, 2–4 players, simultaneous answers, 15s server timer, live scoring, winner screen
 - **Auth** — Google + GitHub via Better Auth
 - **Deployment** — Live at lilastudy.com, Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
 - **Database** — PostgreSQL with Drizzle ORM, daily backups
 ### What's In Progress / Blocked 🚧
 - **Kaikki data pipeline migration** — Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture. Stages 4–6 not started.
 - **Guest play** — No try-before-signup flow yet. Auth required for all game routes.
 - **Game session store** — Still in-memory. Valkey container exists locally but not wired up.
 - **Media ingestion** — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.
 ### The Strategic Gap
 The app is currently a **generic vocabulary quiz**. The media-based practice feature (the differentiator) does not exist yet. It depends on:
 1. Kaikki pipeline reaching production (fixes translation quality)
 2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)
 ---
 ## Stream 1: Documentation Restructure (Parallel Track)
 **Status:** ✅ Complete. Human-readable branch (README, STATUS, ARCHITECTURE, BACKLOG, DECISIONS, DEPLOYMENT, DATA_PIPELINE, MODEL_STRATEGY, LLM_SETUP, design/GAME_MODES) and AI-context branch (00–06, prompts/meta.md, 99-current-task.md) are live in `documentation/`.
 ---
 ## Stream 2: Feature Roadmap (Three Lanes)
 ### Lane A — Attract & Keep Users
 **Goal:** A user lands on Lila, understands the value in 10 seconds, and completes a satisfying vocabulary practice session in under 2 minutes.
 **Current Reality Check:**
 - Singleplayer and multiplayer quizzes are **already working and deployed**.
 - The app is functional but **not differentiated** — it's a generic vocabulary quiz right now.
 - The "wow" moment requires the **media-based practice feature**, which does not exist yet.
 **Must-Haves for First Users:**
 1. **Guest Play (Zero-Friction Onboarding)** `[in backlog next]`
   - No signup required for first session.
   - Capture email or OAuth only after the user experiences value.
   - Critical for viral loops and investor demos.
   - **Status:** Planned in BACKLOG.md. Not yet implemented.
 2. **One Polished Media Demo** `[not started]`
   - Pick **ONE** piece of media and make it flawless end-to-end: subtitles/lyrics → Kaikki-based vocab extraction with sense-disambiguated translations → playable quiz with timestamps/context.
   - **Language pair:** en→es (biggest market, most content)
   - **Media candidates:** _Breaking Bad S01E01_, a Shakira song, or _Harry Potter and the Sorcerer's Stone Ch. 1_.
   - This is the primary "wow" moment. Differentiates Lila from all other vocabulary apps.
   - **Blocker:** Requires (a) Kaikki pipeline in production, and (b) a media ingestion prototype.
 3. **One Additional Multiplayer Mode** `[design exists, not implemented]`
   - Proves the mode-agnostic lobby architecture works and adds variety beyond the current simultaneous-answer flow.
   - **Recommended first mode:** Race to the Top (target score, no round limit) — simplest to implement, changes only scoring logic.
   - Alternative: TV Quiz Show (buzzer — first to press answers) — most visually distinct, but requires new answer flow.
   - **Status:** Lobby infrastructure is mode-agnostic. Each mode adds game logic only. See `design/GAME_MODES.md` for full designs.
   - **Why it matters:** Duolingo has no multiplayer. Anki has no multiplayer. Real-time modes are a genuine differentiator even without media.
 4. **Social Proof / Shareable Output** `[not started]`
   - Post-game card: "I learned 12 words from _La Tortura_ — can you beat my score?"
   - Image export or copy-paste text for Reddit, Discord, Twitter.
   - This is the organic growth engine.
   - **Blocker:** Requires media demo to exist first.
 **Already Shipped (Don't Rebuild):**
 - ✅ Singleplayer quiz (5 languages, POS/difficulty filters)
 - ✅ Multiplayer lobby + real-time game (2–4 players, simultaneous answers, 15s timer, scoring)
 - ✅ Auth (Google + GitHub)
 - ✅ Live deployment with CI/CD
 **Nice-to-Haves (Post-Launch):**
 - Additional multiplayer modes (Chain Link, Elimination Round, Cooperative Challenge)
 - Leaderboards
 - Spaced repetition review queue
 ---
 ### Lane B — Investor-Ready
 **Goal:** Walk into a pitch with engagement metrics and a defensibility story tied to Lila's unique data pipeline.
 **Checklist:**
 1. **Metrics Instrumentation** `[not started]`
   - Track: DAU/MAU, session length, quiz completion rate, multiplayer match completion rate, Day 1 / Day 7 retention.
   - Tool: PostHog, Mixpanel, or Plausible (self-hosted).
   - Need 4–6 weeks of real-user data.
   - **Note:** The app is live but has no analytics. This is a prerequisite for any investor conversation.
 2. **Growth Mechanic** `[not started]`
   - The shareable card (Lane A.3) must be live and instrumented.
   - Measure k-factor (viral coefficient). Even 0.3 is a story.
   - **Blocker:** Requires media demo.
 3. **Defensibility Story** `[partially true, not yet proven]`
   - **Data moat:** Lila's Kaikki → media mapping pipeline produces sense-disambiguated vocabulary tied to specific media timestamps. Competitors using generic word lists or OpenWordNet-style dumps cannot match the precision.
   - **Current reality:** The Kaikki pipeline exists but is not in production. The media mapping pipeline does not exist yet.
   - **What investors would ask:** "You have a quiz app. Where's the media feature you pitched?"
   - **Requirement:** Media demo + Kaikki production data must be live before investor conversations.
 4. **Monetization Hypothesis** `[deferred to business co-founder]`
   - Not the technical founder's priority right now.
   - Will be owned by business co-founder or advisor after traction.
   - Options to test later: freemium (free curated media, premium for uploads/unlimited multiplayer/stats), B2B schools, affiliate links to streaming/books/music.
 **Investor Timeline:**
 - **Now → Month 2:** Finish Kaikki pipeline + ship media demo + add metrics.
 - **Month 2–3:** Soft launch to 100 strangers, gather retention data.
 - **Month 3+:** Investor-ready if retention curves look good.
 ---
 ### Lane C — Co-Founder-Ready
 **Goal:** A potential co-founder looks at Lila and thinks, "This person can build, and there's a real product here."
 **Checklist:**
 1. **Clean Codebase + Documentation** `[in progress]`
   - Documentation restructure is complete.
   - README must get a new dev from `git clone` to `docker compose up` in < 5 minutes.
   - **Status:** Docs are done. Code cleanliness is ongoing (BACKLOG.md `next`/`later` items).
 2. **Live Demo with Real Users** `[partially done]`
   - App is live at lilastudy.com with real auth and multiplayer.
   - **Gap:** No real users yet. The current app is a generic quiz — not compelling enough for strangers to stick around.
   - **Requirement:** Media demo must be live before pitching to potential co-founders.
 3. **Clear Vision Doc** `[not written]`
   - 1-page: What Lila is, what it isn't, and the 18-month arc.
   - Include: target languages, target media types, target user persona, and what "success" looks like at 6 / 12 / 18 months.
 **Co-Founder Search: Deferred**
 - **Not needed now.** No savings, no traction, no differentiated product. A business co-founder can't raise money or design monetization from a generic quiz.
 - **Revisit in Month 6+** after media demo + 100 users + retention data.
 - **Exception:** If Reaktor.berlin accepts solo founder, take it. If they require a team, evaluate then — but don't rush a bad match.
 ---
 ## Stream 3: Building the Startup (Technical Founder Journey)
 ### Phase 0 — Runway Acquisition (Now → June)
 **Goal:** Secure full-time building capacity.
 **Profile:** EU citizen, Berlin-based, no savings, currently employed but being fired May 28. Eligible for Arbeitslosengeld I (24 months employment history).
 **Primary Path:**
 1. **Register at Arbeitsamt** — May 28 (immediately after firing)
 2. **Apply for Arbeitslosengeld I** — Same day
 3. **Apply for Gründungszuschuss** — Within 4 weeks of starting self-employment
   - Requires: business plan (1–2 pages), viability check by counselor
   - Provides: 9–12 months basic income (~€1,500–2,000/month)
   - **Best case:** Full-time building, no equity cost, no co-founder needed
   - **Probability:** High (70–80%) — low burn rate, working app, clear technical path
 **Parallel Path:**
 - **Reaktor.berlin** — Already applied. Solo founder accepted (lower odds). €25K for 6 months, 2.5% equity.
  - If accepted + Gründungszuschuss approved: Choose Gründungszuschuss (no equity, longer runway)
  - If accepted + Gründungszuschuss rejected: Take Reaktor, build solo for 6 months
  - If rejected: Continue with Gründungszuschuss or Arbeitslosengeld I
 **Backup Path:**
 - **Arbeitslosengeld I only** — If Gründungszuschuss rejected. €1,000–1,500/month. Continue building full-time.
 - **Part-time job** — If Arbeitslosengeld I insufficient. Slower progress but sustainable.
 **Not Pursuing:**
 - Full-time job + nights/weekends — Burnout risk, Lila stalls
 - Freelance/consulting — No current skill set or client network, too time-consuming to build
 - EXIST/INVEST/YC/Techstars — Wrong stage, too competitive, too long timeline
 ---
 ### Phase 1 — Differentiate the MVP (Month 1–3)
 **Duration:** 2–3 months  
 **Rule:** Build first, measure second, align third.
 **Assumption:** Gründungszuschuss approved (full-time building). If not, same tasks but slower.
 **Tasks:**
 1. **Finish Kaikki pipeline** (Stage 3–6)
   - Complete enrich sub-stage rewrite
   - Run full sample, validate quality
   - Production sync to PostgreSQL
   - **Timeline:** 2–4 weeks
 2. **Build media ingestion prototype**
   - Pick ONE media piece (Breaking Bad S01E01, Shakira song, or Harry Potter Ch. 1)
   - Pipeline: subtitles/lyrics → text extraction → vocabulary identification → Kaikki sense-matching → quiz generation
   - UI: media selection → quiz with context ("This word appears at 00:04:23")
   - **Language pair:** en→es
   - **Timeline:** 2–4 weeks (parallel with Kaikki pipeline)
 3. **Ship guest play**
   - Make auth optional on game routes
   - "Try without account" button on landing page
   - Capture email/OAuth after first session
   - **Timeline:** 1 week
 4. **Add one additional multiplayer mode**
   - Race to the Top recommended (simplest: target score, no round limit)
   - Changes only scoring logic, reuses existing lobby infrastructure
   - **Timeline:** 1–2 weeks
 5. **Add metrics instrumentation**
   - PostHog or Plausible
   - Track: signups, quiz starts, completions, multiplayer matches, retention
   - **Timeline:** 1 week
 6. **Soft launch to 100 strangers**
   - Reddit (r/languagelearning, r/Anki, r/Refold), language-learning Discords, Hacker News Show HN
   - Collect qualitative feedback
   - **Timeline:** 1 week (after media demo is live)
 ---
 ### Phase 2 — Validate & Measure (Month 3–4)
 **Goal:** Prove that the media feature resonates and that retention curves exist.
 **Tasks:**
 - Analyze metrics: Do users who try media-based practice return more than singleplayer-only users?
 - Iterate on media selection and quiz UX based on feedback
 - Polish shareable output (social cards)
 - Fix hardening items from BACKLOG.md Phase 7
 **Decision gate:** If 100 users show positive retention signals (Day 1 > 30%, Day 7 > 10%), proceed to Phase 3. If not, iterate on media feature or pivot.
 ---
 ### Phase 3 — Funding or Revenue (Month 4–6)
 **Goal:** Secure runway beyond Gründungszuschuss/Arbeitslosengeld I.
 **If metrics are positive:**
 - Apply to accelerators: Reaktor.berlin (next batch), Y Combinator, Techstars Berlin
 - Angel outreach: Berlin ed-tech angels, former founders in language learning
 - EU grants: EXIST (if university partnership), INVEST (Berlin-specific)
 **If metrics are weak:**
 - Iterate on media feature or pivot value proposition
 - Consider B2B angle (schools, language institutes) if consumer traction is low
 - Part-time work to extend runway while iterating
 **Co-founder search:** Not a priority. If funding requires a team (some accelerators), evaluate then. Otherwise, hire contractors or employees for specific gaps.
 ---
 ### Phase 4 — Scale (Month 6+)
 **Goal:** Grow user base, build team, secure Series A or sustainable revenue.
 **Only relevant if Phase 3 succeeds.** Otherwise, continue iterating.
 ---
 ## Suggested Execution Order
 ### Month 0 (Now → May 28)
 - **Week 1:** Finish Kaikki Stage 3 enrich sub-stage rewrite. Run full sample, validate quality.
 - **Week 2:** Register at Arbeitsamt (May 28). Apply for Arbeitslosengeld I. Ask about Gründungszuschuss.
 ### Month 1 (June)
 - **Week 1–2:** Submit Gründungszuschuss application. Prepare business plan (1–2 pages).
 - **Week 3:** Start media ingestion prototype (parallel). Pick one media piece, get text extraction working.
 - **Week 4:** Continue Kaikki pipeline + media prototype.
 ### Month 2 (July)
 - **Week 1–2:** Complete media ingestion prototype. End-to-end: media → quiz.
 - **Week 3:** Ship guest play. Add additional multiplayer mode (Race to the Top).
 - **Week 4:** Add metrics (PostHog/Plausible). Polish shareable output.
 ### Month 3 (August)
 - **Week 1:** Soft launch to 100 strangers. Gather feedback.
 - **Week 2–3:** Iterate based on feedback. Fix hardening items from BACKLOG.md.
 - **Week 4:** Analyze metrics. Decision gate: proceed or iterate?
 ### Month 4–6 (September–November)
 - If metrics positive: Apply to accelerators, angels, grants.
 - If metrics weak: Iterate or pivot.
 - If Gründungszuschuss expires: Transition to Arbeitslosengeld I, part-time work, or funding.
 ---
 ## Open Questions (Answered)
 ### Product Reality Check
 - [x] What actually works today? — Singleplayer quiz, multiplayer, auth, deployment
 - [x] What is broken or placeholder? — Kaikki pipeline Stage 3, guest play, in-memory session store
 - [x] What language pairs are supported? — en↔it/de/es/fr
 - [x] Exact blocker on Kaikki Stage 3? — No blocker, work in progress
 ### Target Audience
 - [x] Who is the ideal first user? — Immersion learner (Netflix watcher) + social learner
 - [x] What languages for launch? — en→es for media demo (biggest market), all 5 pairs live
 ### Business Model
 - [x] Monetization hypothesis? — **Deferred** to business co-founder or advisor after traction
 - [x] Unit economics? — **Deferred** until product-market fit
 ### Competitive Landscape
 - [x] Direct competitors? — Duolingo, Anki, LingQ, FluentU, Quizlet
 - [x] What they do poorly? — No real media integration, no real-time multiplayer, no sense-disambiguated translations
 ### Runway & Constraints
 - [x] Full-time or nights/weekends? — Transitioning to full-time via Gründungszuschuss
 - [x] Funding/savings? — No savings. EU citizen, Berlin-based, eligible for Arbeitslosengeld I
 - [x] Hard deadline? — None. Self-paced. Gründungszuschuss application deadline: within 4 weeks of May 28
 ### Co-Founder Search
 - [x] Local or remote? — Berlin-based, but open
 - [x] What do they do? — **Not needed now.** Revisit Month 6+ after traction
 - [x] Known candidates? — None. Starting from zero
 - [x] Equity mindset? — **Deferred** until co-founder search begins
 ---
 _End of Lila feature & startup strategy doc._
--- a/documentation/STATUS.md
+++ b/documentation/STATUS.md
@ -0,0 +1,46 @@
 # Status — 2026-05-15
 > Last updated: 2026-05-15. Update this file after every deploy or when switching tasks.
 ## What Works Today ✅
 - **Singleplayer quiz** — Duolingo-style, 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
 - **Multiplayer** — Create/join lobby by room code, 2–4 players, simultaneous answers, 15s server timer, live scoring, winner screen
 - **Auth** — Google + GitHub via Better Auth, cross-subdomain cookies, session middleware on protected routes
 - **Deployment** — Live at lilastudy.com, Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
 - **Database** — PostgreSQL with Drizzle ORM, daily backups, idempotent seeding
 ## What's Broken / Blocked 🚧
 - **Data quality** — Production still uses OpenWordNet/OMW translations. Kaikki pipeline (sense-disambiguated) is in progress but not yet synced to production.
 - **Guest play** — Auth is required for all game routes. No try-before-signup flow.
 - **Game session store** — Still in-memory (`InMemoryGameSessionStore`). Valkey container exists in local dev but not wired up.
 - **Rate limiting** — Partially implemented on auth endpoints; game endpoints not yet covered.
 - **React error boundaries** — Not implemented; runtime crashes take down the whole app.
 - **Monitoring** — No uptime alerts or centralized logging on the VPS.
 ## What I'm Working On Now 🔄
 **Primary:** Rewriting the Kaikki data pipeline enrich script for sub-stage architecture (round1_gloss → round1_example → round1_translations → round1_cefr).
 **Secondary:** Phase 7 hardening backlog items (see BACKLOG.md `next` section).
 ## Next 2-Week Goal 🎯
 Finish Kaikki Stage 3 (enrich) sub-stage rewrite → run full sample → compare quality → decide on production sync timeline.
 ## The Big Picture
 Lila is a **deployed, working vocabulary quiz app**. The core loop (singleplayer + multiplayer) is solid. The next strategic milestone is **media-based practice** (learn vocab from a song/TV episode/book chapter), but that depends on:
 1. Kaikki data pipeline reaching production (fixes translation quality)
 2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)
 Until then, the app is a generic vocabulary quiz — functional but not differentiated.
 ## Quick Links
 - [BACKLOG.md](BACKLOG.md) — Prioritized tasks
 - [DATA_PIPELINE.md](DATA_PIPELINE.md) — Pipeline stages and current progress
 - [BACKLOG.md](BACKLOG.md) — `now` / `next` / `later`
 - [DEPLOYMENT.md](DEPLOYMENT.md) — Infrastructure ops
--- a/documentation/ai-context/00-project-overview.md
+++ b/documentation/ai-context/00-project-overview.md
@ -0,0 +1,116 @@
 # 00 — Project Overview
 > **Purpose:** Give any LLM instant context on what Lila is, what makes it different, and what's currently built vs. planned. Concatenate this file with domain-specific files (01–06) and 99-current-task.md before handing to an LLM.
 > **Last updated:** 2026-05-15
 > **Depends on:** Nothing (this is the entry point)
 ---
 ## What Lila Is
 Lila is a vocabulary learning app with two core differentiators:
 1. **Media-based practice** — Users learn vocabulary extracted from real media they love: a Shakira song, the first chapter of _Harry Potter_, an episode of _Breaking Bad_. The app extracts vocabulary from subtitles/lyrics/text and turns it into quiz questions.
 2. **Multiplayer modes** — Users practice vocabulary together or competitively in real-time sessions (2–4 players, simultaneous answers, live scoring).
 The core learning loop is Duolingo-style: a word appears in one language, the user picks the correct translation from four choices.
 Live at [lilastudy.com](https://lilastudy.com).
 ---
 ## Current State (2026-05-15)
 ### What Works Today
 - **Singleplayer quiz** — 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
 - **Multiplayer** — Create/join lobby by room code, 2–4 players, simultaneous answers, 15s server timer, live scoring, winner screen
 - **Auth** — Google + GitHub via Better Auth
 - **Deployment** — Live on Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
 - **Database** — PostgreSQL with Drizzle ORM, daily backups
 ### What's In Progress / Blocked
 - **Kaikki data pipeline migration** — Replacing OpenWordNet/OMW with sense-disambiguated Kaikki data. Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture.
 - **Guest play** — No try-before-signup flow yet. Auth required for all game routes.
 - **Game session store** — Still in-memory. Valkey container exists locally but not wired up.
 - **Media ingestion** — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.
 ### The Strategic Gap
 The app is currently a **generic vocabulary quiz**. The media-based practice feature (the differentiator) does not exist yet. It depends on:
 1. Kaikki pipeline reaching production (fixes translation quality)
 2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)
 ---
 ## Tech Stack
 | Layer         | Technology                                                     |
 | ------------- | -------------------------------------------------------------- |
 | Monorepo      | pnpm workspaces                                                |
 | Frontend      | React 18, Vite, TanStack Router, TanStack Query, Tailwind CSS  |
 | Backend       | Node.js, Express, TypeScript, WebSockets (`ws` library)        |
 | Database      | PostgreSQL + Drizzle ORM                                       |
 | Auth          | Better Auth (Google + GitHub)                                  |
 | Validation    | Zod (shared between frontend and backend in `packages/shared`) |
 | Testing       | Vitest, supertest                                              |
 | Deployment    | Docker Compose, Caddy, Hetzner VPS                             |
 | CI/CD         | Forgejo Actions                                                |
 | Data Pipeline | Kaikki (Wiktionary) → SQLite (`pipeline.db`) → PostgreSQL      |
 ---
 ## Repository Structure
 ```
 lila/
 ├── apps/
 │   ├── api/              — Express backend (HTTP + WebSocket)
 │   └── web/              — React frontend (Vite, TanStack Router)
 ├── packages/
 │   ├── shared/           — Zod schemas + constants (API/web contract)
 │   └── db/               — Drizzle schema, migrations, models, seeding
 ├── data-pipeline/        — Kaikki extraction → enrichment → PostgreSQL sync
 └── documentation/        — Project docs (human + AI-context branches)
 ```
 **Key rule:** `packages/shared` is the single source of truth for all data shapes crossing the API boundary. Both frontend and backend import from it. If a schema changes, TypeScript compilation fails in both places simultaneously.
 ---
 ## Key Architecture Principles
 1. **Layered architecture** — Router → Controller → Service → Model → Database. Each layer only talks to the layer below it.
 2. **Server-side answer evaluation** — The correct answer is never sent to the frontend. All evaluation happens server-side.
 3. **Zod discriminated unions for WebSockets** — All WS messages are typed via Zod schemas in `packages/shared`. The router switches on the `type` field.
 4. **GameSessionStore abstraction** — Session state is stored through an interface (`InMemoryGameSessionStore` now, `ValkeyGameSessionStore` planned).
 5. **Language-neutral data model** — `terms` are concepts; `translations` are per-language words. Adding a language requires no schema changes.
 ---
 ## Key Decisions (Summary)
 | Topic       | Decision                    | Why                                           |
 | ----------- | --------------------------- | --------------------------------------------- |
 | ORM         | Drizzle, not Prisma         | No binary, no engine, closer to SQL           |
 | WebSocket   | `ws` library, not Socket.io | 2–4 players, explicit Zod protocol sufficient |
 | Auth        | Better Auth, not Keycloak   | Embedded middleware, no separate service      |
 | Answer eval | Server-side only            | Correct answer never sent to frontend         |
 | Data source | Kaikki, not OMW             | Sense-disambiguated translations              |
 ---
 ## Further Reading (AI-Context Files)
 | File                                                 | What it covers                                               |
 | ---------------------------------------------------- | ------------------------------------------------------------ |
 | [01-architecture.md](01-architecture.md)             | Monorepo structure, layered architecture, data flow diagrams |
 | [02-data-model.md](02-data-model.md)                 | Database schema, tables, relationships, constraints          |
 | [03-api-contract.md](03-api-contract.md)             | REST endpoints, request/response schemas, Zod types          |
 | [04-websocket-protocol.md](04-websocket-protocol.md) | WS message types, game flow, auth, state management          |
 | [05-data-pipeline.md](05-data-pipeline.md)           | Kaikki pipeline stages, enrich sub-stages, sync              |
 | [06-deployment.md](06-deployment.md)                 | Docker, Caddy, CI/CD, backups                                |
 | [prompts/meta.md](prompts/meta.md)                   | How to work with LLMs on this codebase                       |
 | [99-current-task.md](99-current-task.md)             | Template: fill this out before giving a task to an LLM       |
--- a/documentation/ai-context/01-architecture.md
+++ b/documentation/ai-context/01-architecture.md
@ -0,0 +1,156 @@
 # 01 — Architecture
 > **Purpose:** Give an LLM the structural context needed to navigate the codebase and understand data flow. Concatenate with 00-project-overview.md and 99-current-task.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md
 ---
 ## Monorepo Boundaries
 ```
 lila/
 ├── apps/
 │   ├── api/              — Express backend: routers, controllers, services, WS handlers
 │   └── web/              — React frontend: routes, components, hooks, client state
 ├── packages/
 │   ├── shared/           — Zod schemas, constants, derived types. THE CONTRACT.
 │   └── db/               — Drizzle schema, migrations, models (termModel, lobbyModel), seeding
 ├── data-pipeline/        — Kaikki extraction → enrichment → sync to PostgreSQL
 └── documentation/        — Human docs + ai-context/
 ```
 **Critical rule:** `apps/api` never imports `drizzle-orm` for queries. It only calls functions exported from `packages/db`. All database code lives in `packages/db`.
 ---
 ## Layered Architecture (HTTP)
 ```
 HTTP Request
     ↓
  Router        — maps URL + method to controller (Express Router)
     ↓
 Controller     — validates input (Zod safeParse), calls service, sends response
                  or next(error) for errorHandler middleware
     ↓
  Service       — business logic only. No HTTP, no direct DB access.
                  Calls model functions from packages/db.
     ↓
  Model         — database queries only. No business logic.
                  Lives in packages/db/src/models/
     ↓
  Database      — PostgreSQL via Drizzle ORM
 ```
 **Error flow:** Controller throws `ValidationError` (400) or `NotFoundError` (404) → caught by `errorHandler` middleware in `app.ts` → mapped to HTTP status. Unknown errors → 500.
 ---
 ## WebSocket Architecture
 The WS server attaches to the same Express HTTP server. Upgrades on `/ws` path.
 ```
 WS Connection Upgrade
     ↓
 Auth middleware — validates Better Auth session from cookie on upgrade
     ↓
 Message Router — dispatches by `type` field (Zod discriminated union)
     ↓
 Handler (lobby or game) — business logic, broadcasts state to room
     ↓
 In-memory stores (lobby game state, game session state)
 ```
 **Message protocol:** All WS messages validated against Zod schemas in `packages/shared/src/schemas/lobby.ts` and `packages/shared/src/schemas/game.ts`. Router switches on `type` field.
 **State storage:**
 - Lobby membership → PostgreSQL (`lobbies`, `lobby_players` tables) — durable
 - Game/room state → in-memory (`InMemoryLobbyGameStore`, `InMemoryGameSessionStore`) — ephemeral, lost on restart. Valkey migration planned.
 ---
 ## Data Flow: Singleplayer Quiz
 ```
 POST /api/v1/game/start (GameRequestSchema)
     ↓
 Controller validates → Service.createGameSession
     ↓
 termModel.getGameTerms(filters) + termModel.getDistractors(filters)
     ↓
 Service shuffles options, stores session in GameSessionStore
     ↓
 Returns GameSession { sessionId, questions[] }
     ↓
 [frontend] User selects option → confirms → POST /api/v1/game/answer
     ↓
 Service evaluates server-side (correct answer NEVER sent to frontend)
     ↓
 Returns AnswerResult { isCorrect, correctOptionId, selectedOptionId }
 ```
 **Key design:** Correct answer is stored server-side only (in GameSessionStore). Frontend only sees `optionId` (0–3) and `text`. Prevents cheating.
 ---
 ## Data Flow: Multiplayer Game
 ```
 Host creates lobby → POST /api/v1/lobbies → returns room code (e.g. WOLF-42)
     ↓
 Players join via code → POST /api/v1/lobbies/:code/join
     ↓
 All players WS connect → send lobby:join with room code
     ↓
 Server broadcasts lobby:state (player list) to all in room
     ↓
 Host clicks "Start" → WS lobby:start
     ↓
 MultiplayerGameService generates questions, broadcasts game:question
     ↓
 Players submit answers via WS game:answer within 15s server timer
     ↓
 On all-answered or timeout → evaluate, broadcast game:answer_result
     ↓
 After N rounds → broadcast game:finished with final scores
 ```
 ---
 ## GameSessionStore Abstraction
 ```typescript
 // packages/shared/src/schemas/game.ts (interface defined in apps/api)
 interface GameSessionStore {
  createSession(session: GameSession): Promise<void>;
  getSession(sessionId: string): Promise<GameSession | null>;
  // ...
 }
 ```
 **Current:** `InMemoryGameSessionStore` — Map-based, process memory, lost on restart.
 **Planned:** `ValkeyGameSessionStore` — Redis-compatible, persists across restarts.
 Same pattern for `LobbyGameStore`.
 ---
 ## Key Files by Concern
 | Concern         | Key Files                                                                              |
 | --------------- | -------------------------------------------------------------------------------------- |
 | HTTP routing    | `apps/api/src/routes/apiRouter.ts`, `gameRouter.ts`, `lobbyRouter.ts`                  |
 | Controllers     | `apps/api/src/controllers/gameController.ts`, `lobbyController.ts`                     |
 | Services        | `apps/api/src/services/gameService.ts`, `multiplayerGameService.ts`, `lobbyService.ts` |
 | Models          | `packages/db/src/models/termModel.ts`, `lobbyModel.ts`                                 |
 | WS handlers     | `apps/api/src/ws/handlers/gameHandlers.ts`, `lobbyHandlers.ts`                         |
 | WS router       | `apps/api/src/ws/router.ts`                                                            |
 | WS auth         | `apps/api/src/ws/auth.ts`                                                              |
 | Shared schemas  | `packages/shared/src/schemas/game.ts`, `lobby.ts`, `auth.ts`                           |
 | Constants       | `packages/shared/src/constants.ts`                                                     |
 | DB schema       | `packages/db/src/db/schema.ts`                                                         |
 | Auth config     | `apps/api/src/lib/auth.ts`                                                             |
 | Auth middleware | `apps/api/src/middleware/authMiddleware.ts`                                            |
--- a/documentation/ai-context/02-data-model.md
+++ b/documentation/ai-context/02-data-model.md
@ -0,0 +1,221 @@
 # 02 — Data Model
 > **Purpose:** Database schema reference for LLMs working on features that query or modify data. Concatenate with 00-project-overview.md and 99-current-task.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md
 ---
 ## Core Tables
 ### `terms` — Language-neutral concepts
 | Column       | Type      | Constraints                                  | Notes                                                  |
 | ------------ | --------- | -------------------------------------------- | ------------------------------------------------------ |
 | `id`         | uuid      | PK                                           |                                                        |
 | `pos`        | varchar   | CHECK: `noun`, `verb`, `adjective`, `adverb` | Part of speech                                         |
 | `source`     | varchar   |                                              | Pipeline that created this term (e.g. `kaikki`, `omw`) |
 | `source_id`  | varchar   | UNIQUE(`source`, `source_id`)                | Idempotency key for imports                            |
 | `synset_id`  | varchar   | nullable                                     | WordNet synset ID. Nullable for non-WordNet terms.     |
 | `created_at` | timestamp | default now()                                |                                                        |
 **Rule:** One row per concept. The word "cat" (animal) and "cat" (nautical) are separate rows because they have different `source_id` values.
 ---
 ### `translations` — Per-language words
 | Column          | Type       | Constraints                         | Notes                                    |
 | --------------- | ---------- | ----------------------------------- | ---------------------------------------- |
 | `id`            | uuid       | PK                                  |                                          |
 | `term_id`       | uuid       | FK → terms.id                       |                                          |
 | `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` |                                          |
 | `text`          | varchar    |                                     | The actual word                          |
 | `cefr_level`    | varchar(2) | nullable, CHECK: `A1`–`C2`          | Difficulty of THIS word in THIS language |
 | `created_at`    | timestamp  | default now()                       |                                          |
 **Unique constraint:** (`term_id`, `language_code`, `text`) — allows synonyms (e.g. "dog" and "hound" for same term), prevents exact duplicates.
 **Key design:** `cefr_level` is on `translations`, not `terms`. "House" in English is A1; "domicile" is also English but B2 — same concept, different words, different difficulty.
 ---
 ### `term_glosses` — Definitions per language
 | Column          | Type       | Constraints                         | Notes                  |
 | --------------- | ---------- | ----------------------------------- | ---------------------- |
 | `id`            | uuid       | PK                                  |                        |
 | `term_id`       | uuid       | FK → terms.id                       |                        |
 | `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` |                        |
 | `text`          | text       |                                     | Definition/explanation |
 | `created_at`    | timestamp  | default now()                       |                        |
 **Unique constraint:** (`term_id`, `language_code`) — one gloss per term per language. Prevents left joins from multiplying question rows.
 **Note:** Italian gloss coverage is sparse (~2% of terms have Italian glosses). UI falls back to English gloss when no gloss exists for the user's language.
 ---
 ### `decks` — Curated wordlists
 | Column                | Type         | Constraints                                       | Notes                                                   |
 | --------------------- | ------------ | ------------------------------------------------- | ------------------------------------------------------- |
 | `id`                  | uuid         | PK                                                |                                                         |
 | `name`                | varchar      |                                                   | e.g. `en-core-1000`                                     |
 | `source_language`     | varchar(2)   | CHECK                                             | Language the wordlist was built from                    |
 | `validated_languages` | varchar(2)[] | CHECK: source_language NOT IN validated_languages | Languages with complete translations for all deck terms |
 | `description`         | text         | nullable                                          |                                                         |
 | `created_at`          | timestamp    | default now()                                     |                                                         |
 **Design:** One deck per frequency tier per source language. POS, difficulty, and category are query filters, not separate decks. Decks must not overlap — each term appears in exactly one tier.
 **Source:** SUBTLEX frequency lists (per-language editions, same methodology).
 ---
 ### `deck_terms` — Junction table
 | Column       | Type      | Constraints   | Notes |
 | ------------ | --------- | ------------- | ----- |
 | `deck_id`    | uuid      | FK → decks.id |       |
 | `term_id`    | uuid      | FK → terms.id |       |
 | `created_at` | timestamp | default now() |       |
 **PK:** (`deck_id`, `term_id`)
 ---
 ## Auth Tables (managed by Better Auth)
 Better Auth creates and owns these tables. Do not modify directly.
 ### `user`
 | Column           | Type      | Notes                |
 | ---------------- | --------- | -------------------- |
 | `id`             | varchar   | PK                   |
 | `name`           | varchar   | Display name         |
 | `email`          | varchar   |                      |
 | `email_verified` | boolean   |                      |
 | `image`          | varchar   | nullable, avatar URL |
 | `created_at`     | timestamp |                      |
 | `updated_at`     | timestamp |                      |
 ### `session`
 | Column       | Type      | Notes         |
 | ------------ | --------- | ------------- |
 | `id`         | varchar   | PK            |
 | `user_id`    | varchar   | FK → user.id  |
 | `token`      | varchar   | Session token |
 | `expires_at` | timestamp |               |
 | `ip_address` | varchar   | nullable      |
 | `user_agent` | text      | nullable      |
 | `created_at` | timestamp |               |
 ### `account` — Social provider links
 | Column          | Type      | Notes                |
 | --------------- | --------- | -------------------- |
 | `id`            | varchar   | PK                   |
 | `user_id`       | varchar   | FK → user.id         |
 | `account_id`    | varchar   | Provider's user ID   |
 | `provider_id`   | varchar   | `google` or `github` |
 | `access_token`  | text      | nullable             |
 | `refresh_token` | text      | nullable             |
 | `id_token`      | text      | nullable             |
 | `expires_at`    | timestamp | nullable             |
 **Note:** One user can have multiple accounts (Google + GitHub linked to same user).
 ### `verification`
 Email verification tokens. Unused for social-only auth but managed by Better Auth.
 ---
 ## Lobby Tables (Multiplayer)
 ### `lobbies`
 | Column        | Type      | Constraints                                 | Notes                                        |
 | ------------- | --------- | ------------------------------------------- | -------------------------------------------- |
 | `id`          | uuid      | PK                                          |                                              |
 | `code`        | varchar   | UNIQUE                                      | Human-readable room code (e.g. `WOLF-42`)    |
 | `host_id`     | varchar   | FK → user.id                                |                                              |
 | `status`      | varchar   | CHECK: `waiting`, `in_progress`, `finished` |                                              |
 | `max_players` | integer   | default 4                                   |                                              |
 | `settings`    | jsonb     | nullable                                    | Game mode, round count, timer duration, etc. |
 | `created_at`  | timestamp | default now()                               |                                              |
 | `updated_at`  | timestamp | default now()                               | Used for stale recovery                      |
 ### `lobby_players`
 | Column         | Type      | Constraints     | Notes                        |
 | -------------- | --------- | --------------- | ---------------------------- |
 | `id`           | uuid      | PK              |                              |
 | `lobby_id`     | uuid      | FK → lobbies.id |                              |
 | `user_id`      | varchar   | FK → user.id    |                              |
 | `display_name` | varchar   |                 | Player's shown name in lobby |
 | `is_host`      | boolean   | default false   |                              |
 | `joined_at`    | timestamp | default now()   |                              |
 **Unique constraint:** (`lobby_id`, `user_id`) — one entry per player per lobby.
 ---
 ## Key Relationships
 ```
 terms (1) ←──→ (N) translations
 terms (1) ←──→ (N) term_glosses
 terms (N) ←──→ (N) decks via deck_terms
 user (1) ←──→ (N) sessions
 user (1) ←──→ (N) accounts
 user (1) ←──→ (N) lobbies (as host)
 user (1) ←──→ (N) lobby_players
 lobbies (1) ←──→ (N) lobby_players
 ```
 ---
 ## Query Patterns
 ### Get quiz terms (singleplayer)
 ```sql
 SELECT t.id, t.pos, src.text AS source_text, tgt.text AS target_text, g.text AS gloss
 FROM terms t
 JOIN translations src ON src.term_id = t.id AND src.language_code = ?
 JOIN translations tgt ON tgt.term_id = t.id AND tgt.language_code = ?
 LEFT JOIN term_glosses g ON g.term_id = t.id AND g.language_code = ?
 WHERE t.pos = ? AND tgt.cefr_level IN (?)
 LIMIT ?
 ```
 ### Get distractors
 ```sql
 SELECT text FROM translations
 WHERE language_code = ? AND pos = ? AND cefr_level IN (?)
 AND term_id != ? AND text != ?
 ORDER BY RANDOM()
 LIMIT 3
 ```
 **Note:** This is the N+1 query mentioned in BACKLOG.md. Each question fetches 3 distractors separately. Batching is planned.
 ---
 ## Deferred Schema Extensions (Not Yet Implemented)
 These tables are planned but do not exist yet. All are additive — they reference existing `terms` rows via FK.
 | Table                 | Purpose                                         | Trigger                 |
 | --------------------- | ----------------------------------------------- | ----------------------- |
 | `noun_forms`          | Gender, singular, plural, articles per language | Grammar quiz mode       |
 | `verb_forms`          | Conjugation tables per language                 | Grammar quiz mode       |
 | `term_pronunciations` | IPA + audio URLs per language                   | Pronunciation quiz mode |
 | `user_decks`          | Which decks a user studies                      | User customization      |
 | `user_term_progress`  | Spaced repetition state per user/term/language  | SRS review queue        |
 | `quiz_answers`        | Answer history for stats/analytics              | User stats dashboard    |
--- a/documentation/ai-context/03-api-contract.md
+++ b/documentation/ai-context/03-api-contract.md
@ -0,0 +1,367 @@
 # 03 — API Contract
 > **Purpose:** REST and WebSocket endpoint reference with exact Zod schemas. Concatenate with 00-project-overview.md and 99-current-task.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md, 02-data-model.md
 ---
 ## REST Endpoints
 ### Health
 ```
 GET /api/v1/health
 ```
 **Response:** `{ "status": "ok" }`
 **Auth:** None (public)
 ---
 ### Game — Start Session
 ```
 POST /api/v1/game/start
 ```
 **Request body** (GameRequestSchema):
 ```typescript
 {
  source_language: SupportedLanguageCode,  // "en" | "it" | "de" | "es" | "fr"
  target_language: SupportedLanguageCode,
  pos: SupportedPos,                        // "noun" | "verb" | "adjective" | "adverb"
  difficulty: DifficultyLevel,              // "easy" | "intermediate" | "hard"
  rounds: GameRounds                        // "3" | "10" (string enum, converted to number in service)
 }
 ```
 **Validation rules:**
 - `source_language` !== `target_language`
 - Both languages in `SUPPORTED_LANGUAGE_CODES`
 - `pos` in `SUPPORTED_POS`
 - `difficulty` in `DIFFICULTY_LEVELS`
 - `rounds` in `GAME_ROUNDS`
 **Response** (GameSessionSchema):
 ```typescript
 {
  sessionId: string,        // UUID
  questions: GameQuestion[]
 }
 ```
 **GameQuestionSchema:**
 ```typescript
 {
  questionId: string,       // UUID
  prompt: string,           // Word in source language
  gloss: string | null,     // Definition (falls back to English if target lang gloss missing)
  options: AnswerOption[]   // 4 items, shuffled
 }
 ```
 **AnswerOptionSchema:**
 ```typescript
 {
  optionId: number,         // 0–3
  text: string              // Translation in target language
 }
 ```
 **Note:** The correct answer is NOT included in the response. The frontend only sees `optionId` and `text`. The server stores `questionId → correctOptionId` mapping in the GameSessionStore.
 **Auth:** Required (session middleware)
 ---
 ### Game — Submit Answer
 ```
 POST /api/v1/game/answer
 ```
 **Request body** (AnswerSubmissionSchema):
 ```typescript
 {
  sessionId: string,        // UUID
  questionId: string,       // UUID
  selectedOptionId: number  // 0–3
 }
 ```
 **Response** (AnswerResultSchema):
 ```typescript
 {
  questionId: string,
  isCorrect: boolean,
  correctOptionId: number,   // 0–3
  selectedOptionId: number   // 0–3
 }
 ```
 **Error cases:**
 - Session not found → 404 NotFoundError
 - Question not in session → 404 NotFoundError
 - Invalid optionId → 400 ValidationError
 **Auth:** Required
 ---
 ### Lobby — Create
 ```
 POST /api/v1/lobbies
 ```
 **Request body:** None (host's auth session determines host_id)
 **Response:**
 ```typescript
 {
  id: string,           // UUID
  code: string,         // Human-readable room code (e.g. "WOLF-42")
  host_id: string,
  status: "waiting",
  max_players: number,
  settings: object | null,
  created_at: string
 }
 ```
 **Auth:** Required
 ---
 ### Lobby — Join
 ```
 POST /api/v1/lobbies/:code/join
 ```
 **Path param:** `code` — room code (e.g. "WOLF-42")
 **Response:** Same as create (the lobby object)
 **Error cases:**
 - Lobby not found → 404
 - Lobby full → 400
 - Already joined → 200 (idempotent)
 **Auth:** Required
 ---
 ### Auth
 ```
 ALL /api/auth/*     — Better Auth handlers (public)
 ```
 Better Auth mounts its own router at `/api/auth/*`. Handles:
 - `/api/auth/signin/social` — initiate social login
 - `/api/auth/callback/:provider` — OAuth callback
 - `/api/auth/signout` — clear session
 - `/api/auth/session` — get current session
 **Auth:** Mixed (some public, some require valid session)
 ---
 ## WebSocket Protocol
 All WS messages are JSON objects with a `type` field. The `type` is a discriminated union — the router validates the payload against the schema for that type.
 ### Connection
 1. Client opens WebSocket to `wss://api.lilastudy.com/ws`
 2. Server validates Better Auth session from cookie on upgrade
 3. Connection established
 ### Client → Server Messages
 #### `lobby:join`
 ```typescript
 {
  type: "lobby:join",
  payload: {
    code: string  // Room code (e.g. "WOLF-42")
  }
 }
 ```
 #### `lobby:leave`
 ```typescript
 {
  type: "lobby:leave",
  payload: {
    code: string
  }
 }
 ```
 #### `lobby:start`
 ```typescript
 {
  type: "lobby:start",
  payload: {
    code: string
  }
 }
 ```
 Only the host can send this. Triggers game start.
 #### `game:answer`
 ```typescript
 {
  type: "game:answer",
  payload: {
    code: string,
    questionId: string,
    optionId: number  // 0–3
  }
 }
 ```
 Must be sent within the 15-second server timer.
 ---
 ### Server → Client Messages
 #### `lobby:state`
 ```typescript
 {
  type: "lobby:state",
  payload: {
    code: string,
    players: {
      id: string,
      display_name: string,
      is_host: boolean
    }[],
    status: "waiting" | "in_progress" | "finished",
    settings: object | null
  }
 }
 ```
 Broadcast to all players in the lobby on any membership change.
 #### `game:question`
 ```typescript
 {
  type: "game:question",
  payload: {
    questionId: string,
    prompt: string,
    gloss: string | null,
    options: { optionId: number, text: string }[],
    timeLimit: number  // seconds (15)
  }
 }
 ```
 Broadcast when the game starts or a new round begins.
 #### `game:answer_result`
 ```typescript
 {
  type: "game:answer_result",
  payload: {
    questionId: string,
    results: {
      playerId: string,
      displayName: string,
      isCorrect: boolean,
      selectedOptionId: number,
      score: number
    }[]
  }
 }
 ```
 Broadcast after all players answer or timer expires.
 #### `game:finished`
 ```typescript
 {
  type: "game:finished",
  payload: {
    finalScores: {
      playerId: string,
      displayName: string,
      score: number
    }[],
    winner: {
      playerId: string,
      displayName: string
    } | null  // null for ties
  }
 }
 ```
 Broadcast after all rounds complete.
 ---
 ## Zod Schema Locations
 All schemas live in `packages/shared/src/schemas/`:
 | Schema                 | File       | Used by                                 |
 | ---------------------- | ---------- | --------------------------------------- |
 | GameRequestSchema      | `game.ts`  | API controller, frontend GameSetup      |
 | GameSessionSchema      | `game.ts`  | API service, frontend quiz flow         |
 | GameQuestionSchema     | `game.ts`  | API service, frontend QuestionCard      |
 | AnswerOptionSchema     | `game.ts`  | API service, frontend OptionButton      |
 | AnswerSubmissionSchema | `game.ts`  | API controller, frontend submit handler |
 | AnswerResultSchema     | `game.ts`  | API controller, frontend ScoreScreen    |
 | LobbyCreateSchema      | `lobby.ts` | API controller                          |
 | LobbyJoinSchema        | `lobby.ts` | API controller                          |
 | LobbyStateSchema       | `lobby.ts` | WS handler, frontend lobby UI           |
 | WebSocketMessageSchema | `lobby.ts` | WS router (discriminated union)         |
 **Rule:** Never duplicate these schemas. Import from `packages/shared` in both API and frontend.
 ---
 ## Error Responses
 All errors follow this shape:
 ```typescript
 {
  error: string,      // Human-readable message
  statusCode: number  // HTTP status
 }
 ```
 **Common status codes:**
 - 400 — ValidationError (bad input, schema mismatch)
 - 401 — Unauthorized (no valid session)
 - 404 — NotFoundError (session, question, or lobby not found)
 - 500 — Unknown error (logged, generic message to client)
--- a/documentation/ai-context/04-websocket-protocol.md
+++ b/documentation/ai-context/04-websocket-protocol.md
@ -0,0 +1,237 @@
 # 04 — WebSocket Protocol
 > **Purpose:** Deep dive into WebSocket lifecycle, state management, and edge cases for LLMs working on multiplayer features. Concatenate with 00-project-overview.md and 99-current-task.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md, 03-api-contract.md
 ---
 ## Connection Lifecycle
 ### 1. Upgrade
 ```
 Client: GET wss://api.lilastudy.com/ws
        Headers: Cookie: better-auth.session=...
 Server: Validates session via Better Auth (reads cookie, looks up in DB)
        → Valid: 101 Switching Protocols, connection established
        → Invalid: 401 Unauthorized, connection rejected
 ```
 **Auth is mandatory.** No anonymous WebSocket connections. Guest play (if implemented) would need a different auth strategy here.
 ### 2. Message Routing
 After connection, all messages flow through:
 ```
 Raw JSON message
     ↓
 Zod safeParse against WebSocketMessageSchema (discriminated union on `type`)
     ↓
 Router switches on `type` → dispatches to handler
     ↓
 Handler executes business logic → broadcasts to room
 ```
 **Invalid messages:** Parse failures are logged and silently dropped. The client receives no error response — this is intentional to prevent error spam from malformed clients.
 ### 3. Disconnect
 When a client disconnects (browser close, network loss, page navigate):
 ```
 Connection close event
     ↓
 Handler removes player from lobby (if in one)
     ↓
 Broadcasts updated lobby:state to remaining players
     ↓
 If game in progress and player disconnects:
     → Player is marked as "disconnected" (not removed from game state)
     → Their answer slot is treated as "no answer" (timeout)
     → Game continues
 ```
 **No automatic reconnect.** The client must manually reconnect and re-join the lobby. Graceful reconnect with state restoration is planned (BACKLOG.md `next`).
 ---
 ## State Management
 ### Two-Tier Storage
 | State Type       | Storage                                                          | Durability | Use Case                                          |
 | ---------------- | ---------------------------------------------------------------- | ---------- | ------------------------------------------------- |
 | Lobby membership | PostgreSQL (`lobbies`, `lobby_players`)                          | Durable    | Who is in which room, who is host                 |
 | Game state       | In-memory (`InMemoryLobbyGameStore`, `InMemoryGameSessionStore`) | Ephemeral  | Current question, scores, timer, answers received |
 **Why the split?** Lobby membership must survive server restarts (players shouldn't be kicked on deploy). Game state is ephemeral by design — a game lasts minutes, and losing state on restart is acceptable for MVP.
 ### In-Memory Store Structure
 ```typescript
 // Conceptual — actual implementation in apps/api/src/gameSessionStore/
 interface InMemoryGameState {
  [lobbyCode: string]: {
    status: "waiting" | "question" | "result" | "finished";
    currentRound: number;
    totalRounds: number;
    currentQuestion: GameQuestion | null;
    answers: Map<playerId, { optionId: number; timestamp: number }>;
    scores: Map<playerId, number>;
    timer: NodeJS.Timeout | null; // 15s server timer
    questionStartTime: number; // For speed-based tiebreaking
  };
 }
 ```
 ---
 ## The 15-Second Timer
 ### Implementation
 ```
 Host sends lobby:start
     ↓
 Server generates questions, stores in game state
     ↓
 Broadcast game:question to all players
     ↓
 START 15-second timer (NodeJS setTimeout)
     ↓
 Player answers collected in Map<playerId, answer>
     ↓
 Timer expires OR all players answered
     ↓
 STOP timer, evaluate answers, broadcast game:answer_result
     ↓
 If more rounds: wait 3s → broadcast next game:question → restart timer
     ↓
 If last round: broadcast game:finished
 ```
 ### Timer Edge Cases
 | Scenario                   | Behavior                                                                  |
 | -------------------------- | ------------------------------------------------------------------------- |
 | Player answers at 14.9s    | Valid, collected before timer expiry                                      |
 | Player answers at 15.1s    | Rejected, treated as timeout. Timer already fired.                        |
 | All players answer early   | Timer is cleared early, round proceeds immediately                        |
 | No one answers             | All players get 0 points for that round, next round starts                |
 | Host disconnects mid-game  | Game continues, any player can see results. No "host transfer" logic yet. |
 | Non-host sends lobby:start | Silently ignored (or rejected — check implementation)                     |
 ---
 ## Message Broadcasting
 ### Room-Based Broadcasting
 The server maintains a mapping of `lobbyCode → Set<WebSocket connections>`. When a message needs to broadcast:
 ```typescript
 // Pseudo-code from ws/connections.ts
 function broadcastToRoom(code: string, message: WebSocketMessage) {
  const connections = roomConnections.get(code);
  for (const ws of connections) {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify(message));
    }
  }
 }
 ```
 **Self-broadcast:** The sender receives their own broadcast. The frontend must handle this (e.g., ignore their own lobby:state if they already updated optimistically).
 ### Message Ordering
 WebSocket guarantees in-order delivery per connection. However, race conditions can occur:
 - Player A sends `game:answer` at 14.5s
 - Player B's connection lags, receives `game:answer_result` before their own `game:answer` ack
 - **Frontend must handle out-of-order messages gracefully**
 ---
 ## Edge Cases & Failure Modes
 ### Mid-Game Disconnect
 ```
 Player disconnects during question phase
     ↓
 Connection close handler triggered
     ↓
 Player NOT removed from game state (they might reconnect)
     ↓
 Timer continues
     ↓
 On timer expiry: player has no answer → treated as wrong
     ↓
 Result broadcast includes "disconnected" status for that player
 ```
 **Current gap:** No reconnect-with-state-restoration. Player must re-join lobby and game state is not recovered. Planned in BACKLOG.md `next`.
 ### Double Join
 ```
 Player joins lobby ABC
     ↓
 Player joins lobby ABC again (accidental double-click, retry)
     ↓
 Server: idempotent — player already in lobby, return 200
     ↓
 No duplicate entries in lobby_players table
 ```
 ### Rapid Start/Stop
 ```
 Host clicks "Start" twice rapidly
     ↓
 First click: game starts, state changes to "in_progress"
     ↓
 Second click: server checks state, sees "in_progress", ignores
 ```
 ### Client-Side Message Loss
 If a client's `game:answer` never reaches the server (network blip):
 - Server never receives the answer
 - Timer expires
 - Player gets 0 points for that round
 - **No retry mechanism** — client sends once, no ack expected
 ---
 ## Planned Improvements (Not Yet Implemented)
 From BACKLOG.md `next`:
 1. **Graceful WS reconnect** — Exponential back-off, restore game state on reconnection if game still in progress
 2. **Heartbeat/ping** — Detect stale connections faster than TCP timeout
 3. **Valkey for game state** — Replace in-memory store with Redis-compatible storage for horizontal scaling and persistence across restarts
 4. **Configurable game settings** — Host sets round count, timer duration, target score via lobby settings jsonb column
 5. **Additional game modes** — TV Quiz Show, Race to the Top, Chain Link, Elimination Round, Cooperative Challenge (see design/GAME_MODES.md)
 ---
 ## Key Files
 | File                                              | Purpose                                       |
 | ------------------------------------------------- | --------------------------------------------- |
 | `apps/api/src/ws/index.ts`                        | WebSocket server setup, attach to HTTP server |
 | `apps/api/src/ws/auth.ts`                         | Session validation on upgrade                 |
 | `apps/api/src/ws/router.ts`                       | Message routing by `type`                     |
 | `apps/api/src/ws/connections.ts`                  | Connection management, room mapping           |
 | `apps/api/src/ws/handlers/lobbyHandlers.ts`       | lobby:join, lobby:leave, lobby:start          |
 | `apps/api/src/ws/handlers/gameHandlers.ts`        | game:answer                                   |
 | `apps/api/src/services/multiplayerGameService.ts` | Game logic, timer, scoring                    |
 | `apps/api/src/lobbyGameStore/`                    | In-memory lobby state storage                 |
 | `packages/shared/src/schemas/lobby.ts`            | WS message Zod schemas                        |
 | `packages/shared/src/schemas/game.ts`             | Game state Zod schemas                        |
--- a/documentation/ai-context/05-data-pipeline.md
+++ b/documentation/ai-context/05-data-pipeline.md
@ -0,0 +1,173 @@
 # 05 — Data Pipeline
 > **Purpose:** Condensed reference for LLMs working on the Kaikki data pipeline. Covers stages, data flow, and current blockers. For full operational details (llama.cpp setup, provider configs, hardware specs), see the human-readable DATA_PIPELINE.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md
 ---
 ## Pipeline Overview
 ```
 Kaikki JSONL (Wiktionary extracts)
     ↓
 Stage 1: Extract → Parse into pipeline.db (SQLite)
     ↓
 Stage 2: Reverse Link → Insert missing reverse translations
     ↓
 Stage 3: Enrich → LLMs review glosses, examples, translations, assign CEFR
     ↓
 Stage 4: Merge → Resolve LLM votes into final values
     ↓
 Stage 4b: Tiebreak → Run unused models on flagged entries
     ↓
 Stage 5: Compare / QA → Generate COVERAGE.md quality report
     ↓
 Stage 6: Sync → Upsert resolved records into production PostgreSQL
 ```
 **Current state:** Stage 1 and 2 complete on sample data. Stage 3 enrich script being rewritten for sub-stage architecture. Stages 4–6 not started.
 ---
 ## Stage 1: Extract
 **Input:** `data-pipeline/stage-1-extract/sources/*.jsonl` (Kaikki files, not in git)
 **Output:** `pipeline.db` — `vocabulary_entries` and `entry_translations` tables
 **What it does:**
 - Parses Kaikki JSONL for all 5 languages (en, de, es, fr, it)
 - Filters to 4 POS: noun, verb, adjective, adverb
 - Each Kaikki sense becomes one `vocabulary_entries` row
 - Translations stored in `entry_translations` with sense hints
 **Key design:** Kaikki is structured per word sense. Each headword has multiple senses, and translations are linked to a specific sense. This prevents the sense-disambiguation problems of OpenWordNet/OMW.
 ---
 ## Stage 2: Reverse Link Sync
 **Pure script, no LLMs.**
 For each translation pair (e.g., English "thrill" → German "begeistern"), checks if the reverse exists (German "begeistern" → English "thrill"). If the German entry exists but lacks the English back-link, inserts it automatically.
 **Why:** Ensures LLMs in Stage 3 only generate translations that are genuinely missing — not translations findable by simple reverse lookup.
 ---
 ## Stage 3: Enrich (In Progress — Being Rewritten)
 **Current blocker:** The original single-prompt design had problems (skipped invalid translations, triggered reasoning mode, 20% manual review). Being rewritten as four ordered sub-stages.
 ### Sub-Stage Architecture
 Each model processes every entry through four sub-stages in order:
 1. **`round1_gloss`** — Review existing gloss. Confirm if clear, generate better one if not.
 2. **`round1_example`** — Review examples. Confirm if natural, generate one better sentence.
 3. **`round1_translations`** — Validate translations with verified gloss as context. Confirm valid, reject invalid, generate missing.
 4. **`round1_cefr`** — Assign CEFR level (A1–C2) to headword and each confirmed translation.
 **Why this order:** CEFR sub-stage only sees clean, verified data. Bad translations are rejected before reaching CEFR assignment.
 **Voter strategy:** Multiple models vote independently. Each model = one vote per sub-stage. Current plan:
 - Primary: Local Qwen3.5-9B (overnight runs, unlimited)
 - Secondary: Groq Llama 3.3 70B (cloud, batched)
 - Tertiary: Gemini AI Studio (cloud, batched)
 **Context enrichment:** Before calling models for gloss/example, pipeline queries Wiktionary API for the headword. Full entry (all senses, usage notes) added to prompt. Fixes category header glosses and short ambiguous glosses.
 ---
 ## Stage 4: Merge
 Resolves LLM votes into final values per entry.
 **Rules:**
 - Kaikki source data wins automatically (never overridden)
 - CEFR: level with most votes wins
 - Text fields (gloss, example, translation): candidate with most votes wins
 - No majority → flag for tiebreaker
 **Difficulty mapping:**
 | CEFR | Difficulty |
 |------|-----------|
 | A1, A2 | easy |
 | B1, B2 | intermediate |
 | C1, C2 | hard |
 ---
 ## Stage 4b: Tiebreak
 Runs automatically after merge if flagged entries remain. Queries unused models (not yet voted) and re-runs merge. Repeats until resolved or no unused models remain.
 **If still unresolved:** Sync is blocked. Add more models to config and re-run.
 ---
 ## Stage 5: Compare / QA
 Read-only. Generates `COVERAGE.md` with per-language breakdown:
 - Total entries, POS distribution
 - Translation coverage per language pair
 - CEFR coverage and difficulty breakdown
 - Gloss/example coverage by source (Kaikki vs LLM)
 - Per-model contribution stats
 Run this before syncing to production.
 ---
 ## Stage 6: Sync
 Upserts all `status = "final"` entries from `pipeline.db` to production PostgreSQL.
 **Behavior:**
 - Missing → insert
 - Present but changed → update
 - Present and unchanged → skip
 **Idempotent.** Safe to re-run.
 ---
 ## Key Constraints
 | Constant   | Values                                |
 | ---------- | ------------------------------------- |
 | Languages  | `en`, `it`, `de`, `es`, `fr`          |
 | POS        | `noun`, `verb`, `adjective`, `adverb` |
 | CEFR       | `A1`, `A2`, `B1`, `B2`, `C1`, `C2`    |
 | Difficulty | `easy`, `intermediate`, `hard`        |
 Adding a new value requires updating `packages/shared/src/constants.ts` AND a database migration before re-running the pipeline.
 ---
 ## Current Blockers
 1. **Enrich sub-stage rewrite** — Stage 3 script needs redesign and testing
 2. **Cloud provider integration** — Groq and Gemini not yet wired into pipeline
 3. **Batching prompt design** — 5–10 entries per API call for efficiency; not yet designed
 4. **Full dataset scale unknown** — Currently running on 500-entry samples. Full Kaikki English file has ~1.3M entries. Exact filtered count and runtime estimate not yet known.
 ---
 ## Key Files
 | File                                                         | Purpose                                                   |
 | ------------------------------------------------------------ | --------------------------------------------------------- |
 | `data-pipeline/pipeline.ts`                                  | Orchestrator — runs stages in order, handles resumability |
 | `data-pipeline/stage-1-extract/scripts/extract.ts`           | Parse Kaikki JSONL                                        |
 | `data-pipeline/stage-2-reverse-link/scripts/reverse-link.ts` | Insert reverse translations                               |
 | `data-pipeline/stage-3-enrich/scripts/enrich.ts`             | LLM enrichment (being rewritten)                          |
 | `data-pipeline/stage-3-enrich/config.ts`                     | Provider configs (local, OpenRouter, etc.)                |
 | `data-pipeline/db/schema.sql`                                | pipeline.db schema                                        |
 | `data-pipeline/db/import.ts`                                 | Import stage 1 output into pipeline.db                    |
 | `packages/shared/src/constants.ts`                           | Language codes, POS, CEFR, difficulty constants           |
--- a/documentation/ai-context/06-deployment.md
+++ b/documentation/ai-context/06-deployment.md
@ -0,0 +1,144 @@
 # 06 — Deployment
 > **Purpose:** Condensed infrastructure reference for LLMs working on deployment, CI/CD, or ops tasks. For full setup details (VPS provisioning, Forgejo configuration, backup scripts), see the human-readable DEPLOYMENT.md.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md
 ---
 ## Infrastructure Overview
 ```
 Internet
     ↓
 Caddy (Docker container, ports 80/443)
     ├── lilastudy.com       → web container (nginx:alpine, static files)
     ├── api.lilastudy.com   → api container (Express, port 3000)
     └── git.lilastudy.com   → forgejo container (git + registry, port 3000)
 SSH (port 2222) → forgejo container (git push/pull)
 ```
 **VPS:** Hetzner, Debian 13, ARM64 (aarch64), 4GB RAM  
 **Domain:** lilastudy.com, wildcard `*.lilastudy.com` configured  
 **Only Caddy faces the internet.** All other services communicate over internal Docker network.
 ---
 ## Docker Compose Stack
 Services on shared `lila-network`:
 | Service  | Image                                            | Ports (internal) | Notes                                              |
 | -------- | ------------------------------------------------ | ---------------- | -------------------------------------------------- |
 | caddy    | caddy:alpine                                     | 80, 443          | Only container with published ports                |
 | api      | `git.lilastudy.com/forgejo-lila/lila-api:latest` | 3000             | Multi-stage Dockerfile, runs migrations on startup |
 | web      | `git.lilastudy.com/forgejo-lila/lila-web:latest` | 80               | nginx:alpine, SPA fallback via try_files           |
 | database | postgres:16                                      | 5432             | Named volume `lila-db` for persistence             |
 | forgejo  | forgejo:...                                      | 3000, 2222       | Git + container registry, SSH on 2222              |
 **No ports exposed on internal services.** Only Caddy (80/443) and Forgejo SSH (2222) are public.
 ---
 ## Build & Deploy Flow
 ```
 Dev laptop: git push to main
     ↓
 Forgejo Actions triggers (runner on VPS)
     ↓
 Build API image (target: runner)
 Build Web image (target: production, VITE_API_URL baked in)
     ↓
 Push both to git.lilastudy.com registry
     ↓
 SSH into VPS, docker compose pull, restart containers
     ↓
 API container runs migrations on startup (migrate.js before server.js)
     ↓
 App updated (~2–5 min total)
 ```
 **Cross-compilation:** Images built natively on ARM64 VPS (no QEMU). Dev laptop used for initial pushes before CI/CD was set up.
 ---
 ## Environment-Driven Config
 Same code runs in dev and production. Environment variables control behavior:
 | Variable          | Dev                                  | Production                                        |
 | ----------------- | ------------------------------------ | ------------------------------------------------- |
 | `DATABASE_URL`    | `postgres://...@localhost:5432/lila` | `postgres://...@database:5432/lila`               |
 | `BETTER_AUTH_URL` | `http://localhost:3000`              | `https://api.lilastudy.com`                       |
 | `CORS_ORIGIN`     | `http://localhost:5173`              | `https://lilastudy.com`                           |
 | `COOKIE_DOMAIN`   | undefined                            | `.lilastudy.com`                                  |
 | `VITE_API_URL`    | `http://localhost:3000`              | `https://api.lilastudy.com` (baked at build time) |
 **Note:** `VITE_API_URL` is baked into the frontend at Docker build time via `--build-arg`. It cannot be changed at runtime.
 ---
 ## Database
 ### Migrations
 Drizzle migrations run automatically on API container startup. The Dockerfile entrypoint:
 ```dockerfile
 CMD ["node", "dist/src/migrate.js", "&&", "node", "dist/src/server.js"]
 ```
 **Deploy order enforced automatically:** migrations before server starts.
 ### Backups
 - Daily cron job at 3:00 AM: `pg_dump` → compressed SQL → `~/backups/`
 - 7-day retention on VPS
 - Dev laptop auto-syncs new backups on login via `rsync`
 - **Offsite storage:** Planned (Hetzner Object Storage or S3-compatible)
 ### Seeding
 Idempotent (`onConflictDoNothing`). Safe to re-run for adding new languages without affecting existing data or user tables.
 ---
 ## Auth & OAuth
 **Better Auth** embedded in Express API. No separate auth service.
 **Social providers:**
 - Google OAuth — consent screen in testing mode (100 user cap). Must publish before reaching 80 users.
 - GitHub OAuth — configured for both dev and production redirect URIs
 **Cross-subdomain cookies:** `COOKIE_DOMAIN=.lilastudy.com` (leading dot) makes auth cookie valid across all subdomains.
 ---
 ## Known Issues & Limitations
 | Issue                             | Impact                                                                             | Status                       |
 | --------------------------------- | ---------------------------------------------------------------------------------- | ---------------------------- |
 | lila-web has no healthcheck       | Vite dev server has no health endpoint; `depends_on` uses API healthcheck as proxy | Acceptable for dev           |
 | Valkey memory overcommit warning  | Harmless in dev. Fix before production: `vm.overcommit_memory = 1`                 | Documented                   |
 | No centralized monitoring/logging | No uptime alerts or log aggregation on VPS                                         | Planned (BACKLOG.md)         |
 | Backups only on VPS + dev laptop  | No offsite protection against VPS failure                                          | Planned (BACKLOG.md)         |
 | Google OAuth in testing mode      | 100 user cap                                                                       | Must publish before 80 users |
 ---
 ## Key Files
 | File                            | Purpose                                                |
 | ------------------------------- | ------------------------------------------------------ |
 | `docker-compose.yml` (root)     | Local dev stack                                        |
 | `docker-compose.yml` (VPS)      | Production stack                                       |
 | `apps/api/Dockerfile`           | Multi-stage: deps → dev → builder → runner             |
 | `apps/web/Dockerfile`           | Multi-stage: deps → dev → builder → production (nginx) |
 | `apps/web/nginx.conf`           | SPA fallback routing                                   |
 | `Caddyfile`                     | Reverse proxy routing, automatic HTTPS                 |
 | `.forgejo/workflows/deploy.yml` | CI/CD pipeline                                         |
 | `apps/api/src/migrate.ts`       | Drizzle migration runner                               |
--- a/documentation/ai-context/99-current-task.md
+++ b/documentation/ai-context/99-current-task.md
@ -0,0 +1,102 @@
 # 99 — Current Task
 > **Purpose:** Fill out this template before giving a task to an LLM. Concatenate with 00-project-overview.md and relevant domain files (01–06). After the task is complete, ask the LLM to review this checklist and suggest doc updates.
 > **Last updated:** 2026-05-15
 > **Depends on:** 00-project-overview.md, prompts/meta.md
 ---
 ## Task Description
 **What I'm building / fixing / refactoring:**
 [Describe the task in 1–2 sentences. Be specific.]
 Example: "Implement guest play flow so users can try a 3-round quiz without creating an account."
 ---
 ## Context
 **Which parts of the codebase does this touch?**
 - [ ] Frontend (`apps/web/`)
 - [ ] Backend API (`apps/api/`)
 - [ ] Database schema (`packages/db/`)
 - [ ] Shared schemas (`packages/shared/`)
 - [ ] WebSocket protocol (`apps/api/src/ws/`)
 - [ ] Data pipeline (`data-pipeline/`)
 - [ ] Infrastructure / deployment (`docker-compose.yml`, Caddyfile, etc.)
 - [ ] Documentation
 **Relevant files I already know about:**
 [List files you've identified. The LLM may ask for additional ones.]
 Example:
 - `apps/api/src/controllers/gameController.ts` — needs guest variant
 - `apps/api/src/middleware/authMiddleware.ts` — needs optional auth path
 - `packages/shared/src/schemas/game.ts` — needs GuestGameRequestSchema
 ---
 ## Constraints & Requirements
 **Must have:**
 - [ ]
 - [ ]
 **Nice to have:**
 - [ ]
 - [ ]
 **Must NOT break:**
 - [ ] Existing auth flow (logged-in users still work normally)
 - [ ] WebSocket protocol (if applicable)
 - [ ] Database schema (additive changes only unless migration planned)
 - [ ] Zod schemas in `packages/shared` (no silent drift)
 **Known blockers or open questions:**
 - [ ]
 ---
 ## Definition of Done
 - [ ] Code implemented and tested
 - [ ] No TypeScript errors (`pnpm typecheck` passes)
 - [ ] Tests pass (`pnpm test`)
 - [ ] Manual verification in dev environment
 - [ ] Commit message follows convention (see prompts/meta.md)
 - [ ] Feature branch merged to main
 ---
 ## Post-Work Checklist
 After the task is complete, ask the LLM:
 > "Review the post-work checklist in prompts/meta.md. Which documentation files need updates based on what we just changed?"
 The LLM should check:
 | File | Check if... |
 |------|-------------|
 | `documentation/STATUS.md` | Task changes what's working or what's blocked |
 | `documentation/BACKLOG.md` | Task completes a backlog item or creates a new one |
 | `documentation/DECISIONS.md` | Task involved choosing between alternatives with long-term consequences |
 | `documentation/ARCHITECTURE.md` | Task changes monorepo structure, data flow, or layer boundaries |
 | `documentation/ai-context/*.md` | Task changes schemas, endpoints, protocol, or pipeline stages |
 | `packages/shared/src/schemas/*.ts` | Task changes request/response shapes or WS message types |
 | `README.md` | Task changes quickstart steps, stack, or current status |
 **Expected output format:**
 ```
 - FILE: [filename] — REASON: [what changed and why the doc needs updating]
 ```
 ---
 ## Notes
 [Any additional context, links, or scratch notes for this specific task.]
--- a/documentation/ai-context/WORKFLOW.md
+++ b/documentation/ai-context/WORKFLOW.md
@ -0,0 +1,260 @@
 # Workflow — Working with LLMs on Lila
 > **Purpose:** The process for using AI assistants effectively on this codebase. Covers task scoping, context selection, conversation management, verification, and doc updates. Complements `prompts/meta.md` (which covers prompt templates and methodology).
 > **Last updated:** 2026-05-15
 ---
 ## Before Starting a Task
 ### 1. Define the Task
 Write a clear, specific description in 1–2 sentences. Avoid vague goals.
 **Bad:** "Fix multiplayer"  
 **Good:** "Handle the case where a player disconnects mid-game and reconnects within 10 seconds — restore their game state without restarting the round."
 ### 2. Fill Out `99-current-task.md`
 Copy `documentation/ai-context/99-current-task.md` and fill in:
 - What you're building/fixing
 - Which parts of the codebase it touches (check the boxes)
 - Known files already involved
 - Constraints (must haves, nice to haves, must not break)
 - Definition of done
 This forces you to scope the task before involving the LLM.
 ### 3. Select Context Files
 Don't feed all AI-context files for every task. Pick the minimum the LLM needs.
 | Task Type                                               | Feed These Files                                                                                                   |
 | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
 | Frontend-only (UI component, route, styling)            | `00-project-overview.md` + `03-api-contract.md` + `99-current-task.md`                                             |
 | Backend-only (new endpoint, service logic, model query) | `00-project-overview.md` + `01-architecture.md` + `02-data-model.md` + `99-current-task.md`                        |
 | Full-stack feature (touches API + frontend + schema)    | `00-project-overview.md` + `01-architecture.md` + `02-data-model.md` + `03-api-contract.md` + `99-current-task.md` |
 | WebSocket / multiplayer                                 | `00-project-overview.md` + `04-websocket-protocol.md` + `99-current-task.md`                                       |
 | Data pipeline (Kaikki, enrichment, sync)                | `00-project-overview.md` + `05-data-pipeline.md` + `99-current-task.md`                                            |
 | Deployment / infrastructure                             | `00-project-overview.md` + `06-deployment.md` + `99-current-task.md`                                               |
 | Auth / security                                         | `00-project-overview.md` + `03-api-contract.md` + `99-current-task.md`                                             |
 | Cross-cutting refactor                                  | `00-project-overview.md` + `01-architecture.md` + `02-data-model.md` + `03-api-contract.md` + `99-current-task.md` |
 **Always include:** `00-project-overview.md` (ground truth) and `99-current-task.md` (task scope).  
 **Never include:** `prompts/meta.md` (that's for you, not the LLM).
 ### 4. Check for Decision Conflicts
 If the task touches any area from the Decision Index in `00-project-overview.md`, also feed the relevant section from `documentation/DECISIONS.md`.
 Examples:
 - Changing ORM or query patterns → feed `DECISIONS.md` → ORM section
 - Adding a new WebSocket library → feed `DECISIONS.md` → WebSocket section
 - Changing auth provider → feed `DECISIONS.md` → Auth section
 ---
 ## During the Task
 ### 5. Feed Context in Order
 ```
 1. 00-project-overview.md
 2. Relevant domain file(s) (01–06)
 3. 99-current-task.md (filled out)
 4. [Optional] Relevant DECISIONS.md section
 5. [Optional] Specific code files the LLM asks for
 ```
 **Why this order:** The LLM sees the big picture first, then the domain details, then the specific task. This reduces hallucination.
 ### 6. Start with the Base Prompt
 Use the template from `prompts/meta.md`:
 ```
 I'm working on Lila, a vocabulary learning app. Here's the project context:
 [PASTE: selected context files]
 My current task: [from 99-current-task.md]
 Please follow these rules:
 [1–8 from prompts/meta.md Base Prompt Template]
 ```
 ### 7. Work File-by-File, Section-by-Section
 The LLM will suggest files to modify. Go through them one at a time:
 1. **LLM explains the change** — concept first, code second
 2. **You review** — does this make sense? Does it violate any constraints from 99-current-task.md?
 3. **LLM shows the code** — section by section, not the whole file at once
 4. **You apply** — copy-paste into your editor, don't let the LLM write files directly
 5. **You verify** — TypeScript compiles, tests pass, manual check
 **Rule:** Never let the LLM modify more than one file before you review it.
 ### 8. Verify LLM Assumptions
 LLMs hallucinate file paths, schema shapes, and API endpoints. Periodically ask:
 > "Which files from the context did you actually look at?"  
 > "What schema from packages/shared are you using here?"  
 > "Show me the exact Zod schema for this request body."
 If the LLM's answer doesn't match the context files, correct it immediately. Wrong assumptions compound.
 ### 9. When to Start a New Conversation
 Start a fresh chat when:
 | Scenario                       | Why                                                                                          |
 | ------------------------------ | -------------------------------------------------------------------------------------------- |
 | Conversation exceeds ~25 turns | LLM coherence degrades; starts contradicting earlier context                                 |
 | Task pivots significantly      | "We were fixing a bug, now we're redesigning the feature" — fresh context prevents confusion |
 | LLM seems confused             | Repeating questions, forgetting constraints, suggesting things already ruled out             |
 | You took a break > 2 hours     | Context window state is opaque; safer to restart                                             |
 | Multiple failed attempts       | The LLM is stuck in a bad pattern; reset gives it a clean slate                              |
 **How to restart:** Paste the same context files + updated 99-current-task.md (mark what's already done). Summarize progress in 2–3 sentences.
 ### 10. Handle Multi-File Changes
 For tasks touching 3+ files, establish a sequence:
 ```
 1. Shared schemas (packages/shared) — foundation everything else depends on
 2. Database models (packages/db) — if schema or queries change
 3. Backend service + controller (apps/api) — business logic
 4. Backend tests (apps/api) — verify the service
 5. Frontend API client + types (apps/web) — consume the new contract
 6. Frontend components (apps/web) — UI changes
 7. Frontend tests (apps/web) — verify the UI
 8. Integration / e2e tests — full flow
 ```
 **Exception:** If the task is frontend-only, skip steps 2–4. If backend-only, skip 5–7.
 Tell the LLM: "We'll go in this order. Start with [file]."
 ---
 ## After the Task
 ### 11. Final Verification
 Before declaring done:
 - [ ] `pnpm typecheck` passes (no TypeScript errors)
 - [ ] `pnpm test` passes (all tests green)
 - [ ] `pnpm lint` passes (no ESLint errors)
 - [ ] Manual verification in dev environment
 - [ ] No console errors in browser
 - [ ] No server errors in API logs
 ### 12. Ask for Doc Updates
 Prompt the LLM:
 > "Review the post-work checklist in prompts/meta.md. Which documentation files need updates based on what we just changed?"
 Expected output format:
 ```
 - FILE: documentation/STATUS.md — REASON: Guest play flow is now live
 - FILE: documentation/ai-context/03-api-contract.md — REASON: New endpoint added
 - FILE: packages/shared/src/schemas/game.ts — REASON: New schema added
 ```
 ### 13. Update Docs Yourself
 The LLM suggests; you apply. Docs are your responsibility, not the LLM's.
 Priority order:
 1. `STATUS.md` — if "what works today" changed
 2. `BACKLOG.md` — if a task was completed or discovered
 3. `packages/shared/src/schemas/*.ts` — if request/response shapes changed
 4. `ai-context/*.md` — if architecture, API, or protocol changed
 5. `DECISIONS.md` — if you made a new architectural choice
 6. `README.md` — if quickstart or stack changed
 ### 14. Generate Ticket File (If Significant)
 For completed tasks, create a ticket in `documentation/tickets/`:
 | Prefix   | Use when...                                          | Example                               |
 | -------- | ---------------------------------------------------- | ------------------------------------- |
 | `adr-`   | Decision between options with long-term consequences | `adr-websocket-reconnect-strategy.md` |
 | `feat-`  | New feature shipped                                  | `feat-guest-play.md`                  |
 | `fix-`   | Bug fixed                                            | `fix-race-condition-lobby-join.md`    |
 | `chore-` | Routine maintenance, refactoring, tooling            | `chore-batch-distractor-queries.md`   |
 **Ticket contents:**
 - What was done (summary)
 - Why it was needed (context)
 - What files changed (list)
 - Any follow-up work (notes)
 - Setup guide if applicable (how to verify it works)
 ---
 ## Common Anti-Patterns
 | Anti-Pattern                                | Why It Fails                                    | Fix                                              |
 | ------------------------------------------- | ----------------------------------------------- | ------------------------------------------------ |
 | Feeding all ai-context files for every task | Bloated context, LLM loses focus, wastes tokens | Use the file selection table (step 3)            |
 | Letting the LLM write files directly        | You don't understand the code, can't debug it   | Copy-paste into your editor, review line by line |
 | Skipping verification                       | "It compiles" ≠ "it works"                      | Run tests, manual check, no console errors       |
 | Not updating docs                           | Future You is confused, LLMs get stale context  | Post-work checklist is non-negotiable            |
 | One long conversation for everything        | LLM forgets constraints, contradicts itself     | Restart at ~25 turns or on pivot                 |
 | Accepting code you don't understand         | You can't maintain it, can't explain it         | Ask "explain this line" until you do             |
 ---
 ## Quick Reference
 ### File Selection Cheat Sheet
 ```
 Frontend only     → 00 + 03 + 99
 Backend only      → 00 + 01 + 02 + 99
 Full-stack        → 00 + 01 + 02 + 03 + 99
 Multiplayer/WS    → 00 + 04 + 99
 Data pipeline     → 00 + 05 + 99
 Deployment        → 00 + 06 + 99
 Auth              → 00 + 03 + 99
 Big refactor      → 00 + 01 + 02 + 03 + 99
 ```
 ### Conversation Restart Template
 ```
 I'm continuing work on Lila. Here's the current context:
 [PASTE: 00-project-overview.md]
 [PASTE: relevant domain file(s)]
 Previously, we [brief summary of what was done].
 Current task: [updated 99-current-task.md, marking completed items]
 Let's continue from [specific file/section].
 ```
 ### Verification Checklist
 ```
 □ pnpm typecheck
 □ pnpm test
 □ pnpm lint
 □ Manual dev verification
 □ No browser console errors
 □ No server errors
 □ Doc updates applied
 □ Ticket file created (if significant)
 ```
--- a/documentation/ai-context/prompts/meta.md
+++ b/documentation/ai-context/prompts/meta.md
@ -0,0 +1,172 @@
 # Prompts — Meta
 > **Purpose:** Reusable prompt templates and working methodology for LLM-assisted development on Lila. Use these as preambles when starting a new task with any LLM.
 > **Last updated:** 2026-05-15
 ---
 ## Working Methodology
 This project is a learning exercise. The goal is to understand the code, not just to ship it.
 ### How to use an LLM for help
 1. **Paste the relevant AI-context files as context** (00-project-overview.md + domain files + 99-current-task.md)
 2. **Describe what you're working on and what you're stuck on**
 3. **Ask for hints and explanations, not raw solutions** — understand the concept, then implement it yourself
 4. **After completing a task, ask the LLM what docs need updating**
 ### Refactoring workflow
 After completing a task: share the code, ask what to refactor and why. The LLM should explain the concept, not write the implementation.
 ---
 ## Base Prompt Template
 Use this as the opening when starting any task with an LLM:
 ```
 I'm working on Lila, a vocabulary learning app. Here's the project context:
 [PASTE: 00-project-overview.md]
 [PASTE: relevant domain file(s) from ai-context/]
 My current task: [describe what you're building or fixing]
 Please follow these rules:
 1. Tell me which files you need to see to get the full context of the problem.
   Do not assume you know the codebase — ask for files.
 2. Walk me text-only through the problem and the solution.
   Explain the concept before showing code.
 3. If we need to update multiple files, let's go through them one by one,
   no matter how many files there are.
 4. If we go through a file, we'll do it slowly section by section,
   no matter how many sections.
 5. Suggest a feature branch name. Tell me when it's time to git commit
   and provide a commit message.
 6. If we have multiple options, provide options that reflect current
   industry standards and best practices. Explain the trade-offs.
 7. Never assume anything. Always ask for clarification if uncertain.
 8. For every completed task, tell me which documentation files need updates.
   Use this format:
   - FILE: [filename] — REASON: [what changed and why the doc needs updating]
 Let's start.
 ```
 ---
 ## Task-Specific Prompt Templates
 ### Generate a Feature
 ```
 [Base prompt template above]
 Additional context:
 - This is a [feature/bugfix/refactor] task
 - It touches these areas: [frontend/backend/database/websocket/pipeline]
 - The user-facing behavior should be: [describe]
 - Technical constraints: [e.g., must work with existing Zod schemas, must not break WebSocket protocol]
 ```
 ### Review Code for Bugs
 ```
 [Base prompt template above]
 Additional context:
 - I'm seeing this symptom: [error message, unexpected behavior]
 - It happens when: [reproduction steps]
 - I've checked these files already: [list]
 - Focus on: [race conditions, null handling, async flow, type safety, etc.]
 ```
 ### Generate Tests
 ```
 [Base prompt template above]
 Additional context:
 - Test type: [unit/integration/e2e]
 - What to test: [function/component/endpoint]
 - Current test coverage: [none/existing but incomplete]
 - Mocking strategy: [mock DB/mock WS/mock auth]
 ```
 ### Debug an Issue
 ```
 [Base prompt template above]
 Additional context:
 - Error message: [paste full error]
 - Stack trace: [paste if available]
 - Recent changes: [what was modified before it broke]
 - Environment: [dev/production/local/CI]
 ```
 ---
 ## Post-Work Doc Update Checklist
 After completing any task, the LLM should check these files for needed updates:
 | File                               | Check if...                                                             |
 | ---------------------------------- | ----------------------------------------------------------------------- |
 | `documentation/STATUS.md`          | Task changes what's working or what's blocked                           |
 | `documentation/BACKLOG.md`         | Task completes a backlog item or creates a new one                      |
 | `documentation/DECISIONS.md`       | Task involved choosing between alternatives with long-term consequences |
 | `documentation/ARCHITECTURE.md`    | Task changes monorepo structure, data flow, or layer boundaries         |
 | `documentation/ai-context/*.md`    | Task changes schemas, endpoints, protocol, or pipeline stages           |
 | `packages/shared/src/schemas/*.ts` | Task changes request/response shapes or WS message types                |
 | `README.md`                        | Task changes quickstart steps, stack, or current status                 |
 **Format for doc updates:**
 ```
 - FILE: documentation/STATUS.md — REASON: Guest play flow is now live, update "What Works Today"
 - FILE: documentation/ai-context/03-api-contract.md — REASON: New endpoint POST /api/v1/game/guest-start added
 - FILE: packages/shared/src/schemas/game.ts — REASON: Added GuestGameRequestSchema
 ```
 ---
 ## Ticket File Convention
 For completed tasks, produce a ticket file in `documentation/tickets/`:
 | Prefix   | Use when...                                          | Example                             |
 | -------- | ---------------------------------------------------- | ----------------------------------- |
 | `adr-`   | Decision between options with long-term consequences | `adr-websocket-library.md`          |
 | `feat-`  | New feature shipped                                  | `feat-guest-play.md`                |
 | `fix-`   | Bug fixed                                            | `fix-race-condition-lobby-join.md`  |
 | `chore-` | Routine maintenance, refactoring, tooling            | `chore-batch-distractor-queries.md` |
 **Ticket contents:**
 - What was done (summary)
 - Why it was needed (context)
 - What files changed (list)
 - Any follow-up work (notes)
 - Setup guide if applicable (how to verify it works)
 ---
 ## Tips for Effective LLM Collaboration
 1. **Start small.** Give the LLM one file or one function at a time, not the whole codebase.
 2. **Verify assumptions.** If the LLM assumes something about your stack, correct it immediately — wrong assumptions compound.
 3. **Ask for alternatives.** "What's the simplest way to do this?" vs. "What's the most robust way?" often yield different answers.
 4. **Don't accept code you don't understand.** Ask the LLM to explain a line until you do.
 5. **Test everything.** The LLM can suggest tests, but you run them. Trust nothing until it passes.
 6. **Keep context fresh.** If a conversation gets long, start a new one with the base prompt + current task template.
--- a/documentation/archive/notes.md
+++ b/documentation/archive/notes.md
--- a/documentation/archive/roadmap.md
+++ b/documentation/archive/roadmap.md
--- a/documentation/archive/spec.md
+++ b/documentation/archive/spec.md
--- a/documentation/design/GAME_MODES.md
+++ b/documentation/design/GAME_MODES.md
Author	SHA1	Message	Date
lila	caa2f7d395	updating docs	2026-05-25 01:04:49 +02:00
lila	7e0311683f	updating documentation	2026-05-16 01:59:43 +02:00