# Vocabulary Trainer — Project Specification ## 1. Overview A multiplayer English–Italian vocabulary trainer with a Duolingo-style quiz interface (one word prompt, four answer choices). Supports both single-player practice and real-time competitive multiplayer rooms of 2–4 players. Designed from the ground up to be language-pair agnostic. ### Core Principles - **Minimal but extendable**: Working product fast, clean architecture for future growth - **Mobile-first**: Touch-friendly Duolingo-like UX - **Type safety end-to-end**: TypeScript + Zod schemas shared between frontend and backend --- ## 2. Technology Stack | Layer | Technology | | -------------------- | ----------------------------- | | Monorepo | pnpm workspaces | | Frontend | React 18, Vite, TypeScript | | Routing | TanStack Router | | Server state | TanStack Query | | Client state | Zustand | | Styling | Tailwind CSS + shadcn/ui | | Backend | Node.js, Express, TypeScript | | Realtime | WebSockets (`ws` library) | | Database | PostgreSQL 18 | | ORM | Drizzle ORM | | Cache / Queue | Valkey 9 | | Auth | OpenAuth (Google + GitHub) | | Validation | Zod (shared schemas) | | Testing | Vitest, React Testing Library | | Linting / Formatting | ESLint, Prettier | | Containerisation | Docker, Docker Compose | | Hosting | Hetzner VPS | ### Why `ws` over Socket.io `ws` is the raw WebSocket library. For rooms of 2–4 players there is no need for Socket.io's transport fallbacks or room-management abstractions. The protocol is defined explicitly in `packages/shared`, which gives the same guarantees without the overhead. ### Why Valkey Valkey stores ephemeral room state that does not need to survive a server restart. It keeps the PostgreSQL schema clean and makes room lookups O(1). ### Why pnpm workspaces without Turborepo Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, the plain pnpm workspace commands (`pnpm -r run build`, `pnpm --filter`) are sufficient and there is one less tool to configure and maintain. --- ## 3. Repository Structure ``` vocab-trainer/ ├── apps/ │ ├── web/ # React SPA (Vite + TanStack Router) │ │ ├── src/ │ │ │ ├── routes/ │ │ │ ├── components/ │ │ │ ├── stores/ # Zustand stores │ │ │ └── lib/ │ │ └── Dockerfile │ └── api/ # Express REST + WebSocket server │ ├── src/ │ │ ├── routes/ │ │ ├── services/ │ │ ├── repositories/ │ │ └── websocket/ │ └── Dockerfile ├── packages/ │ ├── shared/ # Zod schemas, TypeScript types, constants │ └── db/ # Drizzle schema, migrations, seed script ├── scripts/ │ └── extract_omw.py # One-time WordNet + OMW extraction → seed.json ├── docker-compose.yml ├── docker-compose.prod.yml ├── pnpm-workspace.yaml └── package.json ``` `packages/shared` is the contract between frontend and backend. All request/response shapes and WebSocket event payloads are defined there as Zod schemas and inferred TypeScript types — never duplicated. ### pnpm workspace config `pnpm-workspace.yaml` declares: ``` packages: - 'apps/*' - 'packages/*' ``` ### Root scripts The root `package.json` defines convenience scripts that delegate to workspaces: - `dev` — starts `api` and `web` in parallel - `build` — builds all packages in dependency order - `test` — runs Vitest across all workspaces - `lint` — runs ESLint across all workspaces For parallel dev, use `concurrently` or just two terminal tabs for MVP. --- ## 4. Architecture — N-Tier / Layered ``` ┌────────────────────────────────────┐ │ Presentation (React SPA) │ apps/web ├────────────────────────────────────┤ │ API / Transport │ HTTP REST + WebSocket ├────────────────────────────────────┤ │ Application (Controllers) │ apps/api/src/routes │ Domain (Business logic) │ apps/api/src/services │ Data Access (Repositories) │ apps/api/src/repositories ├────────────────────────────────────┤ │ Database (PostgreSQL via Drizzle) │ packages/db │ Cache (Valkey) │ apps/api/src/lib/valkey.ts └────────────────────────────────────┘ ``` Each layer only communicates with the layer directly below it. Business logic lives in services, not in route handlers or repositories. --- ## 5. Infrastructure ### Domain structure | Subdomain | Service | | --------------------- | ----------------------- | | `app.yourdomain.com` | React frontend | | `api.yourdomain.com` | Express API + WebSocket | | `auth.yourdomain.com` | OpenAuth service | ### Docker Compose services (production) | Container | Role | | ---------------- | ------------------------------------------- | | `postgres` | PostgreSQL 16, named volume | | `valkey` | Valkey 8, ephemeral (no persistence needed) | | `openauth` | OpenAuth service | | `api` | Express + WS server | | `web` | Nginx serving the Vite build | | `nginx-proxy` | Automatic reverse proxy | | `acme-companion` | Let's Encrypt certificate automation | ``` nginx-proxy (:80/:443) app.domain → web:80 api.domain → api:3000 (HTTP + WS upgrade) auth.domain → openauth:3001 ``` SSL is fully automatic via `nginx-proxy` + `acme-companion`. No manual Certbot needed. --- ## 6. Data Model ### Design principle Words are modelled as language-neutral **terms** with one or more **translations** per language. Adding a new language pair (e.g. English–French) requires **no schema changes** — only new rows in `translations` and `language_pairs`. The flat `english/italian` column pattern is explicitly avoided. ### Core tables ``` terms id uuid PK synset_id text UNIQUE -- WordNet synset offset e.g. "wn:01234567n" pos varchar(20) -- "noun" | "verb" | "adjective" frequency_rank integer -- 1–1000, reserved for difficulty filtering created_at timestamptz translations id uuid PK term_id uuid FK → terms.id language_code varchar(10) -- BCP 47: "en", "it", "de", ... text text UNIQUE (term_id, language_code) language_pairs id uuid PK source varchar(10) -- "en" target varchar(10) -- "it" label text -- "English → Italian" active boolean DEFAULT true UNIQUE (source, target) users id uuid PK -- OpenAuth sub claim email varchar(255) UNIQUE display_name varchar(100) games_played integer DEFAULT 0 games_won integer DEFAULT 0 created_at timestamptz last_login_at timestamptz rooms id uuid PK code varchar(8) UNIQUE -- human-readable e.g. "WOLF-42" host_id uuid FK → users.id pair_id uuid FK → language_pairs.id status text -- "waiting" | "in_progress" | "finished" max_players smallint DEFAULT 4 round_count smallint DEFAULT 10 created_at timestamptz room_players room_id uuid FK → rooms.id user_id uuid FK → users.id score integer DEFAULT 0 joined_at timestamptz PRIMARY KEY (room_id, user_id) ``` ### Indexes ```sql CREATE INDEX ON terms (pos, frequency_rank); CREATE INDEX ON rooms (status); CREATE INDEX ON room_players (user_id); ``` --- ## 7. Vocabulary Data — WordNet + OMW ### Source - **Princeton WordNet** — English words + synset IDs - **Open Multilingual Wordnet (OMW)** — Italian translations keyed by synset ID ### Extraction process 1. Run `scripts/extract_omw.py` once locally using NLTK 2. Filter to the 1 000 most common nouns (by WordNet frequency data) 3. Output: `packages/db/src/seed.json` — committed to the repo 4. `packages/db/src/seed.ts` reads the JSON and populates `terms` + `translations` `terms.synset_id` stores the WordNet offset (e.g. `wn:01234567n`) for traceability and future re-imports with additional languages. --- ## 8. Authentication — OpenAuth All auth is delegated to the OpenAuth service at `auth.yourdomain.com`. Providers: Google, GitHub. The API validates the JWT from OpenAuth on every protected request. User rows are created or updated on first login via the `sub` claim as the primary key. **Auth endpoint on the API:** | Method | Path | Description | | ------ | -------------- | --------------------------- | | GET | `/api/auth/me` | Validate token, return user | All other auth flows (login, callback, token refresh) are handled entirely by OpenAuth — the frontend redirects to `auth.yourdomain.com` and receives a JWT back. --- ## 9. REST API All endpoints prefixed `/api`. Request and response bodies validated with Zod on both sides using schemas from `packages/shared`. ### Vocabulary | Method | Path | Description | | ------ | ---------------------------- | --------------------------------- | | GET | `/language-pairs` | List active language pairs | | GET | `/terms?pair=en-it&limit=10` | Fetch quiz terms with distractors | ### Rooms | Method | Path | Description | | ------ | ------------------- | ----------------------------------- | | POST | `/rooms` | Create a room → returns room + code | | GET | `/rooms/:code` | Get current room state | | POST | `/rooms/:code/join` | Join a room | ### Users | Method | Path | Description | | ------ | ----------------- | ---------------------- | | GET | `/users/me` | Current user profile | | GET | `/users/me/stats` | Games played, win rate | --- ## 10. WebSocket Protocol One WS connection per client. Authenticated by passing the OpenAuth JWT as a query param on the upgrade request: `wss://api.yourdomain.com?token=...`. All messages are JSON: `{ type: string, payload: unknown }`. The full set of types is a Zod discriminated union in `packages/shared` — both sides validate every message they receive. ### Client → Server | type | payload | Description | | ------------- | -------------------------- | -------------------------------- | | `room:join` | `{ code }` | Subscribe to a room's WS channel | | `room:leave` | — | Unsubscribe | | `room:start` | — | Host starts the game | | `game:answer` | `{ questionId, answerId }` | Player submits an answer | ### Server → Client | type | payload | Description | | -------------------- | -------------------------------------------------- | ----------------------------------------- | | `room:state` | Full room snapshot | Sent on join and on any player join/leave | | `game:question` | `{ id, prompt, options[], timeLimit }` | New question broadcast to all players | | `game:answer_result` | `{ questionId, correct, correctAnswerId, scores }` | Broadcast after all answer or timeout | | `game:finished` | `{ scores[], winner }` | End of game summary | | `error` | `{ message }` | Protocol or validation error | ### Multiplayer game mechanic — simultaneous answers All players see the same question at the same time. Everyone submits independently. The server waits until all players have answered **or** the 15-second timeout fires — then broadcasts `game:answer_result` with updated scores. There is no buzz-first mechanic. This keeps the experience Duolingo-like and symmetric. ### Game flow ``` host creates room (REST) → players join via room code (REST + WS room:join) → room:state broadcasts player list → host sends room:start → server broadcasts game:question → players send game:answer → server collects all answers or waits for timeout → server broadcasts game:answer_result → repeat for N rounds → server broadcasts game:finished ``` ### Room state in Valkey Active room state (connected players, current question, answers received this round) is stored in Valkey with a TTL. PostgreSQL holds the durable record (`rooms`, `room_players`). On server restart, in-progress games are considered abandoned — acceptable for MVP. --- ## 11. Game Mechanics - **Question format**: source-language word prompt + 4 target-language choices (1 correct + 3 distractors of the same POS) - **Distractors**: generated server-side, never include the correct answer, never repeat within a session - **Scoring**: +1 point per correct answer. Speed bonus is out of scope for MVP. - **Timer**: 15 seconds per question, server-authoritative - **Single-player**: uses `GET /terms` and runs entirely client-side. No WebSocket. --- ## 12. Frontend Structure ``` apps/web/src/ ├── routes/ │ ├── index.tsx # Landing / mode select │ ├── auth/ │ ├── singleplayer/ │ └── multiplayer/ │ ├── lobby.tsx # Create or join by code │ ├── room.$code.tsx # Waiting room │ └── game.$code.tsx # Active game ├── components/ │ ├── quiz/ # QuestionCard, OptionButton, ScoreBoard │ ├── room/ # PlayerList, RoomCode, ReadyState │ └── ui/ # shadcn/ui wrappers: Button, Card, Dialog ... ├── stores/ │ └── gameStore.ts # Zustand: game session, scores, WS state ├── lib/ │ ├── api.ts # TanStack Query wrappers │ └── ws.ts # WS client singleton └── main.tsx ``` ### Zustand store (single store for MVP) ```typescript interface AppStore { user: User | null; gameSession: GameSession | null; currentQuestion: Question | null; scores: Record; isLoading: boolean; error: string | null; } ``` TanStack Query handles all server data fetching. Zustand handles ephemeral UI and WebSocket-driven state. --- ## 13. Testing Strategy | Type | Tool | Scope | | ----------- | -------------------- | --------------------------------------------------- | | Unit | Vitest | Services, QuizService distractor logic, Zod schemas | | Component | Vitest + RTL | QuestionCard, OptionButton, auth forms | | Integration | Vitest | API route handlers against a test DB | | E2E | Out of scope for MVP | — | Tests are co-located with source files (`*.test.ts` / `*.test.tsx`). **Critical paths to cover:** - Distractor generation (correct POS, no duplicates, never includes answer) - Answer validation (server-side, correct scoring) - Game session lifecycle (create → play → complete) - JWT validation middleware --- ## 14. Definition of Done ### Functional - [ ] User can log in via Google or GitHub (OpenAuth) - [ ] User can play singleplayer: 10 rounds, score, result screen - [ ] User can create a room and share a code - [ ] User can join a room via code - [ ] Multiplayer: 10 rounds, simultaneous answers, real-time score sync - [ ] 1 000 English–Italian words seeded from WordNet + OMW ### Technical - [ ] Deployed to Hetzner with HTTPS on all three subdomains - [ ] Docker Compose running all services - [ ] Drizzle migrations applied on container start - [ ] 10–20 passing tests covering critical paths - [ ] pnpm workspace build pipeline green --- ## 15. Out of Scope (MVP) - Difficulty levels _(`frequency_rank` column exists, ready to use)_ - Additional language pairs _(schema already supports it — just add rows)_ - Leaderboards _(`games_played`, `games_won` columns exist)_ - Streaks / daily challenges - Friends / private invites - Audio pronunciation - CI/CD pipeline (manual deploy for now) - Rate limiting _(add before going public)_ - Admin panel for vocabulary management