20 KiB
Vocabulary Trainer — Project Specification
1. Overview
A multiplayer English–Italian vocabulary trainer with a Duolingo-style quiz interface (one word prompt, four answer choices). Supports both single-player practice and real-time competitive multiplayer rooms of 2–4 players. Designed from the ground up to be language-pair agnostic.
Core Principles
- Minimal but extendable: Working product fast, clean architecture for future growth
- Mobile-first: Touch-friendly Duolingo-like UX
- Type safety end-to-end: TypeScript + Zod schemas shared between frontend and backend
2. Technology Stack
| Layer | Technology |
|---|---|
| Monorepo | pnpm workspaces |
| Frontend | React 18, Vite, TypeScript |
| Routing | TanStack Router |
| Server state | TanStack Query |
| Client state | Zustand |
| Styling | Tailwind CSS + shadcn/ui |
| Backend | Node.js, Express, TypeScript |
| Realtime | WebSockets (ws library) |
| Database | PostgreSQL 18 |
| ORM | Drizzle ORM |
| Cache / Queue | Valkey 9 |
| Auth | OpenAuth (Google + GitHub) |
| Validation | Zod (shared schemas) |
| Testing | Vitest, React Testing Library |
| Linting / Formatting | ESLint, Prettier |
| Containerisation | Docker, Docker Compose |
| Hosting | Hetzner VPS |
Why ws over Socket.io
ws is the raw WebSocket library. For rooms of 2–4 players there is no need for Socket.io's transport fallbacks or room-management abstractions. The protocol is defined explicitly in packages/shared, which gives the same guarantees without the overhead.
Why Valkey
Valkey stores ephemeral room state that does not need to survive a server restart. It keeps the PostgreSQL schema clean and makes room lookups O(1).
Why pnpm workspaces without Turborepo
Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, the plain pnpm workspace commands (pnpm -r run build, pnpm --filter) are sufficient and there is one less tool to configure and maintain.
3. Repository Structure
vocab-trainer/
├── apps/
│ ├── web/ # React SPA (Vite + TanStack Router)
│ │ ├── src/
│ │ │ ├── routes/
│ │ │ ├── components/
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── lib/
│ │ └── Dockerfile
│ └── api/ # Express REST + WebSocket server
│ ├── src/
│ │ ├── routes/
│ │ ├── services/
│ │ ├── repositories/
│ │ └── websocket/
│ └── Dockerfile
├── packages/
│ ├── shared/ # Zod schemas, TypeScript types, constants
│ └── db/ # Drizzle schema, migrations, seed script
├── scripts/
| ├── datafiles/
│ | └── en-it-nouns.json
│ └── extract-en-it-nouns.py # One-time WordNet + OMW extraction → seed.json
├── docker-compose.yml
├── docker-compose.prod.yml
├── pnpm-workspace.yaml
└── package.json
packages/shared is the contract between frontend and backend. All request/response shapes and WebSocket event payloads are defined there as Zod schemas and inferred TypeScript types — never duplicated.
pnpm workspace config
pnpm-workspace.yaml declares:
packages:
- 'apps/*'
- 'packages/*'
Root scripts
The root package.json defines convenience scripts that delegate to workspaces:
dev— startsapiandwebin parallelbuild— builds all packages in dependency ordertest— runs Vitest across all workspaceslint— runs ESLint across all workspaces
For parallel dev, use concurrently or just two terminal tabs for MVP.
4. Architecture — N-Tier / Layered
┌────────────────────────────────────┐
│ Presentation (React SPA) │ apps/web
├────────────────────────────────────┤
│ API / Transport │ HTTP REST + WebSocket
├────────────────────────────────────┤
│ Application (Controllers) │ apps/api/src/routes
│ Domain (Business logic) │ apps/api/src/services
│ Data Access (Repositories) │ apps/api/src/repositories
├────────────────────────────────────┤
│ Database (PostgreSQL via Drizzle) │ packages/db
│ Cache (Valkey) │ apps/api/src/lib/valkey.ts
└────────────────────────────────────┘
Each layer only communicates with the layer directly below it. Business logic lives in services, not in route handlers or repositories.
5. Infrastructure
Domain structure
| Subdomain | Service |
|---|---|
app.yourdomain.com |
React frontend |
api.yourdomain.com |
Express API + WebSocket |
auth.yourdomain.com |
OpenAuth service |
Docker Compose services (production)
| Container | Role |
|---|---|
postgres |
PostgreSQL 16, named volume |
valkey |
Valkey 8, ephemeral (no persistence needed) |
openauth |
OpenAuth service |
api |
Express + WS server |
web |
Nginx serving the Vite build |
nginx-proxy |
Automatic reverse proxy |
acme-companion |
Let's Encrypt certificate automation |
nginx-proxy (:80/:443)
app.domain → web:80
api.domain → api:3000 (HTTP + WS upgrade)
auth.domain → openauth:3001
SSL is fully automatic via nginx-proxy + acme-companion. No manual Certbot needed.
5.1 Valkey Key Structure
Ephemeral room state is stored in Valkey with TTL (e.g., 1 hour). PostgreSQL stores durable history only.
Key Format: room:{code}:{field}
| Key | Type | TTL | Description |
|---|---|---|---|
room:{code}:state |
Hash | 1h | Current question index, round status |
room:{code}:players |
Set | 1h | List of connected user IDs |
room:{code}:answers:{round} |
Hash | 15m | Temp storage for current round answers |
Recovery Strategy
If server crashes mid-game, Valkey data is lost.
PostgreSQL room_players.score remains 0.
Room status is reset to finished via startup health check if updated_at is stale.
6. Data Model
Design principle
Words are modelled as language-neutral concepts (terms) separate from learning curricula (decks).
Adding a new language pair requires no schema changes — only new rows in translations, decks, and language_pairs.
Core tables
terms id uuid PK synset_id text UNIQUE -- OMW ILI (e.g. "ili:i12345") pos varchar(20) -- NOT NULL, CHECK (pos IN ('noun', 'verb', 'adjective', 'adverb')) created_at timestamptz DEFAULT now() -- REMOVED: frequency_rank (handled at deck level)
translations id uuid PK term_id uuid FK → terms.id language_code varchar(10) -- NOT NULL, BCP 47: "en", "it" text text -- NOT NULL created_at timestamptz DEFAULT now() UNIQUE (term_id, language_code, text) -- Allow synonyms, prevent exact duplicates
term_glosses id uuid PK term_id uuid FK → terms.id language_code varchar(10) -- NOT NULL text text -- NOT NULL created_at timestamptz DEFAULT now()
language_pairs id uuid PK source varchar(10) -- NOT NULL target varchar(10) -- NOT NULL label text active boolean DEFAULT true UNIQUE (source, target)
decks id uuid PK name text -- NOT NULL (e.g. "A1 Italian Nouns", "Most Common 1000") description text -- NULLABLE pair_id uuid FK → language_pairs.id -- NULLABLE (for single-language or multi-pair decks) created_by uuid FK → users.id -- NULLABLE (for system decks) is_public boolean DEFAULT true created_at timestamptz DEFAULT now()
deck_terms deck_id uuid FK → decks.id term_id uuid FK → terms.id position smallint -- NOT NULL, ordering within deck (1, 2, 3...) added_at timestamptz DEFAULT now() PRIMARY KEY (deck_id, term_id)
users
id uuid PK -- Internal stable ID (FK target)
openauth_sub text UNIQUE -- NOT NULL, OpenAuth sub claim (e.g. "google|12345")
email varchar(255) UNIQUE -- NULLABLE (GitHub users may lack email)
display_name varchar(100)
created_at timestamptz DEFAULT now()
last_login_at timestamptz
-- REMOVED: games_played, games_won (derive from room_players)
rooms id uuid PK code varchar(8) UNIQUE -- NOT NULL, CHECK (code = UPPER(code)) host_id uuid FK → users.id pair_id uuid FK → language_pairs.id deck_id uuid FK → decks.id -- Which vocabulary deck this room uses status varchar(20) -- NOT NULL, CHECK (status IN ('waiting', 'in_progress', 'finished')) max_players smallint -- NOT NULL, DEFAULT 4, CHECK (max_players BETWEEN 2 AND 10) round_count smallint -- NOT NULL, DEFAULT 10, CHECK (round_count BETWEEN 5 AND 20) created_at timestamptz DEFAULT now() updated_at timestamptz DEFAULT now() -- For stale room recovery
room_players room_id uuid FK → rooms.id user_id uuid FK → users.id score integer DEFAULT 0 -- Final score only (written at game end) joined_at timestamptz DEFAULT now() left_at timestamptz -- Populated on WS disconnect/leave PRIMARY KEY (room_id, user_id)
Indexes -- Vocabulary CREATE INDEX idx_terms_pos ON terms (pos); CREATE INDEX idx_translations_lang ON translations (language_code, term_id);
-- Decks CREATE INDEX idx_decks_pair ON decks (pair_id, is_public); CREATE INDEX idx_decks_creator ON decks (created_by); CREATE INDEX idx_deck_terms_term ON deck_terms (term_id);
-- Language Pairs CREATE INDEX idx_pairs_active ON language_pairs (active, source, target);
-- Rooms CREATE INDEX idx_rooms_status ON rooms (status); CREATE INDEX idx_rooms_host ON rooms (host_id); -- NOTE: idx_rooms_code omitted (UNIQUE constraint creates index automatically)
-- Room Players CREATE INDEX idx_room_players_user ON room_players (user_id); CREATE INDEX idx_room_players_score ON room_players (room_id, score DESC);
Repository Logic Note
DeckRepository.getTerms(deckId, limit, offset) fetches terms from a specific deck.
Query uses deck_terms.position for ordering.
For random practice within a deck: WHERE deck_id = X ORDER BY random() LIMIT N
(safe because deck is bounded, e.g., 500 terms max, not full table).
7. Vocabulary Data — WordNet + OMW
Source
Open Multilingual Wordnet (OMW) — English & Italian nouns via Interlingual Index (ILI) External CEFR lists — For deck curation (e.g. GitHub: ecom/cefr-lists)
Extraction process
- Run
extract-en-it-nouns.pyonce locally usingwnlibrary- Imports ALL bilingual noun synsets (no frequency filtering)
- Output:
datafiles/en-it-nouns.json— committed to repo
- Run
pnpm db:seed— populatesterms+translationstables from JSON - Run
pnpm db:build-decks— matches external CEFR lists to DB terms, createsdecks+deck_terms
Benefits of deck-based approach
- WordNet frequency data is unreliable (e.g. chemical symbols rank high)
- Curricula can come from external sources (CEFR, Oxford 3000, SUBTLEX)
- Bad data excluded at deck level, not schema level
- Users can create custom decks later
- Multiple difficulty levels without schema changes
terms.synset_id stores the OMW ILI (e.g. ili:i12345) for traceability and future re-imports with additional languages.
8. Authentication — OpenAuth
All auth is delegated to the OpenAuth service at auth.yourdomain.com. Providers: Google, GitHub.
The API validates the JWT from OpenAuth on every protected request. User rows are created or updated on first login via the sub claim as the primary key.
Auth endpoint on the API:
| Method | Path | Description |
|---|---|---|
| GET | /api/auth/me |
Validate token, return user |
All other auth flows (login, callback, token refresh) are handled entirely by OpenAuth — the frontend redirects to auth.yourdomain.com and receives a JWT back.
9. REST API
All endpoints prefixed /api. Request and response bodies validated with Zod on both sides using schemas from packages/shared.
Vocabulary
| Method | Path | Description |
|---|---|---|
| GET | /language-pairs |
List active language pairs |
| GET | /terms?pair=en-it&limit=10 |
Fetch quiz terms with distractors |
Rooms
| Method | Path | Description |
|---|---|---|
| POST | /rooms |
Create a room → returns room + code |
| GET | /rooms/:code |
Get current room state |
| POST | /rooms/:code/join |
Join a room |
Users
| Method | Path | Description |
|---|---|---|
| GET | /users/me |
Current user profile |
| GET | /users/me/stats |
Games played, win rate |
10. WebSocket Protocol
One WS connection per client. Authenticated by passing the OpenAuth JWT as a query param on the upgrade request: wss://api.yourdomain.com?token=....
All messages are JSON: { type: string, payload: unknown }. The full set of types is a Zod discriminated union in packages/shared — both sides validate every message they receive.
Client → Server
| type | payload | Description |
|---|---|---|
room:join |
{ code } |
Subscribe to a room's WS channel |
room:leave |
— | Unsubscribe |
room:start |
— | Host starts the game |
game:answer |
{ questionId, answerId } |
Player submits an answer |
Server → Client
| type | payload | Description |
|---|---|---|
room:state |
Full room snapshot | Sent on join and on any player join/leave |
game:question |
{ id, prompt, options[], timeLimit } |
New question broadcast to all players |
game:answer_result |
{ questionId, correct, correctAnswerId, scores } |
Broadcast after all answer or timeout |
game:finished |
{ scores[], winner } |
End of game summary |
error |
{ message } |
Protocol or validation error |
Multiplayer game mechanic — simultaneous answers
All players see the same question at the same time. Everyone submits independently. The server waits until all players have answered or the 15-second timeout fires — then broadcasts game:answer_result with updated scores. There is no buzz-first mechanic. This keeps the experience Duolingo-like and symmetric.
Game flow
host creates room (REST) →
players join via room code (REST + WS room:join) →
room:state broadcasts player list →
host sends room:start →
server broadcasts game:question →
players send game:answer →
server collects all answers or waits for timeout →
server broadcasts game:answer_result →
repeat for N rounds →
server broadcasts game:finished
Room state in Valkey
Active room state (connected players, current question, answers received this round) is stored in Valkey with a TTL. PostgreSQL holds the durable record (rooms, room_players). On server restart, in-progress games are considered abandoned — acceptable for MVP.
11. Game Mechanics
- Question format: source-language word prompt + 4 target-language choices (1 correct + 3 distractors of the same POS)
- Distractors: generated server-side, never include the correct answer, never repeat within a session
- Scoring: +1 point per correct answer. Speed bonus is out of scope for MVP.
- Timer: 15 seconds per question, server-authoritative
- Single-player: uses
GET /termsand runs entirely client-side. No WebSocket.
12. Frontend Structure
apps/web/src/
├── routes/
│ ├── index.tsx # Landing / mode select
│ ├── auth/
│ ├── singleplayer/
│ └── multiplayer/
│ ├── lobby.tsx # Create or join by code
│ ├── room.$code.tsx # Waiting room
│ └── game.$code.tsx # Active game
├── components/
│ ├── quiz/ # QuestionCard, OptionButton, ScoreBoard
│ ├── room/ # PlayerList, RoomCode, ReadyState
│ └── ui/ # shadcn/ui wrappers: Button, Card, Dialog ...
├── stores/
│ └── gameStore.ts # Zustand: game session, scores, WS state
├── lib/
│ ├── api.ts # TanStack Query wrappers
│ └── ws.ts # WS client singleton
└── main.tsx
Zustand store (single store for MVP)
interface AppStore {
user: User | null;
gameSession: GameSession | null;
currentQuestion: Question | null;
scores: Record<string, number>;
isLoading: boolean;
error: string | null;
}
TanStack Query handles all server data fetching. Zustand handles ephemeral UI and WebSocket-driven state.
13. Testing Strategy
| Type | Tool | Scope |
|---|---|---|
| Unit | Vitest | Services, QuizService distractor logic, Zod schemas |
| Component | Vitest + RTL | QuestionCard, OptionButton, auth forms |
| Integration | Vitest | API route handlers against a test DB |
| E2E | Out of scope for MVP | — |
Tests are co-located with source files (*.test.ts / *.test.tsx).
Critical paths to cover:
- Distractor generation (correct POS, no duplicates, never includes answer)
- Answer validation (server-side, correct scoring)
- Game session lifecycle (create → play → complete)
- JWT validation middleware
14. Definition of Done
Functional
- User can log in via Google or GitHub (OpenAuth)
- User can play singleplayer: 10 rounds, score, result screen
- User can create a room and share a code
- User can join a room via code
- Multiplayer: 10 rounds, simultaneous answers, real-time score sync
- 1 000 English–Italian words seeded from WordNet + OMW
Technical
- Deployed to Hetzner with HTTPS on all three subdomains
- Docker Compose running all services
- Drizzle migrations applied on container start
- 10–20 passing tests covering critical paths
- pnpm workspace build pipeline green
15. Out of Scope (MVP)
- Difficulty levels (
frequency_rankcolumn exists, ready to use) - Additional language pairs (schema already supports it — just add rows)
- Leaderboards (
games_played,games_woncolumns exist) - Streaks / daily challenges
- Friends / private invites
- Audio pronunciation
- CI/CD pipeline (manual deploy for now)
- Rate limiting (add before going public)
- Admin panel for vocabulary management