Database Schema: - Add decks table for curated word lists (A1, Most Common, etc.) - Add deck_terms join table with position ordering - Link rooms to decks via rooms.deck_id FK - Remove frequency_rank from terms (now deck-scoped) - Change users.id to uuid, add openauth_sub for auth mapping - Add room_players.left_at for disconnect tracking - Add rooms.updated_at for stale room recovery - Add CHECK constraints for data integrity (pos, status, etc.) Extraction Script: - Rewrite extract.py to mirror complete OMW dataset - Extract all 25,204 bilingual noun synsets (en-it) - Remove frequency filtering and block lists - Output all lemmas per synset for full synonym support - Seed data now uncurated; decks handle selection Architecture: - Separate concerns: raw OMW data in DB, curation in decks - Enables user-created decks and multiple difficulty levels - Rooms select vocabulary by choosing a deck
519 lines
21 KiB
Markdown
519 lines
21 KiB
Markdown
# Vocabulary Trainer — Project Specification
|
||
|
||
## 1. Overview
|
||
|
||
A multiplayer English–Italian vocabulary trainer with a Duolingo-style quiz interface (one word prompt, four answer choices). Supports both single-player practice and real-time competitive multiplayer rooms of 2–4 players. Designed from the ground up to be language-pair agnostic.
|
||
|
||
### Core Principles
|
||
|
||
- **Minimal but extendable**: Working product fast, clean architecture for future growth
|
||
- **Mobile-first**: Touch-friendly Duolingo-like UX
|
||
- **Type safety end-to-end**: TypeScript + Zod schemas shared between frontend and backend
|
||
|
||
---
|
||
|
||
## 2. Technology Stack
|
||
|
||
| Layer | Technology |
|
||
| -------------------- | ----------------------------- |
|
||
| Monorepo | pnpm workspaces |
|
||
| Frontend | React 18, Vite, TypeScript |
|
||
| Routing | TanStack Router |
|
||
| Server state | TanStack Query |
|
||
| Client state | Zustand |
|
||
| Styling | Tailwind CSS + shadcn/ui |
|
||
| Backend | Node.js, Express, TypeScript |
|
||
| Realtime | WebSockets (`ws` library) |
|
||
| Database | PostgreSQL 18 |
|
||
| ORM | Drizzle ORM |
|
||
| Cache / Queue | Valkey 9 |
|
||
| Auth | OpenAuth (Google + GitHub) |
|
||
| Validation | Zod (shared schemas) |
|
||
| Testing | Vitest, React Testing Library |
|
||
| Linting / Formatting | ESLint, Prettier |
|
||
| Containerisation | Docker, Docker Compose |
|
||
| Hosting | Hetzner VPS |
|
||
|
||
### Why `ws` over Socket.io
|
||
|
||
`ws` is the raw WebSocket library. For rooms of 2–4 players there is no need for Socket.io's transport fallbacks or room-management abstractions. The protocol is defined explicitly in `packages/shared`, which gives the same guarantees without the overhead.
|
||
|
||
### Why Valkey
|
||
|
||
Valkey stores ephemeral room state that does not need to survive a server restart. It keeps the PostgreSQL schema clean and makes room lookups O(1).
|
||
|
||
### Why pnpm workspaces without Turborepo
|
||
|
||
Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, the plain pnpm workspace commands (`pnpm -r run build`, `pnpm --filter`) are sufficient and there is one less tool to configure and maintain.
|
||
|
||
---
|
||
|
||
## 3. Repository Structure
|
||
|
||
```
|
||
vocab-trainer/
|
||
├── apps/
|
||
│ ├── web/ # React SPA (Vite + TanStack Router)
|
||
│ │ ├── src/
|
||
│ │ │ ├── routes/
|
||
│ │ │ ├── components/
|
||
│ │ │ ├── stores/ # Zustand stores
|
||
│ │ │ └── lib/
|
||
│ │ └── Dockerfile
|
||
│ └── api/ # Express REST + WebSocket server
|
||
│ ├── src/
|
||
│ │ ├── routes/
|
||
│ │ ├── services/
|
||
│ │ ├── repositories/
|
||
│ │ └── websocket/
|
||
│ └── Dockerfile
|
||
├── packages/
|
||
│ ├── shared/ # Zod schemas, TypeScript types, constants
|
||
│ └── db/ # Drizzle schema, migrations, seed script
|
||
├── scripts/
|
||
| ├── datafiles/
|
||
│ | └── en-it-nouns.json
|
||
│ └── extract-en-it-nouns.py # One-time WordNet + OMW extraction → seed.json
|
||
├── docker-compose.yml
|
||
├── docker-compose.prod.yml
|
||
├── pnpm-workspace.yaml
|
||
└── package.json
|
||
```
|
||
|
||
`packages/shared` is the contract between frontend and backend. All request/response shapes and WebSocket event payloads are defined there as Zod schemas and inferred TypeScript types — never duplicated.
|
||
|
||
### pnpm workspace config
|
||
|
||
`pnpm-workspace.yaml` declares:
|
||
|
||
```
|
||
packages:
|
||
- 'apps/*'
|
||
- 'packages/*'
|
||
```
|
||
|
||
### Root scripts
|
||
|
||
The root `package.json` defines convenience scripts that delegate to workspaces:
|
||
|
||
- `dev` — starts `api` and `web` in parallel
|
||
- `build` — builds all packages in dependency order
|
||
- `test` — runs Vitest across all workspaces
|
||
- `lint` — runs ESLint across all workspaces
|
||
|
||
For parallel dev, use `concurrently` or just two terminal tabs for MVP.
|
||
|
||
---
|
||
|
||
## 4. Architecture — N-Tier / Layered
|
||
|
||
```
|
||
┌────────────────────────────────────┐
|
||
│ Presentation (React SPA) │ apps/web
|
||
├────────────────────────────────────┤
|
||
│ API / Transport │ HTTP REST + WebSocket
|
||
├────────────────────────────────────┤
|
||
│ Application (Controllers) │ apps/api/src/routes
|
||
│ Domain (Business logic) │ apps/api/src/services
|
||
│ Data Access (Repositories) │ apps/api/src/repositories
|
||
├────────────────────────────────────┤
|
||
│ Database (PostgreSQL via Drizzle) │ packages/db
|
||
│ Cache (Valkey) │ apps/api/src/lib/valkey.ts
|
||
└────────────────────────────────────┘
|
||
```
|
||
|
||
Each layer only communicates with the layer directly below it. Business logic lives in services, not in route handlers or repositories.
|
||
|
||
---
|
||
|
||
## 5. Infrastructure
|
||
|
||
### Domain structure
|
||
|
||
| Subdomain | Service |
|
||
| --------------------- | ----------------------- |
|
||
| `app.yourdomain.com` | React frontend |
|
||
| `api.yourdomain.com` | Express API + WebSocket |
|
||
| `auth.yourdomain.com` | OpenAuth service |
|
||
|
||
### Docker Compose services (production)
|
||
|
||
| Container | Role |
|
||
| ---------------- | ------------------------------------------- |
|
||
| `postgres` | PostgreSQL 16, named volume |
|
||
| `valkey` | Valkey 8, ephemeral (no persistence needed) |
|
||
| `openauth` | OpenAuth service |
|
||
| `api` | Express + WS server |
|
||
| `web` | Nginx serving the Vite build |
|
||
| `nginx-proxy` | Automatic reverse proxy |
|
||
| `acme-companion` | Let's Encrypt certificate automation |
|
||
|
||
```
|
||
nginx-proxy (:80/:443)
|
||
app.domain → web:80
|
||
api.domain → api:3000 (HTTP + WS upgrade)
|
||
auth.domain → openauth:3001
|
||
```
|
||
|
||
SSL is fully automatic via `nginx-proxy` + `acme-companion`. No manual Certbot needed.
|
||
|
||
### 5.1 Valkey Key Structure
|
||
|
||
Ephemeral room state is stored in Valkey with TTL (e.g., 1 hour).
|
||
PostgreSQL stores durable history only.
|
||
|
||
Key Format: `room:{code}:{field}`
|
||
| Key | Type | TTL | Description |
|
||
|------------------------------|---------|-------|-------------|
|
||
| `room:{code}:state` | Hash | 1h | Current question index, round status |
|
||
| `room:{code}:players` | Set | 1h | List of connected user IDs |
|
||
| `room:{code}:answers:{round}`| Hash | 15m | Temp storage for current round answers |
|
||
|
||
Recovery Strategy
|
||
If server crashes mid-game, Valkey data is lost.
|
||
PostgreSQL `room_players.score` remains 0.
|
||
Room status is reset to `finished` via startup health check if `updated_at` is stale.
|
||
|
||
---
|
||
|
||
## 6. Data Model
|
||
|
||
## Design principle
|
||
|
||
Words are modelled as language-neutral concepts (terms) separate from learning curricula (decks).
|
||
Adding a new language pair requires no schema changes — only new rows in `translations`, `decks`, and `language_pairs`.
|
||
|
||
## Core tables
|
||
|
||
terms
|
||
id uuid PK
|
||
synset_id text UNIQUE -- OMW ILI (e.g. "ili:i12345")
|
||
pos varchar(20) -- NOT NULL, CHECK (pos IN ('noun', 'verb', 'adjective', 'adverb'))
|
||
created_at timestamptz DEFAULT now()
|
||
-- REMOVED: frequency_rank (handled at deck level)
|
||
|
||
translations
|
||
id uuid PK
|
||
term_id uuid FK → terms.id
|
||
language_code varchar(10) -- NOT NULL, BCP 47: "en", "it"
|
||
text text -- NOT NULL
|
||
created_at timestamptz DEFAULT now()
|
||
UNIQUE (term_id, language_code, text) -- Allow synonyms, prevent exact duplicates
|
||
|
||
term_glosses
|
||
id uuid PK
|
||
term_id uuid FK → terms.id
|
||
language_code varchar(10) -- NOT NULL
|
||
text text -- NOT NULL
|
||
type varchar(20) -- CHECK (type IN ('definition', 'example')), NULLABLE
|
||
created_at timestamptz DEFAULT now()
|
||
|
||
language_pairs
|
||
id uuid PK
|
||
source varchar(10) -- NOT NULL
|
||
target varchar(10) -- NOT NULL
|
||
label text
|
||
active boolean DEFAULT true
|
||
UNIQUE (source, target)
|
||
|
||
decks
|
||
id uuid PK
|
||
name text -- NOT NULL (e.g. "A1 Italian Nouns", "Most Common 1000")
|
||
description text -- NULLABLE
|
||
pair_id uuid FK → language_pairs.id -- NULLABLE (for single-language or multi-pair decks)
|
||
created_by uuid FK → users.id -- NULLABLE (for system decks)
|
||
is_public boolean DEFAULT true
|
||
created_at timestamptz DEFAULT now()
|
||
|
||
deck_terms
|
||
deck_id uuid FK → decks.id
|
||
term_id uuid FK → terms.id
|
||
position smallint -- NOT NULL, ordering within deck (1, 2, 3...)
|
||
added_at timestamptz DEFAULT now()
|
||
PRIMARY KEY (deck_id, term_id)
|
||
|
||
users
|
||
id uuid PK -- Internal stable ID (FK target)
|
||
openauth_sub text UNIQUE -- NOT NULL, OpenAuth `sub` claim (e.g. "google|12345")
|
||
email varchar(255) UNIQUE -- NULLABLE (GitHub users may lack email)
|
||
display_name varchar(100)
|
||
created_at timestamptz DEFAULT now()
|
||
last_login_at timestamptz
|
||
-- REMOVED: games_played, games_won (derive from room_players)
|
||
|
||
rooms
|
||
id uuid PK
|
||
code varchar(8) UNIQUE -- NOT NULL, CHECK (code = UPPER(code))
|
||
host_id uuid FK → users.id
|
||
pair_id uuid FK → language_pairs.id
|
||
deck_id uuid FK → decks.id -- Which vocabulary deck this room uses
|
||
status varchar(20) -- NOT NULL, CHECK (status IN ('waiting', 'in_progress', 'finished'))
|
||
max_players smallint -- NOT NULL, DEFAULT 4, CHECK (max_players BETWEEN 2 AND 10)
|
||
round_count smallint -- NOT NULL, DEFAULT 10, CHECK (round_count BETWEEN 5 AND 20)
|
||
created_at timestamptz DEFAULT now()
|
||
updated_at timestamptz DEFAULT now() -- For stale room recovery
|
||
|
||
room_players
|
||
room_id uuid FK → rooms.id
|
||
user_id uuid FK → users.id
|
||
score integer DEFAULT 0 -- Final score only (written at game end)
|
||
joined_at timestamptz DEFAULT now()
|
||
left_at timestamptz -- Populated on WS disconnect/leave
|
||
PRIMARY KEY (room_id, user_id)
|
||
|
||
Indexes
|
||
-- Vocabulary
|
||
CREATE INDEX idx_terms_pos ON terms (pos);
|
||
CREATE INDEX idx_translations_lang ON translations (language_code, term_id);
|
||
|
||
-- Decks
|
||
CREATE INDEX idx_decks_pair ON decks (pair_id, is_public);
|
||
CREATE INDEX idx_decks_creator ON decks (created_by);
|
||
CREATE INDEX idx_deck_terms_term ON deck_terms (term_id);
|
||
|
||
-- Language Pairs
|
||
CREATE INDEX idx_pairs_active ON language_pairs (active, source, target);
|
||
|
||
-- Rooms
|
||
CREATE INDEX idx_rooms_status ON rooms (status);
|
||
CREATE INDEX idx_rooms_host ON rooms (host_id);
|
||
-- NOTE: idx_rooms_code omitted (UNIQUE constraint creates index automatically)
|
||
|
||
-- Room Players
|
||
CREATE INDEX idx_room_players_user ON room_players (user_id);
|
||
CREATE INDEX idx_room_players_score ON room_players (room_id, score DESC);
|
||
|
||
Repository Logic Note
|
||
`DeckRepository.getTerms(deckId, limit, offset)` fetches terms from a specific deck.
|
||
Query uses `deck_terms.position` for ordering.
|
||
For random practice within a deck: `WHERE deck_id = X ORDER BY random() LIMIT N`
|
||
(safe because deck is bounded, e.g., 500 terms max, not full table).
|
||
|
||
---
|
||
|
||
## 7. Vocabulary Data — WordNet + OMW
|
||
|
||
### Source
|
||
|
||
Open Multilingual Wordnet (OMW) — English & Italian nouns via Interlingual Index (ILI)
|
||
External CEFR lists — For deck curation (e.g. GitHub: ecom/cefr-lists)
|
||
|
||
### Extraction process
|
||
|
||
1. Run `extract-en-it-nouns.py` once locally using `wn` library
|
||
- Imports ALL bilingual noun synsets (no frequency filtering)
|
||
- Output: `datafiles/en-it-nouns.json` — committed to repo
|
||
2. Run `pnpm db:seed` — populates `terms` + `translations` tables from JSON
|
||
3. Run `pnpm db:build-decks` — matches external CEFR lists to DB terms, creates `decks` + `deck_terms`
|
||
|
||
### Benefits of deck-based approach
|
||
|
||
- WordNet frequency data is unreliable (e.g. chemical symbols rank high)
|
||
- Curricula can come from external sources (CEFR, Oxford 3000, SUBTLEX)
|
||
- Bad data excluded at deck level, not schema level
|
||
- Users can create custom decks later
|
||
- Multiple difficulty levels without schema changes
|
||
|
||
`terms.synset_id` stores the OMW ILI (e.g. `ili:i12345`) for traceability and future re-imports with additional languages.
|
||
|
||
---
|
||
|
||
## 8. Authentication — OpenAuth
|
||
|
||
All auth is delegated to the OpenAuth service at `auth.yourdomain.com`. Providers: Google, GitHub.
|
||
|
||
The API validates the JWT from OpenAuth on every protected request. User rows are created or updated on first login via the `sub` claim as the primary key.
|
||
|
||
**Auth endpoint on the API:**
|
||
|
||
| Method | Path | Description |
|
||
| ------ | -------------- | --------------------------- |
|
||
| GET | `/api/auth/me` | Validate token, return user |
|
||
|
||
All other auth flows (login, callback, token refresh) are handled entirely by OpenAuth — the frontend redirects to `auth.yourdomain.com` and receives a JWT back.
|
||
|
||
---
|
||
|
||
## 9. REST API
|
||
|
||
All endpoints prefixed `/api`. Request and response bodies validated with Zod on both sides using schemas from `packages/shared`.
|
||
|
||
### Vocabulary
|
||
|
||
| Method | Path | Description |
|
||
| ------ | ---------------------------- | --------------------------------- |
|
||
| GET | `/language-pairs` | List active language pairs |
|
||
| GET | `/terms?pair=en-it&limit=10` | Fetch quiz terms with distractors |
|
||
|
||
### Rooms
|
||
|
||
| Method | Path | Description |
|
||
| ------ | ------------------- | ----------------------------------- |
|
||
| POST | `/rooms` | Create a room → returns room + code |
|
||
| GET | `/rooms/:code` | Get current room state |
|
||
| POST | `/rooms/:code/join` | Join a room |
|
||
|
||
### Users
|
||
|
||
| Method | Path | Description |
|
||
| ------ | ----------------- | ---------------------- |
|
||
| GET | `/users/me` | Current user profile |
|
||
| GET | `/users/me/stats` | Games played, win rate |
|
||
|
||
---
|
||
|
||
## 10. WebSocket Protocol
|
||
|
||
One WS connection per client. Authenticated by passing the OpenAuth JWT as a query param on the upgrade request: `wss://api.yourdomain.com?token=...`.
|
||
|
||
All messages are JSON: `{ type: string, payload: unknown }`. The full set of types is a Zod discriminated union in `packages/shared` — both sides validate every message they receive.
|
||
|
||
### Client → Server
|
||
|
||
| type | payload | Description |
|
||
| ------------- | -------------------------- | -------------------------------- |
|
||
| `room:join` | `{ code }` | Subscribe to a room's WS channel |
|
||
| `room:leave` | — | Unsubscribe |
|
||
| `room:start` | — | Host starts the game |
|
||
| `game:answer` | `{ questionId, answerId }` | Player submits an answer |
|
||
|
||
### Server → Client
|
||
|
||
| type | payload | Description |
|
||
| -------------------- | -------------------------------------------------- | ----------------------------------------- |
|
||
| `room:state` | Full room snapshot | Sent on join and on any player join/leave |
|
||
| `game:question` | `{ id, prompt, options[], timeLimit }` | New question broadcast to all players |
|
||
| `game:answer_result` | `{ questionId, correct, correctAnswerId, scores }` | Broadcast after all answer or timeout |
|
||
| `game:finished` | `{ scores[], winner }` | End of game summary |
|
||
| `error` | `{ message }` | Protocol or validation error |
|
||
|
||
### Multiplayer game mechanic — simultaneous answers
|
||
|
||
All players see the same question at the same time. Everyone submits independently. The server waits until all players have answered **or** the 15-second timeout fires — then broadcasts `game:answer_result` with updated scores. There is no buzz-first mechanic. This keeps the experience Duolingo-like and symmetric.
|
||
|
||
### Game flow
|
||
|
||
```
|
||
host creates room (REST) →
|
||
players join via room code (REST + WS room:join) →
|
||
room:state broadcasts player list →
|
||
host sends room:start →
|
||
server broadcasts game:question →
|
||
players send game:answer →
|
||
server collects all answers or waits for timeout →
|
||
server broadcasts game:answer_result →
|
||
repeat for N rounds →
|
||
server broadcasts game:finished
|
||
```
|
||
|
||
### Room state in Valkey
|
||
|
||
Active room state (connected players, current question, answers received this round) is stored in Valkey with a TTL. PostgreSQL holds the durable record (`rooms`, `room_players`). On server restart, in-progress games are considered abandoned — acceptable for MVP.
|
||
|
||
---
|
||
|
||
## 11. Game Mechanics
|
||
|
||
- **Question format**: source-language word prompt + 4 target-language choices (1 correct + 3 distractors of the same POS)
|
||
- **Distractors**: generated server-side, never include the correct answer, never repeat within a session
|
||
- **Scoring**: +1 point per correct answer. Speed bonus is out of scope for MVP.
|
||
- **Timer**: 15 seconds per question, server-authoritative
|
||
- **Single-player**: uses `GET /terms` and runs entirely client-side. No WebSocket.
|
||
|
||
---
|
||
|
||
## 12. Frontend Structure
|
||
|
||
```
|
||
apps/web/src/
|
||
├── routes/
|
||
│ ├── index.tsx # Landing / mode select
|
||
│ ├── auth/
|
||
│ ├── singleplayer/
|
||
│ └── multiplayer/
|
||
│ ├── lobby.tsx # Create or join by code
|
||
│ ├── room.$code.tsx # Waiting room
|
||
│ └── game.$code.tsx # Active game
|
||
├── components/
|
||
│ ├── quiz/ # QuestionCard, OptionButton, ScoreBoard
|
||
│ ├── room/ # PlayerList, RoomCode, ReadyState
|
||
│ └── ui/ # shadcn/ui wrappers: Button, Card, Dialog ...
|
||
├── stores/
|
||
│ └── gameStore.ts # Zustand: game session, scores, WS state
|
||
├── lib/
|
||
│ ├── api.ts # TanStack Query wrappers
|
||
│ └── ws.ts # WS client singleton
|
||
└── main.tsx
|
||
```
|
||
|
||
### Zustand store (single store for MVP)
|
||
|
||
```typescript
|
||
interface AppStore {
|
||
user: User | null;
|
||
gameSession: GameSession | null;
|
||
currentQuestion: Question | null;
|
||
scores: Record<string, number>;
|
||
isLoading: boolean;
|
||
error: string | null;
|
||
}
|
||
```
|
||
|
||
TanStack Query handles all server data fetching. Zustand handles ephemeral UI and WebSocket-driven state.
|
||
|
||
---
|
||
|
||
## 13. Testing Strategy
|
||
|
||
| Type | Tool | Scope |
|
||
| ----------- | -------------------- | --------------------------------------------------- |
|
||
| Unit | Vitest | Services, QuizService distractor logic, Zod schemas |
|
||
| Component | Vitest + RTL | QuestionCard, OptionButton, auth forms |
|
||
| Integration | Vitest | API route handlers against a test DB |
|
||
| E2E | Out of scope for MVP | — |
|
||
|
||
Tests are co-located with source files (`*.test.ts` / `*.test.tsx`).
|
||
|
||
**Critical paths to cover:**
|
||
|
||
- Distractor generation (correct POS, no duplicates, never includes answer)
|
||
- Answer validation (server-side, correct scoring)
|
||
- Game session lifecycle (create → play → complete)
|
||
- JWT validation middleware
|
||
|
||
---
|
||
|
||
## 14. Definition of Done
|
||
|
||
### Functional
|
||
|
||
- [ ] User can log in via Google or GitHub (OpenAuth)
|
||
- [ ] User can play singleplayer: 10 rounds, score, result screen
|
||
- [ ] User can create a room and share a code
|
||
- [ ] User can join a room via code
|
||
- [ ] Multiplayer: 10 rounds, simultaneous answers, real-time score sync
|
||
- [ ] 1 000 English–Italian words seeded from WordNet + OMW
|
||
|
||
### Technical
|
||
|
||
- [ ] Deployed to Hetzner with HTTPS on all three subdomains
|
||
- [ ] Docker Compose running all services
|
||
- [ ] Drizzle migrations applied on container start
|
||
- [ ] 10–20 passing tests covering critical paths
|
||
- [ ] pnpm workspace build pipeline green
|
||
|
||
|
||
|
||
---
|
||
|
||
## 15. Out of Scope (MVP)
|
||
|
||
- Difficulty levels _(`frequency_rank` column exists, ready to use)_
|
||
- Additional language pairs _(schema already supports it — just add rows)_
|
||
- Leaderboards _(`games_played`, `games_won` columns exist)_
|
||
- Streaks / daily challenges
|
||
- Friends / private invites
|
||
- Audio pronunciation
|
||
- CI/CD pipeline (manual deploy for now)
|
||
- Rate limiting _(add before going public)_
|
||
- Admin panel for vocabulary management
|