adding documentation

This commit is contained in:
lila 2026-03-20 09:21:06 +01:00
parent 25bb43ee4b
commit 94f02b9904
2 changed files with 585 additions and 0 deletions

436
documentation/spec.md Normal file
View file

@ -0,0 +1,436 @@
# Vocabulary Trainer — Project Specification
## 1. Overview
A multiplayer EnglishItalian vocabulary trainer with a Duolingo-style quiz interface (one word prompt, four answer choices). Supports both single-player practice and real-time competitive multiplayer rooms of 24 players. Designed from the ground up to be language-pair agnostic.
### Core Principles
- **Minimal but extendable**: Working product fast, clean architecture for future growth
- **Mobile-first**: Touch-friendly Duolingo-like UX
- **Type safety end-to-end**: TypeScript + Zod schemas shared between frontend and backend
---
## 2. Technology Stack
| Layer | Technology |
|---|---|
| Monorepo | pnpm workspaces |
| Frontend | React 18, Vite, TypeScript |
| Routing | TanStack Router |
| Server state | TanStack Query |
| Client state | Zustand |
| Styling | Tailwind CSS + shadcn/ui |
| Backend | Node.js, Express, TypeScript |
| Realtime | WebSockets (`ws` library) |
| Database | PostgreSQL 16 |
| ORM | Drizzle ORM |
| Cache / Queue | Valkey 8 |
| Auth | OpenAuth (Google + GitHub) |
| Validation | Zod (shared schemas) |
| Testing | Vitest, React Testing Library |
| Linting / Formatting | ESLint, Prettier |
| Containerisation | Docker, Docker Compose |
| Hosting | Hetzner VPS |
### Why `ws` over Socket.io
`ws` is the raw WebSocket library. For rooms of 24 players there is no need for Socket.io's transport fallbacks or room-management abstractions. The protocol is defined explicitly in `packages/shared`, which gives the same guarantees without the overhead.
### Why Valkey
Valkey stores ephemeral room state that does not need to survive a server restart. It keeps the PostgreSQL schema clean and makes room lookups O(1).
### Why pnpm workspaces without Turborepo
Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, the plain pnpm workspace commands (`pnpm -r run build`, `pnpm --filter`) are sufficient and there is one less tool to configure and maintain.
---
## 3. Repository Structure
```
vocab-trainer/
├── apps/
│ ├── web/ # React SPA (Vite + TanStack Router)
│ │ ├── src/
│ │ │ ├── routes/
│ │ │ ├── components/
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── lib/
│ │ └── Dockerfile
│ └── api/ # Express REST + WebSocket server
│ ├── src/
│ │ ├── routes/
│ │ ├── services/
│ │ ├── repositories/
│ │ └── websocket/
│ └── Dockerfile
├── packages/
│ ├── shared/ # Zod schemas, TypeScript types, constants
│ └── db/ # Drizzle schema, migrations, seed script
├── scripts/
│ └── extract_omw.py # One-time WordNet + OMW extraction → seed.json
├── docker-compose.yml
├── docker-compose.prod.yml
├── pnpm-workspace.yaml
└── package.json
```
`packages/shared` is the contract between frontend and backend. All request/response shapes and WebSocket event payloads are defined there as Zod schemas and inferred TypeScript types — never duplicated.
### pnpm workspace config
`pnpm-workspace.yaml` declares:
```
packages:
- 'apps/*'
- 'packages/*'
```
### Root scripts
The root `package.json` defines convenience scripts that delegate to workspaces:
- `dev` — starts `api` and `web` in parallel
- `build` — builds all packages in dependency order
- `test` — runs Vitest across all workspaces
- `lint` — runs ESLint across all workspaces
For parallel dev, use `concurrently` or just two terminal tabs for MVP.
---
## 4. Architecture — N-Tier / Layered
```
┌────────────────────────────────────┐
│ Presentation (React SPA) │ apps/web
├────────────────────────────────────┤
│ API / Transport │ HTTP REST + WebSocket
├────────────────────────────────────┤
│ Application (Controllers) │ apps/api/src/routes
│ Domain (Business logic) │ apps/api/src/services
│ Data Access (Repositories) │ apps/api/src/repositories
├────────────────────────────────────┤
│ Database (PostgreSQL via Drizzle) │ packages/db
│ Cache (Valkey) │ apps/api/src/lib/valkey.ts
└────────────────────────────────────┘
```
Each layer only communicates with the layer directly below it. Business logic lives in services, not in route handlers or repositories.
---
## 5. Infrastructure
### Domain structure
| Subdomain | Service |
|---|---|
| `app.yourdomain.com` | React frontend |
| `api.yourdomain.com` | Express API + WebSocket |
| `auth.yourdomain.com` | OpenAuth service |
### Docker Compose services (production)
| Container | Role |
|---|---|
| `postgres` | PostgreSQL 16, named volume |
| `valkey` | Valkey 8, ephemeral (no persistence needed) |
| `openauth` | OpenAuth service |
| `api` | Express + WS server |
| `web` | Nginx serving the Vite build |
| `nginx-proxy` | Automatic reverse proxy |
| `acme-companion` | Let's Encrypt certificate automation |
```
nginx-proxy (:80/:443)
app.domain → web:80
api.domain → api:3000 (HTTP + WS upgrade)
auth.domain → openauth:3001
```
SSL is fully automatic via `nginx-proxy` + `acme-companion`. No manual Certbot needed.
---
## 6. Data Model
### Design principle
Words are modelled as language-neutral **terms** with one or more **translations** per language. Adding a new language pair (e.g. EnglishFrench) requires **no schema changes** — only new rows in `translations` and `language_pairs`. The flat `english/italian` column pattern is explicitly avoided.
### Core tables
```
terms
id uuid PK
synset_id text UNIQUE -- WordNet synset offset e.g. "wn:01234567n"
pos varchar(20) -- "noun" | "verb" | "adjective"
frequency_rank integer -- 11000, reserved for difficulty filtering
created_at timestamptz
translations
id uuid PK
term_id uuid FK → terms.id
language_code varchar(10) -- BCP 47: "en", "it", "de", ...
text text
UNIQUE (term_id, language_code)
language_pairs
id uuid PK
source varchar(10) -- "en"
target varchar(10) -- "it"
label text -- "English → Italian"
active boolean DEFAULT true
UNIQUE (source, target)
users
id uuid PK -- OpenAuth sub claim
email varchar(255) UNIQUE
display_name varchar(100)
games_played integer DEFAULT 0
games_won integer DEFAULT 0
created_at timestamptz
last_login_at timestamptz
rooms
id uuid PK
code varchar(8) UNIQUE -- human-readable e.g. "WOLF-42"
host_id uuid FK → users.id
pair_id uuid FK → language_pairs.id
status text -- "waiting" | "in_progress" | "finished"
max_players smallint DEFAULT 4
round_count smallint DEFAULT 10
created_at timestamptz
room_players
room_id uuid FK → rooms.id
user_id uuid FK → users.id
score integer DEFAULT 0
joined_at timestamptz
PRIMARY KEY (room_id, user_id)
```
### Indexes
```sql
CREATE INDEX ON terms (pos, frequency_rank);
CREATE INDEX ON rooms (status);
CREATE INDEX ON room_players (user_id);
```
---
## 7. Vocabulary Data — WordNet + OMW
### Source
- **Princeton WordNet** — English words + synset IDs
- **Open Multilingual Wordnet (OMW)** — Italian translations keyed by synset ID
### Extraction process
1. Run `scripts/extract_omw.py` once locally using NLTK
2. Filter to the 1 000 most common nouns (by WordNet frequency data)
3. Output: `packages/db/src/seed.json` — committed to the repo
4. `packages/db/src/seed.ts` reads the JSON and populates `terms` + `translations`
`terms.synset_id` stores the WordNet offset (e.g. `wn:01234567n`) for traceability and future re-imports with additional languages.
---
## 8. Authentication — OpenAuth
All auth is delegated to the OpenAuth service at `auth.yourdomain.com`. Providers: Google, GitHub.
The API validates the JWT from OpenAuth on every protected request. User rows are created or updated on first login via the `sub` claim as the primary key.
**Auth endpoint on the API:**
| Method | Path | Description |
|---|---|---|
| GET | `/api/auth/me` | Validate token, return user |
All other auth flows (login, callback, token refresh) are handled entirely by OpenAuth — the frontend redirects to `auth.yourdomain.com` and receives a JWT back.
---
## 9. REST API
All endpoints prefixed `/api`. Request and response bodies validated with Zod on both sides using schemas from `packages/shared`.
### Vocabulary
| Method | Path | Description |
|---|---|---|
| GET | `/language-pairs` | List active language pairs |
| GET | `/terms?pair=en-it&limit=10` | Fetch quiz terms with distractors |
### Rooms
| Method | Path | Description |
|---|---|---|
| POST | `/rooms` | Create a room → returns room + code |
| GET | `/rooms/:code` | Get current room state |
| POST | `/rooms/:code/join` | Join a room |
### Users
| Method | Path | Description |
|---|---|---|
| GET | `/users/me` | Current user profile |
| GET | `/users/me/stats` | Games played, win rate |
---
## 10. WebSocket Protocol
One WS connection per client. Authenticated by passing the OpenAuth JWT as a query param on the upgrade request: `wss://api.yourdomain.com?token=...`.
All messages are JSON: `{ type: string, payload: unknown }`. The full set of types is a Zod discriminated union in `packages/shared` — both sides validate every message they receive.
### Client → Server
| type | payload | Description |
|---|---|---|
| `room:join` | `{ code }` | Subscribe to a room's WS channel |
| `room:leave` | — | Unsubscribe |
| `room:start` | — | Host starts the game |
| `game:answer` | `{ questionId, answerId }` | Player submits an answer |
### Server → Client
| type | payload | Description |
|---|---|---|
| `room:state` | Full room snapshot | Sent on join and on any player join/leave |
| `game:question` | `{ id, prompt, options[], timeLimit }` | New question broadcast to all players |
| `game:answer_result` | `{ questionId, correct, correctAnswerId, scores }` | Broadcast after all answer or timeout |
| `game:finished` | `{ scores[], winner }` | End of game summary |
| `error` | `{ message }` | Protocol or validation error |
### Multiplayer game mechanic — simultaneous answers
All players see the same question at the same time. Everyone submits independently. The server waits until all players have answered **or** the 15-second timeout fires — then broadcasts `game:answer_result` with updated scores. There is no buzz-first mechanic. This keeps the experience Duolingo-like and symmetric.
### Game flow
```
host creates room (REST) →
players join via room code (REST + WS room:join) →
room:state broadcasts player list →
host sends room:start →
server broadcasts game:question →
players send game:answer →
server collects all answers or waits for timeout →
server broadcasts game:answer_result →
repeat for N rounds →
server broadcasts game:finished
```
### Room state in Valkey
Active room state (connected players, current question, answers received this round) is stored in Valkey with a TTL. PostgreSQL holds the durable record (`rooms`, `room_players`). On server restart, in-progress games are considered abandoned — acceptable for MVP.
---
## 11. Game Mechanics
- **Question format**: source-language word prompt + 4 target-language choices (1 correct + 3 distractors of the same POS)
- **Distractors**: generated server-side, never include the correct answer, never repeat within a session
- **Scoring**: +1 point per correct answer. Speed bonus is out of scope for MVP.
- **Timer**: 15 seconds per question, server-authoritative
- **Single-player**: uses `GET /terms` and runs entirely client-side. No WebSocket.
---
## 12. Frontend Structure
```
apps/web/src/
├── routes/
│ ├── index.tsx # Landing / mode select
│ ├── auth/
│ ├── singleplayer/
│ └── multiplayer/
│ ├── lobby.tsx # Create or join by code
│ ├── room.$code.tsx # Waiting room
│ └── game.$code.tsx # Active game
├── components/
│ ├── quiz/ # QuestionCard, OptionButton, ScoreBoard
│ ├── room/ # PlayerList, RoomCode, ReadyState
│ └── ui/ # shadcn/ui wrappers: Button, Card, Dialog ...
├── stores/
│ └── gameStore.ts # Zustand: game session, scores, WS state
├── lib/
│ ├── api.ts # TanStack Query wrappers
│ └── ws.ts # WS client singleton
└── main.tsx
```
### Zustand store (single store for MVP)
```typescript
interface AppStore {
user: User | null;
gameSession: GameSession | null;
currentQuestion: Question | null;
scores: Record<string, number>;
isLoading: boolean;
error: string | null;
}
```
TanStack Query handles all server data fetching. Zustand handles ephemeral UI and WebSocket-driven state.
---
## 13. Testing Strategy
| Type | Tool | Scope |
|---|---|---|
| Unit | Vitest | Services, QuizService distractor logic, Zod schemas |
| Component | Vitest + RTL | QuestionCard, OptionButton, auth forms |
| Integration | Vitest | API route handlers against a test DB |
| E2E | Out of scope for MVP | — |
Tests are co-located with source files (`*.test.ts` / `*.test.tsx`).
**Critical paths to cover:**
- Distractor generation (correct POS, no duplicates, never includes answer)
- Answer validation (server-side, correct scoring)
- Game session lifecycle (create → play → complete)
- JWT validation middleware
---
## 14. Definition of Done
### Functional
- [ ] User can log in via Google or GitHub (OpenAuth)
- [ ] User can play singleplayer: 10 rounds, score, result screen
- [ ] User can create a room and share a code
- [ ] User can join a room via code
- [ ] Multiplayer: 10 rounds, simultaneous answers, real-time score sync
- [ ] 1 000 EnglishItalian words seeded from WordNet + OMW
### Technical
- [ ] Deployed to Hetzner with HTTPS on all three subdomains
- [ ] Docker Compose running all services
- [ ] Drizzle migrations applied on container start
- [ ] 1020 passing tests covering critical paths
- [ ] pnpm workspace build pipeline green
### Documentation
- [ ] `SPEC.md` complete
- [ ] `.env.example` files for all apps
- [ ] `README.md` with local dev setup instructions
---
## 15. Out of Scope (MVP)
- Difficulty levels *(`frequency_rank` column exists, ready to use)*
- Additional language pairs *(schema already supports it — just add rows)*
- Leaderboards *(`games_played`, `games_won` columns exist)*
- Streaks / daily challenges
- Friends / private invites
- Audio pronunciation
- CI/CD pipeline (manual deploy for now)
- Rate limiting *(add before going public)*
- Admin panel for vocabulary management