lila/documentation/spec.md
lila 2ebf0d0a83 infra: add Docker Compose setup for local development
- Configure PostgreSQL 18 and Valkey 9.1 services
- Create multi-stage Dockerfiles for API and Web apps
- Set up pnpm workspace support in container builds
- Configure hot reload via volume mounts for both services
- Add healthchecks for service orchestration
- Support dev/production stage targets (tsx watch vs compiled)
2026-03-25 18:56:04 +01:00

448 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Vocabulary Trainer — Project Specification
## 1. Overview
A multiplayer EnglishItalian vocabulary trainer with a Duolingo-style quiz interface (one word prompt, four answer choices). Supports both single-player practice and real-time competitive multiplayer rooms of 24 players. Designed from the ground up to be language-pair agnostic.
### Core Principles
- **Minimal but extendable**: Working product fast, clean architecture for future growth
- **Mobile-first**: Touch-friendly Duolingo-like UX
- **Type safety end-to-end**: TypeScript + Zod schemas shared between frontend and backend
---
## 2. Technology Stack
| Layer | Technology |
| -------------------- | ----------------------------- |
| Monorepo | pnpm workspaces |
| Frontend | React 18, Vite, TypeScript |
| Routing | TanStack Router |
| Server state | TanStack Query |
| Client state | Zustand |
| Styling | Tailwind CSS + shadcn/ui |
| Backend | Node.js, Express, TypeScript |
| Realtime | WebSockets (`ws` library) |
| Database | PostgreSQL 18 |
| ORM | Drizzle ORM |
| Cache / Queue | Valkey 9 |
| Auth | OpenAuth (Google + GitHub) |
| Validation | Zod (shared schemas) |
| Testing | Vitest, React Testing Library |
| Linting / Formatting | ESLint, Prettier |
| Containerisation | Docker, Docker Compose |
| Hosting | Hetzner VPS |
### Why `ws` over Socket.io
`ws` is the raw WebSocket library. For rooms of 24 players there is no need for Socket.io's transport fallbacks or room-management abstractions. The protocol is defined explicitly in `packages/shared`, which gives the same guarantees without the overhead.
### Why Valkey
Valkey stores ephemeral room state that does not need to survive a server restart. It keeps the PostgreSQL schema clean and makes room lookups O(1).
### Why pnpm workspaces without Turborepo
Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, the plain pnpm workspace commands (`pnpm -r run build`, `pnpm --filter`) are sufficient and there is one less tool to configure and maintain.
---
## 3. Repository Structure
```
vocab-trainer/
├── apps/
│ ├── web/ # React SPA (Vite + TanStack Router)
│ │ ├── src/
│ │ │ ├── routes/
│ │ │ ├── components/
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── lib/
│ │ └── Dockerfile
│ └── api/ # Express REST + WebSocket server
│ ├── src/
│ │ ├── routes/
│ │ ├── services/
│ │ ├── repositories/
│ │ └── websocket/
│ └── Dockerfile
├── packages/
│ ├── shared/ # Zod schemas, TypeScript types, constants
│ └── db/ # Drizzle schema, migrations, seed script
├── scripts/
│ └── extract_omw.py # One-time WordNet + OMW extraction → seed.json
├── docker-compose.yml
├── docker-compose.prod.yml
├── pnpm-workspace.yaml
└── package.json
```
`packages/shared` is the contract between frontend and backend. All request/response shapes and WebSocket event payloads are defined there as Zod schemas and inferred TypeScript types — never duplicated.
### pnpm workspace config
`pnpm-workspace.yaml` declares:
```
packages:
- 'apps/*'
- 'packages/*'
```
### Root scripts
The root `package.json` defines convenience scripts that delegate to workspaces:
- `dev` — starts `api` and `web` in parallel
- `build` — builds all packages in dependency order
- `test` — runs Vitest across all workspaces
- `lint` — runs ESLint across all workspaces
For parallel dev, use `concurrently` or just two terminal tabs for MVP.
---
## 4. Architecture — N-Tier / Layered
```
┌────────────────────────────────────┐
│ Presentation (React SPA) │ apps/web
├────────────────────────────────────┤
│ API / Transport │ HTTP REST + WebSocket
├────────────────────────────────────┤
│ Application (Controllers) │ apps/api/src/routes
│ Domain (Business logic) │ apps/api/src/services
│ Data Access (Repositories) │ apps/api/src/repositories
├────────────────────────────────────┤
│ Database (PostgreSQL via Drizzle) │ packages/db
│ Cache (Valkey) │ apps/api/src/lib/valkey.ts
└────────────────────────────────────┘
```
Each layer only communicates with the layer directly below it. Business logic lives in services, not in route handlers or repositories.
---
## 5. Infrastructure
### Domain structure
| Subdomain | Service |
| --------------------- | ----------------------- |
| `app.yourdomain.com` | React frontend |
| `api.yourdomain.com` | Express API + WebSocket |
| `auth.yourdomain.com` | OpenAuth service |
### Docker Compose services (production)
| Container | Role |
| ---------------- | ------------------------------------------- |
| `postgres` | PostgreSQL 16, named volume |
| `valkey` | Valkey 8, ephemeral (no persistence needed) |
| `openauth` | OpenAuth service |
| `api` | Express + WS server |
| `web` | Nginx serving the Vite build |
| `nginx-proxy` | Automatic reverse proxy |
| `acme-companion` | Let's Encrypt certificate automation |
```
nginx-proxy (:80/:443)
app.domain → web:80
api.domain → api:3000 (HTTP + WS upgrade)
auth.domain → openauth:3001
```
SSL is fully automatic via `nginx-proxy` + `acme-companion`. No manual Certbot needed.
---
## 6. Data Model
### Design principle
Words are modelled as language-neutral **terms** with one or more **translations** per language. Adding a new language pair (e.g. EnglishFrench) requires **no schema changes** — only new rows in `translations` and `language_pairs`. The flat `english/italian` column pattern is explicitly avoided.
### Core tables
```
terms
id uuid PK
synset_id text UNIQUE -- WordNet synset offset e.g. "wn:01234567n"
pos varchar(20) -- "noun" | "verb" | "adjective"
frequency_rank integer -- 11000, reserved for difficulty filtering
created_at timestamptz
translations
id uuid PK
term_id uuid FK → terms.id
language_code varchar(10) -- BCP 47: "en", "it", "de", ...
text text
UNIQUE (term_id, language_code)
language_pairs
id uuid PK
source varchar(10) -- "en"
target varchar(10) -- "it"
label text -- "English → Italian"
active boolean DEFAULT true
UNIQUE (source, target)
users
id uuid PK -- OpenAuth sub claim
email varchar(255) UNIQUE
display_name varchar(100)
games_played integer DEFAULT 0
games_won integer DEFAULT 0
created_at timestamptz
last_login_at timestamptz
rooms
id uuid PK
code varchar(8) UNIQUE -- human-readable e.g. "WOLF-42"
host_id uuid FK → users.id
pair_id uuid FK → language_pairs.id
status text -- "waiting" | "in_progress" | "finished"
max_players smallint DEFAULT 4
round_count smallint DEFAULT 10
created_at timestamptz
room_players
room_id uuid FK → rooms.id
user_id uuid FK → users.id
score integer DEFAULT 0
joined_at timestamptz
PRIMARY KEY (room_id, user_id)
```
### Indexes
```sql
CREATE INDEX ON terms (pos, frequency_rank);
CREATE INDEX ON rooms (status);
CREATE INDEX ON room_players (user_id);
```
---
## 7. Vocabulary Data — WordNet + OMW
### Source
- **Princeton WordNet** — English words + synset IDs
- **Open Multilingual Wordnet (OMW)** — Italian translations keyed by synset ID
### Extraction process
1. Run `scripts/extract_omw.py` once locally using NLTK
2. Filter to the 1 000 most common nouns (by WordNet frequency data)
3. Output: `packages/db/src/seed.json` — committed to the repo
4. `packages/db/src/seed.ts` reads the JSON and populates `terms` + `translations`
`terms.synset_id` stores the WordNet offset (e.g. `wn:01234567n`) for traceability and future re-imports with additional languages.
---
## 8. Authentication — OpenAuth
All auth is delegated to the OpenAuth service at `auth.yourdomain.com`. Providers: Google, GitHub.
The API validates the JWT from OpenAuth on every protected request. User rows are created or updated on first login via the `sub` claim as the primary key.
**Auth endpoint on the API:**
| Method | Path | Description |
| ------ | -------------- | --------------------------- |
| GET | `/api/auth/me` | Validate token, return user |
All other auth flows (login, callback, token refresh) are handled entirely by OpenAuth — the frontend redirects to `auth.yourdomain.com` and receives a JWT back.
---
## 9. REST API
All endpoints prefixed `/api`. Request and response bodies validated with Zod on both sides using schemas from `packages/shared`.
### Vocabulary
| Method | Path | Description |
| ------ | ---------------------------- | --------------------------------- |
| GET | `/language-pairs` | List active language pairs |
| GET | `/terms?pair=en-it&limit=10` | Fetch quiz terms with distractors |
### Rooms
| Method | Path | Description |
| ------ | ------------------- | ----------------------------------- |
| POST | `/rooms` | Create a room → returns room + code |
| GET | `/rooms/:code` | Get current room state |
| POST | `/rooms/:code/join` | Join a room |
### Users
| Method | Path | Description |
| ------ | ----------------- | ---------------------- |
| GET | `/users/me` | Current user profile |
| GET | `/users/me/stats` | Games played, win rate |
---
## 10. WebSocket Protocol
One WS connection per client. Authenticated by passing the OpenAuth JWT as a query param on the upgrade request: `wss://api.yourdomain.com?token=...`.
All messages are JSON: `{ type: string, payload: unknown }`. The full set of types is a Zod discriminated union in `packages/shared` — both sides validate every message they receive.
### Client → Server
| type | payload | Description |
| ------------- | -------------------------- | -------------------------------- |
| `room:join` | `{ code }` | Subscribe to a room's WS channel |
| `room:leave` | — | Unsubscribe |
| `room:start` | — | Host starts the game |
| `game:answer` | `{ questionId, answerId }` | Player submits an answer |
### Server → Client
| type | payload | Description |
| -------------------- | -------------------------------------------------- | ----------------------------------------- |
| `room:state` | Full room snapshot | Sent on join and on any player join/leave |
| `game:question` | `{ id, prompt, options[], timeLimit }` | New question broadcast to all players |
| `game:answer_result` | `{ questionId, correct, correctAnswerId, scores }` | Broadcast after all answer or timeout |
| `game:finished` | `{ scores[], winner }` | End of game summary |
| `error` | `{ message }` | Protocol or validation error |
### Multiplayer game mechanic — simultaneous answers
All players see the same question at the same time. Everyone submits independently. The server waits until all players have answered **or** the 15-second timeout fires — then broadcasts `game:answer_result` with updated scores. There is no buzz-first mechanic. This keeps the experience Duolingo-like and symmetric.
### Game flow
```
host creates room (REST) →
players join via room code (REST + WS room:join) →
room:state broadcasts player list →
host sends room:start →
server broadcasts game:question →
players send game:answer →
server collects all answers or waits for timeout →
server broadcasts game:answer_result →
repeat for N rounds →
server broadcasts game:finished
```
### Room state in Valkey
Active room state (connected players, current question, answers received this round) is stored in Valkey with a TTL. PostgreSQL holds the durable record (`rooms`, `room_players`). On server restart, in-progress games are considered abandoned — acceptable for MVP.
---
## 11. Game Mechanics
- **Question format**: source-language word prompt + 4 target-language choices (1 correct + 3 distractors of the same POS)
- **Distractors**: generated server-side, never include the correct answer, never repeat within a session
- **Scoring**: +1 point per correct answer. Speed bonus is out of scope for MVP.
- **Timer**: 15 seconds per question, server-authoritative
- **Single-player**: uses `GET /terms` and runs entirely client-side. No WebSocket.
---
## 12. Frontend Structure
```
apps/web/src/
├── routes/
│ ├── index.tsx # Landing / mode select
│ ├── auth/
│ ├── singleplayer/
│ └── multiplayer/
│ ├── lobby.tsx # Create or join by code
│ ├── room.$code.tsx # Waiting room
│ └── game.$code.tsx # Active game
├── components/
│ ├── quiz/ # QuestionCard, OptionButton, ScoreBoard
│ ├── room/ # PlayerList, RoomCode, ReadyState
│ └── ui/ # shadcn/ui wrappers: Button, Card, Dialog ...
├── stores/
│ └── gameStore.ts # Zustand: game session, scores, WS state
├── lib/
│ ├── api.ts # TanStack Query wrappers
│ └── ws.ts # WS client singleton
└── main.tsx
```
### Zustand store (single store for MVP)
```typescript
interface AppStore {
user: User | null;
gameSession: GameSession | null;
currentQuestion: Question | null;
scores: Record<string, number>;
isLoading: boolean;
error: string | null;
}
```
TanStack Query handles all server data fetching. Zustand handles ephemeral UI and WebSocket-driven state.
---
## 13. Testing Strategy
| Type | Tool | Scope |
| ----------- | -------------------- | --------------------------------------------------- |
| Unit | Vitest | Services, QuizService distractor logic, Zod schemas |
| Component | Vitest + RTL | QuestionCard, OptionButton, auth forms |
| Integration | Vitest | API route handlers against a test DB |
| E2E | Out of scope for MVP | — |
Tests are co-located with source files (`*.test.ts` / `*.test.tsx`).
**Critical paths to cover:**
- Distractor generation (correct POS, no duplicates, never includes answer)
- Answer validation (server-side, correct scoring)
- Game session lifecycle (create → play → complete)
- JWT validation middleware
---
## 14. Definition of Done
### Functional
- [ ] User can log in via Google or GitHub (OpenAuth)
- [ ] User can play singleplayer: 10 rounds, score, result screen
- [ ] User can create a room and share a code
- [ ] User can join a room via code
- [ ] Multiplayer: 10 rounds, simultaneous answers, real-time score sync
- [ ] 1 000 EnglishItalian words seeded from WordNet + OMW
### Technical
- [ ] Deployed to Hetzner with HTTPS on all three subdomains
- [ ] Docker Compose running all services
- [ ] Drizzle migrations applied on container start
- [ ] 1020 passing tests covering critical paths
- [ ] pnpm workspace build pipeline green
### Documentation
- [ ] `SPEC.md` complete
- [ ] `.env.example` files for all apps
- [ ] `README.md` with local dev setup instructions
---
## 15. Out of Scope (MVP)
- Difficulty levels _(`frequency_rank` column exists, ready to use)_
- Additional language pairs _(schema already supports it — just add rows)_
- Leaderboards _(`games_played`, `games_won` columns exist)_
- Streaks / daily challenges
- Friends / private invites
- Audio pronunciation
- CI/CD pipeline (manual deploy for now)
- Rate limiting _(add before going public)_
- Admin panel for vocabulary management