lila/documentation/spec.md

# Glossa — Project Specification

> **This document is the single source of truth for the project.**
> It is written to be handed to any LLM as context. It contains the project vision, the current MVP scope, the tech stack, the architecture, and the roadmap.

---

## 1. Project Overview

A vocabulary trainer for English–Italian words. The quiz format is Duolingo-style: one word is shown as a prompt, and the user picks the correct translation from four choices (1 correct + 3 distractors of the same part-of-speech). The long-term vision is a multiplayer competitive game, but the MVP is a polished singleplayer experience.

**The core learning loop:**
Show word → pick answer → see result → next word → final score

The vocabulary data comes from WordNet + the Open Multilingual Wordnet (OMW). A one-time Python script extracts English–Italian noun pairs and seeds the database. The data model is language-pair agnostic by design — adding a new language later requires no schema changes.

### Core Principles

- **Minimal but extendable**: working product fast, clean architecture for future growth
- **Mobile-first**: touch-friendly Duolingo-like UX
- **Type safety end-to-end**: TypeScript + Zod schemas shared between frontend and backend

---

## 2. Full Product Vision (Long-Term)

- Users log in via Google or GitHub (OpenAuth)
- Singleplayer mode: 10-round quiz, score screen
- Multiplayer mode: create a room, share a code, 2–4 players answer simultaneously in real time, live scores, winner screen
- 1000+ English–Italian nouns seeded from WordNet

This is the full vision. The MVP deliberately ignores most of it.

---

## 3. MVP Scope

**Goal:** A working, presentable singleplayer quiz that can be shown to real people.

### What is IN the MVP

- Vocabulary data in a PostgreSQL database (already seeded)
- REST API that returns quiz terms with distractors
- Singleplayer quiz UI: configurable rounds (3 or 10), answer feedback, score screen
- Clean, mobile-friendly UI (Tailwind + shadcn/ui)
- Global error handler with typed error classes
- Unit + integration tests for the API
- Local dev only (no deployment for MVP)

### What is CUT from the MVP

| Feature                         | Why cut                                |
| ------------------------------- | -------------------------------------- |
| Authentication (OpenAuth)       | No user accounts needed for a demo     |
| Multiplayer (WebSockets, rooms) | Core quiz works without it             |
| Valkey / Redis cache            | Only needed for multiplayer room state |
| Deployment to Hetzner           | Ship to people locally first           |
| User stats / profiles           | Needs auth                             |

These are not deleted from the plan — they are deferred. The architecture is already designed to support them. See Section 11 (Post-MVP Ladder).

---

## 4. Technology Stack

The monorepo structure and tooling are already set up. This is the full stack — the MVP uses a subset of it.

| Layer        | Technology                     | MVP?        |
| ------------ | ------------------------------ | ----------- |
| Monorepo     | pnpm workspaces                | ✅          |
| Frontend     | React 18, Vite, TypeScript     | ✅          |
| Routing      | TanStack Router                | ✅          |
| Server state | TanStack Query                 | ✅          |
| Client state | Zustand                        | ✅          |
| Styling      | Tailwind CSS + shadcn/ui       | ✅          |
| Backend      | Node.js, Express, TypeScript   | ✅          |
| Database     | PostgreSQL + Drizzle ORM       | ✅          |
| Validation   | Zod (shared schemas)           | ✅          |
| Testing      | Vitest, supertest              | ✅          |
| Auth         | OpenAuth (Google + GitHub)     | ❌ post-MVP |
| Realtime     | WebSockets (`ws` library)      | ❌ post-MVP |
| Cache        | Valkey                         | ❌ post-MVP |
| Deployment   | Docker Compose, Hetzner, Nginx | ❌ post-MVP |

---

## 5. Repository Structure

```text
vocab-trainer/
├── apps/
│   ├── api/
│   │   └── src/
│   │       ├── app.ts                  — createApp() factory, express.json(), error middleware
│   │       ├── server.ts               — starts server on PORT
│   │       ├── errors/
│   │       │   └── AppError.ts         — AppError, ValidationError, NotFoundError
│   │       ├── middleware/
│   │       │   └── errorHandler.ts     — central error middleware
│   │       ├── routes/
│   │       │   ├── apiRouter.ts        — mounts /health and /game routers
│   │       │   ├── gameRouter.ts       — POST /start, POST /answer
│   │       │   └── healthRouter.ts
│   │       ├── controllers/
│   │       │   └── gameController.ts   — validates input, calls service, sends response
│   │       ├── services/
│   │       │   ├── gameService.ts      — builds quiz sessions, evaluates answers
│   │       │   └── gameService.test.ts — unit tests (mocked DB)
│   │       └── gameSessionStore/
│   │           ├── GameSessionStore.ts — interface (async, Valkey-ready)
│   │           ├── InMemoryGameSessionStore.ts
│   │           └── index.ts
│   └── web/
│       └── src/
│           ├── routes/
│           │   ├── index.tsx           — landing page
│           │   └── play.tsx            — the quiz
│           ├── components/
│           │   └── game/
│           │       ├── GameSetup.tsx    — settings UI
│           │       ├── QuestionCard.tsx — prompt + 4 options
│           │       ├── OptionButton.tsx — idle / correct / wrong states
│           │       └── ScoreScreen.tsx  — final score + play again
│           └── main.tsx
├── packages/
│   ├── shared/
│   │   └── src/
│   │       ├── constants.ts            — SUPPORTED_POS, DIFFICULTY_LEVELS, etc.
│   │       ├── schemas/game.ts         — Zod schemas for all game types
│   │       └── index.ts
│   └── db/
│       ├── drizzle/                    — migration SQL files
│       └── src/
│           ├── db/schema.ts            — Drizzle schema
│           ├── models/termModel.ts     — getGameTerms(), getDistractors()
│           ├── seeding-datafiles.ts    — seeds terms + translations from JSON
│           ├── seeding-cefr-levels.ts  — enriches translations with CEFR data
│           ├── generating-deck.ts      — builds curated decks
│           └── index.ts
├── scripts/                            — Python extraction/comparison/merge scripts
├── documentation/                      — project docs
├── docker-compose.yml
└── pnpm-workspace.yaml
```

`packages/shared` is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas — never duplicated.

---

## 6. Architecture

### The Layered Architecture

```text
HTTP Request
     ↓
  Router        — maps URL + HTTP method to a controller
     ↓
 Controller     — handles HTTP only: validates input, calls service, sends response
     ↓
  Service       — business logic only: no HTTP, no direct DB access
     ↓
  Model         — database queries only: no business logic
     ↓
  Database
```

**The rule:** each layer only talks to the layer directly below it. A controller never touches the database. A service never reads `req.body`. A model never knows what a quiz is.

### Monorepo Package Responsibilities

| Package           | Owns                                                     |
| ----------------- | -------------------------------------------------------- |
| `packages/shared` | Zod schemas, constants, derived TypeScript types         |
| `packages/db`     | Drizzle schema, DB connection, all model/query functions |
| `apps/api`        | Router, controllers, services, error handling            |
| `apps/web`        | React frontend, consumes types from shared               |

**Key principle:** all database code lives in `packages/db`. `apps/api` never imports `drizzle-orm` for queries — it only calls functions exported from `packages/db`.

---

## 7. Data Model (Current State)

Words are modelled as language-neutral concepts (terms) separate from learning curricula (decks). Adding a new language pair requires no schema changes — only new rows in `translations`, `decks`.

**Core tables:** `terms`, `translations`, `term_glosses`, `decks`, `deck_terms`, `categories`, `term_categories`

Key columns on `terms`: `id` (uuid), `pos` (CHECK-constrained), `source`, `source_id` (unique pair for idempotent imports)

Key columns on `translations`: `id`, `term_id` (FK), `language_code` (CHECK-constrained), `text`, `cefr_level` (nullable varchar(2), CHECK A1–C2)

Deck model uses `source_language` + `validated_languages` array — one deck serves multiple target languages. Decks are frequency tiers (e.g. `en-core-1000`), not POS splits.

Full schema is in `packages/db/src/db/schema.ts`.

---

## 8. API

### Endpoints

```text
POST /api/v1/game/start     GameRequest → GameSession
POST /api/v1/game/answer    AnswerSubmission → AnswerResult
GET  /api/v1/health          Health check
```

### Schemas (packages/shared)

**GameRequest:** `{ source_language, target_language, pos, difficulty, rounds }`
**GameSession:** `{ sessionId: uuid, questions: GameQuestion[] }`
**GameQuestion:** `{ questionId: uuid, prompt: string, gloss: string | null, options: AnswerOption[4] }`
**AnswerOption:** `{ optionId: number (0-3), text: string }`
**AnswerSubmission:** `{ sessionId: uuid, questionId: uuid, selectedOptionId: number (0-3) }`
**AnswerResult:** `{ questionId: uuid, isCorrect: boolean, correctOptionId: number (0-3), selectedOptionId: number (0-3) }`

### Error Handling

Typed error classes (`AppError` base, `ValidationError` 400, `NotFoundError` 404) with central error middleware. Controllers validate with `safeParse`, throw on failure, and call `next(error)` in the catch. The middleware maps `AppError` instances to HTTP status codes; unknown errors return 500.

### Key Design Rules

- Server-side answer evaluation: the correct answer is never sent to the frontend
- `POST` not `GET` for game start (configuration in request body)
- `safeParse` over `parse` (clean 400s, not raw Zod 500s)
- Session state stored in `GameSessionStore` (in-memory now, Valkey later)

---

## 9. Game Mechanics

- **Format**: source-language word prompt + 4 target-language choices
- **Distractors**: same POS, same difficulty, server-side, never the correct answer, never repeated within a session
- **Session length**: 3 or 10 questions (configurable)
- **Scoring**: +1 per correct answer (no speed bonus for MVP)
- **Timer**: none in singleplayer MVP
- **No auth required**: anonymous users
- **Submit-before-send**: user selects, then confirms (prevents misclicks)

---

## 10. Working Methodology

This project is a learning exercise. The goal is to understand the code, not just to ship it.

### How to use an LLM for help

1. Paste this document as context
2. Describe what you're working on and what you're stuck on
3. Ask for hints, not solutions

### Refactoring workflow

After completing a task: share the code, ask what to refactor and why. The LLM should explain the concept, not write the implementation.

---

## 11. Post-MVP Ladder

| Phase             | What it adds                                                   |
| ----------------- | -------------------------------------------------------------- |
| Auth              | OpenAuth (Google + GitHub), JWT middleware, user rows in DB    |
| User Stats        | Games played, score history, profile page                      |
| Multiplayer Lobby | Room creation, join by code, WebSocket connection              |
| Multiplayer Game  | Simultaneous answers, server timer, live scores, winner screen |
| Deployment        | Docker Compose prod config, Nginx, Let's Encrypt, Hetzner VPS  |
| Hardening         | Rate limiting, error boundaries, CI/CD, DB backups             |

### Future Data Model Extensions (deferred, additive)

- `noun_forms` — gender, singular, plural, articles per language
- `verb_forms` — conjugation tables per language
- `term_pronunciations` — IPA and audio URLs per language
- `user_decks` — which decks a user is studying
- `user_term_progress` — spaced repetition state per user/term/language
- `quiz_answers` — history log for stats

All are new tables referencing existing `terms` rows via FK. No existing schema changes required.

### Multiplayer Architecture (deferred)

- WebSocket protocol: `ws` library, Zod discriminated union for message types
- Room model: human-readable codes (e.g. `WOLF-42`), not matchmaking queue
- Game mechanic: simultaneous answers, 15-second server timer, all players see same question
- Valkey for ephemeral room state, PostgreSQL for durable records

### Infrastructure (deferred)

- `app.yourdomain.com` → React frontend
- `api.yourdomain.com` → Express API + WebSocket
- `auth.yourdomain.com` → OpenAuth service
- Docker Compose with `nginx-proxy` + `acme-companion` for automatic SSL

---

## 12. Definition of Done (MVP)

- [x] API returns quiz terms with correct distractors
- [x] User can complete a quiz without errors
- [x] Score screen shows final result and a play-again option
- [x] App is usable on a mobile screen
- [x] No hardcoded data — everything comes from the database
- [x] Global error handler with typed error classes
- [x] Unit + integration tests for API

---

## 13. Roadmap

### Phase 0 — Foundation ✅

Empty repo that builds, lints, and runs end-to-end. `pnpm dev` starts both apps; `GET /api/health` returns 200; React renders a hello page.

### Phase 1 — Vocabulary Data + API ✅

Word data lives in the DB. API returns quiz sessions with distractors. CEFR enrichment pipeline complete. Global error handler and tests implemented.

### Phase 2 — Singleplayer Quiz UI ✅

User can complete a full quiz in the browser. Settings UI, question cards, answer feedback, score screen.

### Phase 3 — Auth

Users can log in via Google or GitHub and stay logged in. JWT validated by API. User row created on first login.

### Phase 4 — Multiplayer Lobby

Players can create and join rooms. Two browser tabs can join the same room and see each other via WebSocket.

### Phase 5 — Multiplayer Game

Host starts a game. All players answer simultaneously in real time. Winner declared.

### Phase 6 — Production Deployment

App is live on Hetzner with HTTPS. Auth flow works end-to-end.

### Phase 7 — Polish & Hardening

Rate limiting, reconnect logic, error boundaries, CI/CD, DB backups.

### Dependency Graph

```text
Phase 0 (Foundation)
└── Phase 1 (Vocabulary Data + API)
    └── Phase 2 (Singleplayer UI)
        └── Phase 3 (Auth)
            ├── Phase 4 (Room Lobby)
            │   └── Phase 5 (Multiplayer Game)
            │       └── Phase 6 (Deployment)
            └── Phase 7 (Hardening)
```

---

## 14. Game Flow (Future)

Singleplayer: choose direction (en→it or it→en) → top-level category → part of speech → difficulty (A1–C2) → round count → game starts.

**Top-level categories (post-MVP):**

- **Grammar** — practice nouns, verb conjugations, etc.
- **Media** — practice vocabulary from specific books, films, songs, etc.
- **Thematic** — animals, kitchen, etc. (requires category metadata research)