diff --git a/documentation/mvp.md b/documentation/mvp.md new file mode 100644 index 0000000..167cc46 --- /dev/null +++ b/documentation/mvp.md @@ -0,0 +1,460 @@ +# glossa mvp + +> **This document is the single source of truth for the project.** +> It is written to be handed to any LLM as context. It contains the project vision, the current MVP scope, the tech stack, the working methodology, and the roadmap. + +--- + +## 1. Project Overview + +A vocabulary trainer for English–Italian words. The quiz format is Duolingo-style: one word is shown as a prompt, and the user picks the correct translation from four choices (1 correct + 3 distractors of the same part-of-speech). The long-term vision is a multiplayer competitive game, but the MVP is a polished singleplayer experience. + +**The core learning loop:** +Show word → pick answer → see result → next word → final score + +The vocabulary data comes from WordNet + the Open Multilingual Wordnet (OMW). A one-time Python script extracts English–Italian noun pairs and seeds the database. The data model is language-pair agnostic by design — adding a new language later requires no schema changes. + +--- + +## 2. What the Full Product Looks Like (Long-Term Vision) + +- Users log in via Google or GitHub (OpenAuth) +- Singleplayer mode: 10-round quiz, score screen +- Multiplayer mode: create a room, share a code, 2–4 players answer simultaneously in real time, live scores, winner screen +- 1000+ English–Italian nouns seeded from WordNet + +This is documented in `spec.md` and the full `roadmap.md`. The MVP deliberately ignores most of it. + +--- + +## 3. MVP Scope + +**Goal:** A working, presentable singleplayer quiz that can be shown to real people. + +### What is IN the MVP + +- Vocabulary data in a PostgreSQL database (already seeded) +- REST API that returns quiz terms with distractors +- Singleplayer quiz UI: 10 questions, answer feedback, score screen +- Clean, mobile-friendly UI (Tailwind + shadcn/ui) +- Local dev only (no deployment for MVP) + +### What is CUT from the MVP + +| Feature | Why cut | +|---|---| +| Authentication (OpenAuth) | No user accounts needed for a demo | +| Multiplayer (WebSockets, rooms) | Core quiz works without it | +| Valkey / Redis cache | Only needed for multiplayer room state | +| Deployment to Hetzner | Ship to people locally first | +| User stats / profiles | Needs auth | +| Testing suite | Add after the UI stabilises | + +These are not deleted from the plan — they are deferred. The architecture is already designed to support them. See Section 9 (Post-MVP Ladder). + +--- + +## 4. Technology Stack + +The monorepo structure and tooling are already set up (Phase 0 complete). This is the full stack — the MVP uses a subset of it. + +| Layer | Technology | MVP? | +|---|---|---| +| Monorepo | pnpm workspaces | ✅ | +| Frontend | React 18, Vite, TypeScript | ✅ | +| Routing | TanStack Router | ✅ | +| Server state | TanStack Query | ✅ | +| Client state | Zustand | ✅ | +| Styling | Tailwind CSS + shadcn/ui | ✅ | +| Backend | Node.js, Express, TypeScript | ✅ | +| Database | PostgreSQL + Drizzle ORM | ✅ | +| Validation | Zod (shared schemas) | ✅ | +| Auth | OpenAuth (Google + GitHub) | ❌ post-MVP | +| Realtime | WebSockets (`ws` library) | ❌ post-MVP | +| Cache | Valkey | ❌ post-MVP | +| Testing | Vitest, React Testing Library | ❌ post-MVP | +| Deployment | Docker Compose, Hetzner, Nginx | ❌ post-MVP | + +### Repository Structure (actual, as of Phase 1 data pipeline complete) + +``` +vocab-trainer/ +├── apps/ +│ ├── api/ +│ │ └── src/ +│ │ ├── app.ts # createApp() factory — routes registered here +│ │ └── server.ts # calls app.listen() +│ └── web/ +│ └── src/ +│ ├── routes/ +│ │ ├── __root.tsx +│ │ ├── index.tsx # placeholder landing page +│ │ └── about.tsx +│ ├── main.tsx +│ └── index.css +├── packages/ +│ ├── shared/ +│ │ └── src/ +│ │ ├── index.ts # empty — Zod schemas go here next +│ │ └── constants.ts +│ └── db/ +│ ├── drizzle/ # migration SQL files +│ └── src/ +│ ├── db/schema.ts # full Drizzle schema +│ ├── seeding-datafiles.ts # seeds terms + translations +│ ├── generating-deck.ts # builds curated decks +│ └── index.ts +├── documentation/ # all project docs live here +│ ├── spec.md +│ ├── roadmap.md +│ ├── decisions.md +│ ├── mvp.md # this file +│ └── CLAUDE.md +├── scripts/ +│ ├── extract-en-it-nouns.py +│ └── datafiles/en-it-noun.json +├── docker-compose.yml +└── pnpm-workspace.yaml +``` + +**What does not exist yet (to be built in MVP phases):** +- `apps/api/src/routes/` — no route handlers yet +- `apps/api/src/services/` — no business logic yet +- `apps/api/src/repositories/` — no DB queries yet +- `apps/web/src/components/` — no UI components yet +- `apps/web/src/stores/` — no Zustand store yet +- `apps/web/src/lib/api.ts` — no TanStack Query wrappers yet +- `packages/shared/src/schemas/` — no Zod schemas yet + +`packages/shared` is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas — never duplicated. + +--- + +## 5. Data Model (relevant tables for MVP) + +``` +export const terms = pgTable( + "terms", + { + id: uuid().primaryKey().defaultRandom(), + synset_id: text().unique().notNull(), + pos: varchar({ length: 20 }).notNull(), + created_at: timestamp({ withTimezone: true }).defaultNow().notNull(), + }, + (table) => [ + check( + "pos_check", + sql`${table.pos} IN (${sql.raw(SUPPORTED_POS.map((p) => `'${p}'`).join(", "))})`, + ), + index("idx_terms_pos").on(table.pos), + ], +); + +export const translations = pgTable( + "translations", + { + id: uuid().primaryKey().defaultRandom(), + term_id: uuid() + .notNull() + .references(() => terms.id, { onDelete: "cascade" }), + language_code: varchar({ length: 10 }).notNull(), + text: text().notNull(), + created_at: timestamp({ withTimezone: true }).defaultNow().notNull(), + }, + (table) => [ + unique("unique_translations").on( + table.term_id, + table.language_code, + table.text, + ), + index("idx_translations_lang").on(table.language_code, table.term_id), + ], +); + +export const decks = pgTable( + "decks", + { + id: uuid().primaryKey().defaultRandom(), + name: text().notNull(), + description: text(), + source_language: varchar({ length: 10 }).notNull(), + validated_languages: varchar({ length: 10 }).array().notNull().default([]), + is_public: boolean().default(false).notNull(), + created_at: timestamp({ withTimezone: true }).defaultNow().notNull(), + }, + (table) => [ + check( + "source_language_check", + sql`${table.source_language} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`, + ), + check( + "validated_languages_check", + sql`validated_languages <@ ARRAY[${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))}]::varchar[]`, + ), + check( + "validated_languages_excludes_source", + sql`NOT (${table.source_language} = ANY(${table.validated_languages}))`, + ), + unique("unique_deck_name").on(table.name, table.source_language), + ], +); + +export const deck_terms = pgTable( + "deck_terms", + { + deck_id: uuid() + .notNull() + .references(() => decks.id, { onDelete: "cascade" }), + term_id: uuid() + .notNull() + .references(() => terms.id, { onDelete: "cascade" }), + added_at: timestamp({ withTimezone: true }).defaultNow().notNull(), + }, + (table) => [primaryKey({ columns: [table.deck_id, table.term_id] })], +); +``` + +The seed + deck-build scripts have already been run. Data exists in the database. + +--- + +## 6. API Endpoints (MVP) + +All endpoints prefixed `/api`. Schemas live in `packages/shared` and are validated with Zod on both sides. + +| Method | Path | Description | +|---|---|---| +| GET | `/api/health` | Health check (already done) | +| GET | `/api/language-pairs` | List active language pairs | +| GET | `/api/decks` | List available decks | +| GET | `/api/decks/:id/terms` | Fetch terms with distractors for a quiz | + +### Distractor Logic + +The `QuizService` picks 3 distractors server-side: +- Same part-of-speech as the correct answer +- Never the correct answer +- Never repeated within a session + +--- + +## 7. Frontend Structure (MVP) + +``` +apps/web/src/ +├── routes/ +│ ├── index.tsx # Landing page / mode select +│ └── singleplayer/ +│ └── index.tsx # The quiz +├── components/ +│ ├── quiz/ +│ │ ├── QuestionCard.tsx # Prompt word + 4 answer buttons +│ │ ├── OptionButton.tsx # idle / correct / wrong states +│ │ └── ScoreScreen.tsx # Final score + play again +│ └── ui/ # shadcn/ui wrappers +├── stores/ +│ └── gameStore.ts # Zustand: question index, score, answers +└── lib/ + └── api.ts # TanStack Query fetch wrappers +``` + +### State Management + +TanStack Query handles fetching quiz data from the API. Zustand handles the local quiz session (current question index, score, selected answers). There is no overlap between the two. + +--- + +## 8. Working Methodology + +> **Read this section before asking for help with any task.** + +This project is a learning exercise. The goal is to understand the code, not just to ship it. + +### How tasks are structured + +The roadmap (Section 10) lists broad phases. When work starts on a phase, it gets broken into smaller, concrete subtasks with clear done-conditions before any code is written. + +### How to use an LLM for help + +When asking an LLM for help: + +1. **Paste this document** (or the relevant sections) as context +2. **Describe what you're working on** and what specifically you're stuck on +3. **Ask for hints, not solutions.** Example prompts: + - "I'm trying to implement X. My current approach is Y. What am I missing conceptually?" + - "Here is my code. What would you change about the structure and why?" + - "Can you point me to the relevant docs for Z?" + +### Refactoring workflow + +After completing a task or a block of work: +1. Share the current state of the code with the LLM +2. Ask: *"What would you refactor here, and why? Don't show me the code — point me in the right direction and link relevant documentation."* +3. The LLM should explain the *what* and *why*, link to relevant docs/guides, and let you implement the fix yourself + +**The LLM should never write the implementation for you.** If it does, ask it to delete it and explain the concept instead. + +### Decisions log + +Keep a `decisions.md` file in the root. When you make a non-obvious choice (a library, a pattern, a trade-off), write one short paragraph explaining what you chose and why. This is also useful context for any LLM session. + +--- + +## 9. Game Mechanics + +- **Format**: source-language word prompt + 4 target-language choices +- **Distractors**: same POS, server-side, never the correct answer, no repeats in a session +- **Session length**: 10 questions +- **Scoring**: +1 per correct answer (no speed bonus for MVP) +- **Timer**: none in singleplayer MVP +- **No auth required**: anonymous users + +--- + +## 10. MVP Roadmap + +> Tasks are written at a high level. When starting a phase, break it into smaller subtasks before writing any code. + +### Current Status + +**Phase 0 (Foundation) — ✅ Complete** +**Phase 1 (Vocabulary Data) — 🔄 Data pipeline complete. API layer is the immediate next step.** + +What is already in the database: +- 999 unique English terms (nouns), fully seeded from WordNet/OMW +- 3171 term IDs resolved (higher than word count due to homonyms) +- Full Italian translation coverage (3171/3171 terms) +- Decks created and populated via `packages/db/src/generating-decks.ts` +- 34 words from the source wordlist had no WordNet match (expected, not a bug) + +--- + +### Phase 1 — Finish the API Layer + +**Goal:** The frontend can fetch quiz data from the API. + +**Done when:** `GET /api/decks/1/terms?limit=10` returns 10 terms, each with 3 distractors of the same POS attached. + +**Broadly, what needs to happen:** +- Define Zod response schemas in `packages/shared` for terms, decks, and language pairs +- Implement a repository layer that queries the DB for terms belonging to a deck +- Implement a service layer that attaches distractors to each term (same POS, no duplicates, no correct answer included) +- Wire up the REST endpoints (`GET /language-pairs`, `GET /decks`, `GET /decks/:id/terms`) +- Manually test the endpoints (curl or a REST client like Bruno/Insomnia) + +**Key concepts to understand before starting:** +- Drizzle ORM query patterns (joins, where clauses) +- The repository pattern (data access separated from business logic) +- Zod schema definition and inference +- How pnpm workspace packages reference each other + +--- + +### Phase 2 — Singleplayer Quiz UI + +**Goal:** A user can complete a full 10-question quiz in the browser. + +**Done when:** User visits `/singleplayer`, answers 10 questions, sees a score screen, and can play again. + +**Broadly, what needs to happen:** +- Build the `QuestionCard` component (prompt word + 4 answer buttons) +- Build the `OptionButton` component with three visual states: idle, correct, wrong +- Build the `ScoreScreen` component (score summary + play again) +- Implement a Zustand store to track quiz session state (current question index, score, whether an answer has been picked) +- Wire up TanStack Query to fetch terms from the API on mount +- Create the `/singleplayer` route and assemble the components +- Handle the between-question transition (brief delay showing result → next question) + +**Key concepts to understand before starting:** +- TanStack Query: `useQuery`, loading/error states +- Zustand: defining a store, reading and writing state from components +- TanStack Router: defining routes, navigating between them +- React component composition +- Controlled state for the answer selection (which button is selected, when to lock input) + +--- + +### Phase 3 — UI Polish + +**Goal:** The app looks good enough to show to people. + +**Done when:** The quiz is usable on mobile, readable on desktop, and has a coherent visual style. + +**Broadly, what needs to happen:** +- Apply Tailwind utility classes and shadcn/ui components consistently +- Make the layout mobile-first (touch-friendly buttons, readable font sizes) +- Add a simple landing page (`/`) with a "Start Quiz" button +- Add loading and error states for the API fetch +- Visual feedback on correct/wrong answers (colour, maybe a brief animation) +- Deck selection: let the user pick a deck from a list before starting + +**Key concepts to understand before starting:** +- Tailwind CSS utility-first approach +- shadcn/ui component library and how to add components +- Responsive design with Tailwind breakpoints +- CSS transitions for simple animations + +--- + +## 11. Key Technical Decisions + +These are the non-obvious decisions already made. Any LLM helping with this project should be aware of them and not suggest alternatives without good reason. + +### Architecture + +**Express app: factory function pattern** +`app.ts` exports `createApp()`. `server.ts` imports it and calls `.listen()`. This keeps tests isolated — a test can import the app without starting a server. + +**Layered architecture: routes → services → repositories** +Business logic lives in services, not route handlers or repositories. Each layer only talks to the layer directly below it. For the MVP API, this means: +- `routes/` — parse request, call service, return response +- `services/` — business logic (e.g. attaching distractors) +- `repositories/` — all DB queries live here, nowhere else + +**Shared Zod schemas in `packages/shared`** +All request/response shapes are defined once as Zod schemas in `packages/shared` and imported by both `apps/api` and `apps/web`. Types are inferred from schemas (`z.infer`), never written by hand. + +### Data Model + +**Decks separate from terms (not frequency-rank filtering)** +Terms are raw WordNet data. Decks are curated lists. This separation exists because WordNet frequency data is unreliable for learning — common chemical element symbols ranked highly, for example. Bad words are excluded at the deck level, not filtered from `terms`. + +**Deck language model: `source_language` + `validated_languages` array** +A deck is not tied to a single language pair. `source_language` is the language the wordlist was curated from. `validated_languages` is an array of target languages with full translation coverage — calculated and updated by the deck generation script on every run. + +### Tooling + +**Drizzle ORM (not Prisma):** No binary, no engine. Queries map closely to SQL. Works naturally with Zod. Migrations are plain SQL files. + +**`tsx` as TypeScript runner (not `ts-node`):** Faster, zero config, uses esbuild. Does not type-check — that is handled by `tsc` and the editor. + +**pnpm workspaces (not Turborepo):** Two apps don't need the extra build caching complexity. + +--- + +## 12. Post-MVP Ladder + + + +These phases are deferred but planned. The architecture already supports them. + +| Phase | What it adds | +|---|---| +| Auth | OpenAuth (Google + GitHub), JWT middleware, user rows in DB | +| User Stats | Games played, score history, profile page | +| Multiplayer Lobby | Room creation, join by code, WebSocket connection | +| Multiplayer Game | Simultaneous answers, server timer, live scores, winner screen | +| Deployment | Docker Compose prod config, Nginx, Let's Encrypt, Hetzner VPS | +| Hardening | Rate limiting, error boundaries, CI/CD, DB backups | + +Each of these maps to a phase in the full `roadmap.md`. + +--- + +## 13. Definition of Done (MVP) + +- [ ] `GET /api/decks/:id/terms` returns terms with correct distractors +- [ ] User can complete a 10-question quiz without errors +- [ ] Score screen shows final result and a play-again option +- [ ] App is usable on a mobile screen +- [ ] No hardcoded data — everything comes from the database