lila/documentation/mvp.md

460 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# glossa mvp
> **This document is the single source of truth for the project.**
> It is written to be handed to any LLM as context. It contains the project vision, the current MVP scope, the tech stack, the working methodology, and the roadmap.
---
## 1. Project Overview
A vocabulary trainer for EnglishItalian words. The quiz format is Duolingo-style: one word is shown as a prompt, and the user picks the correct translation from four choices (1 correct + 3 distractors of the same part-of-speech). The long-term vision is a multiplayer competitive game, but the MVP is a polished singleplayer experience.
**The core learning loop:**
Show word → pick answer → see result → next word → final score
The vocabulary data comes from WordNet + the Open Multilingual Wordnet (OMW). A one-time Python script extracts EnglishItalian noun pairs and seeds the database. The data model is language-pair agnostic by design — adding a new language later requires no schema changes.
---
## 2. What the Full Product Looks Like (Long-Term Vision)
- Users log in via Google or GitHub (OpenAuth)
- Singleplayer mode: 10-round quiz, score screen
- Multiplayer mode: create a room, share a code, 24 players answer simultaneously in real time, live scores, winner screen
- 1000+ EnglishItalian nouns seeded from WordNet
This is documented in `spec.md` and the full `roadmap.md`. The MVP deliberately ignores most of it.
---
## 3. MVP Scope
**Goal:** A working, presentable singleplayer quiz that can be shown to real people.
### What is IN the MVP
- Vocabulary data in a PostgreSQL database (already seeded)
- REST API that returns quiz terms with distractors
- Singleplayer quiz UI: 10 questions, answer feedback, score screen
- Clean, mobile-friendly UI (Tailwind + shadcn/ui)
- Local dev only (no deployment for MVP)
### What is CUT from the MVP
| Feature | Why cut |
|---|---|
| Authentication (OpenAuth) | No user accounts needed for a demo |
| Multiplayer (WebSockets, rooms) | Core quiz works without it |
| Valkey / Redis cache | Only needed for multiplayer room state |
| Deployment to Hetzner | Ship to people locally first |
| User stats / profiles | Needs auth |
| Testing suite | Add after the UI stabilises |
These are not deleted from the plan — they are deferred. The architecture is already designed to support them. See Section 9 (Post-MVP Ladder).
---
## 4. Technology Stack
The monorepo structure and tooling are already set up (Phase 0 complete). This is the full stack — the MVP uses a subset of it.
| Layer | Technology | MVP? |
|---|---|---|
| Monorepo | pnpm workspaces | ✅ |
| Frontend | React 18, Vite, TypeScript | ✅ |
| Routing | TanStack Router | ✅ |
| Server state | TanStack Query | ✅ |
| Client state | Zustand | ✅ |
| Styling | Tailwind CSS + shadcn/ui | ✅ |
| Backend | Node.js, Express, TypeScript | ✅ |
| Database | PostgreSQL + Drizzle ORM | ✅ |
| Validation | Zod (shared schemas) | ✅ |
| Auth | OpenAuth (Google + GitHub) | ❌ post-MVP |
| Realtime | WebSockets (`ws` library) | ❌ post-MVP |
| Cache | Valkey | ❌ post-MVP |
| Testing | Vitest, React Testing Library | ❌ post-MVP |
| Deployment | Docker Compose, Hetzner, Nginx | ❌ post-MVP |
### Repository Structure (actual, as of Phase 1 data pipeline complete)
```
vocab-trainer/
├── apps/
│ ├── api/
│ │ └── src/
│ │ ├── app.ts # createApp() factory — routes registered here
│ │ └── server.ts # calls app.listen()
│ └── web/
│ └── src/
│ ├── routes/
│ │ ├── __root.tsx
│ │ ├── index.tsx # placeholder landing page
│ │ └── about.tsx
│ ├── main.tsx
│ └── index.css
├── packages/
│ ├── shared/
│ │ └── src/
│ │ ├── index.ts # empty — Zod schemas go here next
│ │ └── constants.ts
│ └── db/
│ ├── drizzle/ # migration SQL files
│ └── src/
│ ├── db/schema.ts # full Drizzle schema
│ ├── seeding-datafiles.ts # seeds terms + translations
│ ├── generating-deck.ts # builds curated decks
│ └── index.ts
├── documentation/ # all project docs live here
│ ├── spec.md
│ ├── roadmap.md
│ ├── decisions.md
│ ├── mvp.md # this file
│ └── CLAUDE.md
├── scripts/
│ ├── extract-en-it-nouns.py
│ └── datafiles/en-it-noun.json
├── docker-compose.yml
└── pnpm-workspace.yaml
```
**What does not exist yet (to be built in MVP phases):**
- `apps/api/src/routes/` — no route handlers yet
- `apps/api/src/services/` — no business logic yet
- `apps/api/src/repositories/` — no DB queries yet
- `apps/web/src/components/` — no UI components yet
- `apps/web/src/stores/` — no Zustand store yet
- `apps/web/src/lib/api.ts` — no TanStack Query wrappers yet
- `packages/shared/src/schemas/` — no Zod schemas yet
`packages/shared` is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas — never duplicated.
---
## 5. Data Model (relevant tables for MVP)
```
export const terms = pgTable(
"terms",
{
id: uuid().primaryKey().defaultRandom(),
synset_id: text().unique().notNull(),
pos: varchar({ length: 20 }).notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
check(
"pos_check",
sql`${table.pos} IN (${sql.raw(SUPPORTED_POS.map((p) => `'${p}'`).join(", "))})`,
),
index("idx_terms_pos").on(table.pos),
],
);
export const translations = pgTable(
"translations",
{
id: uuid().primaryKey().defaultRandom(),
term_id: uuid()
.notNull()
.references(() => terms.id, { onDelete: "cascade" }),
language_code: varchar({ length: 10 }).notNull(),
text: text().notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
unique("unique_translations").on(
table.term_id,
table.language_code,
table.text,
),
index("idx_translations_lang").on(table.language_code, table.term_id),
],
);
export const decks = pgTable(
"decks",
{
id: uuid().primaryKey().defaultRandom(),
name: text().notNull(),
description: text(),
source_language: varchar({ length: 10 }).notNull(),
validated_languages: varchar({ length: 10 }).array().notNull().default([]),
is_public: boolean().default(false).notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
check(
"source_language_check",
sql`${table.source_language} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`,
),
check(
"validated_languages_check",
sql`validated_languages <@ ARRAY[${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))}]::varchar[]`,
),
check(
"validated_languages_excludes_source",
sql`NOT (${table.source_language} = ANY(${table.validated_languages}))`,
),
unique("unique_deck_name").on(table.name, table.source_language),
],
);
export const deck_terms = pgTable(
"deck_terms",
{
deck_id: uuid()
.notNull()
.references(() => decks.id, { onDelete: "cascade" }),
term_id: uuid()
.notNull()
.references(() => terms.id, { onDelete: "cascade" }),
added_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [primaryKey({ columns: [table.deck_id, table.term_id] })],
);
```
The seed + deck-build scripts have already been run. Data exists in the database.
---
## 6. API Endpoints (MVP)
All endpoints prefixed `/api`. Schemas live in `packages/shared` and are validated with Zod on both sides.
| Method | Path | Description |
|---|---|---|
| GET | `/api/health` | Health check (already done) |
| GET | `/api/language-pairs` | List active language pairs |
| GET | `/api/decks` | List available decks |
| GET | `/api/decks/:id/terms` | Fetch terms with distractors for a quiz |
### Distractor Logic
The `QuizService` picks 3 distractors server-side:
- Same part-of-speech as the correct answer
- Never the correct answer
- Never repeated within a session
---
## 7. Frontend Structure (MVP)
```
apps/web/src/
├── routes/
│ ├── index.tsx # Landing page / mode select
│ └── singleplayer/
│ └── index.tsx # The quiz
├── components/
│ ├── quiz/
│ │ ├── QuestionCard.tsx # Prompt word + 4 answer buttons
│ │ ├── OptionButton.tsx # idle / correct / wrong states
│ │ └── ScoreScreen.tsx # Final score + play again
│ └── ui/ # shadcn/ui wrappers
├── stores/
│ └── gameStore.ts # Zustand: question index, score, answers
└── lib/
└── api.ts # TanStack Query fetch wrappers
```
### State Management
TanStack Query handles fetching quiz data from the API. Zustand handles the local quiz session (current question index, score, selected answers). There is no overlap between the two.
---
## 8. Working Methodology
> **Read this section before asking for help with any task.**
This project is a learning exercise. The goal is to understand the code, not just to ship it.
### How tasks are structured
The roadmap (Section 10) lists broad phases. When work starts on a phase, it gets broken into smaller, concrete subtasks with clear done-conditions before any code is written.
### How to use an LLM for help
When asking an LLM for help:
1. **Paste this document** (or the relevant sections) as context
2. **Describe what you're working on** and what specifically you're stuck on
3. **Ask for hints, not solutions.** Example prompts:
- "I'm trying to implement X. My current approach is Y. What am I missing conceptually?"
- "Here is my code. What would you change about the structure and why?"
- "Can you point me to the relevant docs for Z?"
### Refactoring workflow
After completing a task or a block of work:
1. Share the current state of the code with the LLM
2. Ask: *"What would you refactor here, and why? Don't show me the code — point me in the right direction and link relevant documentation."*
3. The LLM should explain the *what* and *why*, link to relevant docs/guides, and let you implement the fix yourself
**The LLM should never write the implementation for you.** If it does, ask it to delete it and explain the concept instead.
### Decisions log
Keep a `decisions.md` file in the root. When you make a non-obvious choice (a library, a pattern, a trade-off), write one short paragraph explaining what you chose and why. This is also useful context for any LLM session.
---
## 9. Game Mechanics
- **Format**: source-language word prompt + 4 target-language choices
- **Distractors**: same POS, server-side, never the correct answer, no repeats in a session
- **Session length**: 10 questions
- **Scoring**: +1 per correct answer (no speed bonus for MVP)
- **Timer**: none in singleplayer MVP
- **No auth required**: anonymous users
---
## 10. MVP Roadmap
> Tasks are written at a high level. When starting a phase, break it into smaller subtasks before writing any code.
### Current Status
**Phase 0 (Foundation) — ✅ Complete**
**Phase 1 (Vocabulary Data) — 🔄 Data pipeline complete. API layer is the immediate next step.**
What is already in the database:
- 999 unique English terms (nouns), fully seeded from WordNet/OMW
- 3171 term IDs resolved (higher than word count due to homonyms)
- Full Italian translation coverage (3171/3171 terms)
- Decks created and populated via `packages/db/src/generating-decks.ts`
- 34 words from the source wordlist had no WordNet match (expected, not a bug)
---
### Phase 1 — Finish the API Layer
**Goal:** The frontend can fetch quiz data from the API.
**Done when:** `GET /api/decks/1/terms?limit=10` returns 10 terms, each with 3 distractors of the same POS attached.
**Broadly, what needs to happen:**
- Define Zod response schemas in `packages/shared` for terms, decks, and language pairs
- Implement a repository layer that queries the DB for terms belonging to a deck
- Implement a service layer that attaches distractors to each term (same POS, no duplicates, no correct answer included)
- Wire up the REST endpoints (`GET /language-pairs`, `GET /decks`, `GET /decks/:id/terms`)
- Manually test the endpoints (curl or a REST client like Bruno/Insomnia)
**Key concepts to understand before starting:**
- Drizzle ORM query patterns (joins, where clauses)
- The repository pattern (data access separated from business logic)
- Zod schema definition and inference
- How pnpm workspace packages reference each other
---
### Phase 2 — Singleplayer Quiz UI
**Goal:** A user can complete a full 10-question quiz in the browser.
**Done when:** User visits `/singleplayer`, answers 10 questions, sees a score screen, and can play again.
**Broadly, what needs to happen:**
- Build the `QuestionCard` component (prompt word + 4 answer buttons)
- Build the `OptionButton` component with three visual states: idle, correct, wrong
- Build the `ScoreScreen` component (score summary + play again)
- Implement a Zustand store to track quiz session state (current question index, score, whether an answer has been picked)
- Wire up TanStack Query to fetch terms from the API on mount
- Create the `/singleplayer` route and assemble the components
- Handle the between-question transition (brief delay showing result → next question)
**Key concepts to understand before starting:**
- TanStack Query: `useQuery`, loading/error states
- Zustand: defining a store, reading and writing state from components
- TanStack Router: defining routes, navigating between them
- React component composition
- Controlled state for the answer selection (which button is selected, when to lock input)
---
### Phase 3 — UI Polish
**Goal:** The app looks good enough to show to people.
**Done when:** The quiz is usable on mobile, readable on desktop, and has a coherent visual style.
**Broadly, what needs to happen:**
- Apply Tailwind utility classes and shadcn/ui components consistently
- Make the layout mobile-first (touch-friendly buttons, readable font sizes)
- Add a simple landing page (`/`) with a "Start Quiz" button
- Add loading and error states for the API fetch
- Visual feedback on correct/wrong answers (colour, maybe a brief animation)
- Deck selection: let the user pick a deck from a list before starting
**Key concepts to understand before starting:**
- Tailwind CSS utility-first approach
- shadcn/ui component library and how to add components
- Responsive design with Tailwind breakpoints
- CSS transitions for simple animations
---
## 11. Key Technical Decisions
These are the non-obvious decisions already made. Any LLM helping with this project should be aware of them and not suggest alternatives without good reason.
### Architecture
**Express app: factory function pattern**
`app.ts` exports `createApp()`. `server.ts` imports it and calls `.listen()`. This keeps tests isolated — a test can import the app without starting a server.
**Layered architecture: routes → services → repositories**
Business logic lives in services, not route handlers or repositories. Each layer only talks to the layer directly below it. For the MVP API, this means:
- `routes/` — parse request, call service, return response
- `services/` — business logic (e.g. attaching distractors)
- `repositories/` — all DB queries live here, nowhere else
**Shared Zod schemas in `packages/shared`**
All request/response shapes are defined once as Zod schemas in `packages/shared` and imported by both `apps/api` and `apps/web`. Types are inferred from schemas (`z.infer<typeof Schema>`), never written by hand.
### Data Model
**Decks separate from terms (not frequency-rank filtering)**
Terms are raw WordNet data. Decks are curated lists. This separation exists because WordNet frequency data is unreliable for learning — common chemical element symbols ranked highly, for example. Bad words are excluded at the deck level, not filtered from `terms`.
**Deck language model: `source_language` + `validated_languages` array**
A deck is not tied to a single language pair. `source_language` is the language the wordlist was curated from. `validated_languages` is an array of target languages with full translation coverage — calculated and updated by the deck generation script on every run.
### Tooling
**Drizzle ORM (not Prisma):** No binary, no engine. Queries map closely to SQL. Works naturally with Zod. Migrations are plain SQL files.
**`tsx` as TypeScript runner (not `ts-node`):** Faster, zero config, uses esbuild. Does not type-check — that is handled by `tsc` and the editor.
**pnpm workspaces (not Turborepo):** Two apps don't need the extra build caching complexity.
---
## 12. Post-MVP Ladder
These phases are deferred but planned. The architecture already supports them.
| Phase | What it adds |
|---|---|
| Auth | OpenAuth (Google + GitHub), JWT middleware, user rows in DB |
| User Stats | Games played, score history, profile page |
| Multiplayer Lobby | Room creation, join by code, WebSocket connection |
| Multiplayer Game | Simultaneous answers, server timer, live scores, winner screen |
| Deployment | Docker Compose prod config, Nginx, Let's Encrypt, Hetzner VPS |
| Hardening | Rate limiting, error boundaries, CI/CD, DB backups |
Each of these maps to a phase in the full `roadmap.md`.
---
## 13. Definition of Done (MVP)
- [ ] `GET /api/decks/:id/terms` returns terms with correct distractors
- [ ] User can complete a 10-question quiz without errors
- [ ] Score screen shows final result and a play-again option
- [ ] App is usable on a mobile screen
- [ ] No hardcoded data — everything comes from the database