19 KiB
glossa mvp
This document is the single source of truth for the project. It is written to be handed to any LLM as context. It contains the project vision, the current MVP scope, the tech stack, the working methodology, and the roadmap.
1. Project Overview
A vocabulary trainer for English–Italian words. The quiz format is Duolingo-style: one word is shown as a prompt, and the user picks the correct translation from four choices (1 correct + 3 distractors of the same part-of-speech). The long-term vision is a multiplayer competitive game, but the MVP is a polished singleplayer experience.
The core learning loop: Show word → pick answer → see result → next word → final score
The vocabulary data comes from WordNet + the Open Multilingual Wordnet (OMW). A one-time Python script extracts English–Italian noun pairs and seeds the database. The data model is language-pair agnostic by design — adding a new language later requires no schema changes.
2. What the Full Product Looks Like (Long-Term Vision)
- Users log in via Google or GitHub (OpenAuth)
- Singleplayer mode: 10-round quiz, score screen
- Multiplayer mode: create a room, share a code, 2–4 players answer simultaneously in real time, live scores, winner screen
- 1000+ English–Italian nouns seeded from WordNet
This is documented in spec.md and the full roadmap.md. The MVP deliberately ignores most of it.
3. MVP Scope
Goal: A working, presentable singleplayer quiz that can be shown to real people.
What is IN the MVP
- Vocabulary data in a PostgreSQL database (already seeded)
- REST API that returns quiz terms with distractors
- Singleplayer quiz UI: 10 questions, answer feedback, score screen
- Clean, mobile-friendly UI (Tailwind + shadcn/ui)
- Local dev only (no deployment for MVP)
What is CUT from the MVP
| Feature | Why cut |
|---|---|
| Authentication (OpenAuth) | No user accounts needed for a demo |
| Multiplayer (WebSockets, rooms) | Core quiz works without it |
| Valkey / Redis cache | Only needed for multiplayer room state |
| Deployment to Hetzner | Ship to people locally first |
| User stats / profiles | Needs auth |
| Testing suite | Add after the UI stabilises |
These are not deleted from the plan — they are deferred. The architecture is already designed to support them. See Section 9 (Post-MVP Ladder).
4. Technology Stack
The monorepo structure and tooling are already set up (Phase 0 complete). This is the full stack — the MVP uses a subset of it.
| Layer | Technology | MVP? |
|---|---|---|
| Monorepo | pnpm workspaces | ✅ |
| Frontend | React 18, Vite, TypeScript | ✅ |
| Routing | TanStack Router | ✅ |
| Server state | TanStack Query | ✅ |
| Client state | Zustand | ✅ |
| Styling | Tailwind CSS + shadcn/ui | ✅ |
| Backend | Node.js, Express, TypeScript | ✅ |
| Database | PostgreSQL + Drizzle ORM | ✅ |
| Validation | Zod (shared schemas) | ✅ |
| Auth | OpenAuth (Google + GitHub) | ❌ post-MVP |
| Realtime | WebSockets (ws library) |
❌ post-MVP |
| Cache | Valkey | ❌ post-MVP |
| Testing | Vitest, React Testing Library | ❌ post-MVP |
| Deployment | Docker Compose, Hetzner, Nginx | ❌ post-MVP |
Repository Structure (actual, as of Phase 1 data pipeline complete)
vocab-trainer/
├── apps/
│ ├── api/
│ │ └── src/
│ │ ├── app.ts # createApp() factory — routes registered here
│ │ └── server.ts # calls app.listen()
│ └── web/
│ └── src/
│ ├── routes/
│ │ ├── __root.tsx
│ │ ├── index.tsx # placeholder landing page
│ │ └── about.tsx
│ ├── main.tsx
│ └── index.css
├── packages/
│ ├── shared/
│ │ └── src/
│ │ ├── index.ts # empty — Zod schemas go here next
│ │ └── constants.ts
│ └── db/
│ ├── drizzle/ # migration SQL files
│ └── src/
│ ├── db/schema.ts # full Drizzle schema
│ ├── seeding-datafiles.ts # seeds terms + translations
│ ├── generating-deck.ts # builds curated decks
│ └── index.ts
├── documentation/ # all project docs live here
│ ├── spec.md
│ ├── roadmap.md
│ ├── decisions.md
│ ├── mvp.md # this file
│ └── CLAUDE.md
├── scripts/
│ ├── extract-en-it-nouns.py
│ └── datafiles/en-it-noun.json
├── docker-compose.yml
└── pnpm-workspace.yaml
What does not exist yet (to be built in MVP phases):
apps/api/src/routes/— no route handlers yetapps/api/src/services/— no business logic yetapps/api/src/repositories/— no DB queries yetapps/web/src/components/— no UI components yetapps/web/src/stores/— no Zustand store yetapps/web/src/lib/api.ts— no TanStack Query wrappers yetpackages/shared/src/schemas/— no Zod schemas yet
packages/shared is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas — never duplicated.
5. Data Model (relevant tables for MVP)
export const terms = pgTable(
"terms",
{
id: uuid().primaryKey().defaultRandom(),
synset_id: text().unique().notNull(),
pos: varchar({ length: 20 }).notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
check(
"pos_check",
sql`${table.pos} IN (${sql.raw(SUPPORTED_POS.map((p) => `'${p}'`).join(", "))})`,
),
index("idx_terms_pos").on(table.pos),
],
);
export const translations = pgTable(
"translations",
{
id: uuid().primaryKey().defaultRandom(),
term_id: uuid()
.notNull()
.references(() => terms.id, { onDelete: "cascade" }),
language_code: varchar({ length: 10 }).notNull(),
text: text().notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
unique("unique_translations").on(
table.term_id,
table.language_code,
table.text,
),
index("idx_translations_lang").on(table.language_code, table.term_id),
],
);
export const decks = pgTable(
"decks",
{
id: uuid().primaryKey().defaultRandom(),
name: text().notNull(),
description: text(),
source_language: varchar({ length: 10 }).notNull(),
validated_languages: varchar({ length: 10 }).array().notNull().default([]),
is_public: boolean().default(false).notNull(),
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [
check(
"source_language_check",
sql`${table.source_language} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`,
),
check(
"validated_languages_check",
sql`validated_languages <@ ARRAY[${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))}]::varchar[]`,
),
check(
"validated_languages_excludes_source",
sql`NOT (${table.source_language} = ANY(${table.validated_languages}))`,
),
unique("unique_deck_name").on(table.name, table.source_language),
],
);
export const deck_terms = pgTable(
"deck_terms",
{
deck_id: uuid()
.notNull()
.references(() => decks.id, { onDelete: "cascade" }),
term_id: uuid()
.notNull()
.references(() => terms.id, { onDelete: "cascade" }),
added_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
},
(table) => [primaryKey({ columns: [table.deck_id, table.term_id] })],
);
The seed + deck-build scripts have already been run. Data exists in the database.
6. API Endpoints (MVP)
All endpoints prefixed /api. Schemas live in packages/shared and are validated with Zod on both sides.
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Health check (already done) |
| GET | /api/language-pairs |
List active language pairs |
| GET | /api/decks |
List available decks |
| GET | /api/decks/:id/terms |
Fetch terms with distractors for a quiz |
Distractor Logic
The QuizService picks 3 distractors server-side:
- Same part-of-speech as the correct answer
- Never the correct answer
- Never repeated within a session
7. Frontend Structure (MVP)
apps/web/src/
├── routes/
│ ├── index.tsx # Landing page / mode select
│ └── singleplayer/
│ └── index.tsx # The quiz
├── components/
│ ├── quiz/
│ │ ├── QuestionCard.tsx # Prompt word + 4 answer buttons
│ │ ├── OptionButton.tsx # idle / correct / wrong states
│ │ └── ScoreScreen.tsx # Final score + play again
│ └── ui/ # shadcn/ui wrappers
├── stores/
│ └── gameStore.ts # Zustand: question index, score, answers
└── lib/
└── api.ts # TanStack Query fetch wrappers
State Management
TanStack Query handles fetching quiz data from the API. Zustand handles the local quiz session (current question index, score, selected answers). There is no overlap between the two.
8. Working Methodology
Read this section before asking for help with any task.
This project is a learning exercise. The goal is to understand the code, not just to ship it.
How tasks are structured
The roadmap (Section 10) lists broad phases. When work starts on a phase, it gets broken into smaller, concrete subtasks with clear done-conditions before any code is written.
How to use an LLM for help
When asking an LLM for help:
- Paste this document (or the relevant sections) as context
- Describe what you're working on and what specifically you're stuck on
- Ask for hints, not solutions. Example prompts:
- "I'm trying to implement X. My current approach is Y. What am I missing conceptually?"
- "Here is my code. What would you change about the structure and why?"
- "Can you point me to the relevant docs for Z?"
Refactoring workflow
After completing a task or a block of work:
- Share the current state of the code with the LLM
- Ask: "What would you refactor here, and why? Don't show me the code — point me in the right direction and link relevant documentation."
- The LLM should explain the what and why, link to relevant docs/guides, and let you implement the fix yourself
The LLM should never write the implementation for you. If it does, ask it to delete it and explain the concept instead.
Decisions log
Keep a decisions.md file in the root. When you make a non-obvious choice (a library, a pattern, a trade-off), write one short paragraph explaining what you chose and why. This is also useful context for any LLM session.
9. Game Mechanics
- Format: source-language word prompt + 4 target-language choices
- Distractors: same POS, server-side, never the correct answer, no repeats in a session
- Session length: 10 questions
- Scoring: +1 per correct answer (no speed bonus for MVP)
- Timer: none in singleplayer MVP
- No auth required: anonymous users
10. MVP Roadmap
Tasks are written at a high level. When starting a phase, break it into smaller subtasks before writing any code.
Current Status
Phase 0 (Foundation) — ✅ Complete Phase 1 (Vocabulary Data) — 🔄 Data pipeline complete. API layer is the immediate next step.
What is already in the database:
- 999 unique English terms (nouns), fully seeded from WordNet/OMW
- 3171 term IDs resolved (higher than word count due to homonyms)
- Full Italian translation coverage (3171/3171 terms)
- Decks created and populated via
packages/db/src/generating-decks.ts - 34 words from the source wordlist had no WordNet match (expected, not a bug)
Phase 1 — Finish the API Layer
Goal: The frontend can fetch quiz data from the API.
Done when: GET /api/decks/1/terms?limit=10 returns 10 terms, each with 3 distractors of the same POS attached.
Broadly, what needs to happen:
- Define Zod response schemas in
packages/sharedfor terms, decks, and language pairs - Implement a repository layer that queries the DB for terms belonging to a deck
- Implement a service layer that attaches distractors to each term (same POS, no duplicates, no correct answer included)
- Wire up the REST endpoints (
GET /language-pairs,GET /decks,GET /decks/:id/terms) - Manually test the endpoints (curl or a REST client like Bruno/Insomnia)
Key concepts to understand before starting:
- Drizzle ORM query patterns (joins, where clauses)
- The repository pattern (data access separated from business logic)
- Zod schema definition and inference
- How pnpm workspace packages reference each other
Phase 2 — Singleplayer Quiz UI
Goal: A user can complete a full 10-question quiz in the browser.
Done when: User visits /singleplayer, answers 10 questions, sees a score screen, and can play again.
Broadly, what needs to happen:
- Build the
QuestionCardcomponent (prompt word + 4 answer buttons) - Build the
OptionButtoncomponent with three visual states: idle, correct, wrong - Build the
ScoreScreencomponent (score summary + play again) - Implement a Zustand store to track quiz session state (current question index, score, whether an answer has been picked)
- Wire up TanStack Query to fetch terms from the API on mount
- Create the
/singleplayerroute and assemble the components - Handle the between-question transition (brief delay showing result → next question)
Key concepts to understand before starting:
- TanStack Query:
useQuery, loading/error states - Zustand: defining a store, reading and writing state from components
- TanStack Router: defining routes, navigating between them
- React component composition
- Controlled state for the answer selection (which button is selected, when to lock input)
Phase 3 — UI Polish
Goal: The app looks good enough to show to people.
Done when: The quiz is usable on mobile, readable on desktop, and has a coherent visual style.
Broadly, what needs to happen:
- Apply Tailwind utility classes and shadcn/ui components consistently
- Make the layout mobile-first (touch-friendly buttons, readable font sizes)
- Add a simple landing page (
/) with a "Start Quiz" button - Add loading and error states for the API fetch
- Visual feedback on correct/wrong answers (colour, maybe a brief animation)
- Deck selection: let the user pick a deck from a list before starting
Key concepts to understand before starting:
- Tailwind CSS utility-first approach
- shadcn/ui component library and how to add components
- Responsive design with Tailwind breakpoints
- CSS transitions for simple animations
11. Key Technical Decisions
These are the non-obvious decisions already made. Any LLM helping with this project should be aware of them and not suggest alternatives without good reason.
Architecture
Express app: factory function pattern
app.ts exports createApp(). server.ts imports it and calls .listen(). This keeps tests isolated — a test can import the app without starting a server.
Layered architecture: routes → services → repositories Business logic lives in services, not route handlers or repositories. Each layer only talks to the layer directly below it. For the MVP API, this means:
routes/— parse request, call service, return responseservices/— business logic (e.g. attaching distractors)repositories/— all DB queries live here, nowhere else
Shared Zod schemas in packages/shared
All request/response shapes are defined once as Zod schemas in packages/shared and imported by both apps/api and apps/web. Types are inferred from schemas (z.infer<typeof Schema>), never written by hand.
Data Model
Decks separate from terms (not frequency-rank filtering)
Terms are raw WordNet data. Decks are curated lists. This separation exists because WordNet frequency data is unreliable for learning — common chemical element symbols ranked highly, for example. Bad words are excluded at the deck level, not filtered from terms.
Deck language model: source_language + validated_languages array
A deck is not tied to a single language pair. source_language is the language the wordlist was curated from. validated_languages is an array of target languages with full translation coverage — calculated and updated by the deck generation script on every run.
Tooling
Drizzle ORM (not Prisma): No binary, no engine. Queries map closely to SQL. Works naturally with Zod. Migrations are plain SQL files.
tsx as TypeScript runner (not ts-node): Faster, zero config, uses esbuild. Does not type-check — that is handled by tsc and the editor.
pnpm workspaces (not Turborepo): Two apps don't need the extra build caching complexity.
12. Post-MVP Ladder
These phases are deferred but planned. The architecture already supports them.
| Phase | What it adds |
|---|---|
| Auth | OpenAuth (Google + GitHub), JWT middleware, user rows in DB |
| User Stats | Games played, score history, profile page |
| Multiplayer Lobby | Room creation, join by code, WebSocket connection |
| Multiplayer Game | Simultaneous answers, server timer, live scores, winner screen |
| Deployment | Docker Compose prod config, Nginx, Let's Encrypt, Hetzner VPS |
| Hardening | Rate limiting, error boundaries, CI/CD, DB backups |
Each of these maps to a phase in the full roadmap.md.
13. Definition of Done (MVP)
GET /api/decks/:id/termsreturns terms with correct distractors- User can complete a 10-question quiz without errors
- Score screen shows final result and a play-again option
- App is usable on a mobile screen
- No hardcoded data — everything comes from the database