lila/documentation/mvp.md

18 KiB
Raw Blame History

glossa mvp

This document is the single source of truth for the project. It is written to be handed to any LLM as context. It contains the project vision, the current MVP scope, the tech stack, the working methodology, and the roadmap.


1. Project Overview

A vocabulary trainer for EnglishItalian words. The quiz format is Duolingo-style: one word is shown as a prompt, and the user picks the correct translation from four choices (1 correct + 3 distractors of the same part-of-speech). The long-term vision is a multiplayer competitive game, but the MVP is a polished singleplayer experience.

The core learning loop: Show word → pick answer → see result → next word → final score

The vocabulary data comes from WordNet + the Open Multilingual Wordnet (OMW). A one-time Python script extracts EnglishItalian noun pairs and seeds the database. The data model is language-pair agnostic by design — adding a new language later requires no schema changes.


2. What the Full Product Looks Like (Long-Term Vision)

  • Users log in via Google or GitHub (OpenAuth)
  • Singleplayer mode: 10-round quiz, score screen
  • Multiplayer mode: create a room, share a code, 24 players answer simultaneously in real time, live scores, winner screen
  • 1000+ EnglishItalian nouns seeded from WordNet

This is documented in spec.md and the full roadmap.md. The MVP deliberately ignores most of it.


3. MVP Scope

Goal: A working, presentable singleplayer quiz that can be shown to real people.

What is IN the MVP

  • Vocabulary data in a PostgreSQL database (already seeded)
  • REST API that returns quiz terms with distractors
  • Singleplayer quiz UI: 10 questions, answer feedback, score screen
  • Clean, mobile-friendly UI (Tailwind + shadcn/ui)
  • Local dev only (no deployment for MVP)

What is CUT from the MVP

Feature Why cut
Authentication (OpenAuth) No user accounts needed for a demo
Multiplayer (WebSockets, rooms) Core quiz works without it
Valkey / Redis cache Only needed for multiplayer room state
Deployment to Hetzner Ship to people locally first
User stats / profiles Needs auth
Testing suite Add after the UI stabilises

These are not deleted from the plan — they are deferred. The architecture is already designed to support them. See Section 9 (Post-MVP Ladder).


4. Technology Stack

The monorepo structure and tooling are already set up (Phase 0 complete). This is the full stack — the MVP uses a subset of it.

Layer Technology MVP?
Monorepo pnpm workspaces
Frontend React 18, Vite, TypeScript
Routing TanStack Router
Server state TanStack Query
Client state Zustand
Styling Tailwind CSS + shadcn/ui
Backend Node.js, Express, TypeScript
Database PostgreSQL + Drizzle ORM
Validation Zod (shared schemas)
Auth OpenAuth (Google + GitHub) post-MVP
Realtime WebSockets (ws library) post-MVP
Cache Valkey post-MVP
Testing Vitest, React Testing Library post-MVP
Deployment Docker Compose, Hetzner, Nginx post-MVP

Repository Structure (actual, as of Phase 1 data pipeline complete)

vocab-trainer/
├── apps/
│   ├── api/
│   │   └── src/
│   │       ├── app.ts        # createApp() factory — routes registered here
│   │       └── server.ts     # calls app.listen()
│   └── web/
│       └── src/
│           ├── routes/
│           │   ├── __root.tsx
│           │   ├── index.tsx  # placeholder landing page
│           │   └── about.tsx
│           ├── main.tsx
│           └── index.css
├── packages/
│   ├── shared/
│   │   └── src/
│   │       ├── index.ts      # empty — Zod schemas go here next
│   │       └── constants.ts
│   └── db/
│       ├── drizzle/          # migration SQL files
│       └── src/
│           ├── db/schema.ts          # full Drizzle schema
│           ├── seeding-datafiles.ts  # seeds terms + translations
│           ├── generating-deck.ts    # builds curated decks
│           └── index.ts
├── documentation/            # all project docs live here
│   ├── spec.md
│   ├── roadmap.md
│   ├── decisions.md
│   ├── mvp.md                # this file
│   └── CLAUDE.md
├── scripts/
│   ├── extract-en-it-nouns.py
│   └── datafiles/en-it-noun.json
├── docker-compose.yml
└── pnpm-workspace.yaml

What does not exist yet (to be built in MVP phases):

  • apps/api/src/routes/ — no route handlers yet
  • apps/api/src/services/ — no business logic yet
  • apps/api/src/repositories/ — no DB queries yet
  • apps/web/src/components/ — no UI components yet
  • apps/web/src/stores/ — no Zustand store yet
  • apps/web/src/lib/api.ts — no TanStack Query wrappers yet
  • packages/shared/src/schemas/ — no Zod schemas yet

packages/shared is the contract between frontend and backend. All request/response shapes are defined there as Zod schemas — never duplicated.


5. Data Model (relevant tables for MVP)

export const terms = pgTable(
  "terms",
  {
    id: uuid().primaryKey().defaultRandom(),
    synset_id: text().unique().notNull(),
    pos: varchar({ length: 20 }).notNull(),
    created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
  },
  (table) => [
    check(
      "pos_check",
      sql`${table.pos} IN (${sql.raw(SUPPORTED_POS.map((p) => `'${p}'`).join(", "))})`,
    ),
    index("idx_terms_pos").on(table.pos),
  ],
);

export const translations = pgTable(
  "translations",
  {
    id: uuid().primaryKey().defaultRandom(),
    term_id: uuid()
      .notNull()
      .references(() => terms.id, { onDelete: "cascade" }),
    language_code: varchar({ length: 10 }).notNull(),
    text: text().notNull(),
    created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
  },
  (table) => [
    unique("unique_translations").on(
      table.term_id,
      table.language_code,
      table.text,
    ),
    index("idx_translations_lang").on(table.language_code, table.term_id),
  ],
);

export const decks = pgTable(
  "decks",
  {
    id: uuid().primaryKey().defaultRandom(),
    name: text().notNull(),
    description: text(),
    source_language: varchar({ length: 10 }).notNull(),
    validated_languages: varchar({ length: 10 }).array().notNull().default([]),
    is_public: boolean().default(false).notNull(),
    created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
  },
  (table) => [
    check(
      "source_language_check",
      sql`${table.source_language} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`,
    ),
    check(
      "validated_languages_check",
      sql`validated_languages <@ ARRAY[${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))}]::varchar[]`,
    ),
    check(
      "validated_languages_excludes_source",
      sql`NOT (${table.source_language} = ANY(${table.validated_languages}))`,
    ),
    unique("unique_deck_name").on(table.name, table.source_language),
  ],
);

export const deck_terms = pgTable(
  "deck_terms",
  {
    deck_id: uuid()
      .notNull()
      .references(() => decks.id, { onDelete: "cascade" }),
    term_id: uuid()
      .notNull()
      .references(() => terms.id, { onDelete: "cascade" }),
    added_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
  },
  (table) => [primaryKey({ columns: [table.deck_id, table.term_id] })],
);

The seed + deck-build scripts have already been run. Data exists in the database.


6. API Endpoints (MVP)

All endpoints prefixed /api. Schemas live in packages/shared and are validated with Zod on both sides.

Method Path Description
GET /api/health Health check (already done)
GET /api/language-pairs List active language pairs
GET /api/decks List available decks
GET /api/decks/:id/terms Fetch terms with distractors for a quiz

Distractor Logic

The QuizService picks 3 distractors server-side:

  • Same part-of-speech as the correct answer
  • Never the correct answer
  • Never repeated within a session

7. Frontend Structure (MVP)

apps/web/src/
├── routes/
│   ├── index.tsx             # Landing page / mode select
│   └── singleplayer/
│       └── index.tsx         # The quiz
├── components/
│   ├── quiz/
│   │   ├── QuestionCard.tsx  # Prompt word + 4 answer buttons
│   │   ├── OptionButton.tsx  # idle / correct / wrong states
│   │   └── ScoreScreen.tsx   # Final score + play again
│   └── ui/                   # shadcn/ui wrappers
├── stores/
│   └── gameStore.ts          # Zustand: question index, score, answers
└── lib/
    └── api.ts                # TanStack Query fetch wrappers

State Management

TanStack Query handles fetching quiz data from the API. Zustand handles the local quiz session (current question index, score, selected answers). There is no overlap between the two.


8. Working Methodology

Read this section before asking for help with any task.

This project is a learning exercise. The goal is to understand the code, not just to ship it.

How tasks are structured

The roadmap (Section 10) lists broad phases. When work starts on a phase, it gets broken into smaller, concrete subtasks with clear done-conditions before any code is written.

How to use an LLM for help

When asking an LLM for help:

  1. Paste this document (or the relevant sections) as context
  2. Describe what you're working on and what specifically you're stuck on
  3. Ask for hints, not solutions. Example prompts:
    • "I'm trying to implement X. My current approach is Y. What am I missing conceptually?"
    • "Here is my code. What would you change about the structure and why?"
    • "Can you point me to the relevant docs for Z?"

Refactoring workflow

After completing a task or a block of work:

  1. Share the current state of the code with the LLM
  2. Ask: "What would you refactor here, and why? Don't show me the code — point me in the right direction and link relevant documentation."
  3. The LLM should explain the what and why, link to relevant docs/guides, and let you implement the fix yourself

The LLM should never write the implementation for you. If it does, ask it to delete it and explain the concept instead.

Decisions log

Keep a decisions.md file in the root. When you make a non-obvious choice (a library, a pattern, a trade-off), write one short paragraph explaining what you chose and why. This is also useful context for any LLM session.


9. Game Mechanics

  • Format: source-language word prompt + 4 target-language choices
  • Distractors: same POS, server-side, never the correct answer, no repeats in a session
  • Session length: 10 questions
  • Scoring: +1 per correct answer (no speed bonus for MVP)
  • Timer: none in singleplayer MVP
  • No auth required: anonymous users

10. MVP Roadmap

Tasks are written at a high level. When starting a phase, break it into smaller subtasks before writing any code.

Current Status

Phase 0 (Foundation) — Complete Phase 1 (Vocabulary Data) — 🔄 Data pipeline complete. API layer is the immediate next step.

What is already in the database:

  • 999 unique English terms (nouns), fully seeded from WordNet/OMW
  • 3171 term IDs resolved (higher than word count due to homonyms)
  • Full Italian translation coverage (3171/3171 terms)
  • Decks created and populated via packages/db/src/generating-decks.ts
  • 34 words from the source wordlist had no WordNet match (expected, not a bug)

Phase 1 — Finish the API Layer

Goal: The frontend can fetch quiz data from the API.

Done when: GET /api/decks/1/terms?limit=10 returns 10 terms, each with 3 distractors of the same POS attached.

Broadly, what needs to happen:

  • Define Zod response schemas in packages/shared for terms, decks, and language pairs
  • Implement a repository layer that queries the DB for terms belonging to a deck
  • Implement a service layer that attaches distractors to each term (same POS, no duplicates, no correct answer included)
  • Wire up the REST endpoints (GET /language-pairs, GET /decks, GET /decks/:id/terms)
  • Manually test the endpoints (curl or a REST client like Bruno/Insomnia)

Key concepts to understand before starting:

  • Drizzle ORM query patterns (joins, where clauses)
  • The repository pattern (data access separated from business logic)
  • Zod schema definition and inference
  • How pnpm workspace packages reference each other

Phase 2 — Singleplayer Quiz UI

Goal: A user can complete a full 10-question quiz in the browser.

Done when: User visits /singleplayer, answers 10 questions, sees a score screen, and can play again.

Broadly, what needs to happen:

  • Build the QuestionCard component (prompt word + 4 answer buttons)
  • Build the OptionButton component with three visual states: idle, correct, wrong
  • Build the ScoreScreen component (score summary + play again)
  • Implement a Zustand store to track quiz session state (current question index, score, whether an answer has been picked)
  • Wire up TanStack Query to fetch terms from the API on mount
  • Create the /singleplayer route and assemble the components
  • Handle the between-question transition (brief delay showing result → next question)

Key concepts to understand before starting:

  • TanStack Query: useQuery, loading/error states
  • Zustand: defining a store, reading and writing state from components
  • TanStack Router: defining routes, navigating between them
  • React component composition
  • Controlled state for the answer selection (which button is selected, when to lock input)

Phase 3 — UI Polish

Goal: The app looks good enough to show to people.

Done when: The quiz is usable on mobile, readable on desktop, and has a coherent visual style.

Broadly, what needs to happen:

  • Apply Tailwind utility classes and shadcn/ui components consistently
  • Make the layout mobile-first (touch-friendly buttons, readable font sizes)
  • Add a simple landing page (/) with a "Start Quiz" button
  • Add loading and error states for the API fetch
  • Visual feedback on correct/wrong answers (colour, maybe a brief animation)
  • Deck selection: let the user pick a deck from a list before starting

Key concepts to understand before starting:

  • Tailwind CSS utility-first approach
  • shadcn/ui component library and how to add components
  • Responsive design with Tailwind breakpoints
  • CSS transitions for simple animations

11. Key Technical Decisions

These are the non-obvious decisions already made. Any LLM helping with this project should be aware of them and not suggest alternatives without good reason.

Architecture

Express app: factory function pattern app.ts exports createApp(). server.ts imports it and calls .listen(). This keeps tests isolated — a test can import the app without starting a server.

Layered architecture: routes → services → repositories Business logic lives in services, not route handlers or repositories. Each layer only talks to the layer directly below it. For the MVP API, this means:

  • routes/ — parse request, call service, return response
  • services/ — business logic (e.g. attaching distractors)
  • repositories/ — all DB queries live here, nowhere else

Shared Zod schemas in packages/shared All request/response shapes are defined once as Zod schemas in packages/shared and imported by both apps/api and apps/web. Types are inferred from schemas (z.infer<typeof Schema>), never written by hand.

Data Model

Decks separate from terms (not frequency-rank filtering) Terms are raw WordNet data. Decks are curated lists. This separation exists because WordNet frequency data is unreliable for learning — common chemical element symbols ranked highly, for example. Bad words are excluded at the deck level, not filtered from terms.

Deck language model: source_language + validated_languages array A deck is not tied to a single language pair. source_language is the language the wordlist was curated from. validated_languages is an array of target languages with full translation coverage — calculated and updated by the deck generation script on every run.

Tooling

Drizzle ORM (not Prisma): No binary, no engine. Queries map closely to SQL. Works naturally with Zod. Migrations are plain SQL files.

tsx as TypeScript runner (not ts-node): Faster, zero config, uses esbuild. Does not type-check — that is handled by tsc and the editor.

pnpm workspaces (not Turborepo): Two apps don't need the extra build caching complexity.


12. Post-MVP Ladder

These phases are deferred but planned. The architecture already supports them.

Phase What it adds
Auth OpenAuth (Google + GitHub), JWT middleware, user rows in DB
User Stats Games played, score history, profile page
Multiplayer Lobby Room creation, join by code, WebSocket connection
Multiplayer Game Simultaneous answers, server timer, live scores, winner screen
Deployment Docker Compose prod config, Nginx, Let's Encrypt, Hetzner VPS
Hardening Rate limiting, error boundaries, CI/CD, DB backups

Each of these maps to a phase in the full roadmap.md.


13. Definition of Done (MVP)

  • GET /api/decks/:id/terms returns terms with correct distractors
  • User can complete a 10-question quiz without errors
  • Score screen shows final result and a play-again option
  • App is usable on a mobile screen
  • No hardcoded data — everything comes from the database