lila/documentation/api-development.md
lila ea33b7fcc8 feat(web): add minimal playable quiz at /play
- Add Vite proxy for /api → localhost:3000 (no CORS needed in dev)
- Create /play route with hardcoded game settings (en→it, nouns, easy)
- Three-phase state machine: loading → playing → finished
- Show prompt, optional gloss, and 4 answer buttons per question
- Submit answers to /api/v1/game/answer, show correct/wrong feedback
- Manual Next button to advance after answering
- Score screen on completion
- Add selectedOptionId to AnswerResult schema (discovered during
  frontend work that the result needs to be self-contained for
  rendering feedback without separate client state)

Intentionally unstyled — component extraction and polish come next.
2026-04-11 12:56:03 +02:00

12 KiB
Raw Blame History

Glossa — Architecture & API Development Summary

A record of all architectural discussions, decisions, and outcomes from the initial API design through the quiz model implementation.


Project Overview

Glossa is a vocabulary trainer (Duolingo-style) built as a pnpm monorepo. Users see a word and pick from 4 possible translations. Supports singleplayer and multiplayer. Stack: Express API, React frontend, Drizzle ORM, Postgres, Valkey, WebSockets.


Architectural Foundation

The Layered Architecture

The core mental model established for the entire API:

HTTP Request
     ↓
  Router        — maps URL + HTTP method to a controller
     ↓
 Controller     — handles HTTP only: validates input, calls service, sends response
     ↓
  Service       — business logic only: no HTTP, no direct DB access
     ↓
  Model         — database queries only: no business logic
     ↓
  Database

The rule: each layer only talks to the layer directly below it. A controller never touches the database. A service never reads req.body. A model never knows what a quiz is.

Monorepo Package Responsibilities

Package Owns
packages/shared Zod schemas, constants, derived TypeScript types
packages/db Drizzle schema, DB connection, all model/query functions
apps/api Router, controllers, services
apps/web React frontend, consumes types from shared

Key principle: all database code lives in packages/db. apps/api never imports drizzle-orm for queries — it only calls functions exported from packages/db.


Problems Faced & Solutions

  • Problem 1: Messy API structure Symptom: responsibilities bleeding across layers — DB code in controllers, business logic in routes. Solution: strict layered architecture with one responsibility per layer.
  • Problem 2: No shared contract between API and frontend Symptom: API could return different shapes silently, frontend breaks at runtime. Solution: Zod schemas in packages/shared as the single source of truth. Both API (validation) and frontend (type inference) consume the same schemas.
  • Problem 3: Type safety gaps Symptom: TypeScript any types on model parameters, Number vs number confusion. Solution: derived types from constants using typeof CONSTANT[number] pattern. All valid values defined once in constants, types derived automatically.
  • Problem 4: getGameTerms in wrong package Symptom: model queries living in apps/api/src/models/ meant apps/api had a direct drizzle-orm dependency and was accessing the DB itself. Solution: moved models folder to packages/db/src/models/. All Drizzle code now lives in one package.
  • Problem 5: Deck generation complexity Initial assumption: 12 decks needed (nouns/verbs × easy/intermediate/hard × en/it). Correction: decks are pools, not presets. POS and difficulty are query filters applied at runtime — not deck properties. Only 2 decks needed (en-core, it-core). Final decision: skip deck generation entirely for MVP. Query the terms table directly with difficulty + POS filters. Revisit post-MVP when spaced repetition or progression features require curated pools.
  • Problem 6: GAME_ROUNDS type conflict Problem: z.enum() only accepts strings. GAME_ROUNDS = ["3", "10"] works with z.enum() but requires Number(rounds) conversion in the service. Decision: keep as strings, convert to number in the service before passing to the model. Documented coupling acknowledged with a comment.
  • Problem 7: Gloss join could multiply question rows. Schema allowed multiple glosses per term per language, so the left join would duplicate rows. Fixed by tightening the unique constraint.
  • Problem 8: Model leaked quiz semantics. Return fields were named prompt / answer, baking HTTP-layer concepts into the database layer. Renamed to neutral field names.

Decisions Made

  • Zod schemas belong in packages/shared Both the API and frontend import from the same schemas. If the shape changes, TypeScript compilation fails in both places simultaneously — silent drift is impossible.
  • Server-side answer evaluation The correct answer is never sent to the frontend in QuizQuestion. It is only revealed in AnswerResult after the client submits. Prevents cheating and keeps game logic authoritative on the server.
  • safeParse over parse in controllers parse throws a raw Zod error → ugly 500 response. safeParse returns a result object → clean 400 with early return. Global error handler to be implemented later (Step 6 of roadmap) will centralise this pattern.
  • POST not GET for game start GET requests have no body. Game configuration is submitted as a JSON body → POST is semantically correct.
  • express.json() middleware required Without it, req.body is undefined. Added to createApp() in app.ts.
  • Type naming: PascalCase TypeScript convention. supportedLanguageCodeSupportedLanguageCode etc.
  • Primitive types: always lowercase number not Number, string not String. The uppercase versions are object wrappers and not assignable to Drizzle's expected primitive types.
  • Model parameters use shared types, not GameRequestType The model layer should not know about GameRequestType — that's an HTTP boundary concern. Instead, parameters are typed using the derived constant types (SupportedLanguageCode, SupportedPos, DifficultyLevel) exported from packages/shared.
  • One gloss per term per language. The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent the left join from multiplying question rows. Revisit if multiple glosses per language are ever needed (e.g. register or domain variants).
  • Model returns neutral field names, not quiz semantics. getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features.
  • Asymmetric difficulty filter. Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned.

Data Pipeline Work (Pre-API)

CEFR Enrichment Pipeline (completed)

A staged ETL pipeline was built to enrich translation records with CEFR levels and difficulty ratings:

Raw source files
      ↓
extract-*.py      — normalise each source to standard JSON
      ↓
compare-*.py      — quality gate: surface conflicts between sources (read-only)
      ↓
merge-*.py        — resolve conflicts by source priority, derive difficulty
      ↓
enrich.ts         — write cefr_level + difficulty to DB translations table

Source priority:

  • English: en_m3 > cefrj > octanove > random
  • Italian: it_m3 > italian

Enrichment results:

Language Enriched Total Coverage
English 42,527 171,394 ~25%
Italian 23,061 54,603 ~42%

Both languages have sufficient coverage for MVP. Italian C2 has only 242 terms — noted as a potential constraint for the distractor algorithm at high difficulty.


API Schemas (packages/shared)

GameRequestSchema (implemented)

{
  source_language: z.enum(SUPPORTED_LANGUAGE_CODES),
  target_language: z.enum(SUPPORTED_LANGUAGE_CODES),
  pos: z.enum(SUPPORTED_POS),
  difficulty: z.enum(DIFFICULTY_LEVELS),
  rounds: z.enum(GAME_ROUNDS),
}

Planned schemas (not yet implemented)

QuizQuestion      — prompt, optional gloss, 4 options (no correct answer)
QuizOption        — optionId + text
AnswerSubmission  — questionId + selectedOptionId
AnswerResult      — correct boolean, correctOptionId, selectedOptionId

API Endpoints

POST /api/v1/game/start     GameRequest → QuizQuestion[]
POST /api/v1/game/answer    AnswerSubmission → AnswerResult

Current File Structure (apps/api)

apps/api/src/
├── app.ts                  — Express app, express.json() middleware
├── server.ts               — starts server on PORT
├── routes/
│   ├── apiRouter.ts        — mounts /health and /game routers
│   ├── gameRouter.ts       — POST /start → createGame controller
│   └── healthRouter.ts
├── controllers/
│   └── gameController.ts   — validates GameRequest, calls service
└── services/
    └── gameService.ts      — calls getGameTerms, returns raw rows

Current File Structure (packages/db)

packages/db/src/
├── db/
│   └── schema.ts           — Drizzle schema (terms, translations, users, decks...)
├── models/
│   └── termModel.ts        — getGameTerms() query
└── index.ts                — exports db connection + getGameTerms

Completed Tasks

  • Layered architecture established and understood
  • GameRequestSchema defined in packages/shared
  • Derived types (SupportedLanguageCode, SupportedPos, DifficultyLevel) exported from constants
  • getGameTerms() model implemented with POS / language / difficulty / limit filters
  • Model correctly placed in packages/db
  • prepareGameQuestions() service skeleton calling the model
  • createGame controller with Zod safeParse validation
  • POST /api/v1/game/start route wired
  • End-to-end pipeline verified with test script — returns correct rows
  • CEFR enrichment pipeline complete for English and Italian
  • Double join on translations implemented (source + target language)
  • Gloss left join implemented
  • Model return type uses neutral field names (sourceText, targetText, sourceGloss)
  • Schema: gloss unique constraint tightened to one gloss per term per language

Roadmap Ahead

Step 1 — Learn SQL fundamentals (in progress)

Concepts needed: SELECT, FROM, JOIN, WHERE, LIMIT. Resources: sqlzoo.net or Khan Academy SQL section. Required before: implementing the double join for source language prompt.

Step 2 — Complete the model layer

  • Double join on translations — once for source language (prompt), once for target language (answer)
  • GlossModel.getGloss(termId, languageCode) — fetch gloss if available

Step 3 — Define remaining Zod schemas

  • QuizQuestion, QuizOption, AnswerSubmission, AnswerResult in packages/shared

Step 4 — Complete the service layer

  • QuizService.buildSession() — assemble raw rows into QuizQuestion[]
    • Generate questionId per question
    • Map source language translation as prompt
    • Attach gloss if available
    • Fetch 3 distractors (same POS, different term, same difficulty)
    • Shuffle options so correct answer is not always in same position
  • QuizService.evaluateAnswer() — validate correctness, return AnswerResult

Step 5 — Implement answer endpoint

  • POST /api/v1/game/answer route, controller, service method

Step 6 — Global error handler

  • Typed error classes (ValidationError, NotFoundError)
  • Central error middleware in app.ts
  • Remove temporary safeParse error handling from controllers

Step 7 — Tests

  • Unit tests for QuizService — correct POS filtering, distractor never equals correct answer
  • Unit tests for evaluateAnswer — correct and incorrect cases
  • Integration tests for both endpoints

Step 8 — Auth (Phase 2 from original roadmap)

  • OpenAuth integration
  • JWT validation middleware
  • GET /api/auth/me endpoint
  • Frontend auth guard

Open Questions

  • Distractor algorithm: when Italian C2 has only 242 terms, should the difficulty filter fall back gracefully or return an error? Decision needed before implementing buildSession().
  • Session statefulness: game loop is currently stateless (fetch all questions upfront). Confirm this is still the intended MVP approach before building buildSession().
  • Glosses can leak answers: some WordNet glosses contain the target-language word in the definition text (e.g. "Padre" appearing in the English gloss for "father"). Address during the post-MVP data enrichment pass — either clean the glosses, replace them with custom definitions, or filter at the service layer.