# Glossa — Architecture & API Development Summary A record of all architectural discussions, decisions, and outcomes from the initial API design through the quiz model implementation. --- ## Project Overview Glossa is a vocabulary trainer (Duolingo-style) built as a pnpm monorepo. Users see a word and pick from 4 possible translations. Supports singleplayer and multiplayer. Stack: Express API, React frontend, Drizzle ORM, Postgres, Valkey, WebSockets. --- ## Architectural Foundation ### The Layered Architecture The core mental model established for the entire API: ```text HTTP Request ↓ Router — maps URL + HTTP method to a controller ↓ Controller — handles HTTP only: validates input, calls service, sends response ↓ Service — business logic only: no HTTP, no direct DB access ↓ Model — database queries only: no business logic ↓ Database ``` **The rule:** each layer only talks to the layer directly below it. A controller never touches the database. A service never reads `req.body`. A model never knows what a quiz is. ### Monorepo Package Responsibilities | Package | Owns | | ----------------- | -------------------------------------------------------- | | `packages/shared` | Zod schemas, constants, derived TypeScript types | | `packages/db` | Drizzle schema, DB connection, all model/query functions | | `apps/api` | Router, controllers, services | | `apps/web` | React frontend, consumes types from shared | **Key principle:** all database code lives in `packages/db`. `apps/api` never imports `drizzle-orm` for queries — it only calls functions exported from `packages/db`. --- ## Problems Faced & Solutions - Problem 1: Messy API structure **Symptom:** responsibilities bleeding across layers — DB code in controllers, business logic in routes. **Solution:** strict layered architecture with one responsibility per layer. - Problem 2: No shared contract between API and frontend **Symptom:** API could return different shapes silently, frontend breaks at runtime. **Solution:** Zod schemas in `packages/shared` as the single source of truth. Both API (validation) and frontend (type inference) consume the same schemas. - Problem 3: Type safety gaps **Symptom:** TypeScript `any` types on model parameters, `Number` vs `number` confusion. **Solution:** derived types from constants using `typeof CONSTANT[number]` pattern. All valid values defined once in constants, types derived automatically. - Problem 4: `getGameTerms` in wrong package **Symptom:** model queries living in `apps/api/src/models/` meant `apps/api` had a direct `drizzle-orm` dependency and was accessing the DB itself. **Solution:** moved models folder to `packages/db/src/models/`. All Drizzle code now lives in one package. - Problem 5: Deck generation complexity **Initial assumption:** 12 decks needed (nouns/verbs × easy/intermediate/hard × en/it). **Correction:** decks are pools, not presets. POS and difficulty are query filters applied at runtime — not deck properties. Only 2 decks needed (en-core, it-core). **Final decision:** skip deck generation entirely for MVP. Query the terms table directly with difficulty + POS filters. Revisit post-MVP when spaced repetition or progression features require curated pools. - Problem 6: GAME_ROUNDS type conflict **Problem:** `z.enum()` only accepts strings. `GAME_ROUNDS = ["3", "10"]` works with `z.enum()` but requires `Number(rounds)` conversion in the service. **Decision:** keep as strings, convert to number in the service before passing to the model. Documented coupling acknowledged with a comment. - Problem 7: Gloss join could multiply question rows. Schema allowed multiple glosses per term per language, so the left join would duplicate rows. Fixed by tightening the unique constraint. - Problem 8: Model leaked quiz semantics. Return fields were named prompt / answer, baking HTTP-layer concepts into the database layer. Renamed to neutral field names. - Problem 9: AnswerResult wasn't self-contained. Frontend needed selectedOptionId to render feedback but the schema didn't include it (reasoning was "client already knows"). Discovered during frontend work; added the field. - Problem 10: Distractor could duplicate the correct answer text. Different terms can share the same translation. Fixed with ne(translations.text, excludeText) in the query. - Problem 11: TypeScript strict mode flagged Fisher-Yates shuffle array access. noUncheckedIndexedAccess treats result[i] as T | undefined. Fixed with non-null assertion and temp variable pattern. --- ## Decisions Made - Zod schemas belong in `packages/shared` Both the API and frontend import from the same schemas. If the shape changes, TypeScript compilation fails in both places simultaneously — silent drift is impossible. - Server-side answer evaluation The correct answer is never sent to the frontend in `QuizQuestion`. It is only revealed in `AnswerResult` after the client submits. Prevents cheating and keeps game logic authoritative on the server. - `safeParse` over `parse` in controllers `parse` throws a raw Zod error → ugly 500 response. `safeParse` returns a result object → clean 400 with early return. Global error handler to be implemented later (Step 6 of roadmap) will centralise this pattern. - POST not GET for game start `GET` requests have no body. Game configuration is submitted as a JSON body → `POST` is semantically correct. - `express.json()` middleware required Without it, `req.body` is `undefined`. Added to `createApp()` in `app.ts`. - Type naming: PascalCase TypeScript convention. `supportedLanguageCode` → `SupportedLanguageCode` etc. - Primitive types: always lowercase `number` not `Number`, `string` not `String`. The uppercase versions are object wrappers and not assignable to Drizzle's expected primitive types. - Model parameters use shared types, not `GameRequestType` The model layer should not know about `GameRequestType` — that's an HTTP boundary concern. Instead, parameters are typed using the derived constant types (`SupportedLanguageCode`, `SupportedPos`, `DifficultyLevel`) exported from `packages/shared`. - One gloss per term per language. The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent the left join from multiplying question rows. Revisit if multiple glosses per language are ever needed (e.g. register or domain variants). - Model returns neutral field names, not quiz semantics. getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features. - Asymmetric difficulty filter. Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned. - optionId as integer 0-3, not UUID. Options only need uniqueness within a single question; cheating prevented by shuffling, not opaque IDs. - questionId and sessionId as UUIDs. Globally unique, opaque, natural Valkey keys when storage moves later. - gloss is string | null rather than optional, for predictable shape on the frontend. - GameSessionStore stores only the answer key (questionId → correctOptionId). Minimal payload for easy Valkey migration. - All GameSessionStore methods are async even for the in-memory implementation, so the service layer is already written for Valkey. - Distractors fetched per-question (N+1 queries). Correct shape for the problem; 10 queries on local Postgres is negligible latency. - No fallback logic for insufficient distractors. Data volumes are sufficient; strict query throws if something is genuinely broken. - Distractor query excludes both the correct term ID and the correct answer text, preventing duplicate options from different terms with the same translation. - Submit-before-send flow on frontend: user selects, then confirms. Prevents misclicks. --- ## Data Pipeline Work (Pre-API) ### CEFR Enrichment Pipeline (completed) A staged ETL pipeline was built to enrich translation records with CEFR levels and difficulty ratings: ```text Raw source files ↓ extract-*.py — normalise each source to standard JSON ↓ compare-*.py — quality gate: surface conflicts between sources (read-only) ↓ merge-*.py — resolve conflicts by source priority, derive difficulty ↓ enrich.ts — write cefr_level + difficulty to DB translations table ``` **Source priority:** - English: `en_m3` > `cefrj` > `octanove` > `random` - Italian: `it_m3` > `italian` **Enrichment results:** | Language | Enriched | Total | Coverage | | -------- | -------- | ------- | -------- | | English | 42,527 | 171,394 | ~25% | | Italian | 23,061 | 54,603 | ~42% | Both languages have sufficient coverage for MVP. Italian C2 has only 242 terms — noted as a potential constraint for the distractor algorithm at high difficulty. --- ## API Schemas (packages/shared) ### `GameRequestSchema` ```typescript { source_language: z.enum(SUPPORTED_LANGUAGE_CODES), target_language: z.enum(SUPPORTED_LANGUAGE_CODES), pos: z.enum(SUPPORTED_POS), difficulty: z.enum(DIFFICULTY_LEVELS), rounds: z.enum(GAME_ROUNDS), } ``` AnswerOption: { optionId: number (0-3), text: string } GameQuestion: { questionId: uuid, prompt: string, gloss: string | null, options: AnswerOption[4] } GameSession: { sessionId: uuid, questions: GameQuestion[] } AnswerSubmission: { sessionId: uuid, questionId: uuid, selectedOptionId: number (0-3) } AnswerResult: { questionId: uuid, isCorrect: boolean, correctOptionId: number (0-3), selectedOptionId: number (0-3) } --- ## API Endpoints ```text POST /api/v1/game/start GameRequest → QuizQuestion[] POST /api/v1/game/answer AnswerSubmission → AnswerResult ``` --- ## Current File Structure (apps/api) ```text apps/api/src/ ├── app.ts — Express app, express.json() middleware ├── server.ts — starts server on PORT ├── routes/ │ ├── apiRouter.ts — mounts /health and /game routers │ ├── gameRouter.ts — POST /start → createGame controller │ └── healthRouter.ts ├── controllers/ │ └── gameController.ts — validates GameRequest, calls service └── services/ └── gameService.ts — calls getGameTerms, returns raw rows ``` --- ## Current File Structure (packages/db) ```text packages/db/src/ ├── db/ │ └── schema.ts — Drizzle schema (terms, translations, users, decks...) ├── models/ │ └── termModel.ts — getGameTerms() query └── index.ts — exports db connection + getGameTerms ``` --- ## Completed Tasks - [x] Layered architecture established and understood - [x] `GameRequestSchema` defined in `packages/shared` - [x] Derived types (`SupportedLanguageCode`, `SupportedPos`, `DifficultyLevel`) exported from constants - [x] `getGameTerms()` model implemented with POS / language / difficulty / limit filters - [x] Model correctly placed in `packages/db` - [x] `prepareGameQuestions()` service skeleton calling the model - [x] `createGame` controller with Zod `safeParse` validation - [x] `POST /api/v1/game/start` route wired - [x] End-to-end pipeline verified with test script — returns correct rows - [x] CEFR enrichment pipeline complete for English and Italian - [x] Double join on translations implemented (source + target language) - [x] Gloss left join implemented - [x] Model return type uses neutral field names (sourceText, targetText, sourceGloss) - [x] Schema: gloss unique constraint tightened to one gloss per term per language - [x] Zod schemas defined: AnswerOption, GameQuestion, GameSession, AnswerSubmission, AnswerResult - [x] getDistractors model implemented with POS/difficulty/language/excludeTermId/excludeText filters - [x] createGameSession service: fetches terms, fetches distractors per question, shuffles options, stores session, returns GameSession - [x] evaluateAnswer service: looks up session, compares submitted optionId to stored correct answer, returns AnswerResult - [x] GameSessionStore interface + InMemoryGameSessionStore (Map-backed, swappable to Valkey) - [x] POST /api/v1/game/answer endpoint wired (route, controller, service) - [x] selectedOptionId added to AnswerResult (discovered during frontend work) - [x] Minimal frontend: /play route with settings UI, QuestionCard, OptionButton, ScoreScreen - [x] Vite proxy configured for dev --- ## Roadmap Ahead ### Step 1 — Learn SQL fundamentals - done Concepts needed: SELECT, FROM, JOIN, WHERE, LIMIT. Resources: sqlzoo.net or Khan Academy SQL section. Required before: implementing the double join for source language prompt. ### Step 2 — Complete the model layer - done - Double join on `translations` — once for source language (prompt), once for target language (answer) - `GlossModel.getGloss(termId, languageCode)` — fetch gloss if available ### Step 3 — Define remaining Zod schemas - done - `QuizQuestion`, `QuizOption`, `AnswerSubmission`, `AnswerResult` in `packages/shared` ### Step 4 — Complete the service layer - done - `QuizService.buildSession()` — assemble raw rows into `QuizQuestion[]` - Generate `questionId` per question - Map source language translation as prompt - Attach gloss if available - Fetch 3 distractors (same POS, different term, same difficulty) - Shuffle options so correct answer is not always in same position - `QuizService.evaluateAnswer()` — validate correctness, return `AnswerResult` ### Step 5 — Implement answer endpoint - done - `POST /api/v1/game/answer` route, controller, service method ### Step 6 — Global error handler - Typed error classes (`ValidationError`, `NotFoundError`) - Central error middleware in `app.ts` - Remove temporary `safeParse` error handling from controllers ### Step 7 — Tests - Unit tests for `QuizService` — correct POS filtering, distractor never equals correct answer - Unit tests for `evaluateAnswer` — correct and incorrect cases - Integration tests for both endpoints ### Step 8 — Auth (Phase 2 from original roadmap) - OpenAuth integration - JWT validation middleware - `GET /api/auth/me` endpoint - Frontend auth guard --- ## Open Questions - **Distractor algorithm:** when Italian C2 has only 242 terms, should the difficulty filter fall back gracefully or return an error? Decision needed before implementing `buildSession()`. => resolved - **Session statefulness:** game loop is currently stateless (fetch all questions upfront). Confirm this is still the intended MVP approach before building `buildSession()`. => resolved - **Glosses can leak answers:** some WordNet glosses contain the target-language word in the definition text (e.g. "Padre" appearing in the English gloss for "father"). Address during the post-MVP data enrichment pass — either clean the glosses, replace them with custom definitions, or filter at the service layer. => resolved