lila/documentation/api-development.md
2026-04-11 21:32:13 +02:00

322 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Glossa — Architecture & API Development Summary
A record of all architectural discussions, decisions, and outcomes from the initial
API design through the quiz model implementation.
---
## Project Overview
Glossa is a vocabulary trainer (Duolingo-style) built as a pnpm monorepo. Users see a
word and pick from 4 possible translations. Supports singleplayer and multiplayer.
Stack: Express API, React frontend, Drizzle ORM, Postgres, Valkey, WebSockets.
---
## Architectural Foundation
### The Layered Architecture
The core mental model established for the entire API:
```text
HTTP Request
Router — maps URL + HTTP method to a controller
Controller — handles HTTP only: validates input, calls service, sends response
Service — business logic only: no HTTP, no direct DB access
Model — database queries only: no business logic
Database
```
**The rule:** each layer only talks to the layer directly below it. A controller never
touches the database. A service never reads `req.body`. A model never knows what a quiz is.
### Monorepo Package Responsibilities
| Package | Owns |
| ----------------- | -------------------------------------------------------- |
| `packages/shared` | Zod schemas, constants, derived TypeScript types |
| `packages/db` | Drizzle schema, DB connection, all model/query functions |
| `apps/api` | Router, controllers, services |
| `apps/web` | React frontend, consumes types from shared |
**Key principle:** all database code lives in `packages/db`. `apps/api` never imports
`drizzle-orm` for queries — it only calls functions exported from `packages/db`.
---
## Problems Faced & Solutions
- Problem 1: Messy API structure
**Symptom:** responsibilities bleeding across layers — DB code in controllers, business
logic in routes.
**Solution:** strict layered architecture with one responsibility per layer.
- Problem 2: No shared contract between API and frontend
**Symptom:** API could return different shapes silently, frontend breaks at runtime.
**Solution:** Zod schemas in `packages/shared` as the single source of truth. Both API
(validation) and frontend (type inference) consume the same schemas.
- Problem 3: Type safety gaps
**Symptom:** TypeScript `any` types on model parameters, `Number` vs `number` confusion.
**Solution:** derived types from constants using `typeof CONSTANT[number]` pattern.
All valid values defined once in constants, types derived automatically.
- Problem 4: `getGameTerms` in wrong package
**Symptom:** model queries living in `apps/api/src/models/` meant `apps/api` had a
direct `drizzle-orm` dependency and was accessing the DB itself.
**Solution:** moved models folder to `packages/db/src/models/`. All Drizzle code now
lives in one package.
- Problem 5: Deck generation complexity
**Initial assumption:** 12 decks needed (nouns/verbs × easy/intermediate/hard × en/it).
**Correction:** decks are pools, not presets. POS and difficulty are query filters applied
at runtime — not deck properties. Only 2 decks needed (en-core, it-core).
**Final decision:** skip deck generation entirely for MVP. Query the terms table directly
with difficulty + POS filters. Revisit post-MVP when spaced repetition or progression
features require curated pools.
- Problem 6: GAME_ROUNDS type conflict
**Problem:** `z.enum()` only accepts strings. `GAME_ROUNDS = ["3", "10"]` works with
`z.enum()` but requires `Number(rounds)` conversion in the service.
**Decision:** keep as strings, convert to number in the service before passing to the
model. Documented coupling acknowledged with a comment.
- Problem 7: Gloss join could multiply question rows. Schema allowed multiple glosses per term per language, so the left join would duplicate rows. Fixed by tightening the unique constraint.
- Problem 8: Model leaked quiz semantics. Return fields were named prompt / answer, baking HTTP-layer concepts into the database layer. Renamed to neutral field names.
- Problem 9: AnswerResult wasn't self-contained. Frontend needed selectedOptionId to render feedback but the schema didn't include it (reasoning was "client already knows"). Discovered during frontend work; added the field.
- Problem 10: Distractor could duplicate the correct answer text. Different terms can share the same translation. Fixed with ne(translations.text, excludeText) in the query.
- Problem 11: TypeScript strict mode flagged Fisher-Yates shuffle array access. noUncheckedIndexedAccess treats result[i] as T | undefined. Fixed with non-null assertion and temp variable pattern.
---
## Decisions Made
- Zod schemas belong in `packages/shared`
Both the API and frontend import from the same schemas. If the shape changes, TypeScript
compilation fails in both places simultaneously — silent drift is impossible.
- Server-side answer evaluation
The correct answer is never sent to the frontend in `QuizQuestion`. It is only revealed
in `AnswerResult` after the client submits. Prevents cheating and keeps game logic
authoritative on the server.
- `safeParse` over `parse` in controllers
`parse` throws a raw Zod error → ugly 500 response. `safeParse` returns a result object
→ clean 400 with early return. Global error handler to be implemented later (Step 6 of
roadmap) will centralise this pattern.
- POST not GET for game start
`GET` requests have no body. Game configuration is submitted as a JSON body → `POST` is
semantically correct.
- `express.json()` middleware required
Without it, `req.body` is `undefined`. Added to `createApp()` in `app.ts`.
- Type naming: PascalCase
TypeScript convention. `supportedLanguageCode``SupportedLanguageCode` etc.
- Primitive types: always lowercase
`number` not `Number`, `string` not `String`. The uppercase versions are object wrappers
and not assignable to Drizzle's expected primitive types.
- Model parameters use shared types, not `GameRequestType`
The model layer should not know about `GameRequestType` — that's an HTTP boundary concern.
Instead, parameters are typed using the derived constant types (`SupportedLanguageCode`,
`SupportedPos`, `DifficultyLevel`) exported from `packages/shared`.
- One gloss per term per language. The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent the left join from multiplying question rows. Revisit if multiple glosses per language are ever needed (e.g. register or domain variants).
- Model returns neutral field names, not quiz semantics. getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features.
- Asymmetric difficulty filter. Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned.
- optionId as integer 0-3, not UUID. Options only need uniqueness within a single question; cheating prevented by shuffling, not opaque IDs.
- questionId and sessionId as UUIDs. Globally unique, opaque, natural Valkey keys when storage moves later.
- gloss is string | null rather than optional, for predictable shape on the frontend.
- GameSessionStore stores only the answer key (questionId → correctOptionId). Minimal payload for easy Valkey migration.
- All GameSessionStore methods are async even for the in-memory implementation, so the service layer is already written for Valkey.
- Distractors fetched per-question (N+1 queries). Correct shape for the problem; 10 queries on local Postgres is negligible latency.
- No fallback logic for insufficient distractors. Data volumes are sufficient; strict query throws if something is genuinely broken.
- Distractor query excludes both the correct term ID and the correct answer text, preventing duplicate options from different terms with the same translation.
- Submit-before-send flow on frontend: user selects, then confirms. Prevents misclicks.
---
## Data Pipeline Work (Pre-API)
### CEFR Enrichment Pipeline (completed)
A staged ETL pipeline was built to enrich translation records with CEFR levels and
difficulty ratings:
```text
Raw source files
extract-*.py — normalise each source to standard JSON
compare-*.py — quality gate: surface conflicts between sources (read-only)
merge-*.py — resolve conflicts by source priority, derive difficulty
enrich.ts — write cefr_level + difficulty to DB translations table
```
**Source priority:**
- English: `en_m3` > `cefrj` > `octanove` > `random`
- Italian: `it_m3` > `italian`
**Enrichment results:**
| Language | Enriched | Total | Coverage |
| -------- | -------- | ------- | -------- |
| English | 42,527 | 171,394 | ~25% |
| Italian | 23,061 | 54,603 | ~42% |
Both languages have sufficient coverage for MVP. Italian C2 has only 242 terms — noted
as a potential constraint for the distractor algorithm at high difficulty.
---
## API Schemas (packages/shared)
### `GameRequestSchema`
```typescript
{
source_language: z.enum(SUPPORTED_LANGUAGE_CODES),
target_language: z.enum(SUPPORTED_LANGUAGE_CODES),
pos: z.enum(SUPPORTED_POS),
difficulty: z.enum(DIFFICULTY_LEVELS),
rounds: z.enum(GAME_ROUNDS),
}
```
AnswerOption: { optionId: number (0-3), text: string }
GameQuestion: { questionId: uuid, prompt: string, gloss: string | null, options: AnswerOption[4] }
GameSession: { sessionId: uuid, questions: GameQuestion[] }
AnswerSubmission: { sessionId: uuid, questionId: uuid, selectedOptionId: number (0-3) }
AnswerResult: { questionId: uuid, isCorrect: boolean, correctOptionId: number (0-3), selectedOptionId: number (0-3) }
---
## API Endpoints
```text
POST /api/v1/game/start GameRequest → QuizQuestion[]
POST /api/v1/game/answer AnswerSubmission → AnswerResult
```
---
## Current File Structure (apps/api)
```text
apps/api/src/
├── app.ts — Express app, express.json() middleware
├── server.ts — starts server on PORT
├── routes/
│ ├── apiRouter.ts — mounts /health and /game routers
│ ├── gameRouter.ts — POST /start → createGame controller
│ └── healthRouter.ts
├── controllers/
│ └── gameController.ts — validates GameRequest, calls service
└── services/
└── gameService.ts — calls getGameTerms, returns raw rows
```
---
## Current File Structure (packages/db)
```text
packages/db/src/
├── db/
│ └── schema.ts — Drizzle schema (terms, translations, users, decks...)
├── models/
│ └── termModel.ts — getGameTerms() query
└── index.ts — exports db connection + getGameTerms
```
---
## Completed Tasks
- [x] Layered architecture established and understood
- [x] `GameRequestSchema` defined in `packages/shared`
- [x] Derived types (`SupportedLanguageCode`, `SupportedPos`, `DifficultyLevel`) exported from constants
- [x] `getGameTerms()` model implemented with POS / language / difficulty / limit filters
- [x] Model correctly placed in `packages/db`
- [x] `prepareGameQuestions()` service skeleton calling the model
- [x] `createGame` controller with Zod `safeParse` validation
- [x] `POST /api/v1/game/start` route wired
- [x] End-to-end pipeline verified with test script — returns correct rows
- [x] CEFR enrichment pipeline complete for English and Italian
- [x] Double join on translations implemented (source + target language)
- [x] Gloss left join implemented
- [x] Model return type uses neutral field names (sourceText, targetText, sourceGloss)
- [x] Schema: gloss unique constraint tightened to one gloss per term per language
- [x] Zod schemas defined: AnswerOption, GameQuestion, GameSession, AnswerSubmission, AnswerResult
- [x] getDistractors model implemented with POS/difficulty/language/excludeTermId/excludeText filters
- [x] createGameSession service: fetches terms, fetches distractors per question, shuffles options, stores session, returns GameSession
- [x] evaluateAnswer service: looks up session, compares submitted optionId to stored correct answer, returns AnswerResult
- [x] GameSessionStore interface + InMemoryGameSessionStore (Map-backed, swappable to Valkey)
- [x] POST /api/v1/game/answer endpoint wired (route, controller, service)
- [x] selectedOptionId added to AnswerResult (discovered during frontend work)
- [x] Minimal frontend: /play route with settings UI, QuestionCard, OptionButton, ScoreScreen
- [x] Vite proxy configured for dev
---
## Roadmap Ahead
### Step 1 — Learn SQL fundamentals - done
Concepts needed: SELECT, FROM, JOIN, WHERE, LIMIT.
Resources: sqlzoo.net or Khan Academy SQL section.
Required before: implementing the double join for source language prompt.
### Step 2 — Complete the model layer - done
- Double join on `translations` — once for source language (prompt), once for target language (answer)
- `GlossModel.getGloss(termId, languageCode)` — fetch gloss if available
### Step 3 — Define remaining Zod schemas - done
- `QuizQuestion`, `QuizOption`, `AnswerSubmission`, `AnswerResult` in `packages/shared`
### Step 4 — Complete the service layer - done
- `QuizService.buildSession()` — assemble raw rows into `QuizQuestion[]`
- Generate `questionId` per question
- Map source language translation as prompt
- Attach gloss if available
- Fetch 3 distractors (same POS, different term, same difficulty)
- Shuffle options so correct answer is not always in same position
- `QuizService.evaluateAnswer()` — validate correctness, return `AnswerResult`
### Step 5 — Implement answer endpoint - done
- `POST /api/v1/game/answer` route, controller, service method
### Step 6 — Global error handler
- Typed error classes (`ValidationError`, `NotFoundError`)
- Central error middleware in `app.ts`
- Remove temporary `safeParse` error handling from controllers
### Step 7 — Tests
- Unit tests for `QuizService` — correct POS filtering, distractor never equals correct answer
- Unit tests for `evaluateAnswer` — correct and incorrect cases
- Integration tests for both endpoints
### Step 8 — Auth (Phase 2 from original roadmap)
- OpenAuth integration
- JWT validation middleware
- `GET /api/auth/me` endpoint
- Frontend auth guard
---
## Open Questions
- **Distractor algorithm:** when Italian C2 has only 242 terms, should the difficulty
filter fall back gracefully or return an error? Decision needed before implementing
`buildSession()`. => resolved
- **Session statefulness:** game loop is currently stateless (fetch all questions upfront).
Confirm this is still the intended MVP approach before building `buildSession()`. => resolved
- **Glosses can leak answers:** some WordNet glosses contain the target-language
word in the definition text (e.g. "Padre" appearing in the English gloss for
"father"). Address during the post-MVP data enrichment pass — either clean the
glosses, replace them with custom definitions, or filter at the service layer. => resolved