lila/documentation/api-development.md

# Glossa — Architecture & API Development Summary

A record of all architectural discussions, decisions, and outcomes from the initial
API design through the quiz model implementation.

---

## Project Overview

Glossa is a vocabulary trainer (Duolingo-style) built as a pnpm monorepo. Users see a
word and pick from 4 possible translations. Supports singleplayer and multiplayer.
Stack: Express API, React frontend, Drizzle ORM, Postgres, Valkey, WebSockets.

---

## Architectural Foundation

### The Layered Architecture

The core mental model established for the entire API:

```text
HTTP Request
     ↓
  Router        — maps URL + HTTP method to a controller
     ↓
 Controller     — handles HTTP only: validates input, calls service, sends response
     ↓
  Service       — business logic only: no HTTP, no direct DB access
     ↓
  Model         — database queries only: no business logic
     ↓
  Database
```

**The rule:** each layer only talks to the layer directly below it. A controller never
touches the database. A service never reads `req.body`. A model never knows what a quiz is.

### Monorepo Package Responsibilities

| Package           | Owns                                                     |
| ----------------- | -------------------------------------------------------- |
| `packages/shared` | Zod schemas, constants, derived TypeScript types         |
| `packages/db`     | Drizzle schema, DB connection, all model/query functions |
| `apps/api`        | Router, controllers, services                            |
| `apps/web`        | React frontend, consumes types from shared               |

**Key principle:** all database code lives in `packages/db`. `apps/api` never imports
`drizzle-orm` for queries — it only calls functions exported from `packages/db`.

---

## Problems Faced & Solutions

- Problem 1: Messy API structure
  **Symptom:** responsibilities bleeding across layers — DB code in controllers, business
  logic in routes.
  **Solution:** strict layered architecture with one responsibility per layer.
- Problem 2: No shared contract between API and frontend
  **Symptom:** API could return different shapes silently, frontend breaks at runtime.
  **Solution:** Zod schemas in `packages/shared` as the single source of truth. Both API
  (validation) and frontend (type inference) consume the same schemas.
- Problem 3: Type safety gaps
  **Symptom:** TypeScript `any` types on model parameters, `Number` vs `number` confusion.
  **Solution:** derived types from constants using `typeof CONSTANT[number]` pattern.
  All valid values defined once in constants, types derived automatically.
- Problem 4: `getGameTerms` in wrong package
  **Symptom:** model queries living in `apps/api/src/models/` meant `apps/api` had a
  direct `drizzle-orm` dependency and was accessing the DB itself.
  **Solution:** moved models folder to `packages/db/src/models/`. All Drizzle code now
  lives in one package.
- Problem 5: Deck generation complexity
  **Initial assumption:** 12 decks needed (nouns/verbs × easy/intermediate/hard × en/it).
  **Correction:** decks are pools, not presets. POS and difficulty are query filters applied
  at runtime — not deck properties. Only 2 decks needed (en-core, it-core).
  **Final decision:** skip deck generation entirely for MVP. Query the terms table directly
  with difficulty + POS filters. Revisit post-MVP when spaced repetition or progression
  features require curated pools.
- Problem 6: GAME_ROUNDS type conflict
  **Problem:** `z.enum()` only accepts strings. `GAME_ROUNDS = ["3", "10"]` works with
  `z.enum()` but requires `Number(rounds)` conversion in the service.
  **Decision:** keep as strings, convert to number in the service before passing to the
  model. Documented coupling acknowledged with a comment.
- Problem 7: Gloss join could multiply question rows. Schema allowed multiple glosses per term per language, so the left join would duplicate rows. Fixed by tightening the unique constraint.
- Problem 8: Model leaked quiz semantics. Return fields were named prompt / answer, baking HTTP-layer concepts into the database layer. Renamed to neutral field names.

---

## Decisions Made

- Zod schemas belong in `packages/shared`
  Both the API and frontend import from the same schemas. If the shape changes, TypeScript
  compilation fails in both places simultaneously — silent drift is impossible.
- Server-side answer evaluation
  The correct answer is never sent to the frontend in `QuizQuestion`. It is only revealed
  in `AnswerResult` after the client submits. Prevents cheating and keeps game logic
  authoritative on the server.
- `safeParse` over `parse` in controllers
  `parse` throws a raw Zod error → ugly 500 response. `safeParse` returns a result object
  → clean 400 with early return. Global error handler to be implemented later (Step 6 of
  roadmap) will centralise this pattern.
- POST not GET for game start
  `GET` requests have no body. Game configuration is submitted as a JSON body → `POST` is
  semantically correct.
- `express.json()` middleware required
  Without it, `req.body` is `undefined`. Added to `createApp()` in `app.ts`.
- Type naming: PascalCase
  TypeScript convention. `supportedLanguageCode` → `SupportedLanguageCode` etc.
- Primitive types: always lowercase
  `number` not `Number`, `string` not `String`. The uppercase versions are object wrappers
  and not assignable to Drizzle's expected primitive types.
- Model parameters use shared types, not `GameRequestType`
  The model layer should not know about `GameRequestType` — that's an HTTP boundary concern.
  Instead, parameters are typed using the derived constant types (`SupportedLanguageCode`,
  `SupportedPos`, `DifficultyLevel`) exported from `packages/shared`.
- One gloss per term per language. The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent the left join from multiplying question rows. Revisit if multiple glosses per language are ever needed (e.g. register or domain variants).
- Model returns neutral field names, not quiz semantics. getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features.
- Asymmetric difficulty filter. Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned.

---

## Data Pipeline Work (Pre-API)

### CEFR Enrichment Pipeline (completed)

A staged ETL pipeline was built to enrich translation records with CEFR levels and
difficulty ratings:

```text
Raw source files
      ↓
extract-*.py      — normalise each source to standard JSON
      ↓
compare-*.py      — quality gate: surface conflicts between sources (read-only)
      ↓
merge-*.py        — resolve conflicts by source priority, derive difficulty
      ↓
enrich.ts         — write cefr_level + difficulty to DB translations table
```

**Source priority:**

- English: `en_m3` > `cefrj` > `octanove` > `random`
- Italian: `it_m3` > `italian`

**Enrichment results:**

| Language | Enriched | Total   | Coverage |
| -------- | -------- | ------- | -------- |
| English  | 42,527   | 171,394 | ~25%     |
| Italian  | 23,061   | 54,603  | ~42%     |

Both languages have sufficient coverage for MVP. Italian C2 has only 242 terms — noted
as a potential constraint for the distractor algorithm at high difficulty.

---

## API Schemas (packages/shared)

### `GameRequestSchema` (implemented)

```typescript
{
  source_language: z.enum(SUPPORTED_LANGUAGE_CODES),
  target_language: z.enum(SUPPORTED_LANGUAGE_CODES),
  pos: z.enum(SUPPORTED_POS),
  difficulty: z.enum(DIFFICULTY_LEVELS),
  rounds: z.enum(GAME_ROUNDS),
}
```

### Planned schemas (not yet implemented)

```text
QuizQuestion      — prompt, optional gloss, 4 options (no correct answer)
QuizOption        — optionId + text
AnswerSubmission  — questionId + selectedOptionId
AnswerResult      — correct boolean, correctOptionId, selectedOptionId
```

---

## API Endpoints

```text
POST /api/v1/game/start     GameRequest → QuizQuestion[]
POST /api/v1/game/answer    AnswerSubmission → AnswerResult
```

---

## Current File Structure (apps/api)

```text
apps/api/src/
├── app.ts                  — Express app, express.json() middleware
├── server.ts               — starts server on PORT
├── routes/
│   ├── apiRouter.ts        — mounts /health and /game routers
│   ├── gameRouter.ts       — POST /start → createGame controller
│   └── healthRouter.ts
├── controllers/
│   └── gameController.ts   — validates GameRequest, calls service
└── services/
    └── gameService.ts      — calls getGameTerms, returns raw rows
```

---

## Current File Structure (packages/db)

```text
packages/db/src/
├── db/
│   └── schema.ts           — Drizzle schema (terms, translations, users, decks...)
├── models/
│   └── termModel.ts        — getGameTerms() query
└── index.ts                — exports db connection + getGameTerms
```

---

## Completed Tasks

- [x] Layered architecture established and understood
- [x] `GameRequestSchema` defined in `packages/shared`
- [x] Derived types (`SupportedLanguageCode`, `SupportedPos`, `DifficultyLevel`) exported from constants
- [x] `getGameTerms()` model implemented with POS / language / difficulty / limit filters
- [x] Model correctly placed in `packages/db`
- [x] `prepareGameQuestions()` service skeleton calling the model
- [x] `createGame` controller with Zod `safeParse` validation
- [x] `POST /api/v1/game/start` route wired
- [x] End-to-end pipeline verified with test script — returns correct rows
- [x] CEFR enrichment pipeline complete for English and Italian
- [x] Double join on translations implemented (source + target language)
- [x] Gloss left join implemented
- [x] Model return type uses neutral field names (sourceText, targetText, sourceGloss)
- [x] Schema: gloss unique constraint tightened to one gloss per term per language

---

## Roadmap Ahead

### Step 1 — Learn SQL fundamentals (in progress)

Concepts needed: SELECT, FROM, JOIN, WHERE, LIMIT.
Resources: sqlzoo.net or Khan Academy SQL section.
Required before: implementing the double join for source language prompt.

### Step 2 — Complete the model layer

- Double join on `translations` — once for source language (prompt), once for target language (answer)
- `GlossModel.getGloss(termId, languageCode)` — fetch gloss if available

### Step 3 — Define remaining Zod schemas

- `QuizQuestion`, `QuizOption`, `AnswerSubmission`, `AnswerResult` in `packages/shared`

### Step 4 — Complete the service layer

- `QuizService.buildSession()` — assemble raw rows into `QuizQuestion[]`
  - Generate `questionId` per question
  - Map source language translation as prompt
  - Attach gloss if available
  - Fetch 3 distractors (same POS, different term, same difficulty)
  - Shuffle options so correct answer is not always in same position
- `QuizService.evaluateAnswer()` — validate correctness, return `AnswerResult`

### Step 5 — Implement answer endpoint

- `POST /api/v1/game/answer` route, controller, service method

### Step 6 — Global error handler

- Typed error classes (`ValidationError`, `NotFoundError`)
- Central error middleware in `app.ts`
- Remove temporary `safeParse` error handling from controllers

### Step 7 — Tests

- Unit tests for `QuizService` — correct POS filtering, distractor never equals correct answer
- Unit tests for `evaluateAnswer` — correct and incorrect cases
- Integration tests for both endpoints

### Step 8 — Auth (Phase 2 from original roadmap)

- OpenAuth integration
- JWT validation middleware
- `GET /api/auth/me` endpoint
- Frontend auth guard

---

## Open Questions

- **Distractor algorithm:** when Italian C2 has only 242 terms, should the difficulty
  filter fall back gracefully or return an error? Decision needed before implementing
  `buildSession()`.
- **Session statefulness:** game loop is currently stateless (fetch all questions upfront).
  Confirm this is still the intended MVP approach before building `buildSession()`.