feat(api): implement game terms query with double join
- Add double join on translations for source/target languages - Left join term_glosses for optional source-language glosses - Filter difficulty on target side only (intentionally asymmetric: a word's difficulty can differ between languages, and what matters is the difficulty of the word being learned) - Return neutral field names (sourceText, targetText, sourceGloss) instead of quiz semantics; service layer maps to prompt/answer - Tighten term_glosses unique constraint to (term_id, language_code) to prevent the left join from multiplying question rows - Add TODO for ORDER BY RANDOM() scaling post-MVP
This commit is contained in:
parent
9fc3ba375a
commit
b59fac493d
4 changed files with 356 additions and 28 deletions
288
documentation/api-development.md
Normal file
288
documentation/api-development.md
Normal file
|
|
@ -0,0 +1,288 @@
|
||||||
|
# Glossa — Architecture & API Development Summary
|
||||||
|
|
||||||
|
A record of all architectural discussions, decisions, and outcomes from the initial
|
||||||
|
API design through the quiz model implementation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
Glossa is a vocabulary trainer (Duolingo-style) built as a pnpm monorepo. Users see a
|
||||||
|
word and pick from 4 possible translations. Supports singleplayer and multiplayer.
|
||||||
|
Stack: Express API, React frontend, Drizzle ORM, Postgres, Valkey, WebSockets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architectural Foundation
|
||||||
|
|
||||||
|
### The Layered Architecture
|
||||||
|
|
||||||
|
The core mental model established for the entire API:
|
||||||
|
|
||||||
|
```
|
||||||
|
HTTP Request
|
||||||
|
↓
|
||||||
|
Router — maps URL + HTTP method to a controller
|
||||||
|
↓
|
||||||
|
Controller — handles HTTP only: validates input, calls service, sends response
|
||||||
|
↓
|
||||||
|
Service — business logic only: no HTTP, no direct DB access
|
||||||
|
↓
|
||||||
|
Model — database queries only: no business logic
|
||||||
|
↓
|
||||||
|
Database
|
||||||
|
```
|
||||||
|
|
||||||
|
**The rule:** each layer only talks to the layer directly below it. A controller never
|
||||||
|
touches the database. A service never reads `req.body`. A model never knows what a quiz is.
|
||||||
|
|
||||||
|
### Monorepo Package Responsibilities
|
||||||
|
|
||||||
|
| Package | Owns |
|
||||||
|
|---------|------|
|
||||||
|
| `packages/shared` | Zod schemas, constants, derived TypeScript types |
|
||||||
|
| `packages/db` | Drizzle schema, DB connection, all model/query functions |
|
||||||
|
| `apps/api` | Router, controllers, services |
|
||||||
|
| `apps/web` | React frontend, consumes types from shared |
|
||||||
|
|
||||||
|
**Key principle:** all database code lives in `packages/db`. `apps/api` never imports
|
||||||
|
`drizzle-orm` for queries — it only calls functions exported from `packages/db`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problems Faced & Solutions
|
||||||
|
|
||||||
|
- Problem 1: Messy API structure
|
||||||
|
**Symptom:** responsibilities bleeding across layers — DB code in controllers, business
|
||||||
|
logic in routes.
|
||||||
|
**Solution:** strict layered architecture with one responsibility per layer.
|
||||||
|
- Problem 2: No shared contract between API and frontend
|
||||||
|
**Symptom:** API could return different shapes silently, frontend breaks at runtime.
|
||||||
|
**Solution:** Zod schemas in `packages/shared` as the single source of truth. Both API
|
||||||
|
(validation) and frontend (type inference) consume the same schemas.
|
||||||
|
- Problem 3: Type safety gaps
|
||||||
|
**Symptom:** TypeScript `any` types on model parameters, `Number` vs `number` confusion.
|
||||||
|
**Solution:** derived types from constants using `typeof CONSTANT[number]` pattern.
|
||||||
|
All valid values defined once in constants, types derived automatically.
|
||||||
|
- Problem 4: `getGameTerms` in wrong package
|
||||||
|
**Symptom:** model queries living in `apps/api/src/models/` meant `apps/api` had a
|
||||||
|
direct `drizzle-orm` dependency and was accessing the DB itself.
|
||||||
|
**Solution:** moved models folder to `packages/db/src/models/`. All Drizzle code now
|
||||||
|
lives in one package.
|
||||||
|
- Problem 5: Deck generation complexity
|
||||||
|
**Initial assumption:** 12 decks needed (nouns/verbs × easy/intermediate/hard × en/it).
|
||||||
|
**Correction:** decks are pools, not presets. POS and difficulty are query filters applied
|
||||||
|
at runtime — not deck properties. Only 2 decks needed (en-core, it-core).
|
||||||
|
**Final decision:** skip deck generation entirely for MVP. Query the terms table directly
|
||||||
|
with difficulty + POS filters. Revisit post-MVP when spaced repetition or progression
|
||||||
|
features require curated pools.
|
||||||
|
- Problem 6: GAME_ROUNDS type conflict
|
||||||
|
**Problem:** `z.enum()` only accepts strings. `GAME_ROUNDS = ["3", "10"]` works with
|
||||||
|
`z.enum()` but requires `Number(rounds)` conversion in the service.
|
||||||
|
**Decision:** keep as strings, convert to number in the service before passing to the
|
||||||
|
model. Documented coupling acknowledged with a comment.
|
||||||
|
- Problem 7: Gloss join could multiply question rows. Schema allowed multiple glosses per term per language, so the left join would duplicate rows. Fixed by tightening the unique constraint.
|
||||||
|
- Problem 8: Model leaked quiz semantics. Return fields were named prompt / answer, baking HTTP-layer concepts into the database layer. Renamed to neutral field names.
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- Zod schemas belong in `packages/shared`
|
||||||
|
Both the API and frontend import from the same schemas. If the shape changes, TypeScript
|
||||||
|
compilation fails in both places simultaneously — silent drift is impossible.
|
||||||
|
- Server-side answer evaluation
|
||||||
|
The correct answer is never sent to the frontend in `QuizQuestion`. It is only revealed
|
||||||
|
in `AnswerResult` after the client submits. Prevents cheating and keeps game logic
|
||||||
|
authoritative on the server.
|
||||||
|
- `safeParse` over `parse` in controllers
|
||||||
|
`parse` throws a raw Zod error → ugly 500 response. `safeParse` returns a result object
|
||||||
|
→ clean 400 with early return. Global error handler to be implemented later (Step 6 of
|
||||||
|
roadmap) will centralise this pattern.
|
||||||
|
- POST not GET for game start
|
||||||
|
`GET` requests have no body. Game configuration is submitted as a JSON body → `POST` is
|
||||||
|
semantically correct.
|
||||||
|
- `express.json()` middleware required
|
||||||
|
Without it, `req.body` is `undefined`. Added to `createApp()` in `app.ts`.
|
||||||
|
- Type naming: PascalCase
|
||||||
|
TypeScript convention. `supportedLanguageCode` → `SupportedLanguageCode` etc.
|
||||||
|
- Primitive types: always lowercase
|
||||||
|
`number` not `Number`, `string` not `String`. The uppercase versions are object wrappers
|
||||||
|
and not assignable to Drizzle's expected primitive types.
|
||||||
|
- Model parameters use shared types, not `GameRequestType`
|
||||||
|
The model layer should not know about `GameRequestType` — that's an HTTP boundary concern.
|
||||||
|
Instead, parameters are typed using the derived constant types (`SupportedLanguageCode`,
|
||||||
|
`SupportedPos`, `DifficultyLevel`) exported from `packages/shared`.
|
||||||
|
- One gloss per term per language. The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent the left join from multiplying question rows. Revisit if multiple glosses per language are ever needed (e.g. register or domain variants).
|
||||||
|
- Model returns neutral field names, not quiz semantics. getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features.
|
||||||
|
- Asymmetric difficulty filter. Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Pipeline Work (Pre-API)
|
||||||
|
|
||||||
|
### CEFR Enrichment Pipeline (completed)
|
||||||
|
|
||||||
|
A staged ETL pipeline was built to enrich translation records with CEFR levels and
|
||||||
|
difficulty ratings:
|
||||||
|
|
||||||
|
```
|
||||||
|
Raw source files
|
||||||
|
↓
|
||||||
|
extract-*.py — normalise each source to standard JSON
|
||||||
|
↓
|
||||||
|
compare-*.py — quality gate: surface conflicts between sources (read-only)
|
||||||
|
↓
|
||||||
|
merge-*.py — resolve conflicts by source priority, derive difficulty
|
||||||
|
↓
|
||||||
|
enrich.ts — write cefr_level + difficulty to DB translations table
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source priority:**
|
||||||
|
- English: `en_m3` > `cefrj` > `octanove` > `random`
|
||||||
|
- Italian: `it_m3` > `italian`
|
||||||
|
|
||||||
|
**Enrichment results:**
|
||||||
|
|
||||||
|
| Language | Enriched | Total | Coverage |
|
||||||
|
|----------|----------|-------|----------|
|
||||||
|
| English | 42,527 | 171,394 | ~25% |
|
||||||
|
| Italian | 23,061 | 54,603 | ~42% |
|
||||||
|
|
||||||
|
Both languages have sufficient coverage for MVP. Italian C2 has only 242 terms — noted
|
||||||
|
as a potential constraint for the distractor algorithm at high difficulty.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Schemas (packages/shared)
|
||||||
|
|
||||||
|
### `GameRequestSchema` (implemented)
|
||||||
|
```typescript
|
||||||
|
{
|
||||||
|
source_language: z.enum(SUPPORTED_LANGUAGE_CODES),
|
||||||
|
target_language: z.enum(SUPPORTED_LANGUAGE_CODES),
|
||||||
|
pos: z.enum(SUPPORTED_POS),
|
||||||
|
difficulty: z.enum(DIFFICULTY_LEVELS),
|
||||||
|
rounds: z.enum(GAME_ROUNDS),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Planned schemas (not yet implemented)
|
||||||
|
```
|
||||||
|
QuizQuestion — prompt, optional gloss, 4 options (no correct answer)
|
||||||
|
QuizOption — optionId + text
|
||||||
|
AnswerSubmission — questionId + selectedOptionId
|
||||||
|
AnswerResult — correct boolean, correctOptionId, selectedOptionId
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/v1/game/start GameRequest → QuizQuestion[]
|
||||||
|
POST /api/v1/game/answer AnswerSubmission → AnswerResult
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current File Structure (apps/api)
|
||||||
|
|
||||||
|
```
|
||||||
|
apps/api/src/
|
||||||
|
├── app.ts — Express app, express.json() middleware
|
||||||
|
├── server.ts — starts server on PORT
|
||||||
|
├── routes/
|
||||||
|
│ ├── apiRouter.ts — mounts /health and /game routers
|
||||||
|
│ ├── gameRouter.ts — POST /start → createGame controller
|
||||||
|
│ └── healthRouter.ts
|
||||||
|
├── controllers/
|
||||||
|
│ └── gameController.ts — validates GameRequest, calls service
|
||||||
|
└── services/
|
||||||
|
└── gameService.ts — calls getGameTerms, returns raw rows
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current File Structure (packages/db)
|
||||||
|
|
||||||
|
```
|
||||||
|
packages/db/src/
|
||||||
|
├── db/
|
||||||
|
│ └── schema.ts — Drizzle schema (terms, translations, users, decks...)
|
||||||
|
├── models/
|
||||||
|
│ └── termModel.ts — getGameTerms() query
|
||||||
|
└── index.ts — exports db connection + getGameTerms
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Completed Tasks
|
||||||
|
|
||||||
|
- [x] Layered architecture established and understood
|
||||||
|
- [x] `GameRequestSchema` defined in `packages/shared`
|
||||||
|
- [x] Derived types (`SupportedLanguageCode`, `SupportedPos`, `DifficultyLevel`) exported from constants
|
||||||
|
- [x] `getGameTerms()` model implemented with POS / language / difficulty / limit filters
|
||||||
|
- [x] Model correctly placed in `packages/db`
|
||||||
|
- [x] `prepareGameQuestions()` service skeleton calling the model
|
||||||
|
- [x] `createGame` controller with Zod `safeParse` validation
|
||||||
|
- [x] `POST /api/v1/game/start` route wired
|
||||||
|
- [x] End-to-end pipeline verified with test script — returns correct rows
|
||||||
|
- [x] CEFR enrichment pipeline complete for English and Italian
|
||||||
|
- [x] Double join on translations implemented (source + target language)
|
||||||
|
- [x] Gloss left join implemented
|
||||||
|
- [x] Model return type uses neutral field names (sourceText, targetText, sourceGloss)
|
||||||
|
- [x] Schema: gloss unique constraint tightened to one gloss per term per language
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Roadmap Ahead
|
||||||
|
|
||||||
|
### Step 1 — Learn SQL fundamentals (in progress)
|
||||||
|
Concepts needed: SELECT, FROM, JOIN, WHERE, LIMIT.
|
||||||
|
Resources: sqlzoo.net or Khan Academy SQL section.
|
||||||
|
Required before: implementing the double join for source language prompt.
|
||||||
|
|
||||||
|
### Step 2 — Complete the model layer
|
||||||
|
- Double join on `translations` — once for source language (prompt), once for target language (answer)
|
||||||
|
- `GlossModel.getGloss(termId, languageCode)` — fetch gloss if available
|
||||||
|
|
||||||
|
### Step 3 — Define remaining Zod schemas
|
||||||
|
- `QuizQuestion`, `QuizOption`, `AnswerSubmission`, `AnswerResult` in `packages/shared`
|
||||||
|
|
||||||
|
### Step 4 — Complete the service layer
|
||||||
|
- `QuizService.buildSession()` — assemble raw rows into `QuizQuestion[]`
|
||||||
|
- Generate `questionId` per question
|
||||||
|
- Map source language translation as prompt
|
||||||
|
- Attach gloss if available
|
||||||
|
- Fetch 3 distractors (same POS, different term, same difficulty)
|
||||||
|
- Shuffle options so correct answer is not always in same position
|
||||||
|
- `QuizService.evaluateAnswer()` — validate correctness, return `AnswerResult`
|
||||||
|
|
||||||
|
### Step 5 — Implement answer endpoint
|
||||||
|
- `POST /api/v1/game/answer` route, controller, service method
|
||||||
|
|
||||||
|
### Step 6 — Global error handler
|
||||||
|
- Typed error classes (`ValidationError`, `NotFoundError`)
|
||||||
|
- Central error middleware in `app.ts`
|
||||||
|
- Remove temporary `safeParse` error handling from controllers
|
||||||
|
|
||||||
|
### Step 7 — Tests
|
||||||
|
- Unit tests for `QuizService` — correct POS filtering, distractor never equals correct answer
|
||||||
|
- Unit tests for `evaluateAnswer` — correct and incorrect cases
|
||||||
|
- Integration tests for both endpoints
|
||||||
|
|
||||||
|
### Step 8 — Auth (Phase 2 from original roadmap)
|
||||||
|
- OpenAuth integration
|
||||||
|
- JWT validation middleware
|
||||||
|
- `GET /api/auth/me` endpoint
|
||||||
|
- Frontend auth guard
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- **Distractor algorithm:** when Italian C2 has only 242 terms, should the difficulty
|
||||||
|
filter fall back gracefully or return an error? Decision needed before implementing
|
||||||
|
`buildSession()`.
|
||||||
|
- **Session statefulness:** game loop is currently stateless (fetch all questions upfront).
|
||||||
|
Confirm this is still the intended MVP approach before building `buildSession()`.
|
||||||
|
|
@ -5,6 +5,15 @@
|
||||||
- pinning dependencies in package.json files
|
- pinning dependencies in package.json files
|
||||||
- rethink organisation of datafiles and wordlists
|
- rethink organisation of datafiles and wordlists
|
||||||
|
|
||||||
|
## notes
|
||||||
|
|
||||||
|
- backend advice: https://github.com/MohdOwaisShah/backend
|
||||||
|
- openapi
|
||||||
|
- bruno for api testing
|
||||||
|
- tailscale
|
||||||
|
- husky/lint-staged
|
||||||
|
- musicforprogramming.net
|
||||||
|
|
||||||
## openwordnet
|
## openwordnet
|
||||||
|
|
||||||
download libraries via
|
download libraries via
|
||||||
|
|
@ -44,17 +53,3 @@ list all libraries:
|
||||||
```bash
|
```bash
|
||||||
python -c "import wn; print(wn.lexicons())"
|
python -c "import wn; print(wn.lexicons())"
|
||||||
```
|
```
|
||||||
|
|
||||||
## drizzle
|
|
||||||
|
|
||||||
generate migration file, go to packages/db, then:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pnpm drizzle-kit generate
|
|
||||||
```
|
|
||||||
|
|
||||||
execute migration, go to packages/db (docker containers need to be running):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATABASE_URL=postgresql://username:password@localhost:5432/database pnpm drizzle-kit migrate
|
|
||||||
```
|
|
||||||
|
|
|
||||||
|
|
@ -51,11 +51,7 @@ export const term_glosses = pgTable(
|
||||||
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
|
created_at: timestamp({ withTimezone: true }).defaultNow().notNull(),
|
||||||
},
|
},
|
||||||
(table) => [
|
(table) => [
|
||||||
unique("unique_term_gloss").on(
|
unique("unique_term_gloss").on(table.term_id, table.language_code),
|
||||||
table.term_id,
|
|
||||||
table.language_code,
|
|
||||||
table.text,
|
|
||||||
),
|
|
||||||
check(
|
check(
|
||||||
"language_code_check",
|
"language_code_check",
|
||||||
sql`${table.language_code} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`,
|
sql`${table.language_code} IN (${sql.raw(SUPPORTED_LANGUAGE_CODES.map((l) => `'${l}'`).join(", "))})`,
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
import { db } from "@glossa/db";
|
import { db } from "@glossa/db";
|
||||||
import { eq, and } from "drizzle-orm";
|
import { eq, and, isNotNull, sql } from "drizzle-orm";
|
||||||
import { terms, translations } from "@glossa/db/schema";
|
import { terms, translations, term_glosses } from "@glossa/db/schema";
|
||||||
|
import { alias } from "drizzle-orm/pg-core";
|
||||||
|
|
||||||
import type {
|
import type {
|
||||||
SupportedLanguageCode,
|
SupportedLanguageCode,
|
||||||
|
|
@ -8,25 +9,73 @@ import type {
|
||||||
DifficultyLevel,
|
DifficultyLevel,
|
||||||
} from "@glossa/shared";
|
} from "@glossa/shared";
|
||||||
|
|
||||||
|
export type TranslationPairRow = {
|
||||||
|
termId: string;
|
||||||
|
sourceText: string;
|
||||||
|
targetText: string;
|
||||||
|
sourceGloss: string | null;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Note: difficulty filter is intentionally asymmetric. We filter on the target
|
||||||
|
// (answer) side only — a word can be A2 in Italian but B1 in English, and what
|
||||||
|
// matters for the learner is the difficulty of the word they're being taught.
|
||||||
|
|
||||||
export const getGameTerms = async (
|
export const getGameTerms = async (
|
||||||
sourceLanguage: SupportedLanguageCode,
|
sourceLanguage: SupportedLanguageCode,
|
||||||
targetLanguage: SupportedLanguageCode,
|
targetLanguage: SupportedLanguageCode,
|
||||||
pos: SupportedPos,
|
pos: SupportedPos,
|
||||||
difficulty: DifficultyLevel,
|
difficulty: DifficultyLevel,
|
||||||
count: number,
|
rounds: number,
|
||||||
) => {
|
): Promise<TranslationPairRow[]> => {
|
||||||
|
const sourceTranslations = alias(translations, "source_translations");
|
||||||
|
const targetTranslations = alias(translations, "target_translations");
|
||||||
|
|
||||||
const rows = await db
|
const rows = await db
|
||||||
.select()
|
.select({
|
||||||
|
termId: terms.id,
|
||||||
|
prompt: sourceTranslations.text,
|
||||||
|
answer: targetTranslations.text,
|
||||||
|
gloss: term_glosses.text,
|
||||||
|
})
|
||||||
.from(terms)
|
.from(terms)
|
||||||
.innerJoin(translations, eq(translations.term_id, terms.id))
|
.innerJoin(
|
||||||
|
sourceTranslations,
|
||||||
|
and(
|
||||||
|
eq(sourceTranslations.term_id, terms.id),
|
||||||
|
eq(sourceTranslations.language_code, sourceLanguage), // Filter here!
|
||||||
|
),
|
||||||
|
)
|
||||||
|
.innerJoin(
|
||||||
|
targetTranslations,
|
||||||
|
and(
|
||||||
|
eq(targetTranslations.term_id, terms.id),
|
||||||
|
eq(targetTranslations.language_code, targetLanguage), // Filter here!
|
||||||
|
),
|
||||||
|
)
|
||||||
|
.leftJoin(
|
||||||
|
term_glosses,
|
||||||
|
and(
|
||||||
|
eq(term_glosses.term_id, terms.id),
|
||||||
|
eq(term_glosses.language_code, sourceLanguage),
|
||||||
|
),
|
||||||
|
)
|
||||||
.where(
|
.where(
|
||||||
and(
|
and(
|
||||||
eq(terms.pos, pos),
|
eq(terms.pos, pos),
|
||||||
eq(translations.language_code, targetLanguage),
|
eq(targetTranslations.difficulty, difficulty),
|
||||||
eq(translations.difficulty, difficulty),
|
isNotNull(sourceTranslations.difficulty), // Good data quality check!
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
.limit(count);
|
// TODO(post-mvp): ORDER BY RANDOM() sorts the entire filtered result set before
|
||||||
|
// applying LIMIT, which is fine at current data volumes (low thousands of rows
|
||||||
|
// after POS + difficulty filters) but degrades as the terms table grows. Once
|
||||||
|
// the database is fully populated and tagged, replace with one of:
|
||||||
|
// - TABLESAMPLE BERNOULLI(n) for approximate sampling on large tables
|
||||||
|
// - Random offset: SELECT ... OFFSET floor(random() * (SELECT count(*) ...))
|
||||||
|
// - Pre-computed random column with a btree index, reshuffled periodically
|
||||||
|
// Benchmark first — don't optimise until it actually hurts.
|
||||||
|
.orderBy(sql`RANDOM()`)
|
||||||
|
.limit(rounds);
|
||||||
|
|
||||||
return rows;
|
return rows;
|
||||||
};
|
};
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue