lila/documentation/decisions.md
2026-03-31 10:06:06 +02:00

237 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Decisions Log
A record of non-obvious technical decisions made during development, with reasoning. Intended to preserve context across sessions.
---
## Tooling
### Monorepo: pnpm workspaces (not Turborepo)
Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, plain pnpm workspace commands are sufficient and there is one less tool to configure and maintain.
### TypeScript runner: `tsx` (not `ts-node`)
`tsx` is faster, requires no configuration, and uses esbuild under the hood. `ts-node` is older and more complex to configure. `tsx` does not do type checking — that is handled separately by `tsc` and the editor. Installed as a dev dependency in `apps/api` only.
### ORM: Drizzle (not Prisma)
Drizzle is lighter — no binary, no engine. Queries map closely to SQL. Migrations are plain SQL files. Works naturally with Zod for type inference. Prisma would add Docker complexity (engine binary in containers) and abstraction that is not needed for this schema.
### WebSocket: `ws` library (not Socket.io)
For rooms of 24 players, Socket.io's room management, transport fallbacks, and reconnection abstractions are unnecessary overhead. The WS protocol is defined explicitly as a Zod discriminated union in `packages/shared`, giving the same type safety guarantees. Reconnection logic is deferred to Phase 7.
### Auth: OpenAuth (not rolling own JWT)
All auth delegated to OpenAuth service at `auth.yourdomain.com`. Providers: Google, GitHub. The API validates the JWT on every protected request. User rows are created or updated on first login via the `sub` claim as the primary key.
---
## Docker
### Multi-stage builds for monorepo context
Both `apps/web` and `apps/api` use multi-stage Dockerfiles (`deps`, `dev`, `builder`, `runner`) because:
- The monorepo structure requires copying `pnpm-workspace.yaml`, root `package.json`, and cross-dependencies (`packages/shared`, `packages/db`) before installing
- `node_modules` paths differ between host and container due to workspace hoisting
- Stages allow caching `pnpm install` separately from source code changes
### Vite as dev server (not Nginx)
In development, `apps/web` uses `vite dev` directly, not Nginx. Reasons:
- Hot Module Replacement (HMR) requires Vite's WebSocket dev server
- Source maps and error overlay need direct Vite integration
- Nginx would add unnecessary proxy complexity for local dev
Production will use Nginx to serve static Vite build output.
---
## Architecture
### Express app structure: factory function pattern
`app.ts` exports a `createApp()` factory function. `server.ts` imports it and calls `.listen()`. This allows tests to import the app directly without starting a server, keeping tests isolated and fast.
### Data model: `decks` separate from `terms` (not frequency_rank filtering)
**Original approach:** Store `frequency_rank` on `terms` table and filter by rank range for difficulty.
**Problem discovered:** WordNet/OMW frequency data is unreliable for language learning. Extraction produced results like:
- Rank 1: "In" → "indio" (chemical symbol: Indium)
- Rank 2: "Be" → "berillio" (chemical symbol: Beryllium)
- Rank 7: "He" → "elio" (chemical symbol: Helium)
These are technically "common" in WordNet (every element is a noun) but useless for vocabulary learning.
**Decision:**
- `terms` table stores ALL available OMW synsets (raw data, no frequency filtering)
- `decks` table stores curated learning lists (A1, A2, B1, "Most Common 1000", etc.)
- `deck_terms` junction table links terms to decks with position ordering
- `rooms.deck_id` specifies which vocabulary deck a game uses
**Benefits:**
- Curricula can come from external sources (CEFR lists, Oxford 3000, SUBTLEX)
- Bad data (chemical symbols, obscure words) excluded at deck level, not schema level
- Users can create custom decks later
- Multiple difficulty levels without schema changes
### Multiplayer mechanic: simultaneous answers (not buzz-first)
All players see the same question at the same time and submit independently. The server waits for all answers or a 15-second timeout, then broadcasts the result. This keeps the experience Duolingo-like and symmetric. A buzz-first mechanic was considered and rejected.
### Room model: room codes (not matchmaking queue)
Players create rooms and share a human-readable code (e.g. `WOLF-42`) to invite friends. Auto-matchmaking via a queue is out of scope for MVP. Valkey is included in the stack and can support a queue in a future phase.
---
## TypeScript Configuration
### Base config: no `lib`, `module`, or `moduleResolution`
These are intentionally omitted from `tsconfig.base.json` because different packages need different values — `apps/api` uses `NodeNext`, `apps/web` uses `ESNext`/`bundler` (Vite), and mixing them in the base caused errors. Each package declares its own.
### `outDir: "./dist"` per package
The base config originally had `outDir: "dist"` which resolved relative to the base file location, pointing to the root `dist` folder. Overridden in each package with `"./dist"` to ensure compiled output stays inside the package.
### `apps/web` tsconfig: deferred to Vite scaffold
The web tsconfig was left as a placeholder and filled in after `pnpm create vite` generated `tsconfig.json`, `tsconfig.app.json`, and `tsconfig.node.json`. The generated files were then trimmed to remove options already covered by the base.
### `rootDir: "."` on `apps/api`
Set explicitly to allow `vitest.config.ts` (which lives outside `src/`) to be included in the TypeScript program. Without it, TypeScript infers `rootDir` as `src/` and rejects any file outside that directory.
---
## ESLint
### Two-config approach for `apps/web`
The root `eslint.config.mjs` handles TypeScript linting across all packages. `apps/web/eslint.config.js` is kept as a local addition for React-specific plugins only: `eslint-plugin-react-hooks` and `eslint-plugin-react-refresh`. ESLint flat config merges them automatically by directory proximity — no explicit import between them needed.
### Coverage config at root only
Vitest coverage configuration lives in the root `vitest.config.ts` only. Individual package configs omit it to produce a single aggregated report rather than separate per-package reports.
### `globals: true` with `"types": ["vitest/globals"]`
Using Vitest globals (`describe`, `it`, `expect` without imports) requires `"types": ["vitest/globals"]` in each package's tsconfig `compilerOptions`. Added to `apps/api`, `packages/shared`, and `packages/db`. Added to `apps/web/tsconfig.app.json`.
---
## Known Issues / Dev Notes
### glossa-web has no healthcheck
The `web` service in `docker-compose.yml` has no `healthcheck` defined. Reason: Vite's dev server (`vite dev`) has no built-in health endpoint. Unlike the API's `/api/health`, there's no URL to poll.
Workaround: `depends_on` uses `api` healthcheck as proxy. For production (Nginx), add a health endpoint or use TCP port check.
### Valkey memory overcommit warning
Valkey logs this on start in development:
```text
WARNING Memory overcommit must be enabled for proper functionality
```
This is **harmless in dev** but should be fixed before production. The warning appears because Docker containers don't inherit host sysctl settings by default.
Fix: Add to host `/etc/sysctl.conf`:
```conf
vm.overcommit_memory = 1
```
Then `sudo sysctl -p` or restart Docker.
---
## Data Model
### Users: internal UUID + openauth_sub (not sub as PK)
**Original approach:** Use OpenAuth `sub` claim directly as `users.id` (text primary key).
**Problem:** Embeds auth provider in the primary key (e.g. `"google|12345"`). If OpenAuth changes format or a second provider is added, the PK cascades through all FKs (`rooms.host_id`, `room_players.user_id`).
**Decision:**
- `users.id` = internal UUID (stable FK target)
- `users.openauth_sub` = text UNIQUE (auth provider claim)
- Allows adding multiple auth providers per user later without FK changes
### Rooms: `updated_at` for stale recovery only
Most tables omit `updated_at` (unnecessary for MVP). `rooms.updated_at` is kept specifically for stale room recovery—identifying rooms stuck in `in_progress` status after server crashes.
### Translations: UNIQUE (term_id, language_code, text)
Allows multiple synonyms per language per term (e.g. "dog", "hound" for same synset). Prevents exact duplicate rows. Homonyms (e.g. "Lead" metal vs. "Lead" guide) are handled by different `term_id` values (different synsets), so no constraint conflict.
### Decks: `pair_id` is nullable
`decks.pair_id` references `language_pairs` but is nullable. Reasons:
- Single-language decks (e.g. "English Grammar")
- Multi-pair decks (e.g. "Cognates" spanning EN-IT and EN-FR)
- System decks (created by app, not tied to specific user)
### Decks separate from terms (not frequency_rank filtering)
**Original approach:** Store `frequency_rank` on `terms` table and filter by rank range for difficulty.
**Problem discovered:** WordNet/OMW frequency data is unreliable for language learning. Extraction produced results like:
- Rank 1: "In" → "indio" (chemical symbol: Indium)
- Rank 2: "Be" → "berillio" (chemical symbol: Beryllium)
- Rank 7: "He" → "elio" (chemical symbol: Helium)
These are technically "common" in WordNet (every element is a noun) but useless for vocabulary learning.
**Decision:**
- `terms` table stores ALL available OMW synsets (raw data, no frequency filtering)
- `decks` table stores curated learning lists (A1, A2, B1, "Most Common 1000", etc.)
- `deck_terms` junction table links terms to decks with position ordering
- `rooms.deck_id` specifies which vocabulary deck a game uses
**Benefits:**
- Curricula can come from external sources (CEFR lists, Oxford 3000, SUBTLEX)
- Bad data (chemical symbols, obscure words) excluded at deck level, not schema level
- Users can create custom decks later
- Multiple difficulty levels without schema changes
---
## Current State
### Completed checkboxes (Phase 0)
- [x] Initialise pnpm workspace monorepo: `apps/web`, `apps/api`, `packages/shared`, `packages/db`
- [x] Configure TypeScript project references across packages
- [x] Set up ESLint + Prettier with shared configs in root
- [x] Set up Vitest in `api` and `web` and both packages
- [x] Scaffold Express app with `GET /api/health`
- [x] Scaffold Vite + React app with TanStack Router (single root route)
- [x] Configure Drizzle ORM + connection to local PostgreSQL
- [x] Write first migration (empty — just validates the pipeline works)
- [x] `docker-compose.yml` for local dev: `api`, `web`, `postgres`, `valkey`
- [x] `.env.example` files for `apps/api` and `apps/web`
- [x] update decisions.md
Phase 0 is finished with this.
### Current checkpoint
- [ ] Run `scripts/extract_omw.py` locally → generates `packages/db/src/seed.json`